Kevin Wolf [Mon, 22 Jul 2019 15:44:27 +0000 (17:44 +0200)]
mirror: Keep mirror_top_bs drained after dropping permissions
mirror_top_bs is currently implicitly drained through its connection to
the source or the target node. However, the drain section for target_bs
ends early after moving mirror_top_bs from src to target_bs, so that
requests can already be restarted while mirror_top_bs is still present
in the chain, but has dropped all permissions and therefore runs into an
assertion failure like this:
Keep mirror_top_bs drained until all graph changes have completed.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d2da5e288a2e71e82866c8fdefd41b5727300124) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Philippe Mathieu-Daudé [Wed, 11 Sep 2019 22:08:49 +0000 (00:08 +0200)]
block/create: Do not abort if a block driver is not available
The 'blockdev-create' QMP command was introduced as experimental
feature in commit b0292b851b8, using the assert() debug call.
It got promoted to 'stable' command in 3fb588a0f2c, but the
assert call was not removed.
Some block drivers are optional, and bdrv_find_format() might
return a NULL value, triggering the assertion.
Stable code is not expected to abort, so return an error instead.
This is easily reproducible when libnfs is not installed:
./configure
[...]
module support no
Block whitelist (rw)
Block whitelist (ro)
libiscsi support yes
libnfs support no
[...]
$ gdb qemu-system-x86_64 core
Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0 0x00007ffff510957f in raise () at /lib64/libc.so.6
#1 0x00007ffff50f3895 in abort () at /lib64/libc.so.6
#2 0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3 0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6
#4 0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69
#5 0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314
#6 0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131
#7 0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174
With this patch applied, QEMU returns a QMP error:
{'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
{"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}}
Cc: qemu-stable@nongnu.org Reported-by: Xu Tian <xutian@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d90d5cae2b10efc0e8d0b3cc91ff16201853d3ba) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:35 +0000 (18:55 +0100)]
vhost: Fix memory region section comparison
Using memcmp to compare structures wasn't safe,
as I found out on ARM when I was getting falce miscompares.
Use the helper function for comparing the MRSs.
Fixes: ade6d081fc33948e56e6 ("vhost: Regenerate region list from changed sections list") Cc: qemu-stable@nongnu.org Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190814175535.2023-4-dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3fc4a64cbaed2ddee4c60ddc06740b320e18ab82) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:34 +0000 (18:55 +0100)]
memory: Provide an equality function for MemoryRegionSections
Provide a comparison function that checks all the fields are the same.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190814175535.2023-3-dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9366cf02e4e31c2a8128904d4d8290a0fad5f888) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:33 +0000 (18:55 +0100)]
memory: Align MemoryRegionSections fields
MemoryRegionSection includes an Int128 'size' field;
on some platforms the compiler causes an alignment of this to
a 128bit boundary, leaving 8 bytes of dead space.
This deadspace can be filled with junk.
Move the size field to the top avoiding unnecessary alignment.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190814175535.2023-2-dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 44f85d3276397cfa2cfa379c61430405dad4e644) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Daniel P. Berrangé [Wed, 21 Aug 2019 15:14:27 +0000 (16:14 +0100)]
tests: make filemonitor test more robust to event ordering
The ordering of events that are emitted during the rmdir
test have changed with kernel >= 5.3. Semantically both
new & old orderings are correct, so we must be able to
cope with either.
To cope with this, when we see an unexpected event, we
push it back onto the queue and look and the subsequent
event to see if that matches instead.
Tested-by: Peter Xu <peterx@redhat.com> Tested-by: Wei Yang <richardw.yang@linux.intel.com> Tested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit bf9e0313c27d8e6ecd7f7de3d63e1cb25d8f6311) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Nir Soffer [Tue, 27 Aug 2019 01:05:27 +0000 (04:05 +0300)]
block: posix: Always allocate the first block
When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.
In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests. Allocating the first block avoids the fallback.
Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:
Nir Soffer [Tue, 13 Aug 2019 18:21:03 +0000 (21:21 +0300)]
file-posix: Handle undetectable alignment
In some cases buf_align or request_alignment cannot be detected:
1. With Gluster, buf_align cannot be detected since the actual I/O is
done on Gluster server, and qemu buffer alignment does not matter.
Since we don't have alignment requirement, buf_align=1 is the best
value.
2. With local XFS filesystem, buf_align cannot be detected if reading
from unallocated area. In this we must align the buffer, but we don't
know what is the correct size. Using the wrong alignment results in
I/O error.
3. With Gluster backed by XFS, request_alignment cannot be detected if
reading from unallocated area. In this case we need to use the
correct alignment, and failing to do so results in I/O errors.
4. With NFS, the server does not use direct I/O, so both buf_align cannot
be detected. In this case we don't need any alignment so we can use
buf_align=1 and request_alignment=1.
These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.
To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.
Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.
Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).
Case Sector buf_align (opt) request_alignment (opt) result
======================================================================
1 512 512 (1) 512 (512) OK
1 4096 512 (1) 4096 (4096) FAIL
----------------------------------------------------------------------
2 512 512 (512) 512 (512) OK
2 4096 512 (4096) 4096 (4096) FAIL
----------------------------------------------------------------------
3 512 512 (1) 512 (512) OK
3 4096 512 (1) 512 (4096) FAIL
----------------------------------------------------------------------
4 512 512 (1) 512 (1) OK
4 4096 512 (1) 512 (1) OK
Same cases with this change:
Case Sector buf_align (opt) request_alignment (opt) result
======================================================================
1 512 512 (1) 512 (512) OK
1 4096 4096 (1) 4096 (4096) OK
----------------------------------------------------------------------
2 512 512 (512) 512 (512) OK
2 4096 4096 (4096) 4096 (4096) OK
----------------------------------------------------------------------
3 512 4096 (1) 4096 (512) OK
3 4096 4096 (1) 4096 (4096) OK
----------------------------------------------------------------------
4 512 4096 (1) 4096 (1) OK
4 4096 4096 (1) 4096 (1) OK
I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.
Signed-off-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a6b257a08e3d72219f03e461a52152672fec0612)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Fri, 1 Nov 2019 15:25:10 +0000 (16:25 +0100)]
block/file-posix: Let post-EOF fallocate serialize
The XFS kernel driver has a bug that may cause data corruption for qcow2
images as of qemu commit c8bb23cbdbe32f. We can work around it by
treating post-EOF fallocates as serializing up until infinity (INT64_MAX
in practice).
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 292d06b925b2787ee6f2430996b95651cae42fce) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:05 +0000 (19:15 +0300)]
block/io: refactor padding
We have similar padding code in bdrv_co_pwritev,
bdrv_co_do_pwrite_zeroes and bdrv_co_preadv. Let's combine and unify
it.
[Squashed in Vladimir's qemu-iotests 077 fix
--Stefan]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-4-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-4-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 7a3f542fbdfd799be4fa6f8b96dc8c1e6933fce4)
*prereq for 292d06b9 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:04 +0000 (19:15 +0300)]
util/iov: improve qemu_iovec_is_zero
We'll need to check a part of qiov soon, so implement it now.
Optimization with align down to 4 * sizeof(long) is dropped due to:
1. It is strange: it aligns length of the buffer, but where is a
guarantee that buffer pointer is aligned itself?
2. buffer_is_zero() is a better place for optimizations and it has
them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-3-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-3-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f76889e7b947d896db51be8a4d9c941c2f70365a)
*prereq for 292d06b9 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:03 +0000 (19:15 +0300)]
util/iov: introduce qemu_iovec_init_extended
Introduce new initialization API, to create requests with padding. Will
be used in the following patch. New API uses qemu_iovec_init_buf if
resulting io vector has only one element, to avoid extra allocations.
So, we need to update qemu_iovec_destroy to support destroying such
QIOVs.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-2-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-2-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d953169d4840f312d3b9a54952f4a7ccfcb3b311)
*prereq for 292d06b9 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tuguoyi [Fri, 1 Nov 2019 07:37:35 +0000 (07:37 +0000)]
qcow2-bitmap: Fix uint64_t left-shift overflow
There are two issues in In check_constraints_on_bitmap(),
1) The sanity check on the granularity will cause uint64_t
integer left-shift overflow when cluster_size is 2M and the
granularity is BIGGER than 32K.
2) The way to calculate image size that the maximum bitmap
supported can map to is a bit incorrect.
This patch fix it by add a helper function to calculate the
number of bytes needed by a normal bitmap in image and compare
it to the maximum bitmap bytes supported by qemu.
Fixes: 5f72826e7fc62167cf3a Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 570542ecb11e04b61ef4b3f4d0965a6915232a88) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Fri, 11 Oct 2019 15:28:13 +0000 (17:28 +0200)]
iotests: Add peek_file* functions
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-16-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit fc8ba423ca6b37bee56ec9dc339b44043c39553d)
*prereq for 570542ec Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 28 Oct 2019 16:18:41 +0000 (17:18 +0100)]
iotests: Add test for 4G+ compressed qcow2 write
Test what qemu-img check says about an image after one has written
compressed data to an offset above 4 GB.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191028161841.1198-3-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit b7cd2c11f76d27930f53d3cf26d7b695c78d613b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Philippe Mathieu-Daudé [Fri, 16 Aug 2019 17:15:03 +0000 (19:15 +0200)]
virtio-blk: Cancel the pending BH when the dataplane is reset
When 'system_reset' is called, the main loop clear the memory
region cache before the BH has a chance to execute. Later when
the deferred function is called, some assumptions that were
made when scheduling them are no longer true when they actually
execute.
This is what happens using a virtio-blk device (fresh RHEL7.8 install):
(gdb) bt
Thread 1 (Thread 0x7f109c17b680 (LWP 10939)):
#0 0x00005604083296d1 in vring_get_region_caches (vq=0x56040a24bdd0) at hw/virtio/virtio.c:227
#1 0x000056040832972b in vring_avail_flags (vq=0x56040a24bdd0) at hw/virtio/virtio.c:235
#2 0x000056040832d13d in virtio_should_notify (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1648
#3 0x000056040832d1f8 in virtio_notify_irqfd (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1662
#4 0x00005604082d213d in notify_guest_bh (opaque=0x56040a243ec0) at hw/block/dataplane/virtio-blk.c:75
#5 0x000056040883dc35 in aio_bh_call (bh=0x56040a243f10) at util/async.c:90
#6 0x000056040883dccd in aio_bh_poll (ctx=0x560409161980) at util/async.c:118
#7 0x0000560408842af7 in aio_dispatch (ctx=0x560409161980) at util/aio-posix.c:460
#8 0x000056040883e068 in aio_ctx_dispatch (source=0x560409161980, callback=0x0, user_data=0x0) at util/async.c:261
#9 0x00007f10a8fca06d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#10 0x0000560408841445 in glib_pollfds_poll () at util/main-loop.c:215
#11 0x00005604088414bf in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
#12 0x00005604088415c4 in main_loop_wait (nonblocking=0) at util/main-loop.c:514
#13 0x0000560408416b1e in main_loop () at vl.c:1923
#14 0x000056040841e0e8 in main (argc=20, argv=0x7ffc2c3f9c58, envp=0x7ffc2c3f9d00) at vl.c:4578
Fix this by cancelling the BH when the virtio dataplane is stopped.
[This is version of the patch was modified as discussed with Philippe on
the mailing list thread.
--Stefan]
Reported-by: Yihuang Yu <yihyu@redhat.com> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Fixes: https://bugs.launchpad.net/qemu/+bug/1839428 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190816171503.24761-1-philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ebb6ff25cd888a52a64a9adc3692541c6d1d9a42) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Wed, 14 Aug 2019 12:05:21 +0000 (17:35 +0530)]
scsi: lsi: exit infinite loop while executing script (CVE-2019-12068)
When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.
Reported-by: Bugs SysSec <bugs-syssec@rub.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit de594e47659029316bbf9391efb79da0a1a08e08) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Christophe Lyon [Fri, 25 Oct 2019 09:57:11 +0000 (11:57 +0200)]
target/arm: Allow reading flags from FPSCR for M-profile
rt==15 is a special case when reading the flags: it means the
destination is APSR. This patch avoids rejecting
vmrs apsr_nzcv, fpscr
as illegal instruction.
Cc: qemu-stable@nongnu.org Signed-off-by: Christophe Lyon <christophe.lyon@linaro.org>
Message-id: 20191025095711.10853-1-christophe.lyon@linaro.org
[PMM: updated the comment] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 2529ab43b8a05534494704e803e0332d111d8b91) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Fri, 11 Oct 2019 09:07:07 +0000 (12:07 +0300)]
hbitmap: handle set/reset with zero length
Passing zero length to these functions leads to unpredicted results.
Zero-length set/reset may occur in active-mirror, on zero-length write
(which is unlikely, but not guaranteed to never happen).
Let's just do nothing on zero-length request.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191011090711.19940-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit fed33bd175f663cc8c13f8a490a4f35a19756cfe) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Tue, 6 Aug 2019 15:26:11 +0000 (18:26 +0300)]
util/hbitmap: strict hbitmap_reset
hbitmap_reset has an unobvious property: it rounds requested region up.
It may provoke bugs, like in recently fixed write-blocking mode of
mirror: user calls reset on unaligned region, not keeping in mind that
there are possible unrelated dirty bytes, covered by rounded-up region
and information of this unrelated "dirtiness" will be lost.
Make hbitmap_reset strict: assert that arguments are aligned, allowing
only one exception when @start + @count == hb->orig_size. It's needed
to comfort users of hbitmap_next_dirty_area, which cares about
hb->orig_size.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20190806152611.280389-1-vsementsov@virtuozzo.com>
[Maintainer edit: Max's suggestions from on-list. --js]
[Maintainer edit: Eric's suggestion for aligned macro. --js] Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 48557b138383aaf69c2617ca9a88bfb394fc50ec)
*prereq for fed33bd175f663cc8c13f8a490a4f35a19756cfe Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Fan Yang [Tue, 24 Sep 2019 14:08:29 +0000 (22:08 +0800)]
COLO-compare: Fix incorrect `if` logic
'colo_mark_tcp_pkt' should return 'true' when packets are the same, and
'false' otherwise. However, it returns 'true' when
'colo_compare_packet_payload' returns non-zero while
'colo_compare_packet_payload' is just a 'memcmp'. The result is that
COLO-compare reports inconsistent TCP packets when they are actually
the same.
Fixes: f449c9e549c ("colo: compare the packet based on the tcp sequence number") Cc: qemu-stable@nongnu.org Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Fan Yang <Fan_Yang@sjtu.edu.cn> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1e907a32b77e5d418538453df5945242e43224fa) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Mikhail Sennikovsky [Fri, 11 Oct 2019 13:58:04 +0000 (15:58 +0200)]
virtio-net: prevent offloads reset on migration
Currently offloads disabled by guest via the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
command are not preserved on VM migration.
Instead all offloads reported by guest features (via VIRTIO_PCI_GUEST_FEATURES)
get enabled.
What happens is: first the VirtIONet::curr_guest_offloads gets restored and offloads
are getting set correctly:
#0 qemu_set_offload (nc=0x555556a11400, csum=1, tso4=0, tso6=0, ecn=0, ufo=0) at net/net.c:474
#1 virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
#2 virtio_net_post_load_device (opaque=0x555557701ca0, version_id=11) at hw/net/virtio-net.c:2334
#3 vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577c80 <vmstate_virtio_net_device>, opaque=0x555557701ca0, version_id=11)
at migration/vmstate.c:168
#4 virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2197
#5 virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
#6 vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
#7 vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
#8 qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
#9 qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
#10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
#11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449
However later on the features are getting restored, and offloads get reset to
everything supported by features:
#0 qemu_set_offload (nc=0x555556a11400, csum=1, tso4=1, tso6=1, ecn=0, ufo=0) at net/net.c:474
#1 virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
#2 virtio_net_set_features (vdev=0x555557701ca0, features=5104441767) at hw/net/virtio-net.c:773
#3 virtio_set_features_nocheck (vdev=0x555557701ca0, val=5104441767) at hw/virtio/virtio.c:2052
#4 virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2220
#5 virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
#6 vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
#7 vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
#8 qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
#9 qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
#10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
#11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449
Fix this by preserving the state in saved_guest_offloads field and
pushing out offload initialization to the new post load hook.
Cc: qemu-stable@nongnu.org Signed-off-by: Mikhail Sennikovsky <mikhail.sennikovskii@cloud.ionos.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 7788c3f2e21e35902d45809b236791383bbb613e) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael S. Tsirkin [Fri, 11 Oct 2019 13:58:03 +0000 (15:58 +0200)]
virtio: new post_load hook
Post load hook in virtio vmsd is called early while device is processed,
and when VirtIODevice core isn't fully initialized. Most device
specific code isn't ready to deal with a device in such state, and
behaves weirdly.
Add a new post_load hook in a device class instead. Devices should use
this unless they specifically want to verify the migration stream as
it's processed, e.g. for bounds checking.
Cc: qemu-stable@nongnu.org Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mikhail Sennikovsky <mikhail.sennikovskii@cloud.ionos.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1dd713837cac8ec5a97d3b8492d72ce5ac94803c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Hikaru Nishida [Tue, 15 Oct 2019 01:07:34 +0000 (10:07 +0900)]
ui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)
macOS API documentation says that before applicationDidFinishLaunching
is called, any events will not be processed. However, some events are
fired before it is called in macOS Catalina. This causes deadlock of
iothread_lock in handleEvent while it will be released after the
app_started_sem is posted.
This patch avoids processing events before the app_started_sem is
posted to prevent this deadlock.
Max Reitz [Mon, 14 Oct 2019 15:39:28 +0000 (17:39 +0200)]
mirror: Do not dereference invalid pointers
mirror_exit_common() may be called twice (if it is called from
mirror_prepare() and fails, it will be called from mirror_abort()
again).
In such a case, many of the pointers in the MirrorBlockJob object will
already be freed. This can be seen most reliably for s->target, which
is set to NULL (and then dereferenced by blk_bs()).
Cc: qemu-stable@nongnu.org Fixes: 737efc1eda23b904fbe0e66b37715fb0e5c3e58b Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191014153931.20699-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit f93c3add3a773e0e3f6277e5517583c4ad3a43c2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Thu, 10 Oct 2019 10:08:58 +0000 (12:08 +0200)]
iotests: Test large write request to qcow2 file
Without HEAD^, the following happens when you attempt a large write
request to a qcow2 file such that the number of bytes covered by all
clusters involved in a single allocation will exceed INT_MAX:
(A) handle_alloc_space() decides to fill the whole area with zeroes and
fails because bdrv_co_pwrite_zeroes() fails (the request is too
large).
(B) If handle_alloc_space() does not do anything, but merge_cow()
decides that the requests can be merged, it will create a too long
IOV that later cannot be written.
(C) Otherwise, all parts will be written separately, so those requests
will work.
In either B or C, though, qcow2_alloc_cluster_link_l2() will have an
overflow: We use an int (i) to iterate over nb_clusters, and then
calculate the L2 entry based on "i << s->cluster_bits" -- which will
overflow if the range covers more than INT_MAX bytes. This then leads
to image corruption because the L2 entry will be wrong (it will be
recognized as a compressed cluster).
Even if that were not the case, the .cow_end area would be empty
(because handle_alloc() will cap avail_bytes and nb_bytes at INT_MAX, so
their difference (which is the .cow_end size) will be 0).
So this test checks that on such large requests, the image will not be
corrupted. Unfortunately, we cannot check whether COW will be handled
correctly, because that data is discarded when it is written to null-co
(but we have to use null-co, because writing 2 GB of data in a test is
not quite reasonable).
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a1406a9262a087d9ec9627b88da13c4590b61dae)
Conflicts:
tests/qemu-iotests/group
*drop context dep. on tests not in 4.1 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Thu, 10 Oct 2019 10:08:57 +0000 (12:08 +0200)]
qcow2: Limit total allocation range to INT_MAX
When the COW areas are included, the size of an allocation can exceed
INT_MAX. This is kind of limited by handle_alloc() in that it already
caps avail_bytes at INT_MAX, but the number of clusters still reflects
the original length.
This can have all sorts of effects, ranging from the storage layer write
call failing to image corruption. (If there were no image corruption,
then I suppose there would be data loss because the .cow_end area is
forced to be empty, even though there might be something we need to
COW.)
Fix all of it by limiting nb_clusters so the equivalent number of bytes
will not exceed INT_MAX.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d1b9d19f99586b33795e20a79f645186ccbc070f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Wed, 25 Sep 2019 12:16:43 +0000 (14:16 +0200)]
hw/core/loader: Fix possible crash in rom_copy()
Both, "rom->addr" and "addr" are derived from the binary image
that can be loaded with the "-kernel" paramer. The code in
rom_copy() then calculates:
d = dest + (rom->addr - addr);
and uses "d" as destination in a memcpy() some lines later. Now with
bad kernel images, it is possible that rom->addr is smaller than addr,
thus "rom->addr - addr" gets negative and the memcpy() then tries to
copy contents from the image to a bad memory location. This could
maybe be used to inject code from a kernel image into the QEMU binary,
so we better fix it with an additional sanity check here.
Adrian Moreno [Tue, 24 Sep 2019 16:20:44 +0000 (18:20 +0200)]
vhost-user: save features if the char dev is closed
That way the state can be correctly restored when the device is opened
again. This might happen if the backend is restarted.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1738768 Reported-by: Pei Zhang <pezhang@redhat.com> Fixes: 6ab79a20af3a ("do not call vhost_net_cleanup() on running net from char user event") Cc: ddstreet@canonical.com Cc: Michael S. Tsirkin <mst@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Message-Id: <20190924162044.11414-1-amorenoz@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c6beefd674fff8d41b90365dfccad32e53a5abcb) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Tue, 17 Sep 2019 10:43:42 +0000 (12:43 +0200)]
iotests: Test internal snapshots with -blockdev
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com>
(cherry picked from commit 92b22e7b1789b0e5f20d245706e72eae70dbddce) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Tue, 17 Sep 2019 10:26:23 +0000 (12:26 +0200)]
block/snapshot: Restrict set of snapshot nodes
Nodes involved in internal snapshots were those that were returned by
bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all
nodes that are either the root node of a BlockBackend or monitor-owned
nodes.
With the typical -drive use, this worked well enough. However, in the
typical -blockdev case, the user defines one node per option, making all
nodes monitor-owned nodes. This includes protocol nodes etc. which often
are not snapshottable, so "savevm" only returns an error.
Change the conditions so that internal snapshot still include all nodes
that have a BlockBackend attached (we definitely want to snapshot
anything attached to a guest device and probably also the built-in NBD
server; snapshotting block job BlockBackends is more of an accident, but
a preexisting one), but other monitor-owned nodes are only included if
they have no parents.
This makes internal snapshots usable again with typical -blockdev
configurations.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com>
(cherry picked from commit 05f4aced658a02b02d3e89a6c7a2281008fcf26c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Matthew Rosato [Thu, 26 Sep 2019 14:10:36 +0000 (10:10 -0400)]
s390: PCI: fix IOMMU region init
The fix in dbe9cf606c shrinks the IOMMU memory region to a size
that seems reasonable on the surface, however is actually too
small as it is based against a 0-mapped address space. This
causes breakage with small guests as they can overrun the IOMMU window.
Let's go back to the prior method of initializing iommu for now.
Fixes: dbe9cf606c ("s390x/pci: Set the iommu region size mpcifc request") Cc: qemu-stable@nongnu.org Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Tested-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reported-by: Stefan Zimmerman <stzi@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <1569507036-15314-1-git-send-email-mjrosato@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 7df1dac5f1c85312474df9cb3a8fcae72303da62) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael Roth [Thu, 12 Sep 2019 23:12:02 +0000 (18:12 -0500)]
roms/Makefile.edk2: don't pull in submodules when building from tarball
Currently the `make efi` target pulls submodules nested under the
roms/edk2 submodule as dependencies. However, when we attempt to build
from a tarball this fails since we are no longer in a git tree.
A preceding patch will pre-populate these submodules in the tarball,
so assume this build dependency is only needed when building from a
git tree.
Cc: Laszlo Ersek <lersek@redhat.com> Cc: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org # v4.1.0 Reported-by: Bruce Rogers <brogers@suse.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-3-mdroth@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit f3e330e3c319160ac04954399b5a10afc965098c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael Roth [Thu, 12 Sep 2019 23:12:01 +0000 (18:12 -0500)]
make-release: pull in edk2 submodules so we can build it from tarballs
The `make efi` target added by 536d2173 is built from the roms/edk2
submodule, which in turn relies on additional submodules nested under
roms/edk2.
The make-release script currently only pulls in top-level submodules,
so these nested submodules are missing in the resulting tarball.
We could try to address this situation more generally by recursively
pulling in all submodules, but this doesn't necessarily ensure the
end-result will build properly (this case also required other changes).
Additionally, due to the nature of submodules, we may not always have
control over how these sorts of things are dealt with, so for now we
continue to handle it on a case-by-case in the make-release script.
Cc: Laszlo Ersek <lersek@redhat.com> Cc: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org # v4.1.0 Reported-by: Bruce Rogers <brogers@suse.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-2-mdroth@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit 45c61c6c23918e3b05ed9ecac5b2328ebae5f774) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Fri, 20 Sep 2019 17:40:39 +0000 (18:40 +0100)]
hw/arm/boot.c: Set NSACR.{CP11,CP10} for NS kernel boots
If we're booting a Linux kernel directly into Non-Secure
state on a CPU which has Secure state, then make sure we
set the NSACR CP11 and CP10 bits, so that Non-Secure is allowed
to access the FPU. Otherwise an AArch32 kernel will UNDEF as
soon as it tries to use the FPU.
It used to not matter that we didn't do this until commit fc1120a7f5f2d4b6, where we implemented actually honouring
these NSACR bits.
The problem only exists for CPUs where EL3 is AArch32; the
equivalent AArch64 trap bits are in CPTR_EL3 and are "0 to
not trap, 1 to trap", so the reset value of the register
permits NS access, unlike NSACR.
Fixes: fc1120a7f5 Fixes: https://bugs.launchpad.net/qemu/+bug/1844597 Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190920174039.3916-1-peter.maydell@linaro.org
(cherry picked from commit ece628fcf69cbbd4b3efb6fbd203af07609467a2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:43 +0000 (17:20 +0300)]
block/backup: fix backup_cow_with_offload for last cluster
We shouldn't try to copy bytes beyond EOF. Fix it.
Fixes: 9ded4a0114968e Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 1048ddf0a32dcdaa952e581bd503d49adad527cc) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:42 +0000 (17:20 +0300)]
block/backup: fix max_transfer handling for copy_range
Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
are trying to find aligned size which satisfy both source and target.
Also, don't ignore too small max_transfer. In this case seems safer to
disable copy_range.
Fixes: 9ded4a0114968e Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 981fb5810aa3f68797ee6e261db338bd78857614) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Thu, 24 Oct 2019 14:26:58 +0000 (16:26 +0200)]
qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.
As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.
This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.
Fixes: 69f47505ee66afaa513305de0c1895a224e52c45 Cc: qemu-stable@nongnu.org Reported-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 5e9785505210e2477e590e61b1ab100d0ec22b01) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Thu, 24 Oct 2019 14:26:57 +0000 (16:26 +0200)]
coroutine: Add qemu_co_mutex_assert_locked()
Some functions require that the caller holds a certain CoMutex for them
to operate correctly. Add a function so that they can assert the lock is
really held.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 944f3d5dd216fcd8cb007eddd4f82dced0a15b3d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The corruption happens when we do a write that
* writes to two or more unallocated clusters at once
* doesn't fully cover the first sector
* doesn't fully cover the last sector
* uses luks encryption
In this case, when allocating the new clusters we COW both areas
prior to the write and after the write, and we encrypt them.
The above mentioned commit accidentally made it so we encrypt the
second COW area using the physical cluster offset of the first area.
The problem is that offset_in_cluster in do_perform_cow_encrypt
can be larger that the cluster size, thus cluster_offset
will no longer point to the start of the cluster at which encrypted
area starts.
Next patch in this series will refactor the code to avoid all these
assumptions.
In the bugreport that was triggered by rebasing a luks image to new,
zero filled base, which lot of such writes, and causes some files
with zero areas to contain garbage there instead.
But as described above it can happen elsewhere as well
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190915203655.21638-2-mlevitsk@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 38e7d54bdc518b5a05a922467304bcace2396945) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
blockjob: update nodes head while removing all bdrv
block_job_remove_all_bdrv() iterates through job->nodes, calling
bdrv_root_unref_child() for each entry. The call to the latter may
reach child_job_[can_]set_aio_ctx(), which will also attempt to
traverse job->nodes, potentially finding entries that where freed
on previous iterations.
To avoid this situation, update job->nodes head on each iteration to
ensure that already freed entries are no longer linked to the list.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1746631 Signed-off-by: Sergio Lopez <slp@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190911100316.32282-1-mreitz@redhat.com Reviewed-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d876bf676f5e7c6aa9ac64555e48cba8734ecb2f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 10 Sep 2019 12:41:35 +0000 (14:41 +0200)]
curl: Handle success in multi_check_completion
Background: As of cURL 7.59.0, it verifies that several functions are
not called from within a callback. Among these functions is
curl_multi_add_handle().
curl_read_cb() is a callback from cURL and not a coroutine. Waking up
acb->co will lead to entering it then and there, which means the current
request will settle and the caller (if it runs in the same coroutine)
may then issue the next request. In such a case, we will enter
curl_setup_preadv() effectively from within curl_read_cb().
Calling curl_multi_add_handle() will then fail and the new request will
not be processed.
Fix this by not letting curl_read_cb() wake up acb->co. Instead, leave
the whole business of settling the AIOCB objects to
curl_multi_check_completion() (which is called from our timer callback
and our FD handler, so not from any cURL callbacks).
Reported-by: Natalie Gavrielov <ngavrilo@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-7-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit bfb23b480a49114315877aacf700b49453e0f9d9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 10 Sep 2019 12:41:34 +0000 (14:41 +0200)]
curl: Report only ready sockets
Instead of reporting all sockets to cURL, only report the one that has
caused curl_multi_do_locked() to be called. This lets us get rid of the
QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are
only safe when the current element is removed in each iteration. If it
possible for the list to be concurrently modified, we cannot guarantee
that only the current element will be removed. Therefore, we must not
use QLIST_FOREACH_SAFE() here.
Fixes: ff5ca1664af85b24a4180d595ea6873fd3deac57 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 9abaf9fc474c3dd53e8e119326abc774c977c331) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 10 Sep 2019 12:41:33 +0000 (14:41 +0200)]
curl: Pass CURLSocket to curl_multi_do()
curl_multi_do_locked() currently marks all sockets as ready. That is
not only inefficient, but in fact unsafe (the loop is). A follow-up
patch will change that, but to do so, curl_multi_do_locked() needs to
know exactly which socket is ready; and that is accomplished by this
patch here.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 9dbad87d25587ff640ef878f7b6159fc368ff541) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 10 Sep 2019 12:41:32 +0000 (14:41 +0200)]
curl: Check completion in curl_multi_do()
While it is more likely that transfers complete after some file
descriptor has data ready to read, we probably should not rely on it.
Better be safe than sorry and call curl_multi_check_completion() in
curl_multi_do(), too, just like it is done in curl_multi_read().
With this change, curl_multi_do() and curl_multi_read() are actually the
same, so drop curl_multi_read() and use curl_multi_do() as the sole FD
handler.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 948403bcb1c7e71dcbe8ab8479cf3934a0efcbb5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 10 Sep 2019 12:41:30 +0000 (14:41 +0200)]
curl: Keep pointer to the CURLState in CURLSocket
A follow-up patch will make curl_multi_do() and curl_multi_read() take a
CURLSocket instead of the CURLState. They still need the latter,
though, so add a pointer to it to the former.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190910124136.10565-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 0487861685294660b23bc146e1ebd5304aa8bbe0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Lieven [Tue, 10 Sep 2019 15:41:09 +0000 (17:41 +0200)]
block/nfs: tear down aio before nfs_close
nfs_close is a sync call from libnfs and has its own event
handler polling on the nfs FD. Avoid that both QEMU and libnfs
are intefering here.
CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 601dc6559725f7a614b6f893611e17ff0908e914) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alberto Garcia [Fri, 16 Aug 2019 12:17:42 +0000 (15:17 +0300)]
qcow2: Fix the calculation of the maximum L2 cache size
The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:
The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.
For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.
However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.
A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.
Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b70d08205b2e4044c529eefc21df2c8ab61b473b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Johannes Berg [Tue, 3 Sep 2019 20:04:22 +0000 (23:04 +0300)]
libvhost-user: fix SLAVE_SEND_FD handling
It doesn't look like this could possibly work properly since
VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD is defined to 10, but the
dev->protocol_features has a bitmap. I suppose the peer this
was tested with also supported VHOST_USER_PROTOCOL_F_LOG_SHMFD,
in which case the test would always be false, but nevertheless
the code seems wrong.
Use has_feature() to fix this.
Fixes: d84599f56c82 ("libvhost-user: support host notifier") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Message-Id: <20190903200422.11693-1-johannes@sipsolutions.net> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8726b70b449896f1211f869ec4f608904f027207) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Thu, 22 Aug 2019 13:15:34 +0000 (14:15 +0100)]
target/arm: Don't abort on M-profile exception return in linux-user mode
An attempt to do an exception-return (branch to one of the magic
addresses) in linux-user mode for M-profile should behave like
a normal branch, because linux-user mode is always going to be
in 'handler' mode. This used to work, but we broke it when we added
support for the M-profile security extension in commit d02a8698d7ae2bfed.
In that commit we allowed even handler-mode calls to magic return
values to be checked for and dealt with by causing an
EXCP_EXCEPTION_EXIT exception to be taken, because this is
needed for the FNC_RETURN return-from-non-secure-function-call
handling. For system mode we added a check in do_v7m_exception_exit()
to make any spurious calls from Handler mode behave correctly, but
forgot that linux-user mode would also be affected.
How an attempted return-from-non-secure-function-call in linux-user
mode should be handled is not clear -- on real hardware it would
result in return to secure code (not to the Linux kernel) which
could then handle the error in any way it chose. For QEMU we take
the simple approach of treating this erroneous return the same way
it would be handled on a CPU without the security extensions --
treat it as a normal branch.
The upshot of all this is that for linux-user mode we should never
do any of the bx_excret magic, so the code change is simple.
This ought to be a weird corner case that only affects broken guest
code (because Linux user processes should never be attempting to do
exception returns or NS function returns), except that the code that
assigns addresses in RAM for the process and stack in our linux-user
code does not attempt to avoid this magic address range, so
legitimate code attempting to return to a trampoline routine on the
stack can fall into this case. This change fixes those programs,
but we should also look at restricting the range of memory we
use for M-profile linux-user guests to the area that would be
real RAM in hardware.
Cc: qemu-stable@nongnu.org Reported-by: Christophe Lyon <christophe.lyon@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190822131534.16602-1-peter.maydell@linaro.org Fixes: https://bugs.launchpad.net/qemu/+bug/1840922 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 5e5584c89f36b302c666bc6db535fd3f7ff35ad2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Tue, 27 Aug 2019 12:19:31 +0000 (13:19 +0100)]
target/arm: Free TCG temps in trans_VMOV_64_sp()
The function neon_store_reg32() doesn't free the TCG temp that it
is passed, so the caller must do that. We got this right in most
places but forgot to free the TCG temps in trans_VMOV_64_sp().
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190827121931.26836-1-peter.maydell@linaro.org
(cherry picked from commit 342d27581bd3ecdb995e4fc55fcd383cf3242888) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 2 Sep 2019 19:33:20 +0000 (21:33 +0200)]
iotests: Test blockdev-create for vpc
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit cb73747e1a47b93d3dfdc3f769c576b053916938) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 2 Sep 2019 19:33:19 +0000 (21:33 +0200)]
iotests: Restrict nbd Python tests to nbd
We have two Python unittest-style tests that test NBD. As such, they
should specify supported_protocols=['nbd'] so they are skipped when the
user wants to test some other protocol.
Furthermore, we should restrict their choice of formats to 'raw'. The
idea of a protocol/format combination is to use some format over some
protocol; but we always use the raw format over NBD. It does not really
matter what the NBD server uses on its end, and it is not a useful test
of the respective format driver anyway.
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7c932a1d69a6d6ac5c0b615c11d191da3bbe9aa8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 2 Sep 2019 19:33:18 +0000 (21:33 +0200)]
iotests: Restrict file Python tests to file
Most of our Python unittest-style tests only support the file protocol.
You can run them with any other protocol, but the test will simply
ignore your choice and use file anyway.
We should let them signal that they require the file protocol so they
are skipped when you want to test some other protocol.
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 103cbc771e5660d1f5bb458be80aa9e363547ae0)
Conflicts:
tests/qemu-iotests/257
*drop changes for tests not in 4.1.0 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 2 Sep 2019 19:33:17 +0000 (21:33 +0200)]
iotests: Add supported protocols to execute_test()
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 88d2aa533a4a1aad44a27c2e6cd5bc5fbcbce7ed) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
John Snow [Mon, 29 Jul 2019 20:35:53 +0000 (16:35 -0400)]
iotests: add testing shim for script-style python tests
Because the new-style python tests don't use the iotests.main() test
launcher, we don't turn on the debugger logging for these scripts
when invoked via ./check -d.
Refactor the launcher shim into new and old style shims so that they
share environmental configuration.
Two cleanup notes: debug was not actually used as a global, and there
was no reason to create a class in an inner scope just to achieve
default variables; we can simply create an instance of the runner with
the values we want instead.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-14-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 456a2d5ac7641c7e75c76328a561b528a8607a8e)
*prereq for 88d2aa533a Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 2 Sep 2019 19:33:16 +0000 (21:33 +0200)]
vpc: Return 0 from vpc_co_create() on success
blockdev_create_run() directly uses .bdrv_co_create()'s return value as
the job's return value. Jobs must return 0 on success, not just any
nonnegative value. Therefore, using blockdev-create for VPC images may
currently fail as the vpc driver may return a positive integer.
Because there is no point in returning a positive integer anywhere in
the block layer (all non-negative integers are generally treated as
complete success), we probably do not want to add more such cases.
Therefore, fix this problem by making the vpc driver always return 0 in
case of success.
Suggested-by: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 1a37e3124407b5a145d44478d3ecbdb89c63789f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Igor Mammedov [Mon, 2 Sep 2019 12:02:22 +0000 (08:02 -0400)]
x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set
Commit 176d2cda0 (i386/cpu: Consolidate die-id validity in smp context) added
new 'die-id' topology property to CPUs and exposed it via QMP command
query-hotpluggable-cpus, which broke -device/device_add cpu-foo for existing
users that do not support die-id/dies yet. That's would be fine if it happened
to new machine type only but it also happened to old machine types,
which breaks migration from old QEMU to the new one, for example following CLI:
OLD-QEMU -M pc-i440fx-4.0 -smp 1,max_cpus=2 \
-device qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id
is not able to start with new QEMU, complaining about invalid die-id.
After discovering regression, the patch
"pc: Don't make die-id mandatory unless necessary"
makes die-id optional so old CLI would work.
However it's not enough as new QEMU still exposes die-id via query-hotpluggbale-cpus
QMP command, so the users that started old machine type on new QEMU, using all
properties (including die-id) received from QMP command (as required), won't be
able to start old QEMU using the same properties since it doesn't support die-id.
Fix it by hiding die-id in query-hotpluggbale-cpus for all machine types in case
'-smp dies' is not provided on CLI or -smp dies = 1', in which case smp_dies == 1
and APIC ID is calculated in default way (as it was before DIE support) so we won't
need compat code as in both cases the topology provided to guest via CPUID is the same.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190902120222.6179-1-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit c6c1bb89fb46f3b88f832e654cf5a6f7941aac51) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Markus Armbruster [Thu, 22 Aug 2019 13:38:46 +0000 (15:38 +0200)]
pr-manager: Fix invalid g_free() crash bug
pr_manager_worker() passes its @opaque argument to g_free(). Wrong;
it points to pr_manager_worker()'s automatic @data. Broken when
commit 2f3a7ab39be converted @data from heap- to stack-allocated. Fix
by deleting the g_free().
Fixes: 2f3a7ab39bec4ba8022dc4d42ea641165b004e3e Cc: qemu-stable@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6b9d62c2a9e83bbad73fb61406f0ff69b46ff6f3) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Fri, 23 Aug 2019 13:03:41 +0000 (15:03 +0200)]
iotests: Test reverse sub-cluster qcow2 writes
This exercises the regression introduced in commit 50ba5b2d994853b38fed10e0841b119da0f8b8e5. On my machine, it has close
to a 50 % false-negative rate, but that should still be sufficient to
test the fix.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ae6ef0190981a21f2d4bc8dcee7253688f14fae7)
Conflicts:
tests/qemu-iotests/group
*fix context deps on tests not in 4.1.0 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Fri, 23 Aug 2019 13:03:40 +0000 (15:03 +0200)]
block/file-posix: Reduce xfsctl() use
This patch removes xfs_write_zeroes() and xfs_discard(). Both functions
have been added just before the same feature was present through
fallocate():
- fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March
2011); xfs_discard() was added in December 2010.
- fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June
2014); xfs_write_zeroes() was added in November 2013.
Nowadays, all systems that qemu runs on should support both fallocate()
features (RHEL 7's kernel does).
xfsctl() is still useful for getting the request alignment for O_DIRECT,
so this patch does not remove our dependency on it completely.
Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the
file is shorter than the specified range (because ZERO_RANGE does not
increase the file length). ftruncate() may yield and then discard data
that parallel write requests have written past the EOF in the meantime.
Dropping the function altogether fixes the bug.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: 50ba5b2d994853b38fed10e0841b119da0f8b8e5 Reported-by: Lukáš Doktor <ldoktor@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
/*
* If the toolstack (or unplug request callback) has set the backend
- * state to Closing, but there is no active frontend (i.e. the
- * state is not Connected) then set the backend state to Closed.
+ * state to Closing, but there is no active frontend then set the
+ * backend state to Closed.
*/
if (xendev->backend_state == XenbusStateClosing &&
- xendev->frontend_state != XenbusStateConnected) {
+ !xen_device_state_is_active(state)) {
xen_device_backend_set_state(xendev, XenbusStateClosed);
}
mistakenly replaced the check of 'xendev->frontend_state' with a check
(now in a helper function) of 'state', which actually equates to
'xendev->backend_state'.
This patch fixes the mistake.
Fixes: cb3231460747552d70af9d546dc53d8195bcb796 Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20190910171753.3775-1-paul.durrant@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit df6180bb56cd03949c2c64083da58755fed81a61) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Anthony PERARD [Fri, 23 Aug 2019 10:15:33 +0000 (11:15 +0100)]
xen-bus: Fix backend state transition on device reset
When a frontend wants to reset its state and the backend one, it
starts with setting "Closing", then waits for the backend (QEMU) to do
the same.
But when QEMU is setting "Closing" to its state, it triggers an event
(xenstore watch) that re-execute xen_device_backend_changed() and set
the backend state to "Closed". QEMU should wait for the frontend to
set "Closed" before doing the same.
Before setting "Closed" to the backend_state, we are also going to
check if there is a frontend. If that the case, when the backend state
is set to "Closing" the frontend should react and sets its state to
"Closing" then "Closed". The backend should wait for that to happen.
Fixes: b6af8926fb858c4f1426e5acb2cfc1f0580ec98a Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Message-Id: <20190823101534.465-2-anthony.perard@citrix.com>
(cherry picked from commit cb3231460747552d70af9d546dc53d8195bcb796) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eduardo Habkost [Fri, 16 Aug 2019 17:07:50 +0000 (14:07 -0300)]
pc: Don't make die-id mandatory unless necessary
We have this issue reported when using libvirt to hotplug CPUs:
https://bugzilla.redhat.com/show_bug.cgi?id=1741451
Basically, libvirt is not copying die-id from
query-hotpluggable-cpus, but die-id is now mandatory.
We could blame libvirt and say it is not following the documented
interface, because we have this buried in the QAPI schema
documentation:
> Note: currently there are 5 properties that could be present
> but management should be prepared to pass through other
> properties with device_add command to allow for future
> interface extension. This also requires the filed names to be kept in
> sync with the properties passed to -device/device_add.
But I don't think this would be reasonable from us. We can just
make QEMU more flexible and let die-id to be omitted when there's
no ambiguity. This will allow us to keep compatibility with
existing libvirt versions.
Test case included to ensure we don't break this again.
Fixes: commit 176d2cda0dee ("i386/cpu: Consolidate die-id validity in smp context") Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190816170750.23910-1-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit fea374e7c8079563bca7c8fac895c6a880f76adc) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Aurelien Jarno [Thu, 22 Aug 2019 17:45:14 +0000 (10:45 -0700)]
target/alpha: fix tlb_fill trap_arg2 value for instruction fetch
Commit e41c94529740cc26 ("target/alpha: Convert to CPUClass::tlb_fill")
slightly changed the way the trap_arg2 value is computed in case of TLB
fill. The type of the variable used in the ternary operator has been
changed from an int to an enum. This causes the -1 value to not be
sign-extended to 64-bit in case of an instruction fetch. The trap_arg2
ends up with 0xffffffff instead of 0xffffffffffffffff. Fix that by
changing the -1 into -1LL.
This fixes the execution of user space processes in qemu-system-alpha.
Fixes: e41c94529740cc26 Cc: qemu-stable@nongnu.org Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[rth: Test MMU_DATA_LOAD and MMU_DATA_STORE instead of implying them.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit cb1de55a83eaca9ee32be9c959dca99e11f2fea8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
It's not correct to just ignore an error code in a callback; we need to
handle that error and possible report failure to the guest so that they
don't wait indefinitely for an operation that will now never finish.
This ought to help cases reported by Nutanix where iSCSI returns a
legitimate -ECANCELED for certain operations which should be propagated
normally.
Reported-by: Shaju Abraham <shaju.abraham@nutanix.com> Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190729223605.7163-1-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 8ec41c4265714255d5a138f8b538faf3583dcff6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Mon, 29 Jul 2019 21:34:16 +0000 (23:34 +0200)]
dma-helpers: ensure AIO callback is invoked after cancellation
dma_aio_cancel unschedules the BH if there is one, which corresponds
to the reschedule_dma case of dma_blk_cb. This can stall the DMA
permanently, because dma_complete will never get invoked and therefore
nobody will ever invoke the original AIO callback in dbs->common.cb.
Fix this by invoking the callback (which is ensured to happen after
a bdrv_aio_cancel_async, or done manually in the dbs->bh case), and
add assertions to check that the DMA state machine is indeed waiting
for dma_complete or reschedule_dma, but never both.
Reported-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20190729213416.1972-1-pbonzini@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 539343c0a47e19d5dd64d846d64d084d9793681f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Here's a very, very last minute pull request for qemu-4.1. This fixes
two nasty bugs with the XIVE interrupt controller in "dual" mode
(where the guest decides which interrupt controller it wants to use).
One occurs when resetting the guest while I/O is active, and the other
with migration of hotplugged CPUs.
The timing here is very unfortunate. Alas, we only spotted these bugs
very late, and I was sick last week, delaying analysis and fix even
further.
This series hasn't had nearly as much testing as I'd really like, but
I'd still like to squeeze it into qemu-4.1 if possible, since
definitely fixing two bad bugs seems like an acceptable tradeoff for
the risk of introducing different bugs.
Cédric Le Goater [Tue, 13 Aug 2019 06:48:53 +0000 (08:48 +0200)]
spapr/xive: Fix migration of hot-plugged CPUs
The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.
Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.
Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM") Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190813064853.29310-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
David Gibson [Tue, 13 Aug 2019 05:59:18 +0000 (15:59 +1000)]
spapr: Reset CAS & IRQ subsystem after devices
This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model. Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.
The problem is that in spapr_machine_reset() we:
1. Reset the CAS vector state
spapr_ovec_cleanup(spapr->ov5_cas);
2. Reset all devices
qemu_devices_reset()
3. Reset the irq subsystem
spapr_irq_reset();
However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state. We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.
During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.
Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode. kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1. When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.
This change addresses the problem by delaying the CAS reset until
after the devices reset. The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ. The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.
We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types). Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.
Cc: Cédric Le Goater <clg@kaod.org> Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend" Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
XIVE and XICS" Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Gerd Hoffmann [Mon, 12 Aug 2019 06:52:21 +0000 (08:52 +0200)]
display/bochs: fix pcie support
Set QEMU_PCI_CAP_EXPRESS unconditionally in init(), then clear it in
realize() in case the device is not connected to a PCIe bus.
This makes sure the pci config space allocation is big enough, so
accessing the PCIe extended config space doesn't overflow the pci
config space buffer.
PCI(e) config space is guest writable. Writes are limited by
write mask (which probably is also filled with random stuff),
so the guest can only flip enabled bits. But I suspect it
still might be exploitable, so rather serious because it might
be a host escape for the guest. On the other hand the device
is probably not yet in widespread use.
(For a QEMU version without this commit, a mitigation for the
bug is available: use "-device bochs-display" as a conventional pci
device only.)
Cc: qemu-stable@nongnu.org Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190812065221.20907-2-kraxel@redhat.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cornelia Huck [Tue, 6 Aug 2019 11:58:19 +0000 (13:58 +0200)]
compat: disable edid on virtio-gpu base device
'edid' is a property of the virtio-gpu base device, so turning
it off on virtio-gpu-pci is not enough (it misses -ccw). Turn
it off on the base device instead.
Fixes: 0a71966253c8 ("edid: flip the default to enabled") Signed-off-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190806115819.16026-1-cohuck@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 6 Aug 2019 12:40:31 +0000 (13:40 +0100)]
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-06' into staging
Block patches for 4.1.0-rc4:
- Fix the backup block job when using copy offloading
- Fix the mirror block job when using the write-blocking copy mode
- Fix incremental backups after the image has been grown with the
respective bitmap attached to it
# gpg: Signature made Tue 06 Aug 2019 12:57:07 BST
# gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg: issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2019-08-06:
block/backup: disable copy_range for compressed backup
iotests: Test unaligned blocking mirror write
mirror: Only mirror granularity-aligned chunks
iotests: Test incremental backup after truncation
util/hbitmap: update orig_size on truncate
iotests: Test backup job with two guest writes
backup: Copy only dirty areas
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Vladimir Sementsov-Ogievskiy [Tue, 30 Jul 2019 16:32:50 +0000 (19:32 +0300)]
block/backup: disable copy_range for compressed backup
Enabled by default copy_range ignores compress option. It's definitely
unexpected for user.
It's broken since introduction of copy_range usage in backup in 9ded4a011496.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190730163251.755248-3-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Mon, 5 Aug 2019 11:35:26 +0000 (13:35 +0200)]
iotests: Test unaligned blocking mirror write
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805113526.20319-1-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Mon, 5 Aug 2019 15:33:08 +0000 (17:33 +0200)]
mirror: Only mirror granularity-aligned chunks
In write-blocking mode, all writes to the top node directly go to the
target. We must only mirror chunks of data that are aligned to the
job's granularity, because that is how the dirty bitmap works.
Therefore, the request alignment for writes must be the job's
granularity (in write-blocking mode).
Unfortunately, this forces all reads and writes to have the same
granularity (we only need this alignment for writes to the target, not
the source), but that is something to be fixed another time.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805153308.2657-1-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fixes: d06107ade0ce74dc39739bac80de84b51ec18546 Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 5 Aug 2019 12:01:20 +0000 (15:01 +0300)]
util/hbitmap: update orig_size on truncate
Without this, hbitmap_next_zero and hbitmap_next_dirty_area are broken
after truncate. So, orig_size is broken since it's introduction in 76d570dc495c56bb.
Fixes: 76d570dc495c56bb Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190805120120.23585-1-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Thu, 1 Aug 2019 17:39:00 +0000 (19:39 +0200)]
iotests: Test backup job with two guest writes
Perform two guest writes to not yet backed up areas of an image, where
the former touches an inner area of the latter.
Before HEAD^, copy offloading broke this in two ways:
(1) The target image differs from the reference image (what the source
was when the backup started).
(2) But you will not see that in the failing output, because the job
offset is reported as being greater than the job length. This is
because one cluster is copied twice, and thus accounted for twice,
but of course the job length does not increase.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190801173900.23851-3-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Thu, 1 Aug 2019 17:38:59 +0000 (19:38 +0200)]
backup: Copy only dirty areas
The backup job must only copy areas that the copy_bitmap reports as
dirty. This is always the case when using traditional non-offloading
backup, because it copies each cluster separately. When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.
Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.
Fixes: 9ded4a0114968e98b41494fc035ba14f84cdf700 Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190801173900.23851-2-mreitz@redhat.com Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Olaf Hering [Thu, 30 May 2019 19:28:11 +0000 (21:28 +0200)]
Makefile: remove DESTDIR from firmware file content
The resulting firmware files should only contain the runtime path.
Fixes commit 26ce90fde5c ("Makefile: install the edk2 firmware images
and their descriptors")
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190530192812.17637-1-olaf@aepfle.de> Fixes: https://bugs.launchpad.net/qemu/+bug/1838703 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Peter Maydell [Thu, 1 Aug 2019 10:57:42 +0000 (11:57 +0100)]
target/arm: Avoid bogus NSACR traps on M-profile without Security Extension
In Arm v8.0 M-profile CPUs without the Security Extension and also in
v7M CPUs, there is no NSACR register. However, the code we have to handle
the FPU does not always check whether the ARM_FEATURE_M_SECURITY bit
is set before testing whether env->v7m.nsacr permits access to the
FPU. This means that for a CPU with an FPU but without the Security
Extension we would always take a bogus fault when trying to stack
the FPU registers on an exception entry.
We could fix this by adding extra feature bit checks for all uses,
but it is simpler to just make the internal value of nsacr 0xcff
("all non-secure accesses allowed"), since this is not guest
visible when the Security Extension is not present. This allows
us to continue to follow the Arm ARM pseudocode which takes a
similar approach. (In particular, in the v8.1 Arm ARM the register
is documented as reading as 0xcff in this configuration.)
Dr. David Alan Gilbert [Tue, 30 Jul 2019 09:37:19 +0000 (10:37 +0100)]
pcie_root_port: Disable ACS on older machines
ACS got added in 4.0 unconditionally, that broke older<->4.0 migration
where there was a PCIe root port.
Fix this by turning it off for 3.1 and older machines; note this
fixes compatibility for older QEMUs but breaks compatibility with 4.0
for older machine types.
machine type source qemu dest qemu
3.1 3.1 4.0 broken
3.1 3.1 4.1rc2 broken
3.1 3.1 4.1+this OK ++
3.1 4.0 4.1rc2 OK
3.1 4.0 4.1+this broken --
4.0 4.0 4.1rc2 OK
4.0 4.0 4.1+this OK
So we gain and lose; the consensus seems to be treat this as a
fix for older machine types.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190730093719.12958-3-dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Dr. David Alan Gilbert [Tue, 30 Jul 2019 09:37:18 +0000 (10:37 +0100)]
pcie_root_port: Allow ACS to be disabled
ACS was added in 4.0 unconditionally, this breaks migration
compatibility.
Allow ACS to be disabled by adding a property that's
checked by pcie_root_port.
Unfortunately pcie-root-port doesn't have any instance data,
so there's no where for that flag to live, so stuff it into
PCIESlot.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190730093719.12958-2-dgilbert@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>