Joao Martins [Fri, 12 May 2017 08:46:46 +0000 (09:46 +0100)]
xen-netfront: generalize recycling for grants
Takes the already existent mechanism for recycling pages and leverages it
for grant references too. The difference though is that pages permanently
granted to the backend cannot be revoked (because those are mapped by the
other side) and hence these need to go to a separate quarantine pool, until
the point these pages can be consumed. The strategy is: 1) Get a page by
fetching oldest entry in rx_pool 2) If it's not granted then the page is
freed at the head 3) if it's reusable return the page otherwise add it to
quarantine pool 4) fetch oldest entry in quarantine pool and finally 5) if
all else fails then we resort to allocating a new page. Worst case scenario
if we have two atomic read op added on packet path when allocating a new
page for Rx requests.
This page reuse strategy allows us to remove a copy for each page handed
over by the backend leveraging guest RX performance to ~42-47 Gbit/s when
testing backend -> frontend. The measured recycling percentage is about
30% on TCP streams if pool size == ring size; and with pool size == 2 *
ring size these rises up to 80 - 100%. This shows that bigger ring sizes
should allow for better recycling, which remains to be explored.
The only downside of this approach is that it is not 100% guaranteed that
the Rx requests provided to the backend will be already mapped; in other
words, backend may need to do a grant copy on 1% of the packets.
This is not the case though when we are in full copy mode whereby we always
reuse the same grants while copying into new pages into the upper layers.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:45 +0000 (09:46 +0100)]
xen-netfront: add rx page statistics
Add three new counters namely rx_alloc_pages, rx_alloc_failed_pages
and rx_packet_pages such that we can observe how many packets hit
the recyling path (or otherwise).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:44 +0000 (09:46 +0100)]
xen-netfront: introduce rx page recyling
Recycling pages lets us avoid the page allocator when possible, as
similar approach followed by ixgbe and mlx{4,5} drivers. Introduce
a small buffer pool tracking outstanding pages. We increase page
refcount by 1 to avoid stack freeing the page in upper layers. Recycling
of pages is then possible on inflight skbs, by the time we process N
requests by the stack and thus when allocating new Rx requests we
attempting at reusing the oldest page in the pool if and only if
page._refcount is 1. Otherwise we just decrement the refcount (on
free_page).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:42 +0000 (09:46 +0100)]
xen-netfront: introduce staging gref pools
Grant buffers and allow backend to permanently map these grants
through the control messages newly added. This only happens if
the backend advertises "feature-staging-grants".
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:41 +0000 (09:46 +0100)]
xen-netback: use gref mappings for Tx requests
Introduces grants already mapped (by control ring request of the guest)
for TX path which follows similar code path as the grant mapping.
It starts by checking if there's a grant available for header
and frags grefs and if so setting it in tx_grants. If no gref mapping
is found in the tree for the header it will resort to grant copy. For the
frags it will perform a gref lookup on the mapping table, and in case of
no entry is found it falls back to grant map/unmap using mmap_pages. When
skb destructor callback gets called we release the slot and the grant
within the callback to avoid waking up the dealloc thread. As long as there
are no unmaps to be done the dealloc thread will remain inactive.
Results show an improvement of 46% (3.6 vs 1.24 Mpps, 64 pkt size)
measured with pktgen and up to over 48% (28 vs 14.5 Gbit/s) measured
with iperf3 2 queue vif, DomU to Dom0. Measured too with sendfile()
and it goes further up to 35.3 Gbit/s given the lack of a second copy.
Tests run locally on a Intel Xeon CPU E5-2699 v3 with HT disabled,
Dom0 <-> DomU.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:40 +0000 (09:46 +0100)]
xen-netback: use gref mappings for Rx requests
First lookup in the frontend gref mapping table to see whether
the requested gref is already mapped and has the right permissions.
If so, use that instead.
Results are 2.04 Mpps measured with pktgen (pkt_size 64, burst 1)
with already mapped grants versus half of it with grant copy.
Fundamentally it works in the same way as grants, it just avoids
asking Xen to copy the page, and hence opening room for other
improvements.
For example with the mapped grefs it further adds up contention on
queue->wq as the kthread_guest_rx goes to sleep more often. We can
alternatively copy the skb on xenvif_start_xmit() instead of going
through the RX kthread. It would only be beneficial if guest would
*only* use the mapped grants (either by copying or recycling mechanisms)
otherwise it would significantly add up the added cost of a grant copy
hypercall per packet.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:38 +0000 (09:46 +0100)]
xen-netback: introduce staging grant mappings ops
Introduce support for staging grants which means having a
set of preallocated buffers that get reused over time. This is
negotiated through a couple of xenstore entries in the form of:
These entries will hand over a list of `struct xen_ext_gref_alloc` which
frontend provide (size of XEN_PAGE_SIZE which fits 512 entries). And
these entries contain the gref and flags to map into a Domain-0
ballooned page, which gets added in a hash table of gref <-> backing
page kept per queue. Frontend can use this to pregrant certain pages and
reuse them for Rx/Tx requests.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Joao Martins [Fri, 12 May 2017 08:46:37 +0000 (09:46 +0100)]
include/xen: import vendor extension to netif.h
Describe in the protocol headers the extension we're making
with respect to staging grants. The extensions here described
are a middle ground with what is being discussed upstream
while keeping similar (yet different naming) structures
to be proposed upstream. The difference with upstream proposal
is that the staging grants occurs through a control ring;
here we do at xenbus features negotiation, which is more
maintainable while we keep this code out of tree.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Arnd Bergmann [Fri, 12 May 2017 08:46:36 +0000 (09:46 +0100)]
xen-netback: fix type mismatch warning
Wiht the latest rework of the xen-netback driver, we get a warning
on ARM about the types passed into min():
drivers/net/xen-netback/rx.c: In function 'xenvif_rx_next_chunk':
include/linux/kernel.h:739:16: error: comparison of distinct pointer types lacks a cast [-Werror]
The reason is that XEN_PAGE_SIZE is not size_t here. There
is no actual bug, and we can easily avoid the warning using the
min_t() macro instead of min().
Fixes: eb1723a29b9a ("xen-netback: refactor guest rx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f112be65fd3964ec2d56ddd0d5e6061b0fd502da) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If a VIF has been ready for rx_stall_timeout (60s by default) and an
Rx ring is drained of all requests an Rx stall will be incorrectly
detected. When this occurs and the guest Rx queue is empty, the Rx
ring's event index will not be set and the frontend will not raise an
event when new requests are placed on the ring, permanently stalling
the VIF.
This is a regression introduced by eb1723a29b9a7 (xen-netback:
refactor guest rx).
Fix this by reinstating the setting of queue->last_rx_time when
placing a packet onto the guest Rx ring.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d1ef006dc116bf6487426b0b50c1bf2bf51e6423) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ross Lagerwall [Fri, 12 May 2017 08:46:34 +0000 (09:46 +0100)]
xen/netback: add fraglist support for to-guest rx
This allows full 64K skbuffs (with 1500 mtu ethernet, composed of 45
fragments) to be handled by netback for to-guest rx.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
[re-based] Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2167ca029c2449018314fdf8637c1eb3f123036e) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 12 May 2017 08:46:33 +0000 (09:46 +0100)]
xen-netback: batch copies for multiple to-guest rx packets
Instead of flushing the copy ops when an packet is complete, complete
packets when their copy ops are done. This improves performance by
reducing the number of grant copy hypercalls.
Latency is still limited by the relatively small size of the copy
batch.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[re-based] Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a37f12298c251a48bc74d4012e07bf0d78175f46) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 12 May 2017 08:46:32 +0000 (09:46 +0100)]
xen-netback: process guest rx packets in batches
Instead of only placing one skb on the guest rx ring at a time, process
a batch of up-to 64. This improves performance by ~10% in some tests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[re-based] Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 98f6d57ced73b723551568262019f1d6c8771f20) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 12 May 2017 08:46:31 +0000 (09:46 +0100)]
xen-netback: immediately wake tx queue when guest rx queue has space
When an skb is removed from the guest rx queue, immediately wake the
tx queue, instead of after processing them.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[re-based] Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7c0b1a23e6f983fe392c8ffa71d05189ae52ebb5) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 12 May 2017 08:46:30 +0000 (09:46 +0100)]
xen-netback: refactor guest rx
Refactor the to-guest (rx) path to:
1. Push responses for completed skbs earlier, reducing latency.
2. Reduce the per-queue memory overhead by greatly reducing the
maximum number of grant copy ops in each hypercall (from 4352 to
64). Each struct xenvif_queue is now only 44 kB instead of 220 kB.
3. Make the code more maintainable.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[re-based] Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit eb1723a29b9a75dd787510a39096a68dba6cc200)
Paul Durrant [Fri, 12 May 2017 08:46:29 +0000 (09:46 +0100)]
xen-netback: retire guest rx side prefix GSO feature
As far as I am aware only very old Windows network frontends make use of
this style of passing GSO packets from backend to frontend. These
frontends can easily be replaced by the freely available Xen Project
Windows PV network frontend, which uses the 'default' mechanism for
passing GSO packets, which is also used by all Linux frontends.
NOTE: Removal of this feature will not cause breakage in old Windows
frontends. They simply will no longer receive GSO packets - the
packets instead being fragmented in the backend.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fedbc8c132bcf836358103195d8b6df6c03d9daf) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Paul Durrant [Fri, 12 May 2017 08:46:28 +0000 (09:46 +0100)]
xen-netback: separate guest side rx code into separate module
The netback source module has become very large and somewhat confusing.
This patch simply moves all code related to the backend to frontend (i.e
guest side rx) data-path into a separate rx source module.
This patch contains no functional change, it is code movement and
minimal changes to avoid patch style-check issues.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3254f83694fe519ac18b8334a2f481d80c3a8a3a)
Joao Martins [Mon, 15 May 2017 16:51:10 +0000 (17:51 +0100)]
x86/xen/time: setup secondary time info for vdso
In order to support pvclock vdso on xen we need to setup the
time info page for each vcpu and register those pages with Xen
using the VCPUOP_register_vcpu_time_memory_area hypercall. This
hypercall will also forcefully update the pvti which will set
some of the necessary flags for vdso. Afterwards we check if it
supports the PVCLOCK_TSC_STABLE_BIT flag which is mandatory for
having vdso/vsyscall support. And if so, it will set the cpu
pvti's that will be later used when mapping the vdso image.
Note that before setting up vdso we check if PVCLOCK_TSC_STABLE_BIT
with the primary vcpu_info which if supported adds up this flag
to the pvclock supported ones. This is to allow Xen clocksource
to be faster irrespesctive of how the pvclock vdso pages are setup.
This allows to speed up pvclock_clocksource_read() users.
The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info copy.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26107942
Konrad Rzeszutek Wilk [Wed, 8 Feb 2017 19:24:18 +0000 (14:24 -0500)]
Merge branch 'topic/uek-4.1/4.10-xen' of git://ca-git.us.oracle.com/linux-bostrovs-public into topic/uek-4.1/xen
* 'topic/uek-4.1/4.10-xen' of git://ca-git.us.oracle.com/linux-bostrovs-public: (49 commits)
xen: events: Replace BUG() with BUG_ON()
xen: remove stale xs_input_avail() from header
xen: return xenstore command failures via response instead of rc
xen: xenbus driver must not accept invalid transaction ids
xen/evtchn: use rb_entry()
xen/setup: Don't relocate p2m over existing one
xen/balloon: Only mark a page as managed when it is released
xen/scsifront: don't request a slot on the ring until request is ready
xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
x86: Make E820_X_MAX unconditionally larger than E820MAX
xen/pci: Bubble up error and fix description.
xen: xenbus: set error code on failure
xen: set error code on failures
xen/events: use xen_vcpu_id mapping for EVTCHNOP_status
xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
tpm xen: Remove bogus tpm_chip_unregister
xen-scsifront: Add a missing call to kfree
xenfs: Use proc_create_mount_point() to create /proc/xen
xen-netback: fix error handling output
xen: make use of xenbus_read_unsigned() in xenbus
...
OraBug: 25497392 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Paul Durrant [Thu, 12 May 2016 13:43:03 +0000 (14:43 +0100)]
xen-netback: fix extra_info handling in xenvif_tx_err()
Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reported-by: Jan Beulich <JBeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72eec92accabe3ec34f27a9d3cd459bf5a877c33) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 25445336 Tested-by: Majid Valiollahzadeh <majid.valiollahzadeh@oracle.com>
Juergen Gross [Thu, 22 Dec 2016 07:19:47 +0000 (08:19 +0100)]
xen: return xenstore command failures via response instead of rc
When the xenbus driver does some special handling for a Xenstore
command any error condition related to the command should be returned
via an error response instead of letting the related write operation
fail. Otherwise the user land handler might take wrong decisions
assuming the connection to Xenstore is broken.
While at it try to return the same error values xenstored would
return for those cases.
Juergen Gross [Thu, 22 Dec 2016 07:19:46 +0000 (08:19 +0100)]
xen: xenbus driver must not accept invalid transaction ids
When accessing Xenstore in a transaction the user is specifying a
transaction id which he normally obtained from Xenstore when starting
the transaction. Xenstore is validating a transaction id against all
known transaction ids of the connection the request came in. As all
requests of a domain not being the one where Xenstore lives share
one connection, validation of transaction ids of different users of
Xenstore in that domain should be done by the kernel of that domain
being the multiplexer between the Xenstore users in that domain and
Xenstore.
In order to prohibit one Xenstore user "hijacking" a transaction from
another user the xenbus driver has to verify a given transaction id
against all known transaction ids of the user before forwarding it to
Xenstore.
Ross Lagerwall [Mon, 12 Dec 2016 14:35:13 +0000 (14:35 +0000)]
xen/setup: Don't relocate p2m over existing one
When relocating the p2m, take special care not to relocate it so
that is overlaps with the current location of the p2m/initrd. This is
needed since the full extent of the current location is not marked as a
reserved region in the e820.
This was seen to happen to a dom0 with a large initial p2m and a small
reserved region in the middle of the initial p2m.
Ross Lagerwall [Fri, 9 Dec 2016 17:10:22 +0000 (17:10 +0000)]
xen/balloon: Only mark a page as managed when it is released
Only mark a page as managed when it is released back to the allocator.
This ensures that the managed page count does not get falsely increased
when a VM is running. Correspondingly change it so that pages are
marked as unmanaged after getting them from the allocator.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 25497392
Juergen Gross [Fri, 2 Dec 2016 06:15:45 +0000 (07:15 +0100)]
xen/scsifront: don't request a slot on the ring until request is ready
Instead of requesting a new slot on the ring to the backend early, do
so only after all has been setup for the request to be sent. This
makes error handling easier as we don't need to undo the request id
allocation and ring slot allocation.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 25497392
Alex Thorlton [Mon, 5 Dec 2016 17:49:14 +0000 (11:49 -0600)]
xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
On systems with sufficiently large e820 tables, and several IOAPICs, it
is possible for the XENMEM_machine_memory_map callback (and its
counterpart, XENMEM_memory_map) to attempt to return an e820 table with
more than 128 entries. This callback adds entries to the BIOS-provided
e820 table to account for IOAPIC registers, which, on sufficiently large
systems, can result in an e820 table that is too large to copy back into
xen_e820_map.
This change simply increases the size of xen_e820_map to E820_X_MAX to
ensure that there is enough room to store the entire e820 map returned
from this callback.
Alex Thorlton [Mon, 5 Dec 2016 17:49:13 +0000 (11:49 -0600)]
x86: Make E820_X_MAX unconditionally larger than E820MAX
It's really not necessary to limit E820_X_MAX to 128 in the non-EFI
case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that
E820_X_MAX is always at least slightly larger than E820MAX.
The real motivation behind this is actually to prevent some issues in
the Xen kernel, where the XENMEM_machine_memory_map hypercall can
produce an e820 map larger than 128 entries, even on systems where the
original e820 table was quite a bit smaller than that, depending on how
many IOAPICs are installed on the system.
Konrad Rzeszutek Wilk [Tue, 6 Dec 2016 14:28:21 +0000 (09:28 -0500)]
xen/pci: Bubble up error and fix description.
The function is never called under PV guests, and only shows up
when MSI (or MSI-X) cannot be allocated. Convert the message
to include the error value.
Pan Bian [Mon, 5 Dec 2016 08:22:22 +0000 (16:22 +0800)]
xen: xenbus: set error code on failure
Variable err is initialized with 0. As a result, the return value may
be 0 even if get_zeroed_page() fails to allocate memory. This patch fixes
the bug, initializing err with "-ENOMEM".
Pan Bian [Mon, 5 Dec 2016 08:23:05 +0000 (16:23 +0800)]
xen: set error code on failures
Variable rc is reset in the loop, and its value will be non-negative
during the second and after repeat of the loop. If it fails to allocate
memory then, it may return a non-negative integer, which indicates no
error. This patch fixes the bug, assigning "-ENOMEM" to rc when
kzalloc() or alloc_page() returns NULL, and removing the initialization
of rc outside of the loop.
Boris Ostrovsky [Mon, 21 Nov 2016 14:56:06 +0000 (09:56 -0500)]
xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
NUMA balancing") set VM_IO flag to prevent grant maps from being
subjected to NUMA balancing.
It was discovered recently that this flag causes get_user_pages() to
always fail with -EFAULT.
Seth Forshee [Mon, 14 Nov 2016 11:12:56 +0000 (11:12 +0000)]
xenfs: Use proc_create_mount_point() to create /proc/xen
Mounting proc in user namespace containers fails if the xenbus
filesystem is mounted on /proc/xen because this directory fails
the "permanently empty" test. proc_create_mount_point() exists
specifically to create such mountpoints in proc but is currently
proc-internal. Export this interface to modules, then use it in
xenbus when creating /proc/xen.
Arnd Bergmann [Thu, 10 Nov 2016 08:55:42 +0000 (09:55 +0100)]
xen-netback: fix error handling output
The connect function prints an unintialized error code after an
earlier initialization was removed:
drivers/net/xen-netback/xenbus.c: In function 'connect':
drivers/net/xen-netback/xenbus.c:938:3: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This prints it as -EINVAL instead, which seems to be the most
appropriate error code. Before the patch that caused the warning,
this would print a positive number returned by vsscanf() instead,
which is also wrong. We probably don't need a backport though,
as fixing the warning here should be sufficient.
Fixes: f95842e7a9f2 ("xen: make use of xenbus_read_unsigned() in xen-netback") Fixes: 8d3d53b3e433 ("xen-netback: Add support for multiple queues") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 25497392
Juergen Gross [Mon, 31 Oct 2016 13:58:42 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xenbus
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Juergen Gross [Mon, 31 Oct 2016 13:58:41 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-pciback
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Juergen Gross [Mon, 31 Oct 2016 13:58:41 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-fbfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Juergen Gross [Mon, 31 Oct 2016 13:58:41 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-pcifront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.
Juergen Gross [Mon, 31 Oct 2016 13:58:41 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-netfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Juergen Gross [Mon, 31 Oct 2016 13:58:41 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-netback
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Cc: wei.liu2@citrix.com Cc: paul.durrant@citrix.com Cc: netdev@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
(cherry picked from commit f95842e7a9f235ef3b7d6d4b70fee2244149f1e7) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
drivers/net/xen-netback/xenbus.c
Juergen Gross [Mon, 31 Oct 2016 13:58:40 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-kbdfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Juergen Gross [Mon, 31 Oct 2016 13:58:40 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-tpmfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.
Juergen Gross [Mon, 31 Oct 2016 13:58:40 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-blkfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.
Juergen Gross [Mon, 31 Oct 2016 13:58:40 +0000 (14:58 +0100)]
xen: make use of xenbus_read_unsigned() in xen-blkback
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.
Juergen Gross [Mon, 31 Oct 2016 13:58:40 +0000 (14:58 +0100)]
xen: introduce xenbus_read_unsigned()
There are multiple instances of code reading an optional unsigned
parameter from Xenstore via xenbus_scanf(). Instead of repeating the
same code over and over add a service function doing the job.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Dongli Zhang [Wed, 2 Nov 2016 01:04:33 +0000 (09:04 +0800)]
xen-netfront: cast grant table reference first to type int
IS_ERR_VALUE() in commit 87557efc27f6a50140fb20df06a917f368ce3c66
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OraBug: 25497392
Juergen Gross [Tue, 11 Oct 2016 11:34:16 +0000 (13:34 +0200)]
xenbus: advertise control feature flags
The Xen docs specify several flags which a guest can set to advertise
which values of the xenstore control/shutdown key it will recognize.
This patch adds code to write all the relevant feature-flag keys.
Based-on-patch-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Support the driver_override scheme introduced with commit 782a985d7af2
("PCI: Introduce new device binding path using pci_dev.driver_override")
As pcistub_probe() is called for all devices (it has to check for a
match based on the slot address rather than device type) it has to
check for driver_override set to "pciback" itself.
Up to now for assigning a pci device to pciback you need something like:
The Xen pciback driver has a list of all pci devices it is ready to
seize. There is no check whether a to be added entry already exists.
While this might be no problem in the common case it might confuse
those which consume the list via sysfs.
Modify the handling of this list by not adding an entry which already
exists. As this will be needed later split out the list handling into
a separate function.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
The Xen pciback driver maintains a list of all its seized devices.
There are two functions searching the list for a specific device with
basically the same semantics just returning different structures in
case of a match.
Split out the search function.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Colin Ian King [Mon, 12 Sep 2016 10:20:46 +0000 (11:20 +0100)]
x86/xen: add missing \n at end of printk warning message
The message is missing a \n, add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
xen-netfront: avoid packet loss when ethernet header crosses page boundary
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:
But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.
I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
the original SKB structure. We'll have to add a flag indicating the fact
that the particular request is a part of the original linear part and not
a frag. We'll need to know the length of the linear part to pre-allocate
memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
boundary. That seems preferable short-term and shouldn't bring
significant performance degradation as such packets are rare. That's what
this patch is trying to achieve with skb_copy().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OraBug: 25497392
Markus Elfring [Thu, 25 Aug 2016 11:23:06 +0000 (13:23 +0200)]
xen/grant-table: Use kmalloc_array() in arch_gnttab_valloc()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Juergen Gross [Tue, 2 Aug 2016 07:22:12 +0000 (09:22 +0200)]
xen: Make VPMU init message look less scary
The default for the Xen hypervisor is to not enable VPMU in order to
avoid security issues. In this case the Linux kernel will issue the
message "Could not initialize VPMU for cpu 0, error -95" which looks
more like an error than a normal state.
Change the message to something less scary in case the hypervisor
returns EOPNOTSUPP or ENOSYS when trying to activate VPMU.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Juergen Gross [Tue, 2 Aug 2016 06:53:36 +0000 (08:53 +0200)]
xen: rename xen_pmu_init() in sys-hypervisor.c
There are two functions with name xen_pmu_init() in the kernel. Rename
the one in drivers/xen/sys-hypervisor.c to avoid shadowing the one in
arch/x86/xen/pmu.c
To avoid the same problem in future rename some more functions.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain. This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.
[akpm@linux-foundation.org: build fix: unconditionally include kexec.h] Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OraBug: 25497392
(cherry picked from commit c0253115968c35f3e1ee497282efb75ccf29fb98) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/xen/enlighten.c
Jan Beulich [Fri, 8 Jul 2016 12:15:07 +0000 (06:15 -0600)]
xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
both Intel and AMD systems. Doing any kind of hardware capability
checks in the driver as a prerequisite was wrong anyway: With the
hypervisor being in charge, all such checking should be done by it. If
ACPI data gets uploaded despite some missing capability, the hypervisor
is free to ignore part or all of that data.
Ditch the entire check_prereq() function, and do the only valid check
(xen_initial_domain()) in the caller in its place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
Boris Ostrovsky [Wed, 2 Dec 2015 17:10:48 +0000 (12:10 -0500)]
xen: Resume PMU from non-atomic context
Resuming PMU currently triggers a warning from ___might_sleep() (assuming
CONFIG_DEBUG_ATOMIC_SLEEP is set) when xen_pmu_init() allocates GFP_KERNEL
page because we are in state resembling atomic context.
Move resuming PMU to xen_arch_resume() which is called in regular context.
For symmetry move suspending PMU to xen_arch_suspend() as well.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> # 4.3 Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OraBug: 25497392
David Vrabel [Fri, 9 Dec 2016 14:41:13 +0000 (14:41 +0000)]
xenbus: fix deadlock on writes to /proc/xen/xenbus
/proc/xen/xenbus does not work correctly. A read blocked waiting for
a xenstore message holds the mutex needed for atomic file position
updates. This blocks any writes on the same file handle, which can
deadlock if the write is needed to unblock the read.
Clear FMODE_ATOMIC_POS when opening this device to always get
character device like sematics.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
Orabug: 25425387
(cherry picked from commit 581d21a2d02a798ee34e56dbfa13f891b3a90c30)
Jira: OCC-36718 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:36 +0000 (17:56 +0200)]
x86/acpi: store ACPI ids from MADT for future usage
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3e9e57fad3d8530aa30787f861c710f598ddc4e7) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Filipe Manco [Thu, 15 Sep 2016 15:10:46 +0000 (17:10 +0200)]
xen-netback: fix error handling on netback_probe()
In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().
Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.
Signed-off-by: Filipe Manco <filipe.manco@neclab.eu> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cce94483e47e8e3d74cf4475dea33f9fd4b6ad9f) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 55467dea2967259f21f4f854fc99d39cc5fea60e) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Mon, 15 Aug 2016 15:02:38 +0000 (09:02 -0600)]
xenbus: don't look up transaction IDs for ordinary writes
This should really only be done for XS_TRANSACTION_END messages, or
else at least some of the xenstore-* tools don't work anymore.
Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition") Reported-by: Richard Schütz <rschuetz@uni-koblenz.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Richard Schütz <rschuetz@uni-koblenz.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9a035a40f7f3f6708b79224b86c5777a3334f7ea) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Bob Liu [Wed, 27 Jul 2016 09:42:04 +0000 (17:42 +0800)]
xen-blkfront: free resources if xlvbd_alloc_gendisk fails
Current code forgets to free resources in the failure path of
xlvbd_alloc_gendisk(), this patch fix it.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4e876c2bd37fbb5c37a4554a79cf979d486f0e82) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
xen: add static initialization of steal_clock op to xen_time_ops
pv_time_ops might be overwritten with xen_time_ops after the
steal_clock operation has been initialized already. To prevent calling
a now uninitialized function pointer add the steal_clock static
initialization to xen_time_ops.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit d34c30cc1fa80f509500ff192ea6bc7d30671061) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:43 +0000 (17:56 +0200)]
xen/pvhvm: run xen_vcpu_setup() for the boot CPU
Historically we didn't call VCPUOP_register_vcpu_info for CPU0 for
PVHVM guests (while we had it for PV and ARM guests). This is usually
fine as we can use vcpu info in the shared_info page but when we try
booting on a vCPU with Xen's vCPU id > 31 (e.g. when we try to kdump
after crashing on this CPU) we're not able to boot.
Switch to always doing VCPUOP_register_vcpu_info for the boot CPU.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee42d665d3f5db975caf87baf101a57235ddb566) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:42 +0000 (17:56 +0200)]
xen/evtchn: use xen_vcpu_id mapping
Use the newly introduced xen_vcpu_id mapping to get Xen's idea of vCPU
id for CPU0.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cbbb4682394c45986a34d8c77a02e7a066e30235) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:41 +0000 (17:56 +0200)]
xen/events: fifo: use xen_vcpu_id mapping
EVTCHNOP_init_control has vCPU id as a parameter and Xen's idea of
vCPU id should be used. Use the newly introduced xen_vcpu_id mapping
to convert it from Linux's id.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit be78da1cf43db4c1a9e13af8b6754199a89d5d75) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:40 +0000 (17:56 +0200)]
xen/events: use xen_vcpu_id mapping in events_base
EVTCHNOP_bind_ipi and EVTCHNOP_bind_virq pass vCPU id as a parameter
and Xen's idea of vCPU id should be used. Use the newly introduced
xen_vcpu_id mapping to convert it from Linux's id.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 8058c0b897e7d1ba5c900cb17eb82aa0d88fca53) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:39 +0000 (17:56 +0200)]
x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
shared_info page has space for 32 vcpu info slots for first 32 vCPUs
but these are the first 32 vCPUs from Xen's perspective and we should
map them accordingly with the newly introduced xen_vcpu_id mapping.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e15a8621935cac527b4e0ed4078d24c3e5ef73a6) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:38 +0000 (17:56 +0200)]
x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter
while Xen's idea is expected. In some cases these ideas diverge so we
need to do remapping.
Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr().
Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as
they're only being called by PV guests before perpu areas are
initialized. While the issue could be solved by switching to
early_percpu for xen_vcpu_id I think it's not worth it: PV guests will
probably never get to the point where their idea of vCPU id diverges
from Xen's.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ad5475f9faf5186b7f59de2c6481ee3e211f1ed7) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:37 +0000 (17:56 +0200)]
xen: introduce xen_vcpu_id mapping
It may happen that Xen's and Linux's ideas of vCPU id diverge. In
particular, when we crash on a secondary vCPU we may want to do kdump
and unlike plain kexec where we do migrate_to_reboot_cpu() we try
booting on the vCPU which crashed. This doesn't work very well for
PVHVM guests as we have a number of hypercalls where we pass vCPU id
as a parameter. These hypercalls either fail or do something
unexpected.
To solve the issue introduce percpu xen_vcpu_id mapping. ARM and PV
guests get direct mapping for now. Boot CPU for PVHVM guest gets its
id from CPUID. With secondary CPUs it is a bit more
trickier. Currently, we initialize IPI vectors before these CPUs boot
so we can't use CPUID. Use ACPI ids from MADT instead.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 88e957d6e47f1232ad15b21e54a44f1147ea8c1b) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:35 +0000 (17:56 +0200)]
x86/xen: update cpuid.h from Xen-4.7
Update cpuid.h header from xen hypervisor tree to get
XEN_HVM_CPUID_VCPU_ID_PRESENT definition.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit de2f5537b397249e91cafcbed4de64a24818542e) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
David Vrabel [Mon, 11 Jul 2016 14:45:51 +0000 (15:45 +0100)]
xen/evtchn: add IOCTL_EVTCHN_RESTRICT
IOCTL_EVTCHN_RESTRICT limits the file descriptor to being able to bind
to interdomain event channels from a specific domain. Event channels
that are already bound continue to work for sending and receiving
notifications.
This is useful as part of deprivileging a user space PV backend or
device model (QEMU). e.g., Once the device model as bound to the
ioreq server event channels it can restrict the file handle so an
exploited DM cannot use it to create or bind to arbitrary event
channels.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
(cherry picked from commit fbc872c38c8fed31948c85683b5326ee5ab9fccc) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Thu, 7 Jul 2016 07:38:13 +0000 (01:38 -0600)]
xen-blkback: really don't leak mode property
Commit 9d092603cc ("xen-blkback: do not leak mode property") left one
path unfixed; correct this.
Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit aea305e11f7a7af12aa2beb7c7e053a338659c49) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Thu, 7 Jul 2016 07:38:58 +0000 (01:38 -0600)]
xen-blkback: constify instance of "struct attribute_group"
The functions these get passed to have been taking pointers to const
since at least 2.6.16.
Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 530439484d2d9f2a7f1038b1afd3d3543ecc63f6) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Thu, 7 Jul 2016 08:05:46 +0000 (02:05 -0600)]
xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
... for single items being collected: It is more typesafe (as the
compiler can check format string and to-be-written-to variable match)
and requires one less parameter to be passed.
Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ff595325ed556fb4b83af5b9ffd5c427c18405d7) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Thu, 7 Jul 2016 08:05:21 +0000 (02:05 -0600)]
xen-blkback: prefer xenbus_scanf() over xenbus_gather()
... for single items being collected: It is more typesafe (as the
compiler can check format string and to-be-written-to variable match)
and requires one less parameter to be passed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 6694389af9be4d1eb8d3313788a902f0590fb8c2) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Paul Gortmaker [Thu, 14 Jul 2016 00:18:59 +0000 (20:18 -0400)]
x86/xen: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7a2463dcacee3f2f36c78418c201756372eeea6b) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Sat, 9 Jul 2016 00:35:30 +0000 (17:35 -0700)]
Input: xen-kbdfront - prefer xenbus_write() over xenbus_printf() where possible
... as being the simpler variant.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
(cherry picked from commit cd6763be8f553c7db421d38ddcb36466fb8512cd) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Up to now reading the stolen time of a remote cpu was not possible in a
performant way under Xen. This made support of runqueue steal time via
paravirt_steal_rq_enabled impossible.
With the addition of an appropriate hypervisor interface this is now
possible, so add the support.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6ba286ad845799b135e5af73d1fbc838fa79f709) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Wed, 6 Jul 2016 07:00:14 +0000 (01:00 -0600)]
xen-pciback: drop superfluous variables
req_start is simply an alias of the "offset" function parameter, and
req_end is being used just once in each function. (And both variables
were loop invariant anyway, so should at least have got initialized
outside the loop.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 1ad6344acfbf19288573b4a5fa0b07cbb5af27d7) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Wed, 6 Jul 2016 06:59:35 +0000 (00:59 -0600)]
xen-pciback: short-circuit read path used for merging write values
There's no point calling xen_pcibk_config_read() here - all it'll do is
return whatever conf_space_read() returns for the field which was found
here (and which would be found there again). Also there's no point
clearing tmp_val before the call.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee87d6d0d36d98c550f99274a81841033226e3bf) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Wed, 6 Jul 2016 06:58:58 +0000 (00:58 -0600)]
xen-pciback: use const and unsigned in bar_init()
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 585203609c894db11dea724b743c04d0c9927f39) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Wed, 6 Jul 2016 06:58:19 +0000 (00:58 -0600)]
xen-pciback: simplify determination of 64-bit memory resource
Other than for raw BAR values, flags are properly separated in the
internal representation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c8670c22e04e4e42e752cc5b53922106b3eedbda) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937
Jan Beulich [Wed, 6 Jul 2016 06:57:43 +0000 (00:57 -0600)]
xen-pciback: fold read_dev_bar() into its now single caller
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6ad2655d87d2d35c1de4500402fae10fe7b30b4a) Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937