Konrad Rzeszutek Wilk [Tue, 6 Nov 2012 14:51:40 +0000 (09:51 -0500)]
Merge branch 'stable/for-linus-3.7.rebased' into uek2-merge
* stable/for-linus-3.7.rebased:
xen/mmu: Use Xen specific TLB flush instead of the generic one.
xen: resynchronise grant table status codes with upstream
xen/privcmd: return -EFAULT on error
xen/privcmd: Fix mmap batch ioctl error status copy back.
xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl
xen/mm: return more precise error from xen_remap_domain_range()
xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer.
xen/swiotlb: Remove functions not needed anymore.
xen: allow privcmd for HVM guests
xen/sysfs: Use XENVER_guest_handle to query UUID
xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static.
xen: missing includes
xen: update xen_add_to_physmap interface
Konrad Rzeszutek Wilk [Wed, 31 Oct 2012 16:38:31 +0000 (12:38 -0400)]
xen/mmu: Use Xen specific TLB flush instead of the generic one.
As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.
This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.
Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge
* stable/for-linus-3.6.rebased:
xen/boot: Disable BIOS SMP MP table search.
xen/m2p: do not reuse kmap_op->dev_bus_addr
xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.
Ian Campbell [Fri, 14 Sep 2012 07:19:01 +0000 (08:19 +0100)]
xen: resynchronise grant table status codes with upstream
Adds GNTST_address_too_big and GNTST_eagain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit e58f5b55113b8fd4eb8eb43f5508d87e4862f280)
Dan Carpenter [Sat, 8 Sep 2012 09:57:35 +0000 (12:57 +0300)]
xen/privcmd: return -EFAULT on error
__copy_to_user() returns the number of bytes remaining to be copied but
we want to return a negative error code here.
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 9d2be9287107695708e6aae5105a8a518a6cb4d0)
Andres Lagar-Cavilla [Thu, 6 Sep 2012 17:24:39 +0000 (13:24 -0400)]
xen/privcmd: Fix mmap batch ioctl error status copy back.
Copy back of per-slot error codes is only necessary for V2. V1 does not provide
an error array, so copyback will unconditionally set the global rc to EFAULT.
Only copyback for V2.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 1714df7f2cee6a741c3ed24231ec5db25b90633a)
Andres Lagar-Cavilla [Fri, 31 Aug 2012 13:59:30 +0000 (09:59 -0400)]
xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl
PRIVCMD_MMAPBATCH_V2 extends PRIVCMD_MMAPBATCH with an additional
field for reporting the error code for every frame that could not be
mapped. libxc prefers PRIVCMD_MMAPBATCH_V2 over PRIVCMD_MMAPBATCH.
Also expand PRIVCMD_MMAPBATCH to return appropriate error-encoding top nibble
in the mfn array.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ceb90fa0a8008059ecbbf9114cb89dc71a730bb6)
David Vrabel [Thu, 30 Aug 2012 12:58:11 +0000 (13:58 +0100)]
xen/mm: return more precise error from xen_remap_domain_range()
Callers of xen_remap_domain_range() need to know if the remap failed
because frame is currently paged out. So they can retry the remap
later on. Return -ENOENT in this case.
This assumes that the error codes returned by Xen are a subset of
those used by the kernel. It is unclear if this is defined as part of
the hypercall ABI.
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 69870a847856a1ba81f655a8633fce5f5b614730)
Konrad Rzeszutek Wilk [Mon, 13 Aug 2012 15:00:08 +0000 (11:00 -0400)]
xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer.
arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
Konrad Rzeszutek Wilk [Mon, 13 Aug 2012 17:26:11 +0000 (13:26 -0400)]
xen/swiotlb: Remove functions not needed anymore.
Sparse warns us off:
drivers/xen/swiotlb-xen.c:506:1: warning: symbol 'xen_swiotlb_map_sg' was not declared. Should it be static?
drivers/xen/swiotlb-xen.c:534:1: warning: symbol 'xen_swiotlb_unmap_sg' was not declared. Should it be static?
and it looks like we do not need this function at all.
Stefano Stabellini [Wed, 22 Aug 2012 16:20:16 +0000 (17:20 +0100)]
xen: allow privcmd for HVM guests
This patch removes the "return -ENOSYS" for auto_translated_physmap
guests from privcmd_mmap, thus it allows ARM guests to issue privcmd
mmap calls. However privcmd mmap calls are still going to fail for HVM
and hybrid guests on x86 because the xen_remap_domain_mfn_range
implementation is currently PV only.
Changes in v2:
- better commit message;
- return -EINVAL from xen_remap_domain_mfn_range if
auto_translated_physmap.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 1a1d43318aeb74d679372c0b65029957be274529)
Daniel De Graaf [Thu, 16 Aug 2012 20:40:26 +0000 (16:40 -0400)]
xen/sysfs: Use XENVER_guest_handle to query UUID
This hypercall has been present since Xen 3.1, and is the preferred
method for a domain to obtain its UUID. Fall back to the xenstore method
if using an older version of Xen (which returns -ENOSYS).
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 5c13f8067745efc15f6ad0158b58d57c44104c25)
Stefano Stabellini [Mon, 6 Aug 2012 14:27:24 +0000 (15:27 +0100)]
xen: update xen_add_to_physmap interface
Update struct xen_add_to_physmap to be in sync with Xen's version of the
structure.
The size field was introduced by:
changeset: 24164:707d27fe03e7
user: Jean Guyader <jean.guyader@eu.citrix.com>
date: Fri Nov 18 13:42:08 2011 +0000
summary: mm: New XENMEM space, XENMAPSPACE_gmfn_range
According to the comment:
"This new field .size is located in the 16 bits padding between .domid
and .space in struct xen_add_to_physmap to stay compatible with older
versions."
Changes in v2:
- remove erroneous comment in the commit message.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit b58aaa4b0b3506c094308342d746f600468c63d9)
Jan Beulich [Tue, 18 Sep 2012 11:29:03 +0000 (12:29 +0100)]
xen-pciback: support wild cards in slot specifications
Particularly for hiding sets of SR-IOV devices, specifying them all
individually is rather cumbersome. Therefore, allow function and slot
numbers to be replaced by a wildcard character ('*').
Unfortunately this gets complicated by the in-kernel sscanf()
implementation not being really standard conformant - matching of
plain text tails cannot be checked by the caller (a patch to overcome
this will be sent shortly, and a follow-up patch for simplifying the
code is planned to be sent when that fixed went upstream).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit c3cb4709809e655a4ba5a716086c8bc5bbbbccdb)
As the initial domain we are able to search/map certain regions
of memory to harvest configuration data. For all low-level we
use ACPI tables - for interrupts we use exclusively ACPI _PRT
(so DSDT) and MADT for INT_SRC_OVR.
The SMP MP table is not used at all. As a matter of fact we do
not even support machines that only have SMP MP but no ACPI tables.
Lets follow how Moorestown does it and just disable searching
for BIOS SMP tables.
This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
9f->100 for 1:1 PTE
Freeing 9f-100 pfn range: 97 pages freed
1-1 mapping on 9f->100
.. snip..
e820: BIOS-provided physical RAM map:
Xen: [mem 0x0000000000000000-0x000000000009efff] usable
Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
.. snip..
Scan for SMP in [mem 0x00000000-0x000003ff]
Scan for SMP in [mem 0x0009fc00-0x0009ffff]
Scan for SMP in [mem 0x000f0000-0x000fffff]
found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
(XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
(XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
. snip..
Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
.. snip..
Call Trace:
[<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
[<ffffffff81624731>] ? printk+0x48/0x4a
[<ffffffff81ad32ac>] early_ioremap+0x13/0x15
[<ffffffff81acc140>] get_mpc_size+0x2f/0x67
[<ffffffff81acc284>] smp_scan_config+0x10c/0x136
[<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
[<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
[<ffffffff81624731>] ? printk+0x48/0x4a
[<ffffffff81abca7f>] start_kernel+0x90/0x390
[<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
[<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
(which is what early_ioremap sticks as a flag) - which meant
we would get MFN 0xFF (pte ff461, which is OK), and then it would
also map 0x100 (b/c ioremap tries to get page aligned request, and
it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
bypasses the P2M lookup we would happily set the PTE to 1000461.
Xen would deny the request since we do not have access to the
Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
0x80140.
CC: stable@kernel.org
Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit c205ce318b53a73340cad92a14297d213f2f679e)
If the caller passes a valid kmap_op to m2p_add_override, we use
kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
part of the interface with Xen and if we are batching the hypercalls it
might not have been written by the hypervisor yet. That means that later
on Xen will write to it and we'll think that the original mfn is
actually what Xen has written to it.
Rather than "stealing" struct members from kmap_op, keep using
page->index to store the original mfn and add another parameter to
m2p_remove_override to get the corresponding kmap_op instead.
It is now responsibility of the caller to keep track of which kmap_op
corresponds to a particular page in the m2p_override (gntdev, the only
user of this interface that passes a valid kmap_op, is already doing that).
CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 2fc136eecd0c647a6b13fcd00d0c41a1a28f35a5)
When we do FLR and save PCI config we did it in the wrong order.
The end result was that if a PCI device was unbind from
its driver, then binded to xen-pciback, and then back to its
driver we would get:
Konrad Rzeszutek Wilk [Fri, 17 Aug 2012 20:43:28 +0000 (16:43 -0400)]
xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.
When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:
1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added
=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.
The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
CC: stable@vger.kernel.org Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit c96aae1f7f393387d160211f60398d58463a7e65)
xen/p2m: Fix one by off error in checking the P2M tree directory.
We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.
Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:
Konrad Rzeszutek Wilk [Thu, 16 Aug 2012 20:38:55 +0000 (16:38 -0400)]
xen/p2m: When revectoring deal with holes in the P2M array.
When we free the PFNs and then subsequently populate them back
during bootup:
Freeing 20000-20200 pfn range: 512 pages freed
1-1 mapping on 20000->20200
Freeing 40000-40200 pfn range: 512 pages freed
1-1 mapping on 40000->40200
Freeing bad80-badf4 pfn range: 116 pages freed
1-1 mapping on bad80->badf4
Freeing badf6-bae7f pfn range: 137 pages freed
1-1 mapping on badf6->bae7f
Freeing bb000-100000 pfn range: 282624 pages freed
1-1 mapping on bb000->100000
Released 283999 pages of unused memory
Set 283999 page(s) to 1-1 mapping
Populating 1acb8a-1f20e9 pfn range: 283999 pages added
We end up having the P2M array (that is the one that was
grafted on the P2M tree) filled with IDENTITY_FRAME or
INVALID_P2M_ENTRY) entries. The patch titled
"xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
recycles said slots and replaces the P2M tree leaf's with
&mfn_list[xx] with p2m_identity or p2m_missing.
And re-uses the P2M array sections for other P2M tree leaf's.
For the above mentioned bootup excerpt, the PFNs at
0x20000->0x20200 are going to be IDENTITY based:
P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
We can re-use that and replace P2M[0][256] to point to p2m_identity.
The "old" page (the grafted P2M array provided by Xen) that was at
P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
b/c when we populate back:
we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
the new MFNs.
That is all OK, except when we revector we assume that the PFN
count would be the same in the grafted P2M array and in the
newly allocated. Since that is no longer the case, as we have
holes in the P2M that point to p2m_missing or p2m_identity we
have to take that into account.
[v2: Check for overflow]
[v3: Move within the __va check]
[v4: Fix the computation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 3fc509fc0c590900568ef516a37101d88f3476f5)
Konrad Rzeszutek Wilk [Fri, 17 Aug 2012 14:26:21 +0000 (10:26 -0400)]
Merge branch 'stable/for-linus-3.7.rebased' into uek2-merge
* stable/for-linus-3.7.rebased:
xen/mmu: If the revector fails, don't attempt to revector anything else.
xen/p2m: When revectoring deal with holes in the P2M array.
xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
Revert "xen PVonHVM: move shared_info to MMIO before kexec"
xen/mmu: Release just the MFN list, not MFN list and part of pagetables.
Konrad Rzeszutek Wilk [Fri, 17 Aug 2012 13:35:31 +0000 (09:35 -0400)]
xen/mmu: If the revector fails, don't attempt to revector anything else.
If the P2M revectoring would fail, we would try to continue on by
cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap
is greedy and erases the PMD on both boundaries. Since the P2M
array can share the PMD, we would wipe out part of the __ka
that is still used in the P2M tree to point to P2M leafs.
This fixes it by bypassing the revectoring and continuing on.
If the revector fails, a nice WARN is printed so we can still
troubleshoot this.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 16 Aug 2012 20:38:55 +0000 (16:38 -0400)]
xen/p2m: When revectoring deal with holes in the P2M array.
When we free the PFNs and then subsequently populate them back
during bootup:
Freeing 20000-20200 pfn range: 512 pages freed
1-1 mapping on 20000->20200
Freeing 40000-40200 pfn range: 512 pages freed
1-1 mapping on 40000->40200
Freeing bad80-badf4 pfn range: 116 pages freed
1-1 mapping on bad80->badf4
Freeing badf6-bae7f pfn range: 137 pages freed
1-1 mapping on badf6->bae7f
Freeing bb000-100000 pfn range: 282624 pages freed
1-1 mapping on bb000->100000
Released 283999 pages of unused memory
Set 283999 page(s) to 1-1 mapping
Populating 1acb8a-1f20e9 pfn range: 283999 pages added
We end up having the P2M array (that is the one that was
grafted on the P2M tree) filled with IDENTITY_FRAME or
INVALID_P2M_ENTRY) entries. The patch titled
"xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
recycles said slots and replaces the P2M tree leaf's with
&mfn_list[xx] with p2m_identity or p2m_missing.
And re-uses the P2M array sections for other P2M tree leaf's.
For the above mentioned bootup excerpt, the PFNs at
0x20000->0x20200 are going to be IDENTITY based:
P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
We can re-use that and replace P2M[0][256] to point to p2m_identity.
The "old" page (the grafted P2M array provided by Xen) that was at
P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
b/c when we populate back:
we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
the new MFNs.
That is all OK, except when we revector we assume that the PFN
count would be the same in the grafted P2M array and in the
newly allocated. Since that is no longer the case, as we have
holes in the P2M that point to p2m_missing or p2m_identity we
have to take that into account.
[v2: Check for overflow] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 17 Aug 2012 13:27:35 +0000 (09:27 -0400)]
xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.
This also means we can remove git commit: 5bc6f9888db5739abfa0cae279b4b442e4db8049
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.
and make the amount that is required to be reserved much smaller.
CC: stable@vger.kernel.org # for 3.5 only. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 14 Aug 2012 20:37:31 +0000 (16:37 -0400)]
xen/mmu: Release just the MFN list, not MFN list and part of pagetables.
We call memblock_reserve for [start of mfn list] -> [PMD aligned end
of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list].
This has the disastrous effect that if at bootup the end of mfn_list is
not PMD aligned we end up returning to memblock parts of the region
past the mfn_list array. And those parts are the PTE tables with
the disastrous effect of seeing this at bootup:
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 1860k freed
Freeing unused kernel memory: 200k freed
(XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26)
...
(XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0
(XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0
.. and so on.
This is not for upstream as it memblock_x86_reserve_range is not
used upstream anymore.
When I back-ported the patches:
xen/x86: Use memblock_reserve for sensitive areas.
xen/mmu: Recycle the Xen provided L4, L3, and L2 pages
I simply used sed s/memblock_reserve/memblock_x86_reserve_range/.
That was incorrect as the parameters are different - memblock_reserve
as second expects the size, while memblock_x86_reserve_range expects
the physical address. This patch fixes those bugs.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Merge branch 'stable/for-linus-3.7.rebased' into uek2-merge
* stable/for-linus-3.7.rebased:
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
xen/mmu: Remove from __ka space PMD entries for pagetables.
xen/mmu: Copy and revector the P2M tree.
xen/p2m: Add logic to revector a P2M tree to use __va leafs.
xen/mmu: Recycle the Xen provided L4, L3, and L2 pages
xen/mmu: For 64-bit do not call xen_map_identity_early
xen/mmu: use copy_page instead of memcpy.
xen/mmu: Provide comments describing the _ka and _va aliasing issue
xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.
xen/x86: Use memblock_reserve for sensitive areas.
xen/p2m: Fix the comment describing the P2M tree.
xen/perf: Define .glob for the different hypercalls.
We then try to populate those pages back. In the P2M tree however
the space for those leafs must be reserved - as such we use extend_brk.
We reserve 8MB of _brk space, which means we can fit over 1048576 PFNs - which is more than we should ever need.
[v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 99266871de5006ba7ad0bfece6bb283ede4094b9)
xen/mmu: Remove from __ka space PMD entries for pagetables.
Please first read the description in "xen/mmu: Copy and revector the
P2M tree."
At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).
The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.
Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.
At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.
[v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4e928e1a48b6b76e0b8384160213a32d03197e4b)
Please first read the description in "xen/p2m: Add logic to revector a
P2M tree to use __va leafs" patch.
The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.
At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).
We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:
Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:
1) All P2M lookups instead of using the __ka address would
use the __va address. This means we can safely erase from
__ka space the PMD pointers that point to the PFNs for
P2M array and be OK.
2). Allocate a new array, copy the existing P2M into it,
revector the P2M tree to use that, and return the old
P2M to the memory allocate. This has the advantage that
it sets the stage for using XEN_ELF_NOTE_INIT_P2M
feature. That feature allows us to set the exact virtual
address space we want for the P2M - and allows us to
boot as initial domain on large machines.
So we pick option 2).
This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
xen/mmu: Recycle the Xen provided L4, L3, and L2 pages
As we are not using them. We end up only using the L1 pagetables
and grafting those to our page-tables.
[v1: Per Stefano's suggestion squashed two commits]
[v2: Per Stefano's suggestion simplified loop] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit cbc09be35990fb3d15671507f11c3e90479ef816)
xen/mmu: Provide comments describing the _ka and _va aliasing issue
Which is that the level2_kernel_pgt (__ka virtual addresses)
and level2_ident_pgt (__va virtual address) contain the same
PMD entries. So if you modify a PTE in __ka, it will be reflected
in __va (and vice-versa).
xen/x86: Use memblock_reserve for sensitive areas.
instead of a big memblock_reserve. This way we can be more
selective in freeing regions (and it also makes it easier
to understand where is what).
[v1: Move the auto_translate_physmap to proper line]
[v2: Per Stefano suggestion add more comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[upstream git commit 91addbf07abfdd109a9da4e02061e6ed3728b298]
Conflicts:
The P2M code is smart enough to return false (which means that it
cannot allocate anymore) and the error can perculate up the calling
stack without trouble - with the error logic doing the proper thing.
So check the __brk_limit values before allocating from extend_brk.
This allows us to boot on machines where we do not have enough
__brk space, and we would get this:
Interestingly enough, most of the time we are not going to hit this
b/c the _brk space is quite large (v3.5): ffffffff81a25000 B __brk_base ffffffff81e43000 B __brk_limit
= ~4MB.
vs earlier kernels (with this back-ported), the space is smaller: ffffffff81a25000 B __brk_base ffffffff81a7b000 B __brk_limit
= 344 kBytes.
With this patch, we would get now a limited amount of pages populated back:
Freeing 9f-100 pfn range: 97 pages freed
Freeing b7ee0-ecd9b pfn range: 216763 pages freed
Released 216860 pages of unused memory
Set 295297 page(s) to 1-1 mapping
Populating 100000-134f1c pfn range: 30720 pages added
[while it was instructed to populate 216860 pages back
on this particular machine]
Andre Przywara [Tue, 29 May 2012 11:07:31 +0000 (13:07 +0200)]
xen/setup: filter APERFMPERF cpuid feature out
Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs
This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.
[upstream git commit 5e626254206a709c6e937f3dda69bf26c7344f6f] Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge
* stable/for-linus-3.6.rebased:
xen PVonHVM: move shared_info to MMIO before kexec
xen: simplify init_hvm_pv_info
xen: remove cast from HYPERVISOR_shared_info assignment
xen: enable platform-pci only in a Xen guest
xen/pv-on-hvm kexec: shutdown watches from old kernel
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
xen/hvc: Fix up checks when the info is allocated.
xen/mm: zero PTEs for non-present MFNs in the initial page table
xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
xen/x86: add desc_equal() to compare GDT descriptors
x86/xen: avoid updating TLS descriptors if they haven't changed
xen: populate correct number of pages when across mem boundary (v2)
xen/mce: add .poll method for mcelog device driver
Olaf Hering [Tue, 17 Jul 2012 15:43:35 +0000 (17:43 +0200)]
xen PVonHVM: move shared_info to MMIO before kexec
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into
MMIO space before the kexec boot.
The pfn containing the shared_info is located somewhere in RAM. This
will cause trouble if the current kernel is doing a kexec boot into a
new kernel. The new kernel (and its startup code) can not know where the
pfn is, so it can not reserve the page. The hypervisor will continue to
update the pfn, and as a result memory corruption occours in the new
kernel.
One way to work around this issue is to allocate a page in the
xen-platform pci device's BAR memory range. But pci init is done very
late and the shared_info page is already in use very early to read the
pvclock. So moving the pfn from RAM to MMIO is racy because some code
paths on other vcpus could access the pfn during the small window when
the old pfn is moved to the new pfn. There is even a small window were
the old pfn is not backed by a mfn, and during that time all reads
return -1.
Because it is not known upfront where the MMIO region is located it can
not be used right from the start in xen_hvm_init_shared_info.
To minimise trouble the move of the pfn is done shortly before kexec.
This does not eliminate the race because all vcpus are still online when
the syscore_ops will be called. But hopefully there is no work pending
at this point in time. Also the syscore_op is run last which reduces the
risk further.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 17 Jul 2012 09:59:15 +0000 (11:59 +0200)]
xen: simplify init_hvm_pv_info
init_hvm_pv_info is called only in PVonHVM context, move it into ifdef.
init_hvm_pv_info does not fail, make it a void function.
remove arguments from init_hvm_pv_info because they are not used by the
caller.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 10 Jul 2012 13:31:39 +0000 (15:31 +0200)]
xen: enable platform-pci only in a Xen guest
While debugging kexec issues in a PVonHVM guest I modified
xen_hvm_platform() to return false to disable all PV drivers. This
caused a crash in platform_pci_init() because it expects certain data
structures to be initialized properly.
To avoid such a crash make sure the driver is initialized only if
running in a Xen guest.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 10 Jul 2012 12:50:03 +0000 (14:50 +0200)]
xen/pv-on-hvm kexec: shutdown watches from old kernel
Add xs_reset_watches function to shutdown watches from old kernel after
kexec boot. The old kernel does not unregister all watches in the
shutdown path. They are still active, the double registration can not
be detected by the new kernel. When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL). An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.
With this change the xenstored is instructed to wipe all active watches
for the guest. However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:06 +0000 (11:39 +0100)]
xen/mm: zero PTEs for non-present MFNs in the initial page table
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out. In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).
This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:05 +0000 (11:39 +0100)]
xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.
On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE). Instead, do one mmu_update hypercall
directly.
During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.
This significantly improves page fault performance in 32-bit PV
guests.
lmbench3 test Before After Improvement
----------------------------------------------
lat_pagefault 3.18 us 2.32 us 27%
lat_proc fork 356 us 313.3 us 11%
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:08 +0000 (11:39 +0100)]
x86/xen: avoid updating TLS descriptors if they haven't changed
When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed. This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.
The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.
lmbench3 test Before After Improvement
--------------------------------------------
lat_ctx -s 32 24 7.19 6.52 9%
lat_pipe 12.56 11.66 7%
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen: populate correct number of pages when across mem boundary (v2)
When populate pages across a mem boundary at bootup, the page count
populated isn't correct. This is due to mem populated to non-mem
region and ignored.
Pfn range is also wrongly aligned when mem boundary isn't page aligned.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
[v2: If xen_do_chunk fail(populate), abort this chunk and any others]
Suggested by David, thanks.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liu, Jinsong [Fri, 15 Jun 2012 01:03:39 +0000 (09:03 +0800)]
xen/mce: add .poll method for mcelog device driver
If a driver leaves its poll method NULL, the device is assumed to
be both readable and writable without blocking.
This patch add .poll method to xen mcelog device driver, so that
when mcelog use system calls like ppoll or select, it would be
blocked when no data available, and avoid spinning at CPU.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 12 Jun 2012 18:33:31 +0000 (14:33 -0400)]
Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge
* stable/for-linus-3.6.rebased:
xen/mce: schedule a workqueue to avoid sleep in atomic context
xen/mce: Register native mce handler as vMCE bounce back point
x86, MCE, AMD: Adjust initcall sequence for xen
xen/mce: Add mcelog support for Xen platform
Liu, Jinsong [Tue, 12 Jun 2012 15:11:16 +0000 (23:11 +0800)]
xen/mce: schedule a workqueue to avoid sleep in atomic context
copy_to_user might sleep and print a stack trace if it is executed
in an atomic spinlock context. Like this:
(XEN) CMCI: send CMCI to DOM0 through virq
BUG: sleeping function called from invalid context at /home/konradinux/kernel.h:199
in_atomic(): 1, irqs_disabled(): 0, pid: 4581, name: mcelog
Pid: 4581, comm: mcelog Tainted: G O 3.5.0-rc1upstream-00003-g149000b-dirty #1
[<ffffffff8109ad9a>] __might_sleep+0xda/0x100
[<ffffffff81329b0b>] xen_mce_chrdev_read+0xab/0x140
[<ffffffff81148945>] vfs_read+0xc5/0x190
[<ffffffff81148b0c>] sys_read+0x4c/0x90
[<ffffffff815bd039>] system_call_fastpath+0x16
This patch schedule a workqueue for IRQ handler to poll the data,
and use mutex instead of spinlock, so copy_to_user sleep in atomic
context would not occur.
[upstream git commit b3856ae3554ef4cea39126f52216182ee83e8ec8] Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liu, Jinsong [Thu, 7 Jun 2012 11:58:50 +0000 (19:58 +0800)]
x86, MCE, AMD: Adjust initcall sequence for xen
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device
xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;
mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.
so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);
when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Liu, Jinsong [Thu, 7 Jun 2012 11:56:51 +0000 (19:56 +0800)]
xen/mce: Add mcelog support for Xen platform
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Konrad Rzeszutek Wilk [Tue, 29 May 2012 18:25:53 +0000 (14:25 -0400)]
Merge branch 'stable/for-linus-3.5.rebased' into uek2-merge
* stable/for-linus-3.5.rebased:
xen/blkback: Copy id field when doing BLKIF_DISCARD.
xen/balloon: Subtract from xen_released_pages the count that is populated.
Konrad Rzeszutek Wilk [Fri, 25 May 2012 20:11:09 +0000 (16:11 -0400)]
xen/blkback: Copy id field when doing BLKIF_DISCARD.
We weren't copying the id field so when we sent the response
back to the frontend (especially with a 64-bit host and 32-bit
guest), we ended up using a random value. This lead to the
frontend crashing as it would try to pass to __blk_end_request_all
a NULL 'struct request' (b/c it would use the 'id' to find the
proper 'struct request' in its shadow array) and end up crashing:
BUG: unable to handle kernel NULL pointer dereference at 000000e4
IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40
.. snip..
EIP is at __blk_end_request_all+0xc/0x40
.. snip..
[<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront]
This fixes the bug by passing in the proper id for the response.
Konrad Rzeszutek Wilk [Tue, 29 May 2012 16:36:43 +0000 (12:36 -0400)]
xen/balloon: Subtract from xen_released_pages the count that is populated.
We did not take into account that xen_released_pages would be
used outside the initial E820 parsing code. As such we would
did not subtract from xen_released_pages the count of pages
that we had populated back (instead we just did a simple
extra_pages = released - populated).
The balloon driver uses xen_released_pages to set the initial
current_pages count. If this is wrong (too low) then when a new
(higher) target is set, the balloon driver will request too many pages
from Xen."
This fixes errors such as:
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512)
during bootup and
free_memory : 0
where the free_memory should be 128.
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v1: Per David's review made the git commit better] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 23 May 2012 17:28:44 +0000 (13:28 -0400)]
xen/events: Add WARN_ON when quick lookup found invalid type.
All of the bind_XYZ_to_irq do a quick lookup to see if the
event exists. And if they that value is returned instead of
initialized. This patch adds an extra logic to check that the
type returned is the proper one and we can use it to find
drivers that are doing something naught.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 23 May 2012 16:56:59 +0000 (12:56 -0400)]
xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness.
We need to make sure that those parameters are setup to be correct.
As such the value of 0 is deemed invalid and we find that we
bail out. The hypervisor sets by default all of them to be zero
and when the hypercall is done does a simple:
a.value = d->arch.hvm_domain.params[a.index];
Which means that if the Xen toolstack forgot to setup the proper
HVM_PARAM_CONSOLE_EVTCHN, we would get the default value of 0
and use that.
CC: stable@kernel.org
Fixes-Oracle-Bug: 14091238 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 21 May 2012 21:48:15 +0000 (17:48 -0400)]
Merge branch 'stable/for-linus-3.5.rebased' into uek2-merge
* stable/for-linus-3.5.rebased:
hvc_xen: NULL dereference on allocation failure
xen: do not map the same GSI twice in PVHVM guests.
xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.
Stefano Stabellini [Mon, 21 May 2012 15:54:10 +0000 (16:54 +0100)]
xen: do not map the same GSI twice in PVHVM guests.
PV on HVM guests map GSIs into event channels. At restore time the
event channels are resumed by restore_pirqs.
Device drivers might try to register the same GSI again through ACPI at
restore time, but the GSI has already been mapped and bound by
restore_pirqs. This patch detects these situations and avoids
mapping the same GSI multiple times.
Without this patch we get:
(XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped
and waste a pirq.
Daniel De Graaf [Tue, 8 May 2012 13:46:57 +0000 (09:46 -0400)]
xenbus: Add support for xenbus backend in stub domain
Add an ioctl to the /dev/xen/xenbus_backend device allowing the xenbus
backend to be started after the kernel has booted. This allows xenstore
to run in a different domain from the dom0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 21 May 2012 13:19:38 +0000 (09:19 -0400)]
xen/smp: unbind irqworkX when unplugging vCPUs.
The git commit 1ff2b0c303698e486f1e0886b4d9876200ef8ca5
"xen: implement IRQ_WORK_VECTOR handler" added the functionality
to have a per-cpu "irqworkX" for the IPI APIC functionality.
However it missed the unbind when a vCPU is unplugged resulting
in an orphaned per-cpu interrupt line for unplugged vCPU:
Konrad Rzeszutek Wilk [Fri, 11 May 2012 20:37:46 +0000 (16:37 -0400)]
Merge branch 'stable/not-upstreamed' into uek2-merge
* stable/not-upstreamed:
xen/mce: Register native mce handler as vMCE bounce back point
xen/mce: Add mcelog support for Xen platform
Revert "Add mcelog support from xen platform"
Revert "xen/mce: Change the machine check point"
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Konrad Rzeszutek Wilk [Fri, 11 May 2012 20:29:09 +0000 (16:29 -0400)]
xen/gntdev: Fix merge error.
Somehow a merge error ensued were an important part of
"xen/gnt{dev,alloc}: reserve event channels for notify"
went missing. Fortunatly for us we aren't using this driver yet.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 11 May 2012 18:40:21 +0000 (14:40 -0400)]
Merge branch 'stable/for-linus-3.5.rebased' into uek2-merge
* stable/for-linus-3.5.rebased: (22 commits)
x86/apic: Fix UP boot crash
xen/apic: implement io apic read with hypercall
xen/x86: Implement x86_apic_ops
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
x86/ioapic: Add io_apic_ops driver layer to allow interception
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/gnttab: add deferred freeing logic
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
xen: only limit memory map to maximum reservation for domain 0.
xen: release all pages within 1-1 p2m mappings
xen: allow extra memory to be in multiple regions
...
Lin Ming [Mon, 30 Apr 2012 16:16:27 +0000 (00:16 +0800)]
xen/apic: implement io apic read with hypercall
Implements xen_io_apic_read with hypercall, so it returns proper
IO-APIC information instead of fabricated one.
Fallback to return an emulated IO_APIC values if hypercall fails.
[v2: fallback to return an emulated IO_APIC values if hypercall fails] Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 20 Mar 2012 22:53:10 +0000 (18:53 -0400)]
xen/x86: Implement x86_apic_ops
Or rather just implement one different function as opposed
to the native one : the read function.
We synthesize the values.
[upstream git commit 31b3c9d723407b395564d1fff3624cc0083ae520] Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
[v1: Rebased on top of tip/x86/urgent]
[v2: Return 0xfd instead of 0xff in the default case] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/xen/Makefile
arch/x86/xen/enlighten.c
arch/x86/xen/xen-ops.h
[In the makefile, the vga.o didn't exist in 3.0 - it was added
in 3.1]
Konrad Rzeszutek Wilk [Wed, 28 Mar 2012 16:37:36 +0000 (12:37 -0400)]
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
Which makes the code fit within the rest of the x86_ops functions.
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
[v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yinghai@kernel.org> suggestion]
[v2: Rebased on tip/x86/urgent and redid to match Ingo's syntax style] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>