]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
9 years agoxen: switch extra memory accounting to use pfns
Juergen Gross [Fri, 4 Sep 2015 12:05:51 +0000 (14:05 +0200)]
xen: switch extra memory accounting to use pfns

Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 626d7508664c4bc8e67f496da4387ecd0c410b8c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d2d1374f928fad809f74480fad1cf801354a8a3d)

9 years agoxen: limit memory to architectural maximum
Juergen Gross [Fri, 4 Sep 2015 12:18:08 +0000 (14:18 +0200)]
xen: limit memory to architectural maximum

When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.

This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cb9e444b5aaa900bb4310da411315b6947c53e37)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 2492a73eef33873c4f6e01c0d98ef3396faa7953)

9 years agoxen: avoid another early crash of memory limited dom0
Juergen Gross [Wed, 19 Aug 2015 16:53:11 +0000 (18:53 +0200)]
xen: avoid another early crash of memory limited dom0

Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.

The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.

An example E820 map triggering the problem looks like this:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable

In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.

To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ab24507cfae8d916814bb6c16f66e453184a29a5)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7009b48ce4a9830f34a99ff2e8081cb5df96b053)

9 years agoxen: avoid early crash of memory limited dom0
Juergen Gross [Wed, 19 Aug 2015 16:52:34 +0000 (18:52 +0200)]
xen: avoid early crash of memory limited dom0

Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.

The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:

(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.2-pre  x86_64  debug=n Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81d120cb>]
(XEN) RFLAGS: 0000000000000206   EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax: ffffffff81db2000   rbx: 000000004d000000   rcx: 0000000000000000
(XEN) rdx: 000000004d000000   rsi: 0000000000063000   rdi: 000000004d063000
(XEN) rbp: ffffffff81c03d78   rsp: ffffffff81c03d28   r8:  0000000000023000
(XEN) r9:  00000001040ff000   r10: 0000000000007ff0   r11: 0000000000000000
(XEN) r12: 0000000000063000   r13: 000000000004d000   r14: 0000000000000063
(XEN) r15: 0000000000000063   cr0: 0000000080050033   cr4: 00000000000006f0
(XEN) cr3: 0000000105c0f000   cr2: ffffc90000268000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03d28:
(XEN)   0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
(XEN)   0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
(XEN)   0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
(XEN)   ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
(XEN)   00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
(XEN)   ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
(XEN)   ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
(XEN)   0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
(XEN)   00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
(XEN)   ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
(XEN)   ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
(XEN)   0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
(XEN)   ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
(XEN)   0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
(XEN)   0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)   ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
(XEN)   0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit eafd72e016c69df511b14a98b61e439c58ad9c51)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit f02b21de441db4d74851774fad93e8a25709b4d8)

9 years agoarm/xen: Remove helpers which are PV specific
Julien Grall [Fri, 7 Aug 2015 16:34:34 +0000 (17:34 +0100)]
arm/xen: Remove helpers which are PV specific

ARM guests are always HVM. The current implementation is assuming a 1:1
mapping which is only true for DOM0 and may not be at all in the future.

Furthermore, all the helpers but arbitrary_virt_to_machine are used in
x86 specific code (or only compiled for).

The helper arbitrary_virt_to_machine is only used in PV specific code.
Therefore we should never call the function.

Add a BUG() in this helper and drop all the others.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 724afaea2020f3bd98891b535f3ce5d3935bcf63)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 05ecc0cc5c59922acfd65c68f202e2bfe3836637)

9 years agoxen/x86: Don't try to set PCE bit in CR4
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:38 +0000 (16:34 -0400)]
xen/x86: Don't try to set PCE bit in CR4

Since VPMU code emulates RDPMC instruction with RDMSR and because hypervisor
does not emulate it there is no reason to try setting CR4's PCE bit (and the
hypervisor will warn on seeing it set).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3375d8284dfb7866f261ec008d15d30999ff273b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 07cfc8aee7c0d8976bf945877f11f30755b18896)

9 years agoxen/PMU: PMU emulation code
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:37 +0000 (16:34 -0400)]
xen/PMU: PMU emulation code

Add PMU emulation code that runs when we are processing a PMU interrupt.
This code will allow us not to trap to hypervisor on each MSR/LVTPC access
(of which there may be quite a few in the handler).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit bf6dfb154d935725c9a2005033ca33017b9df439)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8e4429eb3ce74ce71b782af545e7fe658e311252)

9 years agoxen/PMU: Intercept PMU-related MSR and APIC accesses
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:36 +0000 (16:34 -0400)]
xen/PMU: Intercept PMU-related MSR and APIC accesses

Provide interfaces for recognizing accesses to PMU-related MSRs and
LVTPC APIC and process these accesses in Xen PMU code.

(The interrupt handler performs XENPMU_flush right away in the beginning
since no PMU emulation is available. It will be added with a later patch).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6b08cd6328c58a2ae190c5ee03a2ffcab5ef828e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 32415b6003dacadabcf82b42ea05fa35016edaed)

9 years agoxen/PMU: Describe vendor-specific PMU registers
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:35 +0000 (16:34 -0400)]
xen/PMU: Describe vendor-specific PMU registers

AMD and Intel PMU register initialization and helpers that determine
whether a register belongs to PMU.

This and some of subsequent PMU emulation code is somewhat similar to
Xen's PMU implementation.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e27b72df01109c689062caeba1defa013b759e0e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 754a67bf6d816e449688d46e2989f97756baf0ce)

9 years agoxen/PMU: Initialization code for Xen PMU
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:34 +0000 (16:34 -0400)]
xen/PMU: Initialization code for Xen PMU

Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.

Set up PMU VIRQ.

Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.

For the same reason avoid issuing a warning on a write to APIC's LVTPC.

Both of these will be made functional in later patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 65d0cf0be79feebeb19e7626fd3ed41ae73f642d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 446983f38ec11e3962acaaca7bb0d1741dc999c5)

9 years agoxen/PMU: Sysfs interface for setting Xen PMU mode
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:33 +0000 (16:34 -0400)]
xen/PMU: Sysfs interface for setting Xen PMU mode

Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 5f141548824cebbff2e838ff401c34e667797467)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 14e786e26bd8504e749478b426c2b65a0f57f0bd)

9 years agoxen: xensyms support
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:32 +0000 (16:34 -0400)]
xen: xensyms support

Export Xen symbols to dom0 via /proc/xen/xensyms (similar to
/proc/kallsyms).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a11f4f0a4e18b4bdc7d5e36438711e038b7a1f74)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit f2be3b07bdd4dc6649bb9d3844d995495472af7d)

9 years agoxen: remove no longer needed p2m.h
Juergen Gross [Fri, 17 Jul 2015 04:51:37 +0000 (06:51 +0200)]
xen: remove no longer needed p2m.h

Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more.

Most definitions in this file are used in p2m.c only. Move those into
p2m.c.

set_phys_range_identity() is already declared in
arch/x86/include/asm/xen/page.h, add __init annotation there.

MAX_REMAP_RANGES isn't used at all, just delete it.

The only define left is P2M_PER_PAGE which is moved to page.h as well.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cb3eb850137cd43fc3e25d2062525f5ba5fd884a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit fb4227a8861bd732964a00bd8015b7506fbb189f)

9 years agoxen: allow more than 512 GB of RAM for 64 bit pv-domains
Juergen Gross [Fri, 17 Jul 2015 04:51:36 +0000 (06:51 +0200)]
xen: allow more than 512 GB of RAM for 64 bit pv-domains

64 bit pv-domains under Xen are limited to 512 GB of RAM today. The
main reason has been the 3 level p2m tree, which was replaced by the
virtual mapped linear p2m list. Parallel to the p2m list which is
being used by the kernel itself there is a 3 level mfn tree for usage
by the Xen tools and eventually for crash dump analysis. For this tree
the linear p2m list can serve as a replacement, too. As the kernel
can't know whether the tools are capable of dealing with the p2m list
instead of the mfn tree, the limit of 512 GB can't be dropped in all
cases.

This patch replaces the hard limit by a kernel parameter which tells
the kernel to obey the 512 GB limit or not. The default is selected by
a configuration parameter which specifies whether the 512 GB limit
should be active per default for domUs (domain save/restore/migration
and crash dump analysis are affected).

Memory above the domain limit is returned to the hypervisor instead of
being identity mapped, which was wrong anyway.

The kernel configuration parameter to specify the maximum size of a
domain can be deleted, as it is not relevant any more.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c70727a5bc18a5a233fddc6056d1de9144d7a293)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 956b08bfc7fadaec4f04578165b867a974267c97)

9 years agoxen: move p2m list if conflicting with e820 map
Juergen Gross [Fri, 17 Jul 2015 04:51:35 +0000 (06:51 +0200)]
xen: move p2m list if conflicting with e820 map

Check whether the hypervisor supplied p2m list is placed at a location
which is conflicting with the target E820 map. If this is the case
relocate it to a new area unused up to now and compliant to the E820
map.

As the p2m list might by huge (up to several GB) and is required to be
mapped virtually, set up a temporary mapping for the copied list.

For pvh domains just delete the p2m related information from start
info instead of reserving the p2m memory, as we don't need it at all.

For 32 bit kernels adjust the memblock_reserve() parameters in order
to cover the page tables only. This requires to memblock_reserve() the
start_info page on it's own.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 70e61199559a09c62714694cd5ac3c3640c41552)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a24fc24795942199cd5817b55ffd65a2b679b0e4)

9 years agoxen: add explicit memblock_reserve() calls for special pages
Juergen Gross [Fri, 17 Jul 2015 04:51:34 +0000 (06:51 +0200)]
xen: add explicit memblock_reserve() calls for special pages

Some special pages containing interfaces to xen are being reserved
implicitly only today. The memblock_reserve() call to reserve them is
meant to reserve the p2m list supplied by xen. It is just reserving
not only the p2m list itself, but some more pages up to the start of
the xen built page tables.

To be able to move the p2m list to another pfn range, which is needed
for support of huge RAM, this memblock_reserve() must be split up to
cover all affected reserved pages explicitly.

The affected pages are:
- start_info page
- xenstore ring (might be missing, mfn is 0 in this case)
- console ring (not for initial domain)

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c2681c863b24360098d1ba60f2af060a13a0561)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit f9918f852bb0e2cd1e7a86966e5f3d0dd25a094f)

9 years agomm: provide early_memremap_ro to establish read-only mapping
Juergen Gross [Fri, 17 Jul 2015 04:51:33 +0000 (06:51 +0200)]
mm: provide early_memremap_ro to establish read-only mapping

During early boot as Xen pv domain the kernel needs to map some page
tables supplied by the hypervisor read only. This is needed to be
able to relocate some data structures conflicting with the physical
memory map especially on systems with huge RAM (above 512GB).

Provide the function early_memremap_ro() to provide this read only
mapping.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 2592dbbbf4c67501c2bd2dcf89c2b8924d592a9f)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 2acafe8a7ba336ccef0b10c1223a35bcbbb62ae3)

9 years agoxen: check for initrd conflicting with e820 map
Juergen Gross [Fri, 17 Jul 2015 04:51:32 +0000 (06:51 +0200)]
xen: check for initrd conflicting with e820 map

Check whether the initrd is placed at a location which is conflicting
with the target E820 map. If this is the case relocate it to a new
area unused up to now and compliant to the E820 map.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4b9c15377f96e241be347fd3bbeeff74fbad0b44)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit de1b2d70a721e4f71347de73a9914083caf40d7d)

9 years agoxen: check pre-allocated page tables for conflict with memory map
Juergen Gross [Fri, 17 Jul 2015 04:51:31 +0000 (06:51 +0200)]
xen: check pre-allocated page tables for conflict with memory map

Check whether the page tables built by the domain builder are at
memory addresses which are in conflict with the target memory map.
If this is the case just panic instead of running into problems
later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 04414baab5ba862b10bde837c4773ffdbb78f0e0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d44039e831fc2c360556fa6d2002d3d07f94278b)

9 years agoxen: check for kernel memory conflicting with memory layout
Juergen Gross [Fri, 17 Jul 2015 04:51:30 +0000 (06:51 +0200)]
xen: check for kernel memory conflicting with memory layout

Checks whether the pre-allocated memory of the loaded kernel is in
conflict with the target memory map. If this is the case, just panic
instead of run into problems later, as there is nothing we can do
to repair this situation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 808fdb71936c41d46245f0e3aa6ec889cba70d97)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit c2c7056af7647b523875d8b9850df6143d6eee11)

9 years agoxen: find unused contiguous memory area
Juergen Gross [Fri, 17 Jul 2015 04:51:29 +0000 (06:51 +0200)]
xen: find unused contiguous memory area

For being able to relocate pre-allocated data areas like initrd or
p2m list it is mandatory to find a contiguous memory area which is
not yet in use and doesn't conflict with the memory map we want to
be in effect.

In case such an area is found reserve it at once as this will be
required to be done in any case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9ddac5b724a9465e27f25a0aa943e92c8341a85b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 782b5690df437652e088662d273c42ee0c2075b6)

9 years agoxen: check memory area against e820 map
Juergen Gross [Fri, 17 Jul 2015 04:51:28 +0000 (06:51 +0200)]
xen: check memory area against e820 map

Provide a service routine to check a physical memory area against the
E820 map. The routine will return false if the complete area is RAM
according to the E820 map and true otherwise.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e612b4a7db4ae1dd8c2bbe171e10c21723de95b2)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit be2f0b895b695df3077f73189a3a3b22532eb943)

9 years agoxen: split counting of extra memory pages from remapping
Juergen Gross [Fri, 17 Jul 2015 04:51:27 +0000 (06:51 +0200)]
xen: split counting of extra memory pages from remapping

Memory pages in the initial memory setup done by the Xen hypervisor
conflicting with the target E820 map are remapped. In order to do this
those pages are counted and remapped in xen_set_identity_and_remap().

Split the counting from the remapping operation to be able to setup
the needed memory sizes in time but doing the remap operation at a
later time. This enables us to simplify the interface to
xen_set_identity_and_remap() as the number of remapped and released
pages is no longer needed here.

Finally move the remapping further down to prepare relocating
conflicting memory contents before the memory might be clobbered by
xen_set_identity_and_remap(). This requires to not destroy the Xen
E820 map when the one for the system is being constructed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 5097cdf6cef15439f971df54f9abcf143d7ca698)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 033eda1444432958e750a02627e5c4b029032cf2)

9 years agoxen: move static e820 map to global scope
Juergen Gross [Fri, 17 Jul 2015 04:51:26 +0000 (06:51 +0200)]
xen: move static e820 map to global scope

Instead of using a function local static e820 map in xen_memory_setup()
and calling various functions in the same source with the map as a
parameter use a map directly accessible by all functions in the source.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 69632ecfcd03b12202ed62dfa0aabac83904f8ac)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 09410e003286b1a6f22f127ba7cb87a8f95210a7)

9 years agoxen: eliminate scalability issues from initial mapping setup
Juergen Gross [Fri, 17 Jul 2015 04:51:25 +0000 (06:51 +0200)]
xen: eliminate scalability issues from initial mapping setup

Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.

As the initial P->M table is copied rather early during boot to
domain private memory and it's initial virtual mapping is dropped,
the easiest way to avoid virtual address conflicts with other
addresses in the kernel is to use a user address area for the
virtual address of the initial P->M table. This allows us to just
throw away the page tables of the initial mapping after the copy
without having to care about address invalidation.

It should be noted that this patch won't enable a pv-domain to USE
more than 512 GB of RAM. It just enables it to be started with a
P->M table covering more memory. This is especially important for
being able to boot a Dom0 on a system with more than 512 GB memory.

Signed-off-by: Juergen Gross <jgross@suse.com>
Based-on-patch-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 8f5b0c63987207fd5c3c1f89c9eb6cb95b30386e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit afafe7543e71733e5298cd86eb6242ed4aaf7a5d)

9 years agoxen: don't build mfn tree if tools don't need it
Juergen Gross [Fri, 17 Jul 2015 04:51:24 +0000 (06:51 +0200)]
xen: don't build mfn tree if tools don't need it

In case the Xen tools indicate they don't need the p2m 3 level tree
as they support the virtual mapped linear p2m list, just omit building
the tree.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit d51e8b3e85972dee10be7943b0b0106742b1e847)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 174c229b0f57185580e690ae9bb8a1c4b6d8a958)

9 years agoxen: save linear p2m list address in shared info structure
Juergen Gross [Fri, 17 Jul 2015 04:51:23 +0000 (06:51 +0200)]
xen: save linear p2m list address in shared info structure

The virtual address of the linear p2m list should be stored in the
shared info structure read by the Xen tools to be able to support
64 bit pv-domains larger than 512 GB. Additionally the linear p2m
list interface includes a generation count which is changed prior
to and after each mapping change of the p2m list. Reading the
generation count the Xen tools can detect changes of the mappings
and re-read the p2m list eventually.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4b9c9a11803eaa73b3223da9fcaea39b2f919d80)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 82d85a2e93bed5c0054ee110a618201de8f32853)

9 years agoxen: sync with xen headers
Juergen Gross [Fri, 17 Jul 2015 04:51:22 +0000 (06:51 +0200)]
xen: sync with xen headers

Use the newest headers from the xen tree to get some new structure
layouts.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 17fb46b1190b677a37cdd636e2aa30052109f51b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 87feb4932d17fa759a755ba34e927440c1591579)

9 years agoarm/xen: Drop the definition of xen_pci_platform_unplug
Julien Grall [Mon, 3 Aug 2015 09:50:55 +0000 (09:50 +0000)]
arm/xen: Drop the definition of xen_pci_platform_unplug

The commit 6f6c15ef912465b3aaafe709f39bd6026a8b3e72 "xen/pvhvm: Remove
the xen_platform_pci int." makes the x86 version of
xen_pci_platform_unplug static.

Therefore we don't need anymore to define a dummy xen_pci_platform_unplug
for ARM.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit 7ed208ef4ef9dbd03cda8a5b5a85cc78f79ef213)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 052bdfc4105123c0a85ece4264ddc82938176f20)

9 years agoxen/events: Support event channel rebind on ARM
Julien Grall [Tue, 28 Jul 2015 09:10:42 +0000 (10:10 +0100)]
xen/events: Support event channel rebind on ARM

Currently, the event channel rebind code is gated with the presence of
the vector callback.

The virtual interrupt controller on ARM has the concept of per-CPU
interrupt (PPI) which allow us to support per-VCPU event channel.
Therefore there is no need of vector callback for ARM.

Xen is already using a free PPI to notify the guest VCPU of an event.
Furthermore, the xen code initialization in Linux (see
arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ.

Introduce new helper xen_support_evtchn_rebind to allow architecture
decide whether rebind an event is support or not. It will always return
true on ARM and keep the same behavior on x86.

This is also allow us to drop the usage of xen_have_vector_callback
entirely in the ARM code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4a5b69464e51f4a8dd432e8c2a1468630df1a53c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 59979b4f2bc533345c80cdba3f9613e322f47002)

9 years agoxen-blkfront: convert to blk-mq APIs
Bob Liu [Mon, 13 Jul 2015 09:55:24 +0000 (17:55 +0800)]
xen-blkfront: convert to blk-mq APIs

Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 907c3eb18e0bd86ca12a9de80befe8e3647bac3e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 0cf2faa73e15b21d6493bf1a87fe86ad3605f56f)

9 years agoxen/preempt: use need_resched() instead of should_resched()
Konstantin Khlebnikov [Wed, 15 Jul 2015 09:52:01 +0000 (12:52 +0300)]
xen/preempt: use need_resched() instead of should_resched()

This code is used only when CONFIG_PREEMPT=n and only in non-atomic
context: xen_in_preemptible_hcall is set only in
privcmd_ioctl_hypercall().  Thus preempt_count is zero and
should_resched() is equal to need_resched().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a7da51ae10032a507ddeae6a490916eadbd1e10a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a2621b7619e9c680813232c9da3a71088688ec3c)

9 years agox86/xen: fix non-ANSI declaration of xen_has_pv_devices()
Colin Ian King [Thu, 16 Jul 2015 19:34:42 +0000 (20:34 +0100)]
x86/xen: fix non-ANSI declaration of xen_has_pv_devices()

xen_has_pv_devices() has no parameters, so use the normal void
parameter convention to make it match the prototype in the header file
include/xen/platform_pci.h.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 772f95e3b9460c64fb99b134022855cbce75b9a0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit b19b9231523ba06c961a3807f58814fd8071d176)

9 years agoxen/events/fifo: Consume unprocessed events when a CPU dies
Ross Lagerwall [Fri, 19 Jun 2015 15:15:57 +0000 (16:15 +0100)]
xen/events/fifo: Consume unprocessed events when a CPU dies

When a CPU is offlined, there may be unprocessed events on a port for
that CPU.  If the port is subsequently reused on a different CPU, it
could be in an unexpected state with the link bit set, resulting in
interrupts being missed. Fix this by consuming any unprocessed events
for a particular CPU when that CPU dies.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3de88d622fd68bd4dbee0f80168218b23f798fd0)

Orabug: 22498877
Tested-by: Carson Hovey <carson.hovey@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoMerge branch 'uek4/bug20386370' of git://ca-git.us.oracle.com/linux-konrad-public...
Santosh Shilimkar [Fri, 8 Jan 2016 16:53:11 +0000 (08:53 -0800)]
Merge branch 'uek4/bug20386370' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/bug20386370' of git://ca-git.us.oracle.com/linux-konrad-public:
  Revert "xen/fb: allow xenfb initialization for hvm guests"

9 years agoMerge branch 'uek4/xsa157' of git://ca-git.us.oracle.com/linux-konrad-public into...
Santosh Shilimkar [Fri, 8 Jan 2016 16:51:39 +0000 (08:51 -0800)]
Merge branch 'uek4/xsa157' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/xsa157' of git://ca-git.us.oracle.com/linux-konrad-public:
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

9 years agoMerge branch 'uek4/xsa155' of git://ca-git.us.oracle.com/linux-konrad-public into...
Santosh Shilimkar [Fri, 8 Jan 2016 16:51:04 +0000 (08:51 -0800)]
Merge branch 'uek4/xsa155' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/xsa155' of git://ca-git.us.oracle.com/linux-konrad-public:
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()

9 years agoRevert "xen/fb: allow xenfb initialization for hvm guests"
Konrad Rzeszutek Wilk [Fri, 18 Dec 2015 17:38:23 +0000 (12:38 -0500)]
Revert "xen/fb: allow xenfb initialization for hvm guests"

This reverts commit d9581c7dcac15c02ad4d47c60c60f4d8f197db55.

which is:
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Fri Jan 3 19:02:09 2014 +0000

    xen/fb: allow xenfb initialization for hvm guests

    There is no reasons why an HVM guest shouldn't be allowed to use xenfb.

as "Xend" exposes an PV and VGA (via QEMU) driver to the guests (either
PV or HVM).

The backed in this case is QEMU - and it only provides the emulation
for the VGA backend (for HVM guests). Upstream QEMU can do both - but
since we are using Xend - we only expose QEMU-traditional.

Upstreamwise we hadn't reached a good decisions. Poking for XenStore
keys to see if the backend is Xend vs xl seems odd. And the issue
goes away in 'xl' as you can't define both PV and VGA drivers. Only
one of them is allowed.

The 'proper' fix would be to teach Xend to not expose PV FB for HVM
guests but that code is pretty genric.

So continuing on with this revert.

OraBug: 20386370 - XENBUS_PROBE_FRONTEND: TIMEOUT CONNECTING TO DEVICE EROR WITH UEK4
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:13:27 +0000 (18:13 -0500)]
xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.

commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.

Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).

Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.

Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
Konrad Rzeszutek Wilk [Wed, 1 Apr 2015 14:49:47 +0000 (10:49 -0400)]
xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.

Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).

This does not change the behavior of XEN_PCI_OP_disable_msi[|x].

The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.

However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.

This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.

Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7cfb905b9638982862f0331b36ccaaca5d383b49)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: Do not install an IRQ handler for MSI interrupts.
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 22:24:08 +0000 (17:24 -0500)]
xen/pciback: Do not install an IRQ handler for MSI interrupts.

Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:

 for (i = 0; i < entry->nvec_used; i++)
        BUG_ON(irq_has_action(entry->irq + i));

Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.

To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.

Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.

Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.

Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).

Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.

The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.

P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X...
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:07:44 +0000 (18:07 -0500)]
xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled

The guest sequence of:

  a) XEN_PCI_OP_enable_msix
  b) XEN_PCI_OP_enable_msix

results in hitting an NULL pointer due to using freed pointers.

The device passed in the guest MUST have MSI-X capability.

The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).

The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.

The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 15:08:22 +0000 (11:08 -0400)]
xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

The guest sequence of:

 a) XEN_PCI_OP_enable_msi
 b) XEN_PCI_OP_enable_msi
 c) XEN_PCI_OP_disable_msi

results in hitting an BUG_ON condition in the msi.c code.

The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.

The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:

BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));

and blows up.

The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.

This is part of XSA-157.

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen/pciback: Save xen_pci_op commands before processing it
Konrad Rzeszutek Wilk [Mon, 16 Nov 2015 17:40:48 +0000 (12:40 -0500)]
xen/pciback: Save xen_pci_op commands before processing it

Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.

The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.

This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-scsiback: safely copy requests
David Vrabel [Mon, 16 Nov 2015 18:02:32 +0000 (18:02 +0000)]
xen-scsiback: safely copy requests

The copy of the ring request was lacking a following barrier(),
potentially allowing the compiler to optimize the copy away.

Use RING_COPY_REQUEST() to ensure the request is copied to local
memory.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit be69746ec12f35b484707da505c6c76ff06f97dc)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-blkback: read from indirect descriptors only once
Roger Pau Monné [Tue, 3 Nov 2015 16:40:43 +0000 (16:40 +0000)]
xen-blkback: read from indirect descriptors only once

Since indirect descriptors are in memory shared with the frontend, the
frontend could alter the first_sect and last_sect values after they have
been validated but before they are recorded in the request.  This may
result in I/O requests that overflow the foreign page, possibly
overwriting local pages when the I/O request is executed.

When parsing indirect descriptors, only read first_sect and last_sect
once.

This is part of XSA155.

(cherry-pick from 18779149101c0dd43ded43669ae2a92d21b6f9cb)
CC: stable@vger.kernel.org
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
----
v2: This is against v4.3

9 years agoxen-blkback: only read request operation from shared ring once
Roger Pau Monné [Tue, 3 Nov 2015 16:34:09 +0000 (16:34 +0000)]
xen-blkback: only read request operation from shared ring once

A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.

When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 1f13d75ccb806260079e0679d55d9253e370ec8a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-netback: use RING_COPY_REQUEST() throughout
David Vrabel [Fri, 30 Oct 2015 15:17:06 +0000 (15:17 +0000)]
xen-netback: use RING_COPY_REQUEST() throughout

Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.

This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 68a33bfd8403e4e22847165d149823a2e0e67c9c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-netback: don't use last request to determine minimum Tx credit
David Vrabel [Fri, 30 Oct 2015 15:16:01 +0000 (15:16 +0000)]
xen-netback: don't use last request to determine minimum Tx credit

The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.

Instead allow for the worst case 128 KiB packet.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen: Add RING_COPY_REQUEST()
David Vrabel [Fri, 30 Oct 2015 14:58:08 +0000 (14:58 +0000)]
xen: Add RING_COPY_REQUEST()

Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected).  Safe usage of a request
generally requires taking a local copy.

Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy().  This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.

Use a volatile source to prevent the compiler from reordering or
omitting the copy.

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 454d5d882c7e412b840e3c99010fe81a9862f6fb)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-netfront: update num_queues to real created
Joe Jin [Mon, 19 Oct 2015 05:37:17 +0000 (13:37 +0800)]
xen-netfront: update num_queues to real created

Sometimes xennet_create_queues() may failed to created all requested
queues, we need to update num_queues to real created to avoid NULL
pointer dereference.

Orabug: 22069665

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
9 years agoxen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)
Cathy Avery [Fri, 23 Oct 2015 10:12:46 +0000 (18:12 +0800)]
xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)

Orabug: 21935345

xen-blkfront will crash if the check to talk_to_blkback()
in blkback_changed()(XenbusStateInitWait) returns an error.
The driver data is freed and info is set to NULL. Later during
the close process via talk_to_blkback's call to xenbus_dev_fatal()
the null pointer is passed to and dereference in blkfront_closing.

Signed-off-by: Cathy Avery <cathy.avery@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
9 years agoxen-netfront: respect user provided max_queues
Wei Liu [Thu, 10 Sep 2015 10:18:58 +0000 (11:18 +0100)]
xen-netfront: respect user provided max_queues

Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 32a844056fd43dda647e1c3c6b9983bdfa04d17d)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agonet/xen-netfront: only napi_synchronize() if running
Chas Williams [Thu, 27 Aug 2015 16:28:46 +0000 (12:28 -0400)]
net/xen-netfront: only napi_synchronize() if running

If an interface isn't running napi_synchronize() will hang forever.

[  392.248403] rmmod           R  running task        0   359    343 0x00000000
[  392.257671]  ffff88003760fc88 ffff880037193b40 ffff880037193160 ffff88003760fc88
[  392.267644]  ffff880037610000 ffff88003760fcd8 0000000100014c22 ffffffff81f75c40
[  392.277524]  0000000000bc7010 ffff88003760fca8 ffffffff81796927 ffffffff81f75c40
[  392.287323] Call Trace:
[  392.291599]  [<ffffffff81796927>] schedule+0x37/0x90
[  392.298553]  [<ffffffff8179985b>] schedule_timeout+0x14b/0x280
[  392.306421]  [<ffffffff810f91b9>] ? irq_free_descs+0x69/0x80
[  392.314006]  [<ffffffff811084d0>] ? internal_add_timer+0xb0/0xb0
[  392.322125]  [<ffffffff81109d07>] msleep+0x37/0x50
[  392.329037]  [<ffffffffa00ec79a>] xennet_disconnect_backend.isra.24+0xda/0x390 [xen_netfront]
[  392.339658]  [<ffffffffa00ecadc>] xennet_remove+0x2c/0x80 [xen_netfront]
[  392.348516]  [<ffffffff81481c69>] xenbus_dev_remove+0x59/0xc0
[  392.356257]  [<ffffffff814e7217>] __device_release_driver+0x87/0x120
[  392.364645]  [<ffffffff814e7cf8>] driver_detach+0xb8/0xc0
[  392.371989]  [<ffffffff814e6e69>] bus_remove_driver+0x59/0xe0
[  392.379883]  [<ffffffff814e84f0>] driver_unregister+0x30/0x70
[  392.387495]  [<ffffffff814814b2>] xenbus_unregister_driver+0x12/0x20
[  392.395908]  [<ffffffffa00ed89b>] netif_exit+0x10/0x775 [xen_netfront]
[  392.404877]  [<ffffffff81124e08>] SyS_delete_module+0x1d8/0x230
[  392.412804]  [<ffffffff8179a8ee>] system_call_fastpath+0x12/0x71

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 274b045509175db0405c784be85e8cce116e6f7d)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agonet/xen-netfront: only clean up queues if present
Chas Williams [Wed, 19 Aug 2015 23:14:20 +0000 (19:14 -0400)]
net/xen-netfront: only clean up queues if present

If you simply load and unload the module without starting the interfaces,
the queues are never created and you get a bad pointer dereference.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9a873c71e91cabf4c10fd9bbd8358c22deaf6c9e)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen-netback: respect user provided max_queues
Wei Liu [Thu, 10 Sep 2015 10:18:57 +0000 (11:18 +0100)]
xen-netback: respect user provided max_queues

Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Reported-by: Johnny Strom <johnny.strom@linuxsolutions.fi>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen-netback: require fewer guest Rx slots when not using GSO
David Vrabel [Tue, 8 Sep 2015 13:25:14 +0000 (14:25 +0100)]
xen-netback: require fewer guest Rx slots when not using GSO

Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1d5d48523900a4b0f25d6b52f1a93c84bd671186)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen-netback: add support for multicast control
Paul Durrant [Wed, 2 Sep 2015 16:58:36 +0000 (17:58 +0100)]
xen-netback: add support for multicast control

Xen's PV network protocol includes messages to add/remove ethernet
multicast addresses to/from a filter list in the backend. This allows
the frontend to request the backend only forward multicast packets
which are of interest thus preventing unnecessary noise on the shared
ring.

The canonical netif header in git://xenbits.xen.org/xen.git specifies
the message format (two more XEN_NETIF_EXTRA_TYPEs) so the minimal
necessary changes have been pulled into include/xen/interface/io/netif.h.

To prevent the frontend from extending the multicast filter list
arbitrarily a limit (XEN_NETBK_MCAST_MAX) has been set to 64 entries.
This limit is not specified by the protocol and so may change in future.
If the limit is reached then the next XEN_NETIF_EXTRA_TYPE_MCAST_ADD
sent by the frontend will be failed with NETIF_RSP_ERROR.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 210c34dcd8d912dcc740f1f17625a7293af5cb56)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen/netback: Wake dealloc thread after completing zerocopy work
Ross Lagerwall [Tue, 4 Aug 2015 14:40:59 +0000 (15:40 +0100)]
xen/netback: Wake dealloc thread after completing zerocopy work

Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57b229063ae6dc65036209018dc7f4290cc026bb)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen-netback: Allocate fraglist early to avoid complex rollback
Ross Lagerwall [Mon, 3 Aug 2015 14:38:03 +0000 (15:38 +0100)]
xen-netback: Allocate fraglist early to avoid complex rollback

Determine if a fraglist is needed in the tx path, and allocate it if
necessary before setting up the copy and map operations.
Otherwise, undoing the copy and map operations is tricky.

This fixes a use-after-free: if allocating the fraglist failed, the copy
and map operations that had been set up were still executed, writing
over the data area of a freed skb.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2475b22526d70234ecfe4a1ff88aed69badefba9)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agonet/xen-netback: off by one in BUG_ON() condition
Dan Carpenter [Sat, 11 Jul 2015 22:20:55 +0000 (01:20 +0300)]
net/xen-netback: off by one in BUG_ON() condition

The > should be >=.  I also added spaces around the '-' operations so
the code is a little more consistent and matches the condition better.

Fixes: f53c3fe8dad7 ('xen-netback: Introduce TX grant mapping')
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 50c2e4dd6749725338621fff456b26d3a592259f)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoxen-netback: remove duplicated function definition
Li, Liang Z [Mon, 6 Jul 2015 00:42:56 +0000 (08:42 +0800)]
xen-netback: remove duplicated function definition

There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6ab13b27699e5a71cca20d301c3c424653bd0841)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agonet/xen-netback: Don't mix hexa and decimal with 0x in the printf format
Julien Grall [Tue, 16 Jun 2015 19:10:48 +0000 (20:10 +0100)]
net/xen-netback: Don't mix hexa and decimal with 0x in the printf format

Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.

Also replace some of the hexadecimal print to decimal to uniformize the
format with netfront.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 68946159da1b0b6791c5990242940950b9383cfc)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agonet/xen-netback: Remove unused code in xenvif_rx_action
Julien Grall [Tue, 16 Jun 2015 19:10:47 +0000 (20:10 +0100)]
net/xen-netback: Remove unused code in xenvif_rx_action

The variables old_req_cons and ring_slots_used are assigned but never
used since commit 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b "xen-netback:
always fully coalesce guest Rx packets".

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 44f0764cfec9c607d43cad6a51e8592c7b2b9b84)
Signed-off-by: Annie Li <annie.li@oracle.com>
9 years agoMerge branch 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public into...
Santosh Shilimkar [Thu, 13 Aug 2015 20:14:34 +0000 (13:14 -0700)]
Merge branch 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

9 years agoMerge branch 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse...
Konrad Rzeszutek Wilk [Wed, 12 Aug 2015 19:20:26 +0000 (15:20 -0400)]
Merge branch 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public into uek-4.1/xen

* 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Backports from Linux 4.1

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxen-netfront: Remove the meaningless code
Li, Liang Z [Fri, 26 Jun 2015 23:17:26 +0000 (07:17 +0800)]
xen-netfront: Remove the meaningless code

The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905726c1c5a3ca620ba7d73c78eddfb91de5ce28)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agonet/xen-netfront: Correct printf format in xennet_get_responses
Julien Grall [Tue, 16 Jun 2015 19:10:46 +0000 (20:10 +0100)]
net/xen-netfront: Correct printf format in xennet_get_responses

rx->status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.

Also use %u rather than %x for rx->offset.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c10127d91bdc80e02938085d03c232ae3118ad5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen-netfront: Use setup_timer
Vaishali Thakkar [Mon, 1 Jun 2015 04:58:37 +0000 (10:28 +0530)]
xen-netfront: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 493be55ac3d81f9c32832237288eb397a9993d5d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Julien Grall [Mon, 10 Aug 2015 18:10:38 +0000 (19:10 +0100)]
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages() in xenbus_unmap_ring_vfree_hvm(), leaking a
page for every shared ring.

Only with backends running in HVM domains were affected.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c22fe519e7e2b94ad173e0ea3b89c1a7d8be8d00)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoRevert "xen/events/fifo: Handle linked events when closing a port"
David Vrabel [Mon, 10 Aug 2015 17:11:06 +0000 (18:11 +0100)]
Revert "xen/events/fifo: Handle linked events when closing a port"

This reverts commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8.

This was causing a WARNING whenever a PIRQ was closed since
shutdown_pirq() is called with irqs disabled.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
(cherry picked from commit ad6cd7bafcd2c812ba4200d5938e07304f1e2fcd)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agox86/xen: build "Xen PV" APIC driver for domU as well
Jason A. Donenfeld [Mon, 10 Aug 2015 13:40:27 +0000 (15:40 +0200)]
x86/xen: build "Xen PV" APIC driver for domU as well

It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
 #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 #1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
 #2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
 #3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
 #4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
 #5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
 #6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
 #7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
 #8  0xffffffffc00013ea in ?? ()
 #9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fc5fee86bdd3d720e2d1d324e4fae0c35845fa63)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/events/fifo: Handle linked events when closing a port
Ross Lagerwall [Fri, 31 Jul 2015 13:30:42 +0000 (14:30 +0100)]
xen/events/fifo: Handle linked events when closing a port

An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue.  If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.

When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.

If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.

This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen: release lock occasionally during ballooning
Juergen Gross [Mon, 20 Jul 2015 11:49:39 +0000 (13:49 +0200)]
xen: release lock occasionally during ballooning

When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 929423fa83e5b75e94101b280738b9a5a376a0e1)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/gntdevt: Fix race condition in gntdev_release()
Marek Marczykowski-Górecki [Fri, 26 Jun 2015 01:28:24 +0000 (03:28 +0200)]
xen/gntdevt: Fix race condition in gntdev_release()

While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoblock/xen-blkback: s/nr_pages/nr_segs/
Julien Grall [Wed, 17 Jun 2015 14:28:08 +0000 (15:28 +0100)]
block/xen-blkback: s/nr_pages/nr_segs/

Make the code less confusing to read now that Linux may not have the
same page size as Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6684fa1cdb1ebe804e9707f389255d461b2e95b0)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoblock/xen-blkfront: Remove invalid comment
Julien Grall [Wed, 17 Jun 2015 14:28:07 +0000 (15:28 +0100)]
block/xen-blkfront: Remove invalid comment

Since commit b764915 "xen-blkfront: use a different scatterlist for each
request", biovec has been replaced by scatterlist when copying back the
data during a completion request.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee4b7179fc9bf19fd22f9d17757a86582c40229e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoarm/xen: Drop duplicate define mfn_to_virt
Julien Grall [Wed, 17 Jun 2015 14:28:05 +0000 (15:28 +0100)]
arm/xen: Drop duplicate define mfn_to_virt

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 90d73c4f43630ca45398cee7f32f0fe51271b2ce)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/grant-table: Remove unused macro SPP
Julien Grall [Wed, 17 Jun 2015 14:28:04 +0000 (15:28 +0100)]
xen/grant-table: Remove unused macro SPP

SPP was used by the grant table v2 code which has been removed in
commit 438b33c7145ca8a5131a30c36d8f59bce119a19a "xen/grant-table:
remove support for V2 tables".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 548f7c94759ac58d4744ef2663e2a66a106e21c5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
Julien Grall [Wed, 17 Jun 2015 14:28:03 +0000 (15:28 +0100)]
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring

virt_to_mfn should take a void* rather an unsigned long. While it
doesn't really matter now, it would throw a compiler warning later when
virt_to_mfn will enforce the type.

At the same time, avoid to compute new virtual address every time in the
loop and directly increment the parameter as we don't use it later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c9fd55eb6625ead6a1207e7da38026ff47c5198b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen: Include xen/page.h rather than asm/xen/page.h
Julien Grall [Wed, 17 Jun 2015 14:28:02 +0000 (15:28 +0100)]
xen: Include xen/page.h rather than asm/xen/page.h

Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a9fd60e2683fb80f5b26a7d686aebe3327a63e70)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agokconfig: add xenconfig defconfig helper
Luis R. Rodriguez [Wed, 20 May 2015 18:53:39 +0000 (11:53 -0700)]
kconfig: add xenconfig defconfig helper

This lets you build a kernel which can support xen dom0
or xen guests on i386, x86-64 and arm64 by just using:

   make xenconfig

You can start from an allnoconfig and then switch to xenconfig.
This also splits out the options which are available currently
to be built with x86 and 'make ARCH=arm64' under a shared config.

Technically xen supports a dom0 kernel and also a guest
kernel configuration but upon review with the xen team
since we don't have many dom0 options its best to just
combine these two into one.

A few generic notes: we enable both of these:

CONFIG_INET=y
CONFIG_BINFMT_ELF=y

although technically not required given you likely will
end up with a pretty useless system otherwise.

A few architectural differences worth noting:

$ make allnoconfig; make xenconfig > /dev/null ; \
grep XEN .config > 64-bit-config
$ make ARCH=i386 allnoconfig; make ARCH=i386 xenconfig > /dev/null; \
grep XEN .config > 32-bit-config
$ make ARCH=arm64 allnoconfig; make ARCH=arm64 xenconfig > /dev/null; \
grep XEN .config > arm64-config

Since the options are already split up with a generic config and
architecture specific configs you anything on the x86 configs
are known to only work right now on x86. For instance arm64 doesn't
support MEMORY_HOTPLUG yet as such although we try to enabe it
generically arm64 doesn't have it yet, so we leave the xen
specific kconfig option XEN_BALLOON_MEMORY_HOTPLUG on x86's config
file to set expecations correctly.

Then on x86 we have differences between i386 and x86-64. The difference
between 64-bit-config and 32-bit-config is you don't get XEN_MCE_LOG as
this is only supported on 64-bit. You also do not get on i386
XEN_BALLOON_MEMORY_HOTPLUG, there does not seem to be any technical
reasons to not allow this but I gave up after a few attempts.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c6685055a285de53f18fbf6611687291b57ccd6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agokconfig: clarify kvmconfig is for kvm
Luis R. Rodriguez [Wed, 20 May 2015 18:53:38 +0000 (11:53 -0700)]
kconfig: clarify kvmconfig is for kvm

We'll be adding options for xen as well.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9bcd776d299e8adb8ed37be0453472aa3cc2b7db)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/pcifront: Remove usage of struct timeval
Tina Ruchandani [Tue, 19 May 2015 06:08:09 +0000 (11:38 +0530)]
xen/pcifront: Remove usage of struct timeval

struct timeval uses a 32-bit field for representing seconds, which
will overflow in the year 2038 and beyond. Replace struct timeval with
64-bit ktime_t which is 2038 safe.  This is part of a larger effort to
remove instances of 32-bit timekeeping variables (timeval, time_t and
timespec) from the kernel.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e1d5bbcdc7ca08d8731f5d780f0de342a768d96a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
Jan Beulich [Thu, 28 May 2015 12:04:33 +0000 (13:04 +0100)]
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 01b720f3295b6c1b2dcfc1affd0fedc6f5d28c1e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agohvc_xen: avoid uninitialized variable warning
Jan Beulich [Thu, 28 May 2015 08:28:22 +0000 (09:28 +0100)]
hvc_xen: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c4ace5daf4ff726402b13f1ababf2ad0e0ceec65)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxenbus: avoid uninitialized variable warning
Jan Beulich [Thu, 28 May 2015 08:26:37 +0000 (09:26 +0100)]
xenbus: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 76ea3cb428c5dc4fd5a415a7651026caf198fb3d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/arm: allow console=hvc0 to be omitted for guests
Ard Biesheuvel [Wed, 6 May 2015 14:14:22 +0000 (14:14 +0000)]
xen/arm: allow console=hvc0 to be omitted for guests

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This patch registers hvc0 as the preferred console if no console
has been specified explicitly on the kernel command line.

The purpose is to allow platform agnostic kernels and boot images
(such as distro installers) to boot in a Xen/ARM domU without the
need to modify the command line by hand.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f1dddd118c555508ce383b7262f4e6440927bdf4)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoarm,arm64/xen: move Xen initialization earlier
Stefano Stabellini [Wed, 6 May 2015 14:13:31 +0000 (14:13 +0000)]
arm,arm64/xen: move Xen initialization earlier

Currently, Xen is initialized/discovered in an initcall. This doesn't
allow us to support earlyprintk or choosing the preferred console when
running on Xen.

The current function xen_guest_init is now split in 2 parts:
    - xen_early_init: Check if there is a Xen node in the device tree
    and setup domain type
    - xen_guest_init: Retrieve the information from the device node and
    initialize Xen (grant table, shared page...)

The former is called in setup_arch, while the latter is an initcall.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 5882bfef6327093bff63569be19795170ff71e5f)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoarm/xen: Correctly check if the event channel interrupt is present
Julien Grall [Wed, 6 May 2015 14:09:06 +0000 (14:09 +0000)]
arm/xen: Correctly check if the event channel interrupt is present

The function irq_of_parse_and_map returns 0 when the IRQ is not found.

Futhermore, move the check before notifying the user that we are running on
Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 81e863c3a28e69fd60411bde9f779b0f8ad0212a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen-blkback: replace work_pending with work_busy in purge_persistent_gnt()
Bob Liu [Wed, 22 Jul 2015 06:40:10 +0000 (14:40 +0800)]
xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 53bc7dc004fecf39e0ba70f2f8d120a1444315d3)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen-blkfront: don't add indirect pages to list when !feature_persistent
Bob Liu [Wed, 22 Jul 2015 06:40:09 +0000 (14:40 +0800)]
xen-blkfront: don't add indirect pages to list when !feature_persistent

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen-blkfront: introduce blkfront_gather_backend_features()
Bob Liu [Wed, 22 Jul 2015 06:40:08 +0000 (14:40 +0800)]
xen-blkfront: introduce blkfront_gather_backend_features()

There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839

The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469

As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d50babbe300eedf33ea5b00a12c5df3a05bd96c7)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agodrivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising
Bob Liu [Fri, 19 Jun 2015 04:23:00 +0000 (00:23 -0400)]
drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising

Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:

vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.

state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().

While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).

The end result is that the ring ends up being incorrectly set.

The other drivers in the tree have such checks already in.

Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a9b54bb95176cd27f952cd9647849022c4c998d6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agoxen/block: add multi-page ring support
Bob Liu [Wed, 3 Jun 2015 05:40:03 +0000 (13:40 +0800)]
xen/block: add multi-page ring support

Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
 - Rebased to 4.0-rc6.
 - Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
 - Remove changes to linux io/blkif.h and follow the protocol defined
   in io/blkif.h of XEN tree.
 - Rebased to 4.1-rc3

Changes in v4:
 - Turn to use 'ring-page-order' and 'max-ring-page-order'.
 - A few comments from Roger.

Changes in v5:
 - Clarify with 4k granularity to comment
 - Address more comments from Roger

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 86839c56dee28c315a4c19b7bfee450ccd84cd25)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agodriver: xen-blkfront: move talk_to_blkback to a more suitable place
Bob Liu [Wed, 3 Jun 2015 05:40:02 +0000 (13:40 +0800)]
driver: xen-blkfront: move talk_to_blkback to a more suitable place

The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front                                Back
=================================    =====================================
XenbusStateInitialising              XenbusStateInitialising
 o Query virtual device               o Query backend device identification
   properties.                          data.
 o Setup OS device instance.          o Open and validate backend device.
                                      o Publish backend features and
                                        transport parameters.
                                                     |
                                                     |
                                                     V
                                     XenbusStateInitWait

o Query backend features and
  transport parameters.
o Allocate and initialize the
  request ring.

There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.

Changes in v2:
 - Re-write the commit message to be more clear.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8ab0144a466320cc37c52e7866b5103c5bbd4e90)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9 years agodrivers: xen-blkback: delay pending_req allocation to connect_ring
Bob Liu [Wed, 3 Jun 2015 05:40:01 +0000 (13:40 +0800)]
drivers: xen-blkback: delay pending_req allocation to connect_ring

This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
10 years agoxen/microcode: Use dummy microcode_ops for non initial domain guest
Zhenzhong Duan [Thu, 26 Jun 2014 09:18:23 +0000 (17:18 +0800)]
xen/microcode: Use dummy microcode_ops for non initial domain guest

Orabug: 19053626

Currently non initial domain guest use Intel or AMD specific ops.
This will also slow up the startup on heavily overcommited guests (say 256VCPUs
on 20 PCPU), as there are many read and write to x86 MSR registers which will
trap to xen during microcode update. Finally it will fail and report errors.

A dummy ops could fix that and also make udevd silent (bug18379824)
by augmenting the commit c18a317f6892536851e5852b6aaa4ef42cbc11a2
"xen/microcode: Only load under initial domain." which fell short of its
intended fix.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
(cherry picked from commit 62b84234f23c1020c690d162b7d8250042425e1e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/kernel/microcode_core.c

10 years agoxen/microcode: Fix compile warning.
Konrad Rzeszutek Wilk [Fri, 6 Jun 2014 18:09:14 +0000 (14:09 -0400)]
xen/microcode: Fix compile warning.

We get a bunch of them. Might as well fix it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 3b99e95e07cee4bbe3ba2b511fc2ac38ff7769b9)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agomicrocode_xen: Add support for AMD family >= 15h
Ian Campbell [Mon, 26 Nov 2012 09:41:02 +0000 (09:41 +0000)]
microcode_xen: Add support for AMD family >= 15h

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8b080aa43b95719d8981ba06f357abb6f0ba9d52)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>