www.infradead.org Git - users/jedix/linux-maple.git/log

xen: switch extra memory accounting to use pfns

Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 626d7508664c4bc8e67f496da4387ecd0c410b8c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: limit memory to architectural maximum

When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.

This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cb9e444b5aaa900bb4310da411315b6947c53e37)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: avoid another early crash of memory limited dom0

Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.

The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.

An example E820 map triggering the problem looks like this:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable

In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.

To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ab24507cfae8d916814bb6c16f66e453184a29a5)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: avoid early crash of memory limited dom0

Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.

The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:

(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.2-pre  x86_64  debug=n Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81d120cb>]
(XEN) RFLAGS: 0000000000000206   EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax: ffffffff81db2000   rbx: 000000004d000000   rcx: 0000000000000000
(XEN) rdx: 000000004d000000   rsi: 0000000000063000   rdi: 000000004d063000
(XEN) rbp: ffffffff81c03d78   rsp: ffffffff81c03d28   r8:  0000000000023000
(XEN) r9:  00000001040ff000   r10: 0000000000007ff0   r11: 0000000000000000
(XEN) r12: 0000000000063000   r13: 000000000004d000   r14: 0000000000000063
(XEN) r15: 0000000000000063   cr0: 0000000080050033   cr4: 00000000000006f0
(XEN) cr3: 0000000105c0f000   cr2: ffffc90000268000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03d28:
(XEN)   0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
(XEN)   0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
(XEN)   0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
(XEN)   ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
(XEN)   00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
(XEN)   ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
(XEN)   ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
(XEN)   0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
(XEN)   00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
(XEN)   ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
(XEN)   ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
(XEN)   0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
(XEN)   ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
(XEN)   0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
(XEN)   0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)   ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
(XEN)   0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit eafd72e016c69df511b14a98b61e439c58ad9c51)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

arm/xen: Remove helpers which are PV specific

ARM guests are always HVM. The current implementation is assuming a 1:1
mapping which is only true for DOM0 and may not be at all in the future.

Furthermore, all the helpers but arbitrary_virt_to_machine are used in
x86 specific code (or only compiled for).

The helper arbitrary_virt_to_machine is only used in PV specific code.
Therefore we should never call the function.

Add a BUG() in this helper and drop all the others.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 724afaea2020f3bd98891b535f3ce5d3935bcf63)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/x86: Don't try to set PCE bit in CR4

Since VPMU code emulates RDPMC instruction with RDMSR and because hypervisor
does not emulate it there is no reason to try setting CR4's PCE bit (and the
hypervisor will warn on seeing it set).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3375d8284dfb7866f261ec008d15d30999ff273b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/PMU: PMU emulation code

Add PMU emulation code that runs when we are processing a PMU interrupt.
This code will allow us not to trap to hypervisor on each MSR/LVTPC access
(of which there may be quite a few in the handler).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit bf6dfb154d935725c9a2005033ca33017b9df439)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/PMU: Intercept PMU-related MSR and APIC accesses

Provide interfaces for recognizing accesses to PMU-related MSRs and
LVTPC APIC and process these accesses in Xen PMU code.

(The interrupt handler performs XENPMU_flush right away in the beginning
since no PMU emulation is available. It will be added with a later patch).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6b08cd6328c58a2ae190c5ee03a2ffcab5ef828e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/PMU: Describe vendor-specific PMU registers

AMD and Intel PMU register initialization and helpers that determine
whether a register belongs to PMU.

This and some of subsequent PMU emulation code is somewhat similar to
Xen's PMU implementation.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e27b72df01109c689062caeba1defa013b759e0e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/PMU: Initialization code for Xen PMU

Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.

Set up PMU VIRQ.

Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.

For the same reason avoid issuing a warning on a write to APIC's LVTPC.

Both of these will be made functional in later patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 65d0cf0be79feebeb19e7626fd3ed41ae73f642d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/PMU: Sysfs interface for setting Xen PMU mode

Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 5f141548824cebbff2e838ff401c34e667797467)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: xensyms support

Export Xen symbols to dom0 via /proc/xen/xensyms (similar to
/proc/kallsyms).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a11f4f0a4e18b4bdc7d5e36438711e038b7a1f74)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: remove no longer needed p2m.h

Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more.

Most definitions in this file are used in p2m.c only. Move those into
p2m.c.

set_phys_range_identity() is already declared in
arch/x86/include/asm/xen/page.h, add __init annotation there.

MAX_REMAP_RANGES isn't used at all, just delete it.

The only define left is P2M_PER_PAGE which is moved to page.h as well.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cb3eb850137cd43fc3e25d2062525f5ba5fd884a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: allow more than 512 GB of RAM for 64 bit pv-domains

64 bit pv-domains under Xen are limited to 512 GB of RAM today. The
main reason has been the 3 level p2m tree, which was replaced by the
virtual mapped linear p2m list. Parallel to the p2m list which is
being used by the kernel itself there is a 3 level mfn tree for usage
by the Xen tools and eventually for crash dump analysis. For this tree
the linear p2m list can serve as a replacement, too. As the kernel
can't know whether the tools are capable of dealing with the p2m list
instead of the mfn tree, the limit of 512 GB can't be dropped in all
cases.

This patch replaces the hard limit by a kernel parameter which tells
the kernel to obey the 512 GB limit or not. The default is selected by
a configuration parameter which specifies whether the 512 GB limit
should be active per default for domUs (domain save/restore/migration
and crash dump analysis are affected).

Memory above the domain limit is returned to the hypervisor instead of
being identity mapped, which was wrong anyway.

The kernel configuration parameter to specify the maximum size of a
domain can be deleted, as it is not relevant any more.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c70727a5bc18a5a233fddc6056d1de9144d7a293)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: move p2m list if conflicting with e820 map

Check whether the hypervisor supplied p2m list is placed at a location
which is conflicting with the target E820 map. If this is the case
relocate it to a new area unused up to now and compliant to the E820
map.

As the p2m list might by huge (up to several GB) and is required to be
mapped virtually, set up a temporary mapping for the copied list.

For pvh domains just delete the p2m related information from start
info instead of reserving the p2m memory, as we don't need it at all.

For 32 bit kernels adjust the memblock_reserve() parameters in order
to cover the page tables only. This requires to memblock_reserve() the
start_info page on it's own.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 70e61199559a09c62714694cd5ac3c3640c41552)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: add explicit memblock_reserve() calls for special pages

Some special pages containing interfaces to xen are being reserved
implicitly only today. The memblock_reserve() call to reserve them is
meant to reserve the p2m list supplied by xen. It is just reserving
not only the p2m list itself, but some more pages up to the start of
the xen built page tables.

To be able to move the p2m list to another pfn range, which is needed
for support of huge RAM, this memblock_reserve() must be split up to
cover all affected reserved pages explicitly.

The affected pages are:
- start_info page
- xenstore ring (might be missing, mfn is 0 in this case)
- console ring (not for initial domain)

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c2681c863b24360098d1ba60f2af060a13a0561)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

mm: provide early_memremap_ro to establish read-only mapping

During early boot as Xen pv domain the kernel needs to map some page
tables supplied by the hypervisor read only. This is needed to be
able to relocate some data structures conflicting with the physical
memory map especially on systems with huge RAM (above 512GB).

Provide the function early_memremap_ro() to provide this read only
mapping.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 2592dbbbf4c67501c2bd2dcf89c2b8924d592a9f)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: check for initrd conflicting with e820 map

Check whether the initrd is placed at a location which is conflicting
with the target E820 map. If this is the case relocate it to a new
area unused up to now and compliant to the E820 map.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4b9c15377f96e241be347fd3bbeeff74fbad0b44)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: check pre-allocated page tables for conflict with memory map

Check whether the page tables built by the domain builder are at
memory addresses which are in conflict with the target memory map.
If this is the case just panic instead of running into problems
later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 04414baab5ba862b10bde837c4773ffdbb78f0e0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: check for kernel memory conflicting with memory layout

Checks whether the pre-allocated memory of the loaded kernel is in
conflict with the target memory map. If this is the case, just panic
instead of run into problems later, as there is nothing we can do
to repair this situation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 808fdb71936c41d46245f0e3aa6ec889cba70d97)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: find unused contiguous memory area

For being able to relocate pre-allocated data areas like initrd or
p2m list it is mandatory to find a contiguous memory area which is
not yet in use and doesn't conflict with the memory map we want to
be in effect.

In case such an area is found reserve it at once as this will be
required to be done in any case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9ddac5b724a9465e27f25a0aa943e92c8341a85b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: check memory area against e820 map

Provide a service routine to check a physical memory area against the
E820 map. The routine will return false if the complete area is RAM
according to the E820 map and true otherwise.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e612b4a7db4ae1dd8c2bbe171e10c21723de95b2)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: split counting of extra memory pages from remapping

Memory pages in the initial memory setup done by the Xen hypervisor
conflicting with the target E820 map are remapped. In order to do this
those pages are counted and remapped in xen_set_identity_and_remap().

Split the counting from the remapping operation to be able to setup
the needed memory sizes in time but doing the remap operation at a
later time. This enables us to simplify the interface to
xen_set_identity_and_remap() as the number of remapped and released
pages is no longer needed here.

Finally move the remapping further down to prepare relocating
conflicting memory contents before the memory might be clobbered by
xen_set_identity_and_remap(). This requires to not destroy the Xen
E820 map when the one for the system is being constructed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 5097cdf6cef15439f971df54f9abcf143d7ca698)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: move static e820 map to global scope

Instead of using a function local static e820 map in xen_memory_setup()
and calling various functions in the same source with the map as a
parameter use a map directly accessible by all functions in the source.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 69632ecfcd03b12202ed62dfa0aabac83904f8ac)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: eliminate scalability issues from initial mapping setup

Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.

As the initial P->M table is copied rather early during boot to
domain private memory and it's initial virtual mapping is dropped,
the easiest way to avoid virtual address conflicts with other
addresses in the kernel is to use a user address area for the
virtual address of the initial P->M table. This allows us to just
throw away the page tables of the initial mapping after the copy
without having to care about address invalidation.

It should be noted that this patch won't enable a pv-domain to USE
more than 512 GB of RAM. It just enables it to be started with a
P->M table covering more memory. This is especially important for
being able to boot a Dom0 on a system with more than 512 GB memory.

Signed-off-by: Juergen Gross <jgross@suse.com>
Based-on-patch-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 8f5b0c63987207fd5c3c1f89c9eb6cb95b30386e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: don't build mfn tree if tools don't need it

In case the Xen tools indicate they don't need the p2m 3 level tree
as they support the virtual mapped linear p2m list, just omit building
the tree.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit d51e8b3e85972dee10be7943b0b0106742b1e847)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: save linear p2m list address in shared info structure

The virtual address of the linear p2m list should be stored in the
shared info structure read by the Xen tools to be able to support
64 bit pv-domains larger than 512 GB. Additionally the linear p2m
list interface includes a generation count which is changed prior
to and after each mapping change of the p2m list. Reading the
generation count the Xen tools can detect changes of the mappings
and re-read the p2m list eventually.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4b9c9a11803eaa73b3223da9fcaea39b2f919d80)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: sync with xen headers

Use the newest headers from the xen tree to get some new structure
layouts.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 17fb46b1190b677a37cdd636e2aa30052109f51b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

arm/xen: Drop the definition of xen_pci_platform_unplug

The commit 6f6c15ef912465b3aaafe709f39bd6026a8b3e72 "xen/pvhvm: Remove
the xen_platform_pci int." makes the x86 version of
xen_pci_platform_unplug static.

Therefore we don't need anymore to define a dummy xen_pci_platform_unplug
for ARM.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit 7ed208ef4ef9dbd03cda8a5b5a85cc78f79ef213)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/events: Support event channel rebind on ARM

Currently, the event channel rebind code is gated with the presence of
the vector callback.

The virtual interrupt controller on ARM has the concept of per-CPU
interrupt (PPI) which allow us to support per-VCPU event channel.
Therefore there is no need of vector callback for ARM.

Xen is already using a free PPI to notify the guest VCPU of an event.
Furthermore, the xen code initialization in Linux (see
arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ.

Introduce new helper xen_support_evtchn_rebind to allow architecture
decide whether rebind an event is support or not. It will always return
true on ARM and keep the same behavior on x86.

This is also allow us to drop the usage of xen_have_vector_callback
entirely in the ARM code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 4a5b69464e51f4a8dd432e8c2a1468630df1a53c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-blkfront: convert to blk-mq APIs

Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 907c3eb18e0bd86ca12a9de80befe8e3647bac3e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/preempt: use need_resched() instead of should_resched()

This code is used only when CONFIG_PREEMPT=n and only in non-atomic
context: xen_in_preemptible_hcall is set only in
privcmd_ioctl_hypercall(). Thus preempt_count is zero and
should_resched() is equal to need_resched().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a7da51ae10032a507ddeae6a490916eadbd1e10a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/xen: fix non-ANSI declaration of xen_has_pv_devices()

xen_has_pv_devices() has no parameters, so use the normal void
parameter convention to make it match the prototype in the header file
include/xen/platform_pci.h.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 772f95e3b9460c64fb99b134022855cbce75b9a0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-netfront: update num_queues to real created

Sometimes xennet_create_queues() may failed to created all requested
queues, we need to update num_queues to real created to avoid NULL
pointer dereference.

Orabug: 22069665

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>

xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)

Orabug: 21935345

xen-blkfront will crash if the check to talk_to_blkback()
in blkback_changed()(XenbusStateInitWait) returns an error.
The driver data is freed and info is set to NULL. Later during
the close process via talk_to_blkback's call to xenbus_dev_fatal()
the null pointer is passed to and dereference in blkfront_closing.

Signed-off-by: Cathy Avery <cathy.avery@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Joe Jin <joe.jin@oracle.com>

xen-netfront: respect user provided max_queues

Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 32a844056fd43dda647e1c3c6b9983bdfa04d17d)
Signed-off-by: Annie Li <annie.li@oracle.com>

net/xen-netfront: only napi_synchronize() if running

If an interface isn't running napi_synchronize() will hang forever.

[  392.248403] rmmod           R  running task        0   359    343 0x00000000
[  392.257671]  ffff88003760fc88 ffff880037193b40 ffff880037193160 ffff88003760fc88
[  392.267644]  ffff880037610000 ffff88003760fcd8 0000000100014c22 ffffffff81f75c40
[  392.277524]  0000000000bc7010 ffff88003760fca8 ffffffff81796927 ffffffff81f75c40
[  392.287323] Call Trace:
[  392.291599]  [<ffffffff81796927>] schedule+0x37/0x90
[  392.298553]  [<ffffffff8179985b>] schedule_timeout+0x14b/0x280
[  392.306421]  [<ffffffff810f91b9>] ? irq_free_descs+0x69/0x80
[  392.314006]  [<ffffffff811084d0>] ? internal_add_timer+0xb0/0xb0
[  392.322125]  [<ffffffff81109d07>] msleep+0x37/0x50
[  392.329037]  [<ffffffffa00ec79a>] xennet_disconnect_backend.isra.24+0xda/0x390 [xen_netfront]
[  392.339658]  [<ffffffffa00ecadc>] xennet_remove+0x2c/0x80 [xen_netfront]
[  392.348516]  [<ffffffff81481c69>] xenbus_dev_remove+0x59/0xc0
[  392.356257]  [<ffffffff814e7217>] __device_release_driver+0x87/0x120
[  392.364645]  [<ffffffff814e7cf8>] driver_detach+0xb8/0xc0
[  392.371989]  [<ffffffff814e6e69>] bus_remove_driver+0x59/0xe0
[  392.379883]  [<ffffffff814e84f0>] driver_unregister+0x30/0x70
[  392.387495]  [<ffffffff814814b2>] xenbus_unregister_driver+0x12/0x20
[  392.395908]  [<ffffffffa00ed89b>] netif_exit+0x10/0x775 [xen_netfront]
[  392.404877]  [<ffffffff81124e08>] SyS_delete_module+0x1d8/0x230
[  392.412804]  [<ffffffff8179a8ee>] system_call_fastpath+0x12/0x71

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 274b045509175db0405c784be85e8cce116e6f7d)
Signed-off-by: Annie Li <annie.li@oracle.com>

net/xen-netfront: only clean up queues if present

If you simply load and unload the module without starting the interfaces,
the queues are never created and you get a bad pointer dereference.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9a873c71e91cabf4c10fd9bbd8358c22deaf6c9e)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen-netback: respect user provided max_queues

Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Reported-by: Johnny Strom <johnny.strom@linuxsolutions.fi>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen-netback: require fewer guest Rx slots when not using GSO

Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1d5d48523900a4b0f25d6b52f1a93c84bd671186)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen-netback: add support for multicast control

Xen's PV network protocol includes messages to add/remove ethernet
multicast addresses to/from a filter list in the backend. This allows
the frontend to request the backend only forward multicast packets
which are of interest thus preventing unnecessary noise on the shared
ring.

The canonical netif header in git://xenbits.xen.org/xen.git specifies
the message format (two more XEN_NETIF_EXTRA_TYPEs) so the minimal
necessary changes have been pulled into include/xen/interface/io/netif.h.

To prevent the frontend from extending the multicast filter list
arbitrarily a limit (XEN_NETBK_MCAST_MAX) has been set to 64 entries.
This limit is not specified by the protocol and so may change in future.
If the limit is reached then the next XEN_NETIF_EXTRA_TYPE_MCAST_ADD
sent by the frontend will be failed with NETIF_RSP_ERROR.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 210c34dcd8d912dcc740f1f17625a7293af5cb56)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen/netback: Wake dealloc thread after completing zerocopy work

Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57b229063ae6dc65036209018dc7f4290cc026bb)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen-netback: Allocate fraglist early to avoid complex rollback

Determine if a fraglist is needed in the tx path, and allocate it if
necessary before setting up the copy and map operations.
Otherwise, undoing the copy and map operations is tricky.

This fixes a use-after-free: if allocating the fraglist failed, the copy
and map operations that had been set up were still executed, writing
over the data area of a freed skb.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2475b22526d70234ecfe4a1ff88aed69badefba9)
Signed-off-by: Annie Li <annie.li@oracle.com>

net/xen-netback: off by one in BUG_ON() condition

The > should be >=. I also added spaces around the '-' operations so
the code is a little more consistent and matches the condition better.

Fixes: f53c3fe8dad7 ('xen-netback: Introduce TX grant mapping')
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 50c2e4dd6749725338621fff456b26d3a592259f)
Signed-off-by: Annie Li <annie.li@oracle.com>

xen-netback: remove duplicated function definition

There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6ab13b27699e5a71cca20d301c3c424653bd0841)
Signed-off-by: Annie Li <annie.li@oracle.com>

net/xen-netback: Don't mix hexa and decimal with 0x in the printf format

Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.

Also replace some of the hexadecimal print to decimal to uniformize the
format with netfront.

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 68946159da1b0b6791c5990242940950b9383cfc)
Signed-off-by: Annie Li <annie.li@oracle.com>

net/xen-netback: Remove unused code in xenvif_rx_action

The variables old_req_cons and ring_slots_used are assigned but never
used since commit 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b "xen-netback:
always fully coalesce guest Rx packets".

Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 44f0764cfec9c607d43cad6a51e8592c7b2b9b84)
Signed-off-by: Annie Li <annie.li@oracle.com>

Merge branch 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Merge branch 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public into uek-4.1/xen

* 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Backports from Linux 4.1

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-netfront: Remove the meaningless code

The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905726c1c5a3ca620ba7d73c78eddfb91de5ce28)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

net/xen-netfront: Correct printf format in xennet_get_responses

rx->status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.

Also use %u rather than %x for rx->offset.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c10127d91bdc80e02938085d03c232ae3118ad5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-netfront: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 493be55ac3d81f9c32832237288eb397a9993d5d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages() in xenbus_unmap_ring_vfree_hvm(), leaking a
page for every shared ring.

Only with backends running in HVM domains were affected.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c22fe519e7e2b94ad173e0ea3b89c1a7d8be8d00)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Revert "xen/events/fifo: Handle linked events when closing a port"

This reverts commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8.

This was causing a WARNING whenever a PIRQ was closed since
shutdown_pirq() is called with irqs disabled.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
(cherry picked from commit ad6cd7bafcd2c812ba4200d5938e07304f1e2fcd)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

x86/xen: build "Xen PV" APIC driver for domU as well

It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
#7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8  0xffffffffc00013ea in ?? ()
#9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fc5fee86bdd3d720e2d1d324e4fae0c35845fa63)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/events/fifo: Handle linked events when closing a port

An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue. If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.

When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.

If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.

This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen: release lock occasionally during ballooning

When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 929423fa83e5b75e94101b280738b9a5a376a0e1)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/gntdevt: Fix race condition in gntdev_release()

While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

block/xen-blkback: s/nr_pages/nr_segs/

Make the code less confusing to read now that Linux may not have the
same page size as Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6684fa1cdb1ebe804e9707f389255d461b2e95b0)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

block/xen-blkfront: Remove invalid comment

Since commit b764915 "xen-blkfront: use a different scatterlist for each
request", biovec has been replaced by scatterlist when copying back the
data during a completion request.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee4b7179fc9bf19fd22f9d17757a86582c40229e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm/xen: Drop duplicate define mfn_to_virt

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 90d73c4f43630ca45398cee7f32f0fe51271b2ce)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/grant-table: Remove unused macro SPP

SPP was used by the grant table v2 code which has been removed in
commit 438b33c7145ca8a5131a30c36d8f59bce119a19a "xen/grant-table:
remove support for V2 tables".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 548f7c94759ac58d4744ef2663e2a66a106e21c5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring

virt_to_mfn should take a void* rather an unsigned long. While it
doesn't really matter now, it would throw a compiler warning later when
virt_to_mfn will enforce the type.

At the same time, avoid to compute new virtual address every time in the
loop and directly increment the parameter as we don't use it later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c9fd55eb6625ead6a1207e7da38026ff47c5198b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen: Include xen/page.h rather than asm/xen/page.h

Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a9fd60e2683fb80f5b26a7d686aebe3327a63e70)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

kconfig: add xenconfig defconfig helper

This lets you build a kernel which can support xen dom0
or xen guests on i386, x86-64 and arm64 by just using:

make xenconfig

You can start from an allnoconfig and then switch to xenconfig.
This also splits out the options which are available currently
to be built with x86 and 'make ARCH=arm64' under a shared config.

Technically xen supports a dom0 kernel and also a guest
kernel configuration but upon review with the xen team
since we don't have many dom0 options its best to just
combine these two into one.

A few generic notes: we enable both of these:

CONFIG_INET=y
CONFIG_BINFMT_ELF=y

although technically not required given you likely will
end up with a pretty useless system otherwise.

A few architectural differences worth noting:

$ make allnoconfig; make xenconfig > /dev/null ; \
grep XEN .config > 64-bit-config
$ make ARCH=i386 allnoconfig; make ARCH=i386 xenconfig > /dev/null; \
grep XEN .config > 32-bit-config
$ make ARCH=arm64 allnoconfig; make ARCH=arm64 xenconfig > /dev/null; \
grep XEN .config > arm64-config

Since the options are already split up with a generic config and
architecture specific configs you anything on the x86 configs
are known to only work right now on x86. For instance arm64 doesn't
support MEMORY_HOTPLUG yet as such although we try to enabe it
generically arm64 doesn't have it yet, so we leave the xen
specific kconfig option XEN_BALLOON_MEMORY_HOTPLUG on x86's config
file to set expecations correctly.

Then on x86 we have differences between i386 and x86-64. The difference
between 64-bit-config and 32-bit-config is you don't get XEN_MCE_LOG as
this is only supported on 64-bit. You also do not get on i386
XEN_BALLOON_MEMORY_HOTPLUG, there does not seem to be any technical
reasons to not allow this but I gave up after a few attempts.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c6685055a285de53f18fbf6611687291b57ccd6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

kconfig: clarify kvmconfig is for kvm

We'll be adding options for xen as well.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9bcd776d299e8adb8ed37be0453472aa3cc2b7db)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/pcifront: Remove usage of struct timeval

struct timeval uses a 32-bit field for representing seconds, which
will overflow in the year 2038 and beyond. Replace struct timeval with
64-bit ktime_t which is 2038 safe. This is part of a larger effort to
remove instances of 32-bit timekeeping variables (timeval, time_t and
timespec) from the kernel.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e1d5bbcdc7ca08d8731f5d780f0de342a768d96a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 01b720f3295b6c1b2dcfc1affd0fedc6f5d28c1e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

hvc_xen: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c4ace5daf4ff726402b13f1ababf2ad0e0ceec65)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xenbus: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 76ea3cb428c5dc4fd5a415a7651026caf198fb3d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/arm: allow console=hvc0 to be omitted for guests

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This patch registers hvc0 as the preferred console if no console
has been specified explicitly on the kernel command line.

The purpose is to allow platform agnostic kernels and boot images
(such as distro installers) to boot in a Xen/ARM domU without the
need to modify the command line by hand.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f1dddd118c555508ce383b7262f4e6440927bdf4)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm,arm64/xen: move Xen initialization earlier

Currently, Xen is initialized/discovered in an initcall. This doesn't
allow us to support earlyprintk or choosing the preferred console when
running on Xen.

The current function xen_guest_init is now split in 2 parts:
    - xen_early_init: Check if there is a Xen node in the device tree
    and setup domain type
    - xen_guest_init: Retrieve the information from the device node and
    initialize Xen (grant table, shared page...)

The former is called in setup_arch, while the latter is an initcall.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 5882bfef6327093bff63569be19795170ff71e5f)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm/xen: Correctly check if the event channel interrupt is present

The function irq_of_parse_and_map returns 0 when the IRQ is not found.

Futhermore, move the check before notifying the user that we are running on
Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 81e863c3a28e69fd60411bde9f779b0f8ad0212a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 53bc7dc004fecf39e0ba70f2f8d120a1444315d3)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkfront: don't add indirect pages to list when !feature_persistent

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkfront: introduce blkfront_gather_backend_features()

There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839

The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469

As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d50babbe300eedf33ea5b00a12c5df3a05bd96c7)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising

Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:

vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.

state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().

While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).

The end result is that the ring ends up being incorrectly set.

The other drivers in the tree have such checks already in.

Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a9b54bb95176cd27f952cd9647849022c4c998d6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/block: add multi-page ring support

Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
- Rebased to 4.0-rc6.
- Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
- Remove changes to linux io/blkif.h and follow the protocol defined
in io/blkif.h of XEN tree.
- Rebased to 4.1-rc3

Changes in v4:
- Turn to use 'ring-page-order' and 'max-ring-page-order'.
- A few comments from Roger.

Changes in v5:
- Clarify with 4k granularity to comment
- Address more comments from Roger

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 86839c56dee28c315a4c19b7bfee450ccd84cd25)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

driver: xen-blkfront: move talk_to_blkback to a more suitable place

The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front                                Back
=================================    =====================================
XenbusStateInitialising              XenbusStateInitialising
o Query virtual device               o Query backend device identification
   properties.                          data.
o Setup OS device instance.          o Open and validate backend device.
                                      o Publish backend features and
                                        transport parameters.
                                                     |
                                                     |
                                                     V
                                     XenbusStateInitWait

o Query backend features and
  transport parameters.
o Allocate and initialize the
  request ring.

There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.

Changes in v2:
- Re-write the commit message to be more clear.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8ab0144a466320cc37c52e7866b5103c5bbd4e90)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

drivers: xen-blkback: delay pending_req allocation to connect_ring

This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/microcode: Use dummy microcode_ops for non initial domain guest

Orabug: 19053626

Currently non initial domain guest use Intel or AMD specific ops.
This will also slow up the startup on heavily overcommited guests (say 256VCPUs
on 20 PCPU), as there are many read and write to x86 MSR registers which will
trap to xen during microcode update. Finally it will fail and report errors.

A dummy ops could fix that and also make udevd silent (bug18379824)
by augmenting the commit c18a317f6892536851e5852b6aaa4ef42cbc11a2
"xen/microcode: Only load under initial domain." which fell short of its
intended fix.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
(cherry picked from commit 62b84234f23c1020c690d162b7d8250042425e1e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/kernel/microcode_core.c

xen/microcode: Fix compile warning.

We get a bunch of them. Might as well fix it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 3b99e95e07cee4bbe3ba2b511fc2ac38ff7769b9)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

microcode_xen: Add support for AMD family >= 15h

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8b080aa43b95719d8981ba06f357abb6f0ba9d52)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/microcode: check proper return code.

After pulling in this change from your tree, I found the following bug,
when checking an enum value, which should be considered before inclusion:

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 33a4651e09e0d6fb6e9c1293810d8a66b734840a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: add CPU microcode update driver

Xen does all the hard work for us, including choosing the right update
method for this cpu type and actually doing it for all cpus. We just
need to supply it with the firmware blob.

Because Xen updates all CPUs (and the kernel's virtual cpu numbers have
no fixed relationship with the underlying physical cpus), we only bother
doing anything for cpu "0".

[ Impact: allow CPU microcode update in Xen dom0 ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Conflicts:
arch/x86/xen/Kconfig

(cherry picked from commit da3d1c83399886c443cbf9e57455bcc2e5caf28c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/include/asm/microcode.h
arch/x86/kernel/Makefile
arch/x86/kernel/microcode_core.c
arch/x86/xen/Kconfig

x86/xen: Disable APIC PM for Xen PV guests

Xen PV guests support only few APIC registers and writes to
unsupported registers result in WARN_ONs. Most APIC accesses in these
guests have been eliminated; however, lapic_suspend/resume are still
called (on 32-bit kernels).

We can disable APIC power management in xen_smp_prepare_boot_cpu()
(which is called after APIC has been initialized).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pvhvm: Support more than 32 VCPUs when migrating (v3).

When Xen migrates an HVM guest, by default its shared_info can
only hold up to 32 CPUs. As such the hypercall
VCPUOP_register_vcpu_info was introduced which allowed us to
setup per-page areas for VCPUs. This means we can boot PVHVM
guest with more than 32 VCPUs. During migration the per-cpu
structure is allocated freshly by the hypervisor (vcpu_info_mfn
is set to INVALID_MFN) so that the newly migrated guest
can make an VCPUOP_register_vcpu_info hypercall.

Unfortunatly we end up triggering this condition in Xen:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. However with the
git commit XYZ ("hvm: Support more than 32 VCPUS when migrating.")
we can do this. As such first check if VCPUOP_is_up is actually
possible before trying this dance.

As most of this dance code is done already in 'xen_setup_vcpu'
lets make it callable on both PV and HVM. This means moving one
of the checks out to 'xen_setup_runstate_info'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Linux 4.1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull scsi target fixes from Nicholas Bellinger:
"Apologies for the late pull request.

  Here are the outstanding target-pending fixes for v4.1 code.

  The series contains three patches from Sagi + Co that address a few
  iser-target issues that have been uncovered during recent testing at
  Mellanox.

  Patch #1 has a v3.16+ stable tag, and #2-3 have v3.10+ stable tags"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iser-target: Fix possible use-after-free
  iser-target: release stale iser connections
  iser-target: Fix variable-length response error completion

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"A smattering of fixes,

  mgag200:
      don't accept modes that aren't aligned properly as hw can't do it

  i915:
      two regression fixes

  radeon:
      one query to allow userspace fixes
      one oops fixer for older hw with new options enabled"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: don't probe MST on hw we don't support it on
  drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query
  drm/mgag200: Reject non-character-cell-aligned mode widths
  Revert "drm/i915: Don't skip request retirement if the active list is empty"
  drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Michael Turquette:
"Very late clk regression fixes for the ARM-based AT91 platform.

  These went unnoticed by me until recently, hence the late pull
  request"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: at91: fix h32mx prototype inclusion in pmc header
  clk: at91: trivial: typo in peripheral clock description
  clk: at91: fix PERIPHERAL_MAX_SHIFT definition
  clk: at91: pll: fix input range validity check

Merge tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Nothing looks scary, just a few usual HD-audio regression fixes and
  fixup, in addition to a minor Kconfig dependency fix for the old MIPS
  drivers"

* tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix unused label skip_i915
  ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model)
  ALSA: mips: let SND_SGI_O2 select SND_PCM
  ALSA: hda - Fix audio crackles on Dell Latitude E7x40
  ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine

Merge branch 'ccf/atmel-fixes-for-4.1' of https://github.com/bbrezillon/linux-at91 into clk-fixes

clk: at91: fix h32mx prototype inclusion in pmc header

Trivial fix that prevents to compile this pmc clock driver if h32mx clock is
present but smd clock isn't.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock")
Cc: <stable@vger.kernel.org> # 3.18+

clk: at91: trivial: typo in peripheral clock description

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

clk: at91: fix PERIPHERAL_MAX_SHIFT definition

Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the
round_rate and set_rate logic accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: "Wu, Songjun" <Songjun.Wu@atmel.com>

clk: at91: pll: fix input range validity check

The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.

Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Jonas Andersson <jonas@microbit.se>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c documentation fix from Wolfram Sang:
"Here is a small documentation fix for I2C.

  We already had a user who unsuccessfully tried to get the new slave
  framework running with the currently broken example.  So, before this
  happens again, I'd like to have this how-to-use section fixed for 4.1
  already.  So that no more hacking time is wasted"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: slave: fix the example how to instantiate from userspace

revert "cpumask: don't perform while loop in cpumask_next_and()"

Revert commit 534b483a86e6 ("cpumask: don't perform while loop in
cpumask_next_and()").

This was a minor optimization, but it puts a `struct cpumask' on the
stack, which consumes too much stack space.

Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-intel-fixes-2015-06-18' of git://anongit.freedesktop.org/drm-intel into drm-fixes

one fix, one revert
* tag 'drm-intel-fixes-2015-06-18' of git://anongit.freedesktop.org/drm-intel:
Revert "drm/i915: Don't skip request retirement if the active list is empty"
drm/i915: Always reset vma->ggtt_view.pages cache on unbinding