Konrad Rzeszutek Wilk [Fri, 7 Dec 2012 17:33:30 +0000 (12:33 -0500)]
Merge branch 'stable/for-linus-3.8.rebased' into uek2-merge
* stable/for-linus-3.8.rebased:
xen-blkfront: implement safe version of llist_for_each_entry
xen-blkback: implement safe iterator for the list of persistent grants
Roger Pau Monne [Tue, 4 Dec 2012 14:21:53 +0000 (15:21 +0100)]
xen-blkfront: implement safe version of llist_for_each_entry
Implement a safe version of llist_for_each_entry, and use it in
blkif_free. Previously grants where freed while iterating the list,
which lead to dereferences when trying to fetch the next item.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Roger Pau Monne [Tue, 4 Dec 2012 14:21:52 +0000 (15:21 +0100)]
xen-blkback: implement safe iterator for the list of persistent grants
Change foreach_grant iterator to a safe version, that allows freeing
the element while iterating. Also move the free code in
free_persistent_gnts to prevent freeing the element before the rb_next
call.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 5 Dec 2012 20:09:29 +0000 (15:09 -0500)]
Merge tag 'uek2-merge-backport-3.8' of git://ca-git/linux-konrad-public into uek2-merge-400
Backport from v3.8 and from not-upstreamed branch.
We are backporting from v3.8:
- feature-persistent in the block layer.
- Xen PAD driver for Intel machines
- PVonHVM kexec fixes
- Lay work for PVH mode (so more ARM code)
From the not-upstreamed branch:
- oprofile for Xen.
* tag 'uek2-merge-backport-3.8' of git://ca-git/linux-konrad-public: (35 commits)
xen: arm: implement remap interfaces needed for privcmd mappings.
xen: correctly use xen_pfn_t in remap_domain_mfn_range.
xen: arm: enable balloon driver
xen: balloon: allow PVMMU interfaces to be compiled out
xen: privcmd: support autotranslated physmap guests.
xen: add pages parameter to xen_remap_domain_mfn_range
xen/PVonHVM: fix compile warning in init_hvm_pv_info
xen/acpi: Move the xen_running_on_version_or_later function.
xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h
xen/acpi: Fix compile error by missing decleration for xen_domain.
xen/acpi: revert pad config check in xen_check_mwait
xen/acpi: ACPI PAD driver
xen PVonHVM: use E820_Reserved area for shared_info
xen-blkfront: free allocated page
xen-blkback: move free persistent grants code
xen/blkback: persistent-grants fixes
xen/blkback: Persistent grant maps for xen blk drivers
xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool
xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
xen/blkfront: Add WARN to deal with misbehaving backends.
...
Konrad Rzeszutek Wilk [Wed, 5 Dec 2012 19:34:19 +0000 (14:34 -0500)]
Merge branch 'stable/not-upstreamed' into uek2-merge
* stable/not-upstreamed:
xen/oprofile: Expose the oprofile_arch_exit_fnc pointer.
xen/oprofile: Switch from syscore_ops to platform_ops.
xen/oprofile: Fix compile issues when CONFIG_XEN is not defined.
xen/oprofile: The arch_ variants for init/exec weren't being called.
xen/oprofile: Compile fix
xen/oprofile: Patch from Michael Petullo
Konrad Rzeszutek Wilk [Wed, 5 Dec 2012 19:49:27 +0000 (14:49 -0500)]
Merge tag 'uek2-merge-backport-3.7' of git://ca-git/linux-konrad-public into uek2-merge-400
Backport from v3.7
We are back-porting:
- the Xen pcifront auto-enabling of SWIOTLB
- Xen ARM (lays the foundation for the PVH work - as they
share similar code)
- self-ballooning fixes (they are actually v3.6 and earlier material)
- fixes to the frontend drivers
- fixes to do kexec in PVonHVM.
- EHCI/Xen driver (Xen 4.2 support to use DBGP as console)
* tag 'uek2-merge-backport-3.7' of git://ca-git/linux-konrad-public: (109 commits)
Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain.
xen/arm: Fix compile errors when drivers are compiled as modules (export more).
xen/arm: Fix compile errors when drivers are compiled as modules.
xen/generic: Disable fallback build on ARM.
xen/hvm: If we fail to fetch an HVM parameter print out which flag it is.
xen/hypercall: fix hypercall fallback code for very old hypervisors
xen/arm: use the __HVC macro
xen/xenbus: fix overflow check in xenbus_file_write()
xen-kbdfront: handle backend CLOSED without CLOSING
xen-fbfront: handle backend CLOSED without CLOSING
xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REF
x86: remove obsolete comment from asm/xen/hypervisor.h
xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
USB EHCI/Xen: propagate controller reset information to hypervisor
xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
xen: balloon: use correct type for frame_list
xen/x86: don't corrupt %eip when returning from a signal handler
xen: arm: make p2m operations NOPs
xen: balloon: don't include e820.h
...
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/xen/mmu.c
drivers/xen/xenbus/xenbus_xs.c
Konrad Rzeszutek Wilk [Wed, 5 Dec 2012 17:59:50 +0000 (12:59 -0500)]
Merge branch 'stable/for-linus-3.8.rebased' into uek2-merge
* stable/for-linus-3.8.rebased: (29 commits)
xen: arm: implement remap interfaces needed for privcmd mappings.
xen: correctly use xen_pfn_t in remap_domain_mfn_range.
xen: arm: enable balloon driver
xen: balloon: allow PVMMU interfaces to be compiled out
xen: privcmd: support autotranslated physmap guests.
xen: add pages parameter to xen_remap_domain_mfn_range
xen/PVonHVM: fix compile warning in init_hvm_pv_info
xen/acpi: Move the xen_running_on_version_or_later function.
xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h
xen/acpi: Fix compile error by missing decleration for xen_domain.
xen/acpi: revert pad config check in xen_check_mwait
xen/acpi: ACPI PAD driver
xen PVonHVM: use E820_Reserved area for shared_info
xen-blkfront: free allocated page
xen-blkback: move free persistent grants code
xen/blkback: persistent-grants fixes
xen/blkback: Persistent grant maps for xen blk drivers
xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool
xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
xen/blkfront: Add WARN to deal with misbehaving backends.
...
Ian Campbell [Wed, 3 Oct 2012 11:17:50 +0000 (12:17 +0100)]
xen: balloon: allow PVMMU interfaces to be compiled out
The ARM platform has no concept of PVMMU and therefor no
HYPERVISOR_update_va_mapping et al. Allow this code to be compiled out
when not required.
In some similar situations (e.g. P2M) we have defined dummy functions
to avoid this, however I think we can/should draw the line at dummying
out actual hypercalls.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/xen/Kconfig
drivers/xen/balloon.c
Mukesh Rathor [Thu, 18 Oct 2012 00:11:21 +0000 (17:11 -0700)]
xen: privcmd: support autotranslated physmap guests.
PVH and ARM only support the batch interface. To map a foreign page to
a process, the PFN must be allocated and the autotranslated path uses
ballooning for that purpose.
The returned PFN is then mapped to the foreign page.
xen_unmap_domain_mfn_range() is introduced to unmap these pages via the
privcmd close call.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
[v1: Fix up privcmd_close] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
[v2: used for ARM too]
Ian Campbell [Wed, 17 Oct 2012 20:37:49 +0000 (13:37 -0700)]
xen: add pages parameter to xen_remap_domain_mfn_range
Also introduce xen_unmap_domain_mfn_range. These are the parts of
Mukesh's "xen/pvh: Implement MMU changes for PVH" which are also
needed as a baseline for ARM privcmd support.
The original patch was:
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This derivative is also:
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Conflicts:
arch/x86/xen/mmu.c
Konrad Rzeszutek Wilk [Tue, 27 Nov 2012 16:39:40 +0000 (11:39 -0500)]
xen/acpi: Move the xen_running_on_version_or_later function.
As on ia64 builds we get:
include/xen/interface/version.h: In function 'xen_running_on_version_or_later':
include/xen/interface/version.h:76: error: implicit declaration of function 'HYPERVISOR_xen_version'
We can later on make this function exportable if there are
modules using part of it. For right now the only two users are
built-in.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In file included from drivers/xen/features.c:15:0:
include/xen/interface/version.h: In function ‘xen_running_on_version_or_later’:
include/xen/interface/version.h:72:2: error: implicit declaration of function ‘xen_domain’ [-Werror=implicit-function-declaration]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liu, Jinsong [Thu, 8 Nov 2012 05:44:28 +0000 (05:44 +0000)]
xen/acpi: revert pad config check in xen_check_mwait
With Xen acpi pad logic added into kernel, we can now revert xen mwait related
patch df88b2d96e36d9a9e325bfcd12eb45671cbbc937 ("xen/enlighten: Disable
MWAIT_LEAF so that acpi-pad won't be loaded. "). The reason is, when running under
newer Xen platform, Xen pad driver would be early loaded, so native pad driver
would fail to be loaded, and hence no mwait/monitor #UD risk again.
Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose
mwait cpuid capability when running under older Xen platform.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liu, Jinsong [Thu, 8 Nov 2012 05:41:13 +0000 (05:41 +0000)]
xen/acpi: ACPI PAD driver
PAD is acpi Processor Aggregator Device which provides a control point
that enables the platform to perform specific processor configuration
and control that applies to all processors in the platform.
This patch is to implement Xen acpi pad logic. When running under Xen
virt platform, native pad driver would not work. Instead Xen pad driver,
a self-contained and thin logic level, would take over acpi pad logic.
When acpi pad notify OSPM, xen pad logic intercept and parse _PUR object
to get the expected idle cpu number, and then hypercall to hypervisor.
Xen hypervisor would then do the rest work, say, core parking, to idle
specific number of cpus on its own policy.
Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
include/xen/interface/platform.h
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into an
E820 reserved memory area.
The pfn containing the shared_info is located somewhere in RAM. This will
cause trouble if the current kernel is doing a kexec boot into a new
kernel. The new kernel (and its startup code) can not know where the pfn
is, so it can not reserve the page. The hypervisor will continue to update
the pfn, and as a result memory corruption occours in the new kernel.
The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
will usually not allocate areas up to FE700000, so FE700000 is expected
to work also with older toolstacks.
In Xen3 there is no reserved area at a fixed location. If the guest is
started on such old hosts the shared_info page will be placed in RAM. As
a result kexec can not be used.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Roger Pau Monne [Fri, 16 Nov 2012 18:26:47 +0000 (19:26 +0100)]
xen-blkfront: free allocated page
Free the page allocated for the persistent grant.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 07c540a0b5f4674538b57ad85bc9306e44fb45dd)
Roger Pau Monne [Fri, 16 Nov 2012 18:26:48 +0000 (19:26 +0100)]
xen-blkback: move free persistent grants code
Move the code that frees persistent grants from the red-black tree
to a function. This will make it easier for other consumers to move
this to a common place.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4d4f270f1880e52d89a33c944ee86f23d6c85541)
Roger Pau Monne [Fri, 2 Nov 2012 15:43:04 +0000 (16:43 +0100)]
xen/blkback: persistent-grants fixes
This patch contains fixes for persistent grants implementation v2:
* handle == 0 is a valid handle, so initialize grants in blkback
setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported
by Konrad Rzeszutek Wilk.
* new_map is a boolean, use "true" or "false" instead of 1 and 0.
Reported by Konrad Rzeszutek Wilk.
* blkfront announces the persistent-grants feature as
feature-persistent-grants, use feature-persistent instead which is
consistent with blkback and the public Xen headers.
* Add a consistency check in blkfront to make sure we don't try to
access segments that have not been set.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
[v1: The new_map int->bool had already been changed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit cb5bd4d19b46c220b1ac8462a3da01767dd99488)
Roger Pau Monne [Wed, 24 Oct 2012 16:58:45 +0000 (18:58 +0200)]
xen/blkback: Persistent grant maps for xen blk drivers
This patch implements persistent grants for the xen-blk{front,back}
mechanism. The effect of this change is to reduce the number of unmap
operations performed, since they cause a (costly) TLB shootdown. This
allows the I/O performance to scale better when a large number of VMs
are performing I/O.
Previously, the blkfront driver was supplied a bvec[] from the request
queue. This was granted to dom0; dom0 performed the I/O and wrote
directly into the grant-mapped memory and unmapped it; blkfront then
removed foreign access for that grant. The cost of unmapping scales
badly with the number of CPUs in Dom0. An experiment showed that when
Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
(at which point 650,000 IOPS are being performed in total). If more
than 5 guests are used, the performance declines. By 10 guests, only
400,000 IOPS are being performed.
This patch improves performance by only unmapping when the connection
between blkfront and back is broken.
On startup blkfront notifies blkback that it is using persistent
grants, and blkback will do the same. If blkback is not capable of
persistent mapping, blkfront will still use the same grants, since it
is compatible with the previous protocol, and simplifies the code
complexity in blkfront.
To perform a read, in persistent mode, blkfront uses a separate pool
of pages that it maps to dom0. When a request comes in, blkfront
transmutes the request so that blkback will write into one of these
free pages. Blkback keeps note of which grefs it has already
mapped. When a new ring request comes to blkback, it looks to see if
it has already mapped that page. If so, it will not map it again. If
the page hasn't been previously mapped, it is mapped now, and a record
is kept of this mapping. Blkback proceeds as usual. When blkfront is
notified that blkback has completed a request, it memcpy's from the
shared memory, into the bvec supplied. A record that the {gref, page}
tuple is mapped, and not inflight is kept.
Writes are similar, except that the memcpy is peformed from the
supplied bvecs, into the shared pages, before the request is put onto
the ring.
Blkback stores a mapping of grefs=>{page mapped to by gref} in
a red-black tree. As the grefs are not known apriori, and provide no
guarantees on their ordering, we have to perform a search
through this tree to find the page, for every gref we receive. This
operation takes O(log n) time in the worst case. In blkfront grants
are stored using a single linked list.
The maximum number of grants that blkback will persistenly map is
currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to
prevent a malicios guest from attempting a DoS, by supplying fresh
grefs, causing the Dom0 kernel to map excessively. If a guest
is using persistent grants and exceeds the maximum number of grants to
map persistenly the newly passed grefs will be mapped and unmaped.
Using this approach, we can have requests that mix persistent and
non-persistent grants, and we need to handle them correctly.
This allows us to set the maximum number of persistent grants to a
lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although
setting it will lead to unpredictable performance.
In writing this patch, the question arrises as to if the additional
cost of performing memcpys in the guest (to/from the pool of granted
pages) outweigh the gains of not performing TLB shootdowns. The answer
to that question is `no'. There appears to be very little, if any
additional cost to the guest of using persistent grants. There is
perhaps a small saving, from the reduced number of hypercalls
performed in granting, and ending foreign access.
Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Fixed up the misuse of bool as int]
(cherry picked from commit 0a8704a51f386cab7394e38ff1d66eef924d8ab8)
Oliver Chick [Fri, 21 Sep 2012 09:04:18 +0000 (10:04 +0100)]
xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool
Changing the type of bdev parameters to be unsigned int :1, rather than bool.
This is more consistent with the types of other features in the block drivers.
Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit af4012ab523e8c81d078ca5f6da4ce95278583f0)
Konrad Rzeszutek Wilk [Fri, 25 May 2012 21:34:51 +0000 (17:34 -0400)]
xen/blkfront: Add WARN to deal with misbehaving backends.
Part of the ring structure is the 'id' field which is under
control of the frontend. The frontend stamps it with "some"
value (this some in this implementation being a value less
than BLK_RING_SIZE), and when it gets a response expects
said value to be in the response structure. We have a check
for the id field when spolling new requests but not when
de-spolling responses.
We also add an extra check in add_id_to_freelist to make
sure that the 'struct request' was not NULL - as we cannot
pass a NULL to __blk_end_request_all, otherwise that crashes
(and all the operations that the response is dealing with
end up with __blk_end_request_all).
Lastly we also print the name of the operation that failed.
[v1: s/BUG/WARN/ suggested by Stefano]
[v2: Add extra check in add_id_to_freelist]
[v3: Redid op_name per Jan's suggestion]
[v4: add const * and add WARN on failure returns] Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 6878c32e5cc0e40980abe51d1f02fb453e27493e)
Stephen Rothwell [Wed, 5 Oct 2011 06:25:28 +0000 (17:25 +1100)]
llist: Add back llist_add_batch() and llist_del_first() prototypes
Commit 1230db8e1543 ("llist: Make some llist functions inline")
has deleted the definitions, causing problems for (not upstream yet)
code that tries to make use of them.
Peter Zijlstra [Mon, 12 Sep 2011 13:50:49 +0000 (15:50 +0200)]
llist: Remove cpu_relax() usage in cmpxchg loops
Initial benchmarks show they're a net loss:
$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
$ echo 4096 32000 64 128 > /proc/sys/kernel/sem
$ ./sembench -t 2048 -w 1900 -o 0
Pre:
run time 30 seconds 778936 worker burns per second
run time 30 seconds 912190 worker burns per second
run time 30 seconds 817506 worker burns per second
run time 30 seconds 830870 worker burns per second
run time 30 seconds 845056 worker burns per second
Post:
run time 30 seconds 905920 worker burns per second
run time 30 seconds 849046 worker burns per second
run time 30 seconds 886286 worker burns per second
run time 30 seconds 822320 worker burns per second
run time 30 seconds 900283 worker burns per second
So about 4% faster. (!)
cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:
- allows SMT siblings to have a go;
- reduces pressure on the CPU interconnect.
However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.
A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.
Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).
llist: Return whether list is empty before adding in llist_add()
Extend the llist_add*() functions to return a success indicator, this
allows us in the scheduler code to send an IPI if the queue was empty.
( There's no effect on existing users, because the list_add_xxx() functions
are inline, thus this will be optimized out by the compiler if not used
by callers. )
If in llist_add()/etc. functions the first cmpxchg() call succeeds, it is
not necessary to use cpu_relax() before the cmpxchg(). So cpu_relax() in
a busy loop involving cmpxchg() should go after cmpxchg() instead of before
that.
This patch fixes this for all involved llist functions.
Because llist code will be used in performance critical scheduler
code path, make llist_add() and llist_del_all() inline to avoid
function calling overhead and related 'glue' overhead.
Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.
Because this is a single list, so the tail can not be accessed in O(1).
If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers. They
can work simultaneously without lock. But llist_del_first can not be
used here. Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
llist_add) sequence in another consumer may violate that.
If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.
Where "-" stands for no lock is needed, while "L" stands for lock is
needed.
The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc. But the list entries
can not be traversed safely before deleted from the list. The order
of deleted entries is from the newest to the oldest added one. If you
want to traverse from the oldest to the newest, you must reverse the
order by yourself before traversing.
The basic atomic operation of this list is cmpxchg on long. On
architectures that don't have NMI-safe cmpxchg implementation, the
list can NOT be used in NMI handler. So code uses the list in NMI
handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit f49f23abf3dd786ddcac1c1e7db3c2013b07413f)
And also documents setup.c and why we want to do it that way, which
is that we tried to make the the memblock_reserve more selective so
that it would be clear what region is reserved. Sadly we ran
in the problem wherein on a 64-bit hypervisor with a 32-bit
initial domain, the pt_base has the cr3 value which is not
neccessarily where the pagetable starts! As Jan put it: "
Actually, the adjustment turns out to be correct: The page
tables for a 32-on-64 dom0 get allocated in the order "first L1",
"first L2", "first L3", so the offset to the page table base is
indeed 2. When reading xen/include/public/xen.h's comment
very strictly, this is not a violation (since there nothing is said
that the first thing in the page table space is pointed to by
pt_base; I admit that this seems to be implied though, namely
do I think that it is implied that the page table space is the
range [pt_base, pt_base + nt_pt_frames), whereas that
range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
which - without a priori knowledge - the kernel would have
difficulty to figure out)." - so lets just fall back to the
easy way and reserve the whole region.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/xen/enlighten.c
[s/memblock_reserve/memblock_x86_reserve_range]
Konrad Rzeszutek Wilk [Tue, 21 Aug 2012 18:31:24 +0000 (14:31 -0400)]
xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain.
If a 64-bit hypervisor is booted with a 32-bit initial domain,
the hypervisor deals with the initial domain as "compat" and
does some extra adjustments (like pagetables are 4 bytes instead
of 8). It also adjusts the xen_start_info->pt_base incorrectly.
To deal with this, we keep keep track of the highest physical
address we have reserved via memblock_reserve. If that address
does not overlap with pt_base, we have a gap which we reserve.
Konrad Rzeszutek Wilk [Tue, 4 Dec 2012 21:31:31 +0000 (16:31 -0500)]
Merge branch 'stable/for-linus-3.7.rebased' into uek2-merge
* stable/for-linus-3.7.rebased: (92 commits)
xen/arm: Fix compile errors when drivers are compiled as modules (export more).
xen/arm: Fix compile errors when drivers are compiled as modules.
xen/generic: Disable fallback build on ARM.
xen/hvm: If we fail to fetch an HVM parameter print out which flag it is.
xen/hypercall: fix hypercall fallback code for very old hypervisors
xen/arm: use the __HVC macro
xen/xenbus: fix overflow check in xenbus_file_write()
xen-kbdfront: handle backend CLOSED without CLOSING
xen-fbfront: handle backend CLOSED without CLOSING
xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REF
x86: remove obsolete comment from asm/xen/hypervisor.h
xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
USB EHCI/Xen: propagate controller reset information to hypervisor
xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
xen: balloon: use correct type for frame_list
xen/x86: don't corrupt %eip when returning from a signal handler
xen: arm: make p2m operations NOPs
xen: balloon: don't include e820.h
xen: events: pirq_check_eoi_map is X86 specific
xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
...
Stefano Stabellini [Thu, 8 Nov 2012 15:58:55 +0000 (15:58 +0000)]
xen/arm: Fix compile errors when drivers are compiled as modules (export more).
The commit 911dec0db4de6ccc544178a8ddaf9cec0a11d533
"xen/arm: Fix compile errors when drivers are compiled as modules." exports
the neccessary functions. But to guard ourselves against out-of-tree modules
and future drivers hitting this, lets export all of the relevant
hypercalls.
Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ab277bbf662ef17ffb7fd8dd7a462a34e326e492)
Konrad Rzeszutek Wilk [Tue, 6 Nov 2012 20:49:27 +0000 (15:49 -0500)]
xen/generic: Disable fallback build on ARM.
As there is no need for it (the fallback code is for older
hypervisors and they only run under x86), and also b/c
we get:
drivers/xen/fallback.c: In function 'xen_event_channel_op_compat':
drivers/xen/fallback.c:10:19: error: storage size of 'op' isn't known
drivers/xen/fallback.c:15:2: error: implicit declaration of function '_hypercall1' [-Werror=implicit-function-declaration]
drivers/xen/fallback.c:15:19: error: expected expression before 'int'
drivers/xen/fallback.c:18:7: error: 'EVTCHNOP_close' undeclared (first use in this function)
drivers/xen/fallback.c:18:7: note: each undeclared identifier is reported only once for each function it appears in
.. and more
[v1: Moved the enablement to be covered by CONFIG_X86 per Ian's suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 6bf926ddd44ddc67edbeb28d4069f207f2c6e07e)
Konrad Rzeszutek Wilk [Fri, 19 Oct 2012 19:01:46 +0000 (15:01 -0400)]
xen/hvm: If we fail to fetch an HVM parameter print out which flag it is.
Makes it easier to troubleshoot in the field.
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[v1: Use macro per Ian's suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 6d877e6b85691e0b2b22e90aeb9b86c3dafcfc6b)
Jan Beulich [Fri, 19 Oct 2012 19:25:37 +0000 (15:25 -0400)]
xen/hypercall: fix hypercall fallback code for very old hypervisors
While copying the argument structures in HYPERVISOR_event_channel_op()
and HYPERVISOR_physdev_op() into the local variable is sufficiently
safe even if the actual structure is smaller than the container one,
copying back eventual output values the same way isn't: This may
collide with on-stack variables (particularly "rc") which may change
between the first and second memcpy() (i.e. the second memcpy() could
discard that change).
Move the fallback code into out-of-line functions, and handle all of
the operations known by this old a hypervisor individually: Some don't
require copying back anything at all, and for the rest use the
individual argument structures' sizes rather than the container's.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().]
[v3: Fix compile errors when modules use said hypercalls]
[v4: Add xen_ prefix to the HYPERCALL_..]
[v5: Alter the name and only EXPORT_SYMBOL_GPL one of them] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit cf47a83fb06e42ae1b572ed68326068c7feaceae)
[header file changes s/export/module]
Jan Beulich [Wed, 17 Oct 2012 17:14:09 +0000 (13:14 -0400)]
xen/xenbus: fix overflow check in xenbus_file_write()
Acked-by: Ian Campbell <ian.campbell@citrix.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Rebased on upstream] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 1bcaba51eba549748917f7d6eb41900ff9ee3d5f)
David Vrabel [Thu, 18 Oct 2012 10:03:38 +0000 (11:03 +0100)]
xen-kbdfront: handle backend CLOSED without CLOSING
Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED. If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.
So, treat an unexpected backend CLOSED state the same as CLOSING.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 2ebb939ab9c6a2484866c5eae4184c83c2b21af8)
David Vrabel [Thu, 18 Oct 2012 10:03:37 +0000 (11:03 +0100)]
xen-fbfront: handle backend CLOSED without CLOSING
Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED. If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.
So, treat an unexpected backend CLOSED state the same as CLOSING.
Acked-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 01bc825f6311ba2878ae353418eee575d3051594)
Olaf Hering [Thu, 25 Oct 2012 09:00:24 +0000 (11:00 +0200)]
x86: remove obsolete comment from asm/xen/hypervisor.h
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit b6514633bdc6a511f7c44b3ecb86d6071374239d)
Jan Beulich [Tue, 18 Sep 2012 11:23:02 +0000 (12:23 +0100)]
USB EHCI/Xen: propagate controller reset information to hypervisor
Just like for the in-tree early console debug port driver, the
hypervisor - when using a debug port based console - also needs to be
told about controller resets, so it can suppress using and then
re-initialize the debug port accordingly.
Other than the in-tree driver, the hypervisor driver actually cares
about doing this only for the device where the debug is port actually
in use, i.e. it needs to be told the coordinates of the device being
reset (quite obviously, leveraging the addition done for that would
likely benefit the in-tree driver too).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9fa5780beea1274d498a224822397100022da7d4)
Conflicts:
drivers/xen/Makefile
include/xen/interface/physdev.h
[Merges per what upstream has f1c6872e]
Ian Campbell [Thu, 18 Oct 2012 07:26:17 +0000 (08:26 +0100)]
xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 3ab0b83bf6a1e834f4b884150d8012990c75d25d)
Ian Campbell [Wed, 17 Oct 2012 08:39:16 +0000 (09:39 +0100)]
xen: balloon: use correct type for frame_list
This is now a xen_pfn_t.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 965c0aaafe3e75d4e65cd4ec862915869bde3abd)
David Vrabel [Fri, 19 Oct 2012 16:29:07 +0000 (17:29 +0100)]
xen/x86: don't corrupt %eip when returning from a signal handler
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal. The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).
The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).
If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR). This may cause the application to crash or have incorrect
behaviour.
handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax. For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.
xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.
Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().
There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a349e23d1cf746f8bdc603dcc61fae9ee4a695f6)
Ian Campbell [Wed, 17 Oct 2012 08:39:17 +0000 (09:39 +0100)]
xen: arm: make p2m operations NOPs
This makes common code less ifdef-y and is consistent with PVHVM on
x86.
Also note that phys_to_machine_mapping_valid should take a pfn
argument and make it do so.
Add __set_phys_to_machine, make set_phys_to_machine a simple wrapper
(on systems with non-nop implementations the outer one can allocate
new p2m pages).
Make __set_phys_to_machine check for identity mapping or invalid only.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ee7b5958e2494619ee3ff52de68580feed6906a2)
Ian Campbell [Wed, 17 Oct 2012 08:39:13 +0000 (09:39 +0100)]
xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit e84fe8a138fb1aa7aec8ef2fafb312ea5eb0f3dd)
Ian Campbell [Wed, 17 Oct 2012 08:39:09 +0000 (09:39 +0100)]
xen: sysfs: include err.h for PTR_ERR etc
Fixes build error on ARM:
drivers/xen/sys-hypervisor.c: In function 'uuid_show_fallback':
drivers/xen/sys-hypervisor.c:127:2: error: implicit declaration of function 'IS_ERR' [-Werror=implicit-function-declaration]
drivers/xen/sys-hypervisor.c:128:3: error: implicit declaration of function 'PTR_ERR' [-Werror=implicit-function-declaration]
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 609b0b8c4608d59be261dde748f1ff1eccd748ba)
Ian Campbell [Wed, 17 Oct 2012 08:39:08 +0000 (09:39 +0100)]
xen: xenbus: quirk uses x86 specific cpuid
This breaks on ARM. This quirk is not necessary on ARM because no
hypervisors of that vintage exist for that architecture (port is too
new).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
[v1: Moved the ifdef inside the function per Jan Beulich suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7644bdac7f4d3e5910f4d3f86f1f2b098d1212ca)
Konrad Rzeszutek Wilk [Wed, 10 Oct 2012 17:23:36 +0000 (13:23 -0400)]
xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches.
The commit 254d1a3f02ebc10ccc6e4903394d8d3f484f715e, titled
"xen/pv-on-hvm kexec: shutdown watches from old kernel" assumes that the
XenBus backend can deal with reading of values from:
"control/platform-feature-xs_reset_watches":
... a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail."
Sadly this is not true when using a Xen 3.4 hypervisor and booting a PVHVM
guest. We end up hanging at:
Reverting the commit or using the attached patch fixes the issue. This fix
checks whether the hypervisor is older than 4.0 and if so does not try to
perform the read.
Fixes-Oracle-Bug: 14708233 CC: stable@vger.kernel.org Acked-by: Olaf Hering <olaf@aepfle.de>
[v2: Added a comment in the source code] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit cb6b6df111e46b9d0f79eb971575fd50555f43f4)
Konrad Rzeszutek Wilk [Wed, 10 Oct 2012 17:30:47 +0000 (13:30 -0400)]
xen/bootup: allow read_tscp call for Xen PV guests.
The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.
Olaf Hering [Mon, 1 Oct 2012 19:18:01 +0000 (21:18 +0200)]
xen pv-on-hvm: add pfn_is_ram helper for kdump
Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
kernel. See commit message of 997c136f518c ("fs/proc/vmcore.c: add hook
to read_from_oldmem() to check for non-ram pages") for details.
It makes use of a new hvmop HVMOP_get_mem_type which was introduced in
xen 4.2 (23298:26413986e6e0) and backported to 4.1.1.
The new function is currently only enabled for reading /proc/vmcore.
Later it will be used also for the kexec kernel. Since that requires
more changes in the generic kernel make it static for the time being.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 34b6f01a79bd65fbd06511d2cb7b28e33a506246)
David Vrabel [Fri, 21 Sep 2012 16:04:24 +0000 (17:04 +0100)]
xen/hvc: handle backend CLOSED without CLOSING
Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED. If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.
So, treat an unexpected backend CLOSED state the same as CLOSING.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 9b6934a3b449266850149b717597408354039e95)
Stefano Stabellini [Wed, 3 Oct 2012 17:08:52 +0000 (18:08 +0100)]
xen/xen_initial_domain: check that xen_start_info is initialized
Since commit commit 4c071ee5268f7234c3d084b6093bebccc28cdcba ("arm:
initial Xen support") PV on HVM guests can be xen_initial_domain.
However PV on HVM guests might have an unitialized xen_start_info, so
check before accessing its fields.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4ed5978bdd99114db7773cb3d78f2998bd17f694)
Stefano Stabellini [Tue, 2 Oct 2012 14:57:57 +0000 (15:57 +0100)]
xen: mark xen_init_IRQ __init
xen_init_IRQ should be marked __init because it calls other functions
marked __init and is always called by functions marked __init (on both
x86 and arm).
Also remove the unused EXPORT_SYMBOL_GPL(xen_init_IRQ).
Both changes were introduced by "xen/arm: receive Xen events on ARM".
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 2e3d88602814e5fd5b88d5c73be3305060c473b6)
Initialize the grant table mapping at the address specified at index 0
in the DT under the /xen node.
After the grant table is initialized, call xenbus_probe (if not dom0).
Changes in v2:
- introduce GRANT_TABLE_PHYSADDR;
- remove unneeded initialization of boot_max_nr_grant_frames.
Jan Beulich [Fri, 3 Feb 2012 15:09:04 +0000 (15:09 +0000)]
xen/tmem: cleanup
Use 'bool' for boolean variables. Do proper section placement.
Eliminate an unnecessary export.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8e6f7c23c135b13f3adf90906fac7edd325bb9af)
Currently, the memory target in the Xen selfballooning driver is mainly
driven by the value of "Committed_AS". However, there are cases in
which it is desirable to assign additional memory to be available for
the kernel, e.g. for local caches (which are not covered by cleancache),
e.g. dcache and inode caches.
This adds an additional tunable in the selfballooning driver (accessible
via sysfs) which allows the user to specify an additional constant
amount of memory to be reserved by the selfballoning driver for the
local domain.
Signed-off-by: Jana Saout <jana@saout.de> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d79d5959a023fd637e90ed1ff6547ff09d19396b)
Conflicts:
drivers/xen/xen-selfballoon.c
[UEK2 hasn't done the 'convert sysdev_class to a regular subsystem'
patchset]
Jan Beulich [Wed, 14 Mar 2012 16:34:19 +0000 (12:34 -0400)]
xen: constify all instances of "struct attribute_group"
The functions these get passed to have been taking pointers to const
since at least 2.6.16.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ead1d01425bbd28c4354b539caa4075bde00ed72)
Dan Magenheimer [Tue, 27 Sep 2011 14:47:58 +0000 (08:47 -0600)]
xen: Fix selfballooning and ensure it doesn't go too far
The balloon driver's "current_pages" is very different from
totalram_pages. Self-ballooning needs to be driven by
the latter. Also, Committed_AS doesn't account for pages
used by the kernel so:
1) Add totalreserve_pages to Committed_AS for the normal target.
2) Enforce a floor for when there are little or no user-space threads
using memory (e.g. single-user mode) to avoid OOMs. The floor
function includes a "min_usable_mb" tuneable in case we discover
later that the floor function is still too aggressive in some
workloads, though likely it will not be needed.
Changes since version 4:
- change floor calculation so that it is not as aggressive; this version
uses a piecewise linear function similar to minimum_target in the 2.6.18
balloon driver, but modified to add to totalreserve_pages instead of
subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels
because the kernel has expanded over time
- change safety_margin to min_usable_mb and comment on its use
- since committed_as does NOT include kernel space (and other reserved
pages), totalreserve_pages is now added to committed_as. The result is
less aggressive self-ballooning, but theoretically more appropriate.
Changes since version 3:
- missing include causes compile problem when CONFIG_FRONTSWAP is disabled
- add comments after includes
Changes since version 2:
- missing include causes compile problem only on 32-bit
Changes since version 1:
- tuneable safety margin added
[v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs]
[v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled]
[v3: guru.anbalagane@oracle.com: fix 32-bit compile]
[v2: konrad.wilk@oracle.com: make safety margin tuneable] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v1: Altered description and added an extra include] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 38a1ed4f039db32b418007ac365076cf53647ebd)
Randy Dunlap [Tue, 16 Aug 2011 04:41:43 +0000 (21:41 -0700)]
xen: self-balloon needs module.h
Fix build errors (found when CONFIG_SYSFS is not enabled):
drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration
drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant
drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4fec0e0bde09095b6349dc6206dbf19cebcd0a7e)
With a specific enough .config file compile errors show
for missing workqueue declarations.
Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 0642d2edc858a1f08716bb32e1ab890db8dac246)
Dan Magenheimer [Fri, 8 Jul 2011 18:26:21 +0000 (12:26 -0600)]
xen: tmem: self-ballooning and frontswap-selfshrinking
This patch introduces two in-kernel drivers for Xen transcendent memory
("tmem") functionality that complement cleancache and frontswap. Both
use control theory to dynamically adjust and optimize memory utilization.
Selfballooning controls the in-kernel Xen balloon driver, targeting a goal
value (vm_committed_as), thus pushing less frequently used clean
page cache pages (through the cleancache code) into Xen tmem where
Xen can balance needs across all VMs residing on the physical machine.
Frontswap-selfshrinking controls the number of pages in frontswap,
driving it towards zero (effectively doing a partial swapoff) when
in-kernel memory pressure subsides, freeing up RAM for other VMs.
More detail is provided in the header comment of xen-selfballooning.c.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v8: konrad.wilk@oracle.com: set default enablement depending on frontswap]
[v7: konrad.wilk@oracle.com: fix capitalization and punctuation in comments]
[v6: fix frontswap-selfshrinking initialization]
[v6: konrad.wilk@oracle.com: fix init pr_infos; add comments about swap]
[v5: konrad.wilk@oracle.com: add NULL to attr list; move inits up to decls]
[v4: dkiper@net-space.pl: use strict_strtoul plus a few syntactic nits]
[v3: konrad.wilk@oracle.com: fix potential divides-by-zero]
[v3: konrad.wilk@oracle.com: add many more comments, fix nits]
[v2: rebased to linux-3.0-rc1]
[v2: Ian.Campbell@citrix.com: reorganize as new file (xen-selfballoon.c)]
[v2: dkiper@net-space.pl: proper access to vm_committed_as]
[v2: dkiper@net-space.pl: accounting fixes] Cc: Jan Beulich <JBeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: <xen-devel@lists.xensource.com>
(cherry picked from commit a50777c791031d7345ce95785ea6220f67339d90)
Ian Campbell [Wed, 17 Oct 2012 08:39:10 +0000 (09:39 +0100)]
xen: sysfs: fix build warning.
Define PRI macros for xen_ulong_t and xen_pfn_t and use to fix:
drivers/xen/sys-hypervisor.c:288:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'xen_ulong_t' [-Wformat]
Ideally this would use PRIx64 on ARM but these (or equivalent) don't
seem to be available in the kernel.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
All the original Xen headers have xen_ulong_t as unsigned long type, however
when they have been imported in Linux, xen_ulong_t has been replaced with
unsigned long. That might work for x86 and ia64 but it does not for arm.
Bring back xen_ulong_t and let each architecture define xen_ulong_t as they
see fit.
Also explicitly size pointers (__DEFINE_GUEST_HANDLE) to 64 bit.
Changes in v3:
- remove the incorrect changes to multicall_entry;
- remove the change to apic_physbase.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Wed, 22 Aug 2012 16:20:14 +0000 (17:20 +0100)]
xen: Introduce xen_pfn_t for pfn and mfn types
All the original Xen headers have xen_pfn_t as mfn and pfn type, however
when they have been imported in Linux, xen_pfn_t has been replaced with
unsigned long. That might work for x86 and ia64 but it does not for arm.
Bring back xen_pfn_t and let each architecture define xen_pfn_t as they
see fit.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add a doc to describe the Xen ARM device tree bindings
Changes in v5:
- add a comment about the size of the grant table memory region;
- add a comment about the required presence of a GIC node;
- specify that the described properties are part of a top-level
"hypervisor" node;
- specify #address-cells and #size-cells for the example.
Changes in v4:
- "xen,xen" should be last as it is less specific;
- update reg property using 2 address-cells and 2 size-cells.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Rob Herring <rob.herring@calxeda.com> CC: devicetree-discuss@lists.ozlabs.org CC: David Vrabel <david.vrabel@citrix.com> CC: Rob Herring <robherring2@gmail.com> CC: Dave Martin <dave.martin@linaro.org>
(cherry picked from commit c43cdfbc4cebdf1a7992432615bf5155b51b8cc0)
Stefano Stabellini [Wed, 8 Aug 2012 16:34:01 +0000 (16:34 +0000)]
xen/arm: sync_bitops
sync_bitops functions are equivalent to the SMP implementation of the
original functions, independently from CONFIG_SMP being defined.
We need them because _set_bit etc are not SMP safe if !CONFIG_SMP. But
under Xen you might be communicating with a completely external entity
who might be on another CPU (e.g. two uniprocessor guests communicating
via event channels and grant tables). So we need a variant of the bit
ops which are SMP safe even on a UP kernel.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit e54d2f61528165bbe0e5f628b111eab3be31c3b5)
Use r12 to pass the hypercall number to the hypervisor.
We need a register to pass the hypercall number because we might not
know it at compile time and HVC only takes an immediate argument.
Among the available registers r12 seems to be the best choice because it
is defined as "intra-procedure call scratch register".
Use the ISS to pass an hypervisor specific tag.
Changes in v2:
- define an HYPERCALL macro for 5 arguments hypercall wrappers, even if
at the moment is unused;
- use ldm instead of pop;
- fix up comments.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit aa2466d21bd9e872690693d56feb946781443f28)
- Basic hypervisor.h and interface.h definitions.
- Skeleton enlighten.c, set xen_start_info to an empty struct.
- Make xen_initial_domain dependent on the SIF_PRIVILIGED_BIT.
The new code only compiles when CONFIG_XEN is set, that is going to be
added to arch/arm/Kconfig in patch #11 "xen/arm: introduce CONFIG_XEN on
ARM".
Changes in v3:
- improve comments.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4c071ee5268f7234c3d084b6093bebccc28cdcba)