Jan Schmidt [Mon, 13 Jun 2011 17:52:59 +0000 (19:52 +0200)]
btrfs: added helper functions to iterate backrefs
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.
Jeff Mahoney [Mon, 15 Aug 2011 17:27:21 +0000 (17:27 +0000)]
btrfs: btrfs_permission's RO check shouldn't apply to device nodes
This patch tightens the read-only access checks in btrfs_permission to
match the constraints in inode_permission. Currently, even though the
device node itself will be unmodified, read-write access to device nodes
is denied to when the device node resides on a read-only subvolume or a
is a file that has been marked read-only by the btrfs conversion utility.
With this patch applied, the check only affects regular files,
directories, and symlinks. It also restructures the code a bit so that
we don't duplicate the MAY_WRITE check for both tests.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit cb6db4e57632ba8589cc2f9fe1d0aa9116b87ab8 with conflicts)
Dan Magenheimer [Fri, 21 Oct 2011 06:35:53 +0000 (23:35 -0700)]
ocfs2: Fix cleancache initialization call to correctly pass uuid
As reported by Steven Whitehouse in https://lkml.org/lkml/2011/5/27/221
the ocfs2 volume UUID is incorrectly passed to cleancache.
As a result, shared-ephemeral tmem pools will not actually
be created; instead they will be private (unshared) which
misses out on a major benefit of tmem.
Reported-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 20:31:58 +0000 (16:31 -0400)]
Merge branch 'stable/xen-block.rebase' into uek2-merge
* stable/xen-block.rebase:
xen/blkback: Fix two races in the handling of barrier requests.
xen/blkback: Check for proper operation.
xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
xen/blkback: Report VBD_WSECT (wr_sect) properly.
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
xen-blkfront: plug device number leak in xlblk_init() error path
xen-blkfront: If no barrier or flush is supported, use invalid operation.
xen-blkback: use kzalloc() in favor of kmalloc()+memset()
xen-blkback: fixed indentation and comments
xen-blkfront: fix a deadlock while handling discard response
xen-blkfront: Handle discard requests.
xen-blkback: Implement discard requests ('feature-discard')
xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
xen/blkback: Add module alias for autoloading
xen/blkback: Don't let in-flight requests defer pending ones.
Konrad Rzeszutek Wilk [Fri, 14 Oct 2011 16:13:05 +0000 (12:13 -0400)]
xen/blkback: Check for proper operation.
The patch titled: "xen/blkback: Fix the inhibition to map pages
when discarding sector ranges." had the right idea except that
it used the wrong comparison operator. It had == instead of !=.
This fixes the bug where all (except discard) operations would
have been ignored.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 10 Oct 2011 04:47:49 +0000 (00:47 -0400)]
xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
The 'operation' parameters are the ones provided to the bio layer while
the req->operation are the ones passed in between the backend and
frontend. We used the wrong 'operation' value to squash the
call to map pages when processing the discard operation resulting
in an hypercall that did nothing. Lets guard against going in the
mapping function by checking for the proper operation type.
CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 10 Oct 2011 16:33:21 +0000 (12:33 -0400)]
xen/blkback: Report VBD_WSECT (wr_sect) properly.
We did not increment the amount of sectors written to disk
b/c we tested for the == WRITE which is incorrect - as the
operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch
fixes it by doing a & WRITE check.
CC: stable@kernel.org Reported-by: Andy Burns <xen.lists@burns.me.uk> Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 10 Oct 2011 04:42:22 +0000 (00:42 -0400)]
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
We emulate the barrier requests by draining the outstanding bio's
and then sending the WRITE_FLUSH command. To drain the I/Os
we use the refcnt that is used during disconnect to wait for all
the I/Os before disconnecting from the frontend. We latch on its
value and if it reaches either the threshold for disconnect or when
there are no more outstanding I/Os, then we have drained all I/Os.
Suggested-by: Christopher Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen-blkfront: If no barrier or flush is supported, use invalid operation.
Guard against issuing BLKIF_OP_WRITE_BARRIER or BLKIF_OP_FLUSH_CACHE
by checking whether we successfully negotiated with the backend.
The negotiation with the backend also sets the q->flush_flags which
fortunately for us is also used when submitting an bio to us. If
we don't support barriers or flushes it would be set to zero so
we should never end up having to deal with REQ_FLUSH | REQ_FUA.
However, other third party implementations of __make_request that
might be stacked on top of us might not be so smart, so lets fix this up.
Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 16 Sep 2011 07:38:09 +0000 (08:38 +0100)]
xen-blkback: use kzalloc() in favor of kmalloc()+memset()
This fixes the problem of three of those four memset()-s having
improper size arguments passed: Sizeof a pointer-typed expression
returns the size of the pointer, not that of the pointed to data.
It also reverts using kmalloc() instead of kzalloc() for the allocation
of the pending grant handles array, as that array gets fully
initialized in a subsequent loop.
Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Joe Jin [Mon, 15 Aug 2011 04:57:07 +0000 (12:57 +0800)]
xen-blkback: fixed indentation and comments
This patch fixes belows:
1. Fix code style issue.
2. Fix incorrect functions name in comments.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Li Dongyang [Wed, 14 Sep 2011 06:02:40 +0000 (14:02 +0800)]
xen-blkfront: fix a deadlock while handling discard response
When we get -EOPNOTSUPP response for a discard request, we will clear
the discard flag on the request queue so we won't attempt to send discard
requests to backend again, and this should be protected under rq->queue_lock.
However, when we setup the request queue, we pass blkif_io_lock to
blk_init_queue so rq->queue_lock is blkif_io_lock indeed, and this lock
is already taken when we are in blkif_interrpt, so remove the
spin_lock/spin_unlock when we clear the discard flag or we will end up
with deadlock here
Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Updated description a bit and removed comment from source] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Li Dongyang [Thu, 1 Sep 2011 10:39:09 +0000 (18:39 +0800)]
xen-blkfront: Handle discard requests.
If the backend advertises 'feature-discard', then interrogate
the backend for alignment and granularity. Setup the request
queue with the appropiate values and send the discard operation
as required.
Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Amended commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend
and used as appropiately by the backend. We also advertise
certain granulity parameters to the frontend so it can plug them in.
If the backend is a realy device - we just end up using
'blkdev_issue_discard' while for loopback devices - we just punch
a hole in the image file.
Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Fixed up pr_debug and commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Li Dongyang [Thu, 1 Sep 2011 10:39:08 +0000 (18:39 +0800)]
xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
Now we use BLKIF_OP_DISCARD and add blkif_request_discard to blkif_request union,
the patch is taken from Owen Smith and Konrad, Thanks
Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel Stodden [Sat, 28 May 2011 20:21:10 +0000 (13:21 -0700)]
xen/blkback: Don't let in-flight requests defer pending ones.
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[v1: Rebased and fixed compile problems] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 20:12:04 +0000 (16:12 -0400)]
Merge branch 'stable/e820-3.2.rebased' into uek2-merge
* stable/e820-3.2.rebased:
xen: release all pages within 1-1 p2m mappings
xen: allow extra memory to be in multiple regions
xen: allow balloon driver to use more than one memory region
xen/balloon: simplify test for the end of usable RAM
xen/balloon: account for pages released during memory setup
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
Revert "xen/e820: if there is no dom0_mem=, don't tweak extra_pages."
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
xen: use maximum reservation to limit amount of usable RAM
xen: Fix misleading WARN message at xen_release_chunk
xen: Fix printk() format in xen/setup.c
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 20:05:45 +0000 (16:05 -0400)]
Merge branch 'stable/mmu.fixes.rebased' into uek2-merge
* stable/mmu.fixes.rebased:
xen/gntdev: Fix sleep-inside-spinlock
xen: modify kernel mappings corresponding to granted pages
xen: add an "highmem" parameter to alloc_xenballooned_pages
xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
xen/p2m: Make debug/xen/mmu/p2m visible again.
Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 19:54:47 +0000 (15:54 -0400)]
Merge branch 'stable/drivers-3.2.rebased' into uek2-merge
* stable/drivers-3.2.rebased:
xen: use static initializers in xen-balloon.c
xenbus: don't rely on xen_initial_domain to detect local xenstore
xenbus: Fix loopback event channel assuming domain 0
xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 19:44:10 +0000 (15:44 -0400)]
Merge branch 'stable/pci.fixes-3.2' of git://oss.oracle.com/git/kwilk/xen into uek2-merge
* 'stable/pci.fixes-3.2' of git://oss.oracle.com/git/kwilk/xen:
xen/pci: support multi-segment systems
xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
xen/pci: make bus notifier handler return sane values
xen-swiotlb: fix printk and panic args
xen-swiotlb: Fix wrong panic.
xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
xen-pcifront: Update warning comment to use 'e820_host' option.
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 19:43:13 +0000 (15:43 -0400)]
Merge branch 'stable/bug.fixes-3.2.rebased' of git://oss.oracle.com/git/kwilk/xen into uek2-merge
* 'stable/bug.fixes-3.2.rebased' of git://oss.oracle.com/git/kwilk/xen:
xen/irq: If we fail during msi_capability_init return proper error code.
xen: remove XEN_PLATFORM_PCI config option
xen: XEN_PVHVM depends on PCI
xen/p2m/debugfs: Make type_name more obvious.
xen/p2m/debugfs: Fix potential pointer exception.
xen/enlighten: Fix compile warnings and set cx to known value.
xen/xenbus: Remove the unnecessary check.
xen/events: Don't check the info for NULL as it is already done.
xen/pci: Use 'acpi_gsi_to_irq' value unconditionally.
xen/pci: Remove 'xen_allocate_pirq_gsi'.
xen/pci: Retire unnecessary #ifdef CONFIG_ACPI
xen/pci: Move the allocation of IRQs when there are no IOAPIC's to the end
xen/pci: Squash pci_xen_initial_domain and xen_setup_pirqs together.
xen/pci: Use the xen_register_pirq for HVM and initial domain users
xen/pci: In xen_register_pirq bind the GSI to the IRQ after the hypercall.
xen/pci: Provide #ifdef CONFIG_ACPI to easy code squashing.
xen/pci: Update comments and fix empty spaces.
xen/pci: Shuffle code around.
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 19:42:45 +0000 (15:42 -0400)]
Merge branch 'stable/xen-pciback-0.6.3.bugfixes' of git://oss.oracle.com/git/kwilk/xen into uek2-merge
* 'stable/xen-pciback-0.6.3.bugfixes' of git://oss.oracle.com/git/kwilk/xen:
xen/pciback: Check if the device is found instead of blindly assuming so.
xen/pciback: Do not dereference psdev during printk when it is NULL.
xen/pciback: double lock typo
xen/pciback: use mutex rather than spinlock in vpci backend
xen/pciback: Use mutexes when working with Xenbus state transitions.
xen/pciback: miscellaneous adjustments
xen/pciback: use mutex rather than spinlock in passthrough backend
xen/pciback: use resource_size()
xen/irq: If we fail during msi_capability_init return proper error code.
There are three different modes: PV, HVM, and initial domain 0. In all
the cases we would return -1 for failure instead of a proper error code.
Fix this by propagating the error code from the generic IRQ code.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 17:34:39 +0000 (13:34 -0400)]
xen: remove XEN_PLATFORM_PCI config option
Xen PVHVM needs xen-platform-pci, on the other hand xen-platform-pci is
useless in any other cases.
Therefore remove the XEN_PLATFORM_PCI config option and compile
xen-platform-pci built-in if XEN_PVHVM is selected.
Changes to v1:
- remove xen-platform-pci.o and just use platform-pci.o since it is not
externally visible anymore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
xen/enlighten: Fix compile warnings and set cx to known value.
We get:
linux/arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
linux/arch/x86/xen/enlighten.c:226: warning: ‘cx’ may be used uninitialized in this function
linux/arch/x86/xen/enlighten.c:240: note: ‘cx’ was declared here
and the cx is really not set but passed in the xen_cpuid instruction
which masks the value with returned masked_ecx from cpuid. This
can potentially lead to invalid data being stored in cx.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
.. we check whether 'xdev' is NULL - but there is no need for
it as the 'dev' check is done before. The 'dev' is embedded in
the 'xdev' so having xdev != NULL with dev being being checked
is not going to happen.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/events: Don't check the info for NULL as it is already done.
The list operation checks whether the 'info' structure that is
retrieved from the list is NULL (otherwise it would not been able
to retrieve it). This check is not neccessary.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/pci: Use 'acpi_gsi_to_irq' value unconditionally.
In the past we would only use the function's value if the
returned value was not equal to 'acpi_sci_override_gsi'. Meaning
that the INT_SRV_OVR values for global and source irq were different.
But it is OK to use the function's value even when the global
and source irq are the same.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In the past (2.6.38) the 'xen_allocate_pirq_gsi' would allocate
an entry in a Linux IRQ -> {XEN_IRQ, type, event, ..} array. All
of that has been removed in 2.6.39 and the Xen IRQ subsystem uses
an linked list that is populated when the call to
'xen_allocate_irq_gsi' (universally done from any of the xen_bind_*
calls) is done. The 'xen_allocate_pirq_gsi' is a NOP and there is
no need for it anymore so lets remove it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 15 Jun 2011 18:43:52 +0000 (14:43 -0400)]
xen/pci: Retire unnecessary #ifdef CONFIG_ACPI
As the code paths that are guarded by CONFIG_XEN_DOM0 already depend
on CONFIG_ACPI so the extra #ifdef is not required. The earlier
patch that added them in had done its job.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 6 Jun 2011 18:20:35 +0000 (14:20 -0400)]
xen/pci: Move the allocation of IRQs when there are no IOAPIC's to the end
.. which means we can preset of NR_IRQS_LEGACY interrupts using
the 'acpi_get_override_irq' API before this loop.
This means that we can get the IRQ's polarity (and trigger) from either
the ACPI (or MP); or use the default values. This fixes a bug if we did
not have an IOAPIC we would not been able to preset the IRQ's polarity
if the MP table existed.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/pci: Provide #ifdef CONFIG_ACPI to easy code squashing.
In the past we would guard those code segments to be dependent
on CONFIG_XEN_DOM0 (which depends on CONFIG_ACPI) so this patch is
not stricly necessary. But the next patch will merge common
HVM and initial domain code and we want to make sure the CONFIG_ACPI
dependency is preserved - as HVM code does not depend on CONFIG_XEN_DOM0.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The file is hard to read. Move the code around so that
the contents of it follows a uniform format:
- setup GSIs - PV, HVM, and initial domain case
- then MSI/MSI-x setup - PV, HVM and then initial domain case.
- then MSI/MSI-x teardown - same order.
- lastly, the __init functions in PV, HVM, and initial domain order.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yu Ke [Wed, 24 Mar 2010 18:01:13 +0000 (11:01 -0700)]
xen/acpi: Domain0 acpi parser related platform hypercall
This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI info to hypervisor.
Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Added DEFINE_GUEST.. in appropiate headers]
[v2: Ripped out typedefs] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Wed, 28 Sep 2011 16:46:36 +0000 (17:46 +0100)]
xen: release all pages within 1-1 p2m mappings
In xen_memory_setup() all reserved regions and gaps are set to an
identity (1-1) p2m mapping. If an available page has a PFN within one
of these 1-1 mappings it will become inaccessible (as it MFN is lost)
so release them before setting up the mapping.
This can make an additional 256 MiB or more of RAM available
(depending on the size of the reserved regions in the memory map) if
the initial pages overlap with reserved regions.
The 1:1 p2m mappings are also extended to cover partial pages. This
fixes an issue with (for example) systems with a BIOS that puts the
DMI tables in a reserved region that begins on a non-page boundary.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Thu, 29 Sep 2011 11:26:19 +0000 (12:26 +0100)]
xen: allow extra memory to be in multiple regions
Allow the extra memory (used by the balloon driver) to be in multiple
regions (typically two regions, one for low memory and one for high
memory). This allows the balloon driver to increase the number of
available low pages (if the initial number if pages is small).
As a side effect, the algorithm for building the e820 memory map is
simpler and more obviously correct as the map supplied by the
hypervisor is (almost) used as is (in particular, all reserved regions
and gaps are preserved). Only RAM regions are altered and RAM regions
above max_pfn + extra_pages are marked as unused (the region is split
in two if necessary).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Wed, 28 Sep 2011 16:46:34 +0000 (17:46 +0100)]
xen: allow balloon driver to use more than one memory region
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory. This will allow platforms to provide
(for example) a region of low memory and a region of high memory.
The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata. This is safe
as both xen_memory_setup() and balloon_init() are in __init.
The balloon regions themselves are not altered (i.e., there is still
only the one region).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Wed, 28 Sep 2011 16:46:32 +0000 (17:46 +0100)]
xen/balloon: account for pages released during memory setup
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen. This reduces the domain's current page count in
the hypervisor. The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this. If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.
This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.
Fix this by accouting for the released pages when starting the balloon
driver.
If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Tue, 13 Sep 2011 14:17:32 +0000 (10:17 -0400)]
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:
reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:
which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:
reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:
which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:
David Vrabel [Fri, 19 Aug 2011 14:57:16 +0000 (15:57 +0100)]
xen: use maximum reservation to limit amount of usable RAM
Use the domain's maximum reservation to limit the amount of extra RAM
for the memory balloon. This reduces the size of the pages tables and
the amount of reserved low memory (which defaults to about 1/32 of the
total RAM).
On a system with 8 GiB of RAM with the domain limited to 1 GiB the
kernel reports:
Before:
Memory: 627792k/4472000k available
After:
Memory: 549740k/11132224k available
A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved
low memory is also reduced from 253 MiB to 32 MiB. The total
additional usable RAM is 329 MiB.
For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
the number of pages for dom0') (c/s 23790)
CC: stable@kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Igor Mammedov [Tue, 2 Aug 2011 09:45:25 +0000 (11:45 +0200)]
xen: Fix misleading WARN message at xen_release_chunk
WARN message should not complain
"Failed to release memory %lx-%lx err=%d\n"
^^^^^^^
about range when it fails to release just one page,
instead it should say what pfn is not freed.
In addition line:
printk(KERN_INFO "xen_release_chunk: looking at area pfn %lx-%lx: "
...
printk(KERN_CONT "%lu pages freed\n", len);
will be broken if WARN in between this line is fired. So fix it
by using a single printk for this.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel De Graaf [Tue, 11 Oct 2011 19:16:06 +0000 (15:16 -0400)]
xen/gntdev: Fix sleep-inside-spinlock
BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm
1 lock held by qemu-dm/3256:
#0: (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5
Pid: 3256, comm: qemu-dm Tainted: G W 3.1.0-rc8+ #5
Call Trace:
[<ffffffff81054594>] __might_sleep+0x131/0x135
[<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45
[<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1
[<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb
[<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a
[<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5
[<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5
[<ffffffff81007d62>] ? check_events+0x12/0x20
[<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7
[<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff8109168b>] ? lock_release+0x21c/0x229
[<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32
[<ffffffff81143452>] sys_ioctl+0x47/0x6a
[<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b
gntdev_put_map tries to acquire a mutex when freeing pages back to the
xenballoon pool, so it cannot be called with a spinlock held. In
gntdev_release, the spinlock is not needed as we are freeing the
structure later; in the ioctl, only the list manipulation needs to be
under the lock.
Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen: modify kernel mappings corresponding to granted pages
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.
In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.
Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.
We dropped a lot of the MMU debugfs in favour of using
tracing API - but there is one which just provides
mostly static information that was made invisible by this change.
Bring it back.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 19 Oct 2011 17:39:47 +0000 (13:39 -0400)]
xen/pciback: Do not dereference psdev during printk when it is NULL.
.. instead use BUG_ON() as all the callers of the kill_domain_by_device
check for psdev.
Suggested-by: Jan Beulich <JBeulich@suse.com>
[v1: Rebased on top " xen/pciback: miscellaneous adjustments" causes some conflicts] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/pciback: use mutex rather than spinlock in vpci backend
Similar to the "xen/pciback: use mutex rather than spinlock in passthrough backend"
this patch converts the vpci backend to use a mutex instead of
a spinlock. Note that the code taking the lock won't ever get called
from non-sleepable context
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/pciback: Use mutexes when working with Xenbus state transitions.
The caller that orchestrates the state changes is xenwatch_thread
and it takes a mutex. In our processing of Xenbus states we can take
the luxery of going to sleep on a mutex, so lets do that and
also fix this bug:
BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 32, name: xenwatch
2 locks held by xenwatch/32:
#0: (xenwatch_mutex){......}, at: [<ffffffff813856ab>] xenwatch_thread+0x4b/0x180
#1: (&(&pdev->dev_lock)->rlock){......}, at: [<ffffffff8138f05b>] xen_pcibk_disconnect+0x1b/0x80
Pid: 32, comm: xenwatch Not tainted 3.1.0-rc6-00015-g3ce340d #2
Call Trace:
[<ffffffff810892b2>] __might_sleep+0x102/0x130
[<ffffffff8163b90f>] mutex_lock_nested+0x2f/0x50
[<ffffffff81382c1c>] unbind_from_irq+0x2c/0x1b0
[<ffffffff8110da66>] ? free_irq+0x56/0xb0
[<ffffffff81382dbc>] unbind_from_irqhandler+0x1c/0x30
[<ffffffff8138f06b>] xen_pcibk_disconnect+0x2b/0x80
[<ffffffff81390348>] xen_pcibk_frontend_changed+0xe8/0x140
[<ffffffff81387ac2>] xenbus_otherend_changed+0xd2/0x150
[<ffffffff810895c1>] ? get_parent_ip+0x11/0x50
[<ffffffff81387de0>] frontend_changed+0x10/0x20
[<ffffffff81385712>] xenwatch_thread+0xb2/0x180
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Wed, 21 Sep 2011 20:22:11 +0000 (16:22 -0400)]
xen/pciback: miscellaneous adjustments
This is a minor bugfix and a set of small cleanups; as it is not clear
whether this needs splitting into pieces (and if so, at what
granularity), it is a single combined patch.
- add a missing return statement to an error path in
kill_domain_by_device()
- use pci_is_enabled() rather than raw atomic_read()
- remove a bogus attempt to zero-terminate an already zero-terminated
string
- #define DRV_NAME once uniformly in the shared local header
- make DRIVER_ATTR() variables static
- eliminate a pointless use of list_for_each_entry_safe()
- add MODULE_ALIAS()
- a little bit of constification
- adjust a few messages
- remove stray semicolons from inline function definitions
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Dropped the resource_size fix, altered the description]
[v2: Fixed cleanpatch.pl comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 19 Sep 2011 16:32:15 +0000 (17:32 +0100)]
xen/pciback: use mutex rather than spinlock in passthrough backend
To accommodate the call to the callback function from
__xen_pcibk_publish_pci_roots(), which so far dropped and the re-
acquired the lock without checking that the list didn't actually
change, convert the code to use a mutex instead (observing that the
code taking the lock won't ever get called from non-sleepable
context).
As a result, drop the bogus use of list_for_each_entry_safe() and
remove the inappropriate dropping of the lock.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Thu, 28 Jul 2011 13:23:03 +0000 (15:23 +0200)]
xen: use static initializers in xen-balloon.c
There is no need to use dynamic initializaion, it just confuses the reader.
Switch to static initializers like its used in other files.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v2: Rebased on v3.0]
[v3: Rebased on v3.0 without Dan's changes to balloon driver] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel De Graaf [Thu, 13 Oct 2011 20:07:08 +0000 (16:07 -0400)]
xenbus: don't rely on xen_initial_domain to detect local xenstore
The xenstore daemon does not have to run in the xen initial domain;
however, Linux currently uses xen_initial_domain to test if a loopback
event channel should be used instead of the event channel provided in
Xen's start_info structure. Instead, if the event channel passed in the
start_info structure is not valid, assume that this domain will run
xenstored locally and set up the event channel.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenbus event channel established in xenbus_init is intended to be a
loopback channel, but the remote domain was hardcoded to 0; this will
cause the channel to be unusable when xenstore is not being run in
domain 0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Wed, 28 Sep 2011 15:44:54 +0000 (16:44 +0100)]
apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches
These warnings (generally one per CPU) are a result of
initializing x86_cpu_to_logical_apicid while apic_default is
still in use, but the check in setup_local_APIC() being done
when apic_bigsmp was already used as an override in
default_setup_apic_routing():
Overriding APIC driver with bigsmp
Enabling APIC mode: Physflat. Using 5 I/O APICs
------------[ cut here ]------------
WARNING: at .../arch/x86/kernel/apic/apic.c:1239
...
CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000
Booting Node 0, Processors #1
smpboot cpu 1: start_ip = 9e000
Initializing CPU#1
------------[ cut here ]------------
WARNING: at .../arch/x86/kernel/apic/apic.c:1239
setup_local_APIC+0x137/0x46b() Hardware name: ...
CPU1 logical APIC ID: 2 != 8
...
Fix this (for the time being, i.e. until
x86_32_early_logical_apicid() will get removed again, as Tejun
says ought to be possible) by overriding the previously stored
values at the point where the APIC driver gets overridden.
v2: Move this and the pre-existing override logic into
arch/x86/kernel/apic/bigsmp_32.c.
Dan Magenheimer [Wed, 12 Oct 2011 18:21:55 +0000 (11:21 -0700)]
xen: Fix selfballooning and ensure it doesn't go too far
The balloon driver's "current_pages" is very different from
totalram_pages. Self-ballooning needs to be driven by
the latter. Also, Committed_AS doesn't account for pages
used by the kernel so:
1) Add totalreserve_pages to Committed_AS for the normal target.
2) Enforce a floor for when there are little or no user-space threads
using memory (e.g. single-user mode) to avoid OOMs. The floor
function includes a "min_usable_mb" tuneable in case we discover
later that the floor function is still too aggressive in some
workloads, though likely it will not be needed.
Changes since version 4:
- change floor calculation so that it is not as aggressive; this version
uses a piecewise linear function similar to minimum_target in the 2.6.18
balloon driver, but modified to add to totalreserve_pages instead of
subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels
because the kernel has bloated over time
- change safety_margin to min_usable_mb and comment on its use
- since committed_as does NOT include kernel space (and other reserved
pages), totalreserve_pages is now added to committed_as. The result is
less aggressive self-ballooning, but theoretically more appropriate.
Changes since version 3:
- missing include causes compile problem when CONFIG_FRONTSWAP is disabled
- add comments after includes
Changes since version 2:
- missing include causes compile problem only on 32-bit
Changes since version 1:
- tuneable safety margin added
[v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs]
[v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled]
[v3: guru.anbalagane@oracle.com: fix 32-bit compile]
[v2: konrad.wilk@oracle.com: make safety margin tuneable]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>