www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:14:10 +0000 (07:44 +0530)]

[SCSI] mpt2sas: Bump driver version to 11.100.00.00

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:58 +0000 (07:43 +0530)]

[SCSI] mpt2sas: Rearrange the the code so that the completion queues are initialized prior to sending the request to controller firmware

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:50 +0000 (07:43 +0530)]

[SCSI] mpt2sas: Do not set sas_device->starget to NULL from the slave_destroy callback when all the LUNS have been deleted

If the sas_device->starget to NULL from slave_destroy callback for LUN=1
even though LUN=0 exist, results in entire target getting deleted.
To resolve the issue, the driver should only set sas_device->starget to
NULL when all the LUNS have been deleted from the slave_destroy.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:37 +0000 (07:43 +0530)]

[SCSI] mpt2sas: MPI next revision header update

1) Added product specific range of ImageType macros for the Extended
   Image Header.

2) Added Flags field and related defines to
   MPI2_TOOLBOX_ISTWI_READ_WRITE_REQUEST to support automatic
   reserve/release and page addressing.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:00 +0000 (07:43 +0530)]

[SCSI] mpt2sas: Adding support for customer specific branding

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:12:40 +0000 (07:42 +0530)]

[SCSI] mpt2sas: When IOs are terminated, update the result to DID_SOFT_ERROR to avoid infinite resets

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:12:04 +0000 (07:42 +0530)]

[SCSI] mpt2sas: Better handling DEAD IOC (PCI-E LInk down) error condition

Detection of Dead IOC has been done in fault_reset_work thread.

If IOC Doorbell is 0xFFFFFFFF, it will be detected as non-operation/DEAD IOC.
When a DEAD IOC is detected, the code is modified to remove that IOC and
all its attached devices from OS.
The PCI layer API pci_remove_bus_device() is called to remove the dead IOC.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Anton Blanchard [Mon, 7 Nov 2011 11:05:21 +0000 (22:05 +1100)]

[SCSI] mpt2sas: _scsih_smart_predicted_fault uses GFP_KERNEL in interrupt context

_scsih_smart_predicted_fault is called in an interrupt and therefore
must allocate memory using GFP_ATOMIC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Carpenter [Fri, 4 Nov 2011 18:25:01 +0000 (21:25 +0300)]

[SCSI] mpt2sas: add missing allocation.

There was supposed to be a kzalloc() here and the compiler complained
about it.
mpt2sas_scsih.c: In function ‘mpt2sas_scsih_reset_handler’:
mpt2sas_scsih.c:2807:21: warning: ‘fw_event’ may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:54 +0000 (15:37 +0530)]

[SCSI] mpt2sas: Bump driver version to 10.100.00.00

Bump driver vesion to 10.100.00.00

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Fri, 21 Oct 2011 04:38:07 +0000 (10:08 +0530)]

[SCSI] mpt2sas: Fix for Panic when inactive volume is tried deleting

The driver was setting the action to MPI2_CONFIG_ACTION_PAGE_READ_CURRENT,
which only returns active volumes. In order to get info on inactive volumes,
the driver needs to change the action to
MPI2_RAID_PGAD_FORM_GET_NEXT_CONFIGNUM, and traverse each config till the
iocstatus is MPI2_IOCSTATUS_CONFIG_INVALID_PAGE returned.
Added a change in the driver to remove the instance of
sas_device object when the driver returns "1" from the slave_configure callback.
Also fixed code to report the hot spares to the operating system with a /dev/sg
assigned.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:37 +0000 (15:37 +0530)]

[SCSI] mpt2sas: Fix for issue Port Reset taking long time(around 5 mins) to complete while issued during creating a volume

This is due to the slave_configuration routine is getting called when
host reset is active, and config page reads are failing, and driver
attempts to added device with stale config data.

To fix the issue, added error checking in slave_configure to check
for configuration pages failing, and return "1" so the device  is
not configured.  The config pages are failing if raid volume is
configured while issuing a host reset, thus driver is reading stale
data and proceeding to attempt to add.  The fix is to return error
so the volume is not configured.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:24 +0000 (15:37 +0530)]

[SCSI] mpt2sas: Fix for deadlock between hot plug worker threads and host reset context

This is due to driver reporting a device missing to the OS then the OS sending
a SYNC_CACHE request to driver while the IO queues are locked due to host reset.

To fix the issue, the driver will be waking up the port enable context
immediately when the driver receives the reply message, instead of waiting
on the hot plug worker threads.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:14 +0000 (15:37 +0530)]

[SCSI] mpt2sas: Fix for dead lock occurring between host_lock and sas_device_lock

Fix for dead lock occurring between host_lock and sas_device_lock.

The deadlock is between two spin locks, between the shost->host_lock
and driver ioc->sas_device_lock.

The fix is to rearrange the code in the FW/Driver device removal
handshake so the ioc->sas_device_lock is not occurring when the
shost->host_lock is taken.

[jejb: zero initialise sas_address to fix spurious compiler warning]
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:00 +0000 (15:37 +0530)]

[SCSI] mpt2sas: Fix drives not getting properly deleted if sas cable is removed while host reset is active

The fix is in the driver-firmware handshake device removal code. We
need to read the controller ioc_state to see if controller is OPERATIONAL
prior to sending target reset and OP_REMOVE. Previously it was checking
the flag ioc->shost_recovery flag, which is always set when host reset is
active, thus preventing drives from getting properly deleted.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:47 +0000 (15:36 +0530)]

[SCSI] mpt2sas: Fix failure message displayed during diag reset

The fix is to inhibit the warning message in _scsih_get_sas_address
when the MPI2_IOCSTATUS_CONFIG_INVALID_PAGE ioc status is returned.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Fri, 21 Oct 2011 04:36:33 +0000 (10:06 +0530)]

[SCSI] mpt2sas: Fix for system hang when discovery in progress

Fix for issue : While discovery is in progress, hot unplug and hot plug of
enclosure connected to the controller card is causing system to hang.

When a device is in the process of being detected at driver load time then
if it is removed, the device that is no longer present will not be added
to the list. So the code in _scsih_probe_sas() is rearranged as such so
the devices that failed to be detected are not added to the list.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:26 +0000 (15:36 +0530)]

[SCSI] mpt2sas: New feature - Fast Load Support

New feature Fast Load Support.

(1)Asynchronous SCSI scanning: This will allow the drivers to scan
for devices in parallel while other device drivers are loading at
the same time. This will improve the amount of time it takes for the
OS to load.

(2) Reporting Devices while port enable is active: This feature will
allow devices to be reported to OS immediately while port enable is
active. The previous implementation waits for port enable to complete,
and then report devices. This feature is only enabled on IT firmware
configurations when there are no boot device configured in BIOS Configuration
Utility, else the driver will wait till port enable completes reporting
devices. For IR firmware, this feature is turned off. This feature is to
address large SAS topologies (>100 drives) when the boot OS is using onboard
SATA device, in other words, the boot devices is not
connected to our controller.

(3) Scanning for devices after diagnostic reset completes: A new routine
_scsih_scan_start is added. This will scan the expander pages, IR pages,
and sas device pages, then reporting new devices to SCSI Mid layer. It
seems the driver is not supporting adding devices while diagnostic reset
is active. Apparently this is due to the sanity checks on
ioc->shost_recovery flag throughout the context of kernel work thread FIFO,
and the mpt2sas_fw_work.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Manual merge of upstream commit #921cd802.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:05 +0000 (15:36 +0530)]

[SCSI] mpt2sas: MPI next revision header update

1)Added ProxyVF_ID field to Configuration Request message.
2)Added IO Unit Page 8, IO Unit Page 9,and IO Unit Page 10.
3)Added SASNotifyPrimitiveMasks field to IOC Page 7.
4)Added SAS NOTIFY Primitive event.
5)Added Temperature Threshold Event.
6)Added Host Message Event.
7)Added Send Host Message request and reply.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Julia Lawall [Fri, 16 Sep 2011 06:57:34 +0000 (08:57 +0200)]

[SCSI] mpt2sas: take size of pointed value, not pointer

Sizeof a pointer-typed expression returns the size of the pointer, not that
of the pointed data.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression *e;
type T;
identifier f;
@@

f(...,(T)e,...,
-sizeof(e)
+sizeof(*e)
,...)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 8 Sep 2011 01:43:35 +0000 (07:13 +0530)]

[SCSI] mpt2sas: Bump driver version 09.100.00.01

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

nagalakshmi.nandigama@lsi.com [Thu, 8 Sep 2011 00:48:50 +0000 (06:18 +0530)]

[SCSI] mpt2sas: Added NUNA IO support in driver which uses multi-reply queue support of the HBA

Support added for controllers capable of multi reply queues.

The following are the modifications to the driver to support NUMA.

1) Create the new structure adapter_reply_queue to contain the reply queue
   info for every msix vector.  This object will contain a
   reply_post_host_index, reply_post_free for each instance, msix_index, among
   other parameters.  We will track all the reply queues on a link list called
   ioc->reply_queue_list. Each reply queue is aligned with each IRQ, and is
   passed to the interrupt via the bus_id parameter.

(2) The driver will figure out the msix_vector_count from the PCIe MSIX
    capabilities register instead of the IOC Facts->MaxMSIxVectors. This is
    because the firmware is not filling in this field until the driver has
    already registered MSIX support.

(3) If the ioc_facts reports that the controller is MSIX compatible in the
    capabilities, then the driver will request for multiple irqs.  This count
    is calculated based on the minimum between the online cpus available and
    the ioc->msix_vector_count.  This count is reported to firmware in the
    ioc_init request.

(4) New routines were added _base_free_irq and _base_request_irq, so
    registering and freeing msix vectors were done thru simple function API.

(5) The new routine _base_assign_reply_queues was added to align the msix
    indexes across cpus. This will initialize the array called
    ioc->cpu_msix_table.  This array is looked up on every MPI request so the
    MSIxIndex is set appropriately.

(6) A new shost sysfs attribute was added to report the reply_queue_count.

(7) User needs to set the affinity cpu mask, so the interrupts occur on the
    same cpu that sent the original request.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Jesper Juhl [Mon, 1 Aug 2011 21:27:12 +0000 (23:27 +0200)]

Remove unneeded version.h includes from drivers/scsi/

It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in drivers/scsi/.
This patch removes them.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

kashyap.desai@lsi.com [Thu, 4 Aug 2011 11:17:50 +0000 (16:47 +0530)]

[SCSI] mpt2sas: Added missing mpt2sas_base_detach call from scsih_remove context

mpt2sas_base_detach() call was removed from _scsih_remove() while
doing some code shuffling. Mainly when we work on adding code for
scsih_shutdown(). I have added back mpt2sas_base_detach() which will
get callled from _scsih_remove().

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Kashyap, Desai [Tue, 5 Jul 2011 07:10:23 +0000 (12:40 +0530)]

[SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command entry in MPI message

Issue:

This issue is seen on LSI H/W WarpDrive SSS6200 When filed direct I/O
is tried as volume I/O the scmd field in internal lookup table get
cleared and because of that the retried volume I/O never gets reported
as completed to SML.

Result:

I/O timeout and Error handling thread will kicking off

Fix:

Setting back the scmd in the lookup table before retrying the failed
direct i/o

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Kashyap, Desai [Tue, 14 Jun 2011 05:27:51 +0000 (10:57 +0530)]

[SCSI] mpt2sas: Bump version 09.100.00.00

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Kashyap, Desai [Tue, 14 Jun 2011 05:26:43 +0000 (10:56 +0530)]

[SCSI] mpt2sas: fix broadcast AEN and task management issue

Properly handling of target reset in multi-initiator environment

Clean up in broadcast change handling:
(1) Need to look at the status of each task management request, and retry
    the TM when there are failures.
(2) Need quiescence IO so the driver doesn't take on more IO request while
    it's in the middle of sending TM  request to firmware
(3)  Add support to keep track of how many pending broadcast AEN events
     are received while the broadcast handling is active, then loop back at
     the end of this routine if there were any events received.

Clean up in mpt2sas_scsih_issue_tm routine:
(1) Make sure proper status is returned when host reset fails
(2) Clean up sanity checks near end of routine, insuring all outstanding
    IOs were completed.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Kashyap, Desai [Tue, 14 Jun 2011 05:26:12 +0000 (10:56 +0530)]

[SCSI] mpt2sas: Set max_sector count from module parameter

This feature is to override the default
max_sectors setting at load time, taking max_sectors as an
command line option when loading the driver. The setting is
currently hard-coded in the driver to 8192 sectors (4MB transfers).
If max_sectors is specified at load time, minimum specified
setting will be 64, and the maximum is 8192. The driver will
modify the setting to be on even boundary. If max_sectors is not
specified, the driver will default to 8192.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Kashyap, Desai [Tue, 14 Jun 2011 05:25:45 +0000 (10:55 +0530)]

[SCSI] mpt2sas MPI next revision header update

mpt2sas driver revision q header update:

(1) Modified the descriptions of the LocalAddress bit in the
    Flags field of the MPI SGE Format description and the MPI
    Simple Element.
(2) Modified Data Location Address Space bits in the Flags field
    of the IEEE Chain Element.
(3) Added more detail to the description of the DataLength field
    for the SCSI IO Request and Target Assist Request. Removed
    restriction on using chained SGLs when using multicast or
    bidirectional support.
(4) In Manufacturing Page 7, added ReceptacleID field to
    ConnectorInfo, and reworked how the Pinout field is used.
(5) In IO Unit Page 7, added BoardTemperature and
    BoardTemperatureUnits fields.
(6) In IOC Page 1, changed CoalescingTimeout to units of
    half-microsecond and updated descriptions.
(7) Modified descriptions of SATASlumberTimeout and
    SASSlumberTimeout fields in SAS IO Unit Page 5 to indicate
    the timers start after partial mode is entered.
(8) Added Extended Manufacturing configuration pages.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Guru Anbalagane [Thu, 15 Dec 2011 22:25:25 +0000 (14:25 -0800)]

Merge branch 'directio' of git://ca-git.us.oracle.com/linux-dkleikam-public into uek2-stable

commit | commitdiff | tree

Guru Anbalagane [Thu, 15 Dec 2011 22:12:13 +0000 (14:12 -0800)]

Merge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Thu, 15 Dec 2011 21:25:32 +0000 (16:25 -0500)]

Merge branches 'stable/pci.fixes-3.2' and 'stable/e820-3.2.rebased' into uek2-merge

* stable/pci.fixes-3.2:
xen/swiotlb: Use page alignment for early buffer allocation.

* stable/e820-3.2.rebased:
xen: only limit memory map to maximum reservation for domain 0.

Conflicts:
arch/x86/xen/setup.c

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Thu, 15 Dec 2011 16:28:46 +0000 (11:28 -0500)]

xen/swiotlb: Use page alignment for early buffer allocation.

This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:

ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO

The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid which was to piggyback on git commit
e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment
for early buffer allocation" which:

We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignment.

So alloc them with page alignment at first, to avoid lose two pages

And doing that fixes the outstanding issue.

CC: stable@kernel.org
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Ian Campbell [Wed, 14 Dec 2011 12:16:08 +0000 (12:16 +0000)]

xen: only limit memory map to maximum reservation for domain 0.

d312ae878b6a "xen: use maximum reservation to limit amount of usable RAM"
clamped the total amount of RAM to the current maximum reservation. This is
correct for dom0 but is not correct for guest domains. In order to boot a guest
"pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
future memory expansion the guest must derive max_pfn from the e820 provided by
the toolstack and not the current maximum reservation (which can reflect only
the current maximum, not the guest lifetime max). The existing algorithm
already behaves this correctly if we do not artificially limit the maximum
number of pages for the guest case.

For a guest booted with maxmem=512, memory=128 this results in:
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
-[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
-[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
+[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
...
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
-[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
+[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
[    0.000000] initial memory mapped : 0 - 027ff000
[    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
-[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
-[    0.000000]  0000000000 - 0008100000 page 4k
-[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
+[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
+[    0.000000]  0000000000 - 0020800000 page 4k
+[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
[    0.000000] xen: setting RW the range 27e8000 - 27ff000
[    0.000000] 0MB HIGHMEM available.
-[    0.000000] 129MB LOWMEM available.
-[    0.000000]   mapped low ram: 0 - 08100000
-[    0.000000]   low ram: 0 - 08100000
+[    0.000000] 520MB LOWMEM available.
+[    0.000000]   mapped low ram: 0 - 20800000
+[    0.000000]   low ram: 0 - 20800000

With this change "xl mem-set <domain> 512M" will successfully increase the
guest RAM (by reducing the balloon).

There is no change for dom0.

Reported-and-Tested-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 21:08:52 +0000 (16:08 -0500)]

xen: Enable CONFIG_XEN_WDT so that we can reboot the box in case the dom0 is hanged.

It does require the generic watchdog RPM to be installed and used.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Guru Anbalagane [Thu, 15 Dec 2011 06:11:42 +0000 (22:11 -0800)]

Merge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

commit | commitdiff | tree

Dave Kleikamp [Tue, 13 Dec 2011 19:49:16 +0000 (13:49 -0600)]

AIO: Don't plug the I/O queue in do_io_submit()

Asynchronous I/O latency to a solid-state disk greatly increased
between the 2.6.32 and 3.0 kernels. By removing the plug from
do_io_submit(), we observed a 34% improvement in the I/O latency.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:34:43 +0000 (12:34 -0500)]

Merge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge

* stable/bug.fixes-3.3.rebased:
Revert "xen/pm_idle: Make pm_idle be default_idle under Xen."

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:34:15 +0000 (12:34 -0500)]

Revert "xen/pm_idle: Make pm_idle be default_idle under Xen."

as it is already such in kernels that are 3.0 or earlier.
This reverts commit 9964aedb7350736b1f7a799d57ee92bbf4b99ea6.

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:09:34 +0000 (12:09 -0500)]

Merge branch 'stable/acpi-cpufreq.v3.rebased' into uek2-merge

.. which is not yet upstream, albeit it has been posted:
https://lkml.org/lkml/2011/11/30/245

but it still needs guidance from the ACPI maintainers - but they are right
now busy with the ACPI v5.0 so for the time being carrying this patch
out of the tree.

In the future we will have to revert this and insert the one that is in
the upstream kernel.

* stable/acpi-cpufreq.v3.rebased:
  ACPI: xen processor: set ignore_ppc to handle PPC event for Xen vcpu.
  ACPI: xen processor: add PM notification interfaces.
  ACPI: processor: override the interface of register acpi processor handler for Xen vcpu
  ACPI: add processor driver for Xen virtual CPUs.
  ACPI: processor: add __acpi_processor_[un]register_driver helpers.
  ACPI: processor: cache acpi_power_register in cx structure
  ACPI: processor: Don't setup cpu idle handler when we do not want them.
  ACPI: processor: export necessary interfaces
  xen/acpi: Domain0 acpi parser related platform hypercall

Conflicts:
drivers/xen/Makefile

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 10:37:18 +0000 (18:37 +0800)]

ACPI: xen processor: set ignore_ppc to handle PPC event for Xen vcpu.

Xen acpi processor does not CPUFREQ_START, hence we we need to set
ignore_ppc to handle PPC events.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 10:36:39 +0000 (18:36 +0800)]

ACPI: xen processor: add PM notification interfaces.

Since cpu power is controlled by VMM in Xen, to provide
that information to the VMM, we have to use hypercall to exchange
power management state between domain with hypervisor.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Tang Liang [Wed, 19 Oct 2011 10:33:46 +0000 (18:33 +0800)]

ACPI: processor: override the interface of register acpi processor handler for Xen vcpu

This patch calls the check which detectes whether to override
the interface to register ACPI processor.

Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 10:16:51 +0000 (18:16 +0800)]

ACPI: add processor driver for Xen virtual CPUs.

Because the processor is controlled by the VMM in xen,
we need new acpi processor driver for Xen virtual CPU.

Specifically we need to be able to pass the CXX/PXX states
to the hypervisor, and as well deal with the peculiarity
that the amount of CPUs that Linux parses in the ACPI
is different from the amount visible to the Linux kernel.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:

drivers/xen/Makefile
include/xen/acpi.h

commit | commitdiff | tree

Tang Liang [Wed, 19 Oct 2011 09:01:20 +0000 (17:01 +0800)]

ACPI: processor: add __acpi_processor_[un]register_driver helpers.

This patch implement __acpi_processor_[un]register_driver helper,
so we can registry override processor driver function. Specifically
the Xen processor driver.

By default the values are set to the native one.

Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 08:51:51 +0000 (16:51 +0800)]

ACPI: processor: cache acpi_power_register in cx structure

This patch save acpi_power_register in cx structure because we need
pass this to the Xen ACPI processor driver.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 08:47:51 +0000 (16:47 +0800)]

ACPI: processor: Don't setup cpu idle handler when we do not want them.

This patch inhibits processing of the CPU idle handler if it is not
set to the appropiate one. This is needed by the Xen processor driver
which, while still needing processor details, wants to use the default_idle
call (which makes a yield hypercall).

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Kevin Tian [Wed, 19 Oct 2011 08:39:37 +0000 (16:39 +0800)]

ACPI: processor: export necessary interfaces

This patch export some necessary functions which parse processor
power management information. The Xen ACPI processor driver uses them.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Yu Ke [Wed, 24 Mar 2010 18:01:13 +0000 (11:01 -0700)]

xen/acpi: Domain0 acpi parser related platform hypercall

This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI info to hypervisor.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Added DEFINE_GUEST.. in appropiate headers]
[v2: Ripped out typedefs]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:27:08 +0000 (11:27 -0500)]

Merge branch 'stable/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-merge

Which adds the microcode code support. It is not upstream
and probably won't be as the upstream as the x86 maintainers want to
load the microcode blob (in a new format) as part of the GRUB loader:
[http://lists.xen.org/archives/html/xen-devel/2011-12/msg00250.html]

Jan Beulich implemented a patchset for Xen hypervisor which would do this
as part of the mboot loader and define which payload using 'ucode=<number>'.
[http://lists.xen.org/archives/html/xen-devel/2011-12/msg00007.html]
but that is not what the x86 maintainers want to do (as he did not define
a new format and just ingested the raw binary blob). There is also
a feature: "[PATCH] x86/microcode: Allow "ucode=" argument to be negative"
which will pick the microcode as the last payload.

For the time being lets use this old driver that loads the microcode
in the dom0 and pushes it up to the hypervisor - and let the x86 and xen
folks sort this out.

* 'stable/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86/microcode: check proper return code.
  xen/v86d: Fix /dev/mem to access memory below 1MB
  xen: add CPU microcode update driver
  xen: add dom0_op hypercall
  xen/acpi: Domain0 acpi parser related platform hypercall

Conflicts:
arch/x86/xen/Kconfig

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:15:33 +0000 (11:15 -0500)]

Merge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge

* stable/bug.fixes-3.3.rebased:
  x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test
  x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.
  xen/pm_idle: Make pm_idle be default_idle under Xen.

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:15:27 +0000 (11:15 -0500)]

Merge branches 'stable/xen-block.rebase' and 'stable/vmalloc-3.2.rebased' into uek2-merge

* stable/xen-block.rebase:
  xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
  block: xen-blkback: use API provided by xenbus module to map rings
  xen-blkback: convert hole punching to discard request on loop devices
  xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io
  xen/blk[front|back]: Enhance discard support with secure erasing support.
  xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard together

* stable/vmalloc-3.2.rebased:
  xen: map foreign pages for shared rings by updating the PTEs directly
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()

commit | commitdiff | tree

Joe Jin [Mon, 15 Aug 2011 04:51:31 +0000 (12:51 +0800)]

xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.

When do block-attach/block-detach test with below steps, umount hangs
in the guest. Furthermore shutdown ends up being stuck when umounting file-systems.

1. start guest.
2. attach new block device by xm block-attach in Dom0.
3. mount new disk in guest.
4. execute xm block-detach to detach the block device in dom0 until timeout
5. Any request to the disk will hung.

Root cause:
This issue is caused when setting backend device's state to
'XenbusStateClosing', which sends to the frontend the XenbusStateClosing
notification. When frontend receives the notification it tries to release
the disk in blkfront_closing(), but at that moment the disk is still in use
by guest, so frontend refuses to close. Specifically it sets the disk state to
XenbusStateClosing and sends the notification to backend - when backend receives the
event, it disconnects the vbd from real device, and sets the vbd device state to
XenbusStateClosing. The backend disconnects the real device/file, and any IO
requests to the disk in guest will end up in ether, leaving disk DEAD and set to
XenbusStateClosing. When the guest wants to disconnect the disk, umount will
hang on blkif_release()->xlvbd_release_gendisk() as it is unable to send any IO
to the disk, which prevents clean system shutdown.

Solution:
Don't disconnect backend until frontend state switched to XenbusStateClosed.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Daniel Stodden <daniel.stodden@citrix.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
[v1: Modified description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Fri, 4 Nov 2011 17:18:15 +0000 (13:18 -0400)]

x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test

For details refer to patch "x86/paravirt: Use pte_attrs instead of
pte_flags on CPA/set_p.._wb/wc operations." which explains that
some pages have the _PAGE_PWT bit set in the _PAGE_PSE field
when running under Xen.

When pageattr_test is running it uses pte_flags to check whether
it succedded in setting _PAGE_UNUSED1 bit, but also whether the
page had _PAGE_PSE. This can happen when one of the randomly selected
pages to be tested is a page that has been set to be _PAGE_WC
as under Xen, that field is under _PAGE_PSE. Since the 'pte_huge'
call is using the pte_flags(x) macro, which extracts the "raw" contents
of the PTE, the translation of _PAGE_PSE -> _PAGE_PWT does not happen
and we incorrectly identify the PTE as bad.

Using the 'pte_val' instead of 'pte_flags' fixes the problem and
this patch does that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@kernel.org

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Fri, 4 Nov 2011 15:59:34 +0000 (11:59 -0400)]

x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

When using the paravirt interface, most of the page operations are wrapped
in the pvops interface. The one that is not is the pte_flags. The reason
being that for most cases, the "raw" PTE flag values for baremetal and whatever
pvops platform is running (in this case) - share the same bit meaning.

Except for PAT. Under Linux, the PAT MSR is written to be:

          PAT4                 PAT0
+---+----+----+----+-----+----+----+
WC | WC | WB | UC | UC- | WC | WB |  <= Linux
+---+----+----+----+-----+----+----+
WC | WT | WB | UC | UC- | WT | WB |  <= BIOS
+---+----+----+----+-----+----+----+
WC | WP | WC | UC | UC- | WT | WB |  <= Xen
+---+----+----+----+-----+----+----+

The lookup of this index table translates to looking up
Bit 7, Bit 4, and Bit 3 of PTE:

PAT/PSE (bit 7) ... PCD (bit 4) .. PWT (bit 3).

If all bits are off, then we are using PAT0. If bit 3 turned on,
then we are using PAT1, if bit 3 and bit 4, then PAT2..

Back to the PAT MSR table:

As you can see, the PAT1 translates to PAT4 under Xen. Under Linux
we only use PAT0, PAT1, and PAT2 for the caching as:

WB = none (so PAT0)
WC = PWT (bit 3 on)
UC = PWT | PCD (bit 3 and 4 are on).

But to make it work with Xen, we end up doing for WC a translation:

PWT (so bit 3 on) --> PAT (so bit 7 is on) and clear bit 3

And to translate back (when the paravirt pte_val is used) we would:

PAT (bit 7 on) --> PWT (bit 3 on) and clear bit 7.

This works quite well, except if code uses the pte_flags, as pte_flags
reads the raw value and does not go through the paravirt. Which means
that if (when running under Xen):

1) we allocate some pages.
2) call set_pages_array_wc, which ends up calling:
     __page_change_att_set_clr(.., __pgprot(__PAGE_WC),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE flags and _only_ look at the
    _PTE_FLAG_MASK contents with __PAGE_MASK cleared (0x18) and
    __PAGE_WC (0x8) set.

     read raw *pte -> 0x67
     *pte = 0x67 & ^0x18 | 0x8
     *pte = 0x67 & 0xfffffe7 | 0x8
     *pte = 0x6f

   [now set_pte_atomic is called, and 0x6f is written in, but under
    xen_make_pte, the bit 3 is translated to bit 7, so it ends up
    writting 0xa7, which is correct]

3) do something to them.
4) call set_pages_array_wb
     __page_change_att_set_clr(.., __pgprot(__PAGE_WB),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE and _only_ look at the
    _PTE_FLAG_MASK contents with _PAGE_MASK cleared (0x18) and
    __PAGE_WB (0x0) set:

     read raw *pte -> 0xa7
     *pte = 0xa7 & &0x18 | 0
     *pte = 0xa7 & 0xfffffe7 | 0
     *pte = 0xa7

   [we check whether the old PTE is different from the new one

    if (pte_val(old_pte) != pte_val(new_pte)) {
        set_pte_atomic(kpte, new_pte);
        ...

   and find out that 0xA7 == 0xA7 so we do not write the new PTE value in]

   End result is that we failed at removing the WC caching bit!

5) free them.
   [and have pages with PAT4 (bit 7) set, so other subsystems end up using
    the pages that have the write combined bit set resulting in crashes. Yikes!].

The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
code with newly introduced pte_attrs which can go through the pvops interface
to get the "emulated" value instead of the raw. Naturally if CONFIG_PARAVIRT is
not set, it would end calling native_pte_val.

The other way to fix this is by wrapping pte_flags and go through the pvops
interface and it really is the Right Thing to do.  The problem is, that past
experience with mprotect stuff demonstrates that it be really expensive in inner
loops, and pte_flags() is used in some very perf-critical areas.

Example code to run this and see the various mysterious subsystems/applications
crashing

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>");
MODULE_DESCRIPTION("wb_to_wc_and_back");
MODULE_LICENSE("GPL");
MODULE_VERSION(WB_TO_WC);

static int thread(void *arg)
{
struct page *a[MAX_PAGES];
unsigned int i, j;
do {
for (j = 0, i = 0;i < MAX_PAGES; i++, j++) {
a[i] = alloc_page(GFP_KERNEL);
if (!a[i])
break;
}
set_pages_array_wc(a, j);
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout_interruptible(HZ);
for (i = 0; i < j; i++) {
unsigned long *addr = page_address(a[i]);
if (addr) {
memset(addr, 0xc2, PAGE_SIZE);
}
}
set_pages_array_wb(a, j);
for (i = 0; i< MAX_PAGES; i++) {
if (a[i])
__free_page(a[i]);
a[i] = NULL;
}
} while (!kthread_should_stop());
return 0;
}
static struct task_struct *t;
static int __init wb_to_wc_init(void)
{
t = kthread_run(thread, NULL, "wb_to_wc_and_back");
return 0;
}
static void __exit wb_to_wc_exit(void)
{
if (t)
kthread_stop(t);
}
module_init(wb_to_wc_init);
module_exit(wb_to_wc_exit);

This fixes RH BZ #742032
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Tom Goetz <tom.goetz@virtualcomputer.com>
CC: stable@kernel.org

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Mon, 21 Nov 2011 23:02:02 +0000 (18:02 -0500)]

xen/pm_idle: Make pm_idle be default_idle under Xen.

The idea behind commit d91ee5863b71 ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

David Vrabel [Thu, 29 Sep 2011 15:53:32 +0000 (16:53 +0100)]

xen: map foreign pages for shared rings by updating the PTEs directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>

commit | commitdiff | tree

David Vrabel [Thu, 29 Sep 2011 15:53:31 +0000 (16:53 +0100)]

net: xen-netback: use API provided by xenbus module to map rings

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree(). Use these to map the Tx and Rx ring pages
granted by the frontend.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

David Vrabel [Thu, 29 Sep 2011 15:53:30 +0000 (16:53 +0100)]

block: xen-blkback: use API provided by xenbus module to map rings

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree(). Use these to map the ring pages granted by
the frontend.

Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

David Vrabel [Thu, 29 Sep 2011 15:53:29 +0000 (16:53 +0100)]

xen: use generic functions instead of xen_{alloc, free}_vm_area()

Replace calls to the Xen-specific xen_alloc_vm_area() and
xen_free_vm_area() functions with the generic equivalent
(alloc_vm_area() and free_vm_area()).

On x86, these were identical already.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

David Vrabel [Thu, 20 Oct 2011 10:45:17 +0000 (11:45 +0100)]

block: xen-blkback: use API provided by xenbus module to map rings

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree(). Use these to map the ring pages granted by
the frontend.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Li Dongyang [Thu, 10 Nov 2011 07:52:06 +0000 (15:52 +0800)]

xen-blkback: convert hole punching to discard request on loop devices

As of dfaa2ef68e80c378e610e3c8c536f1c239e8d3ef, loop devices support
discard request now. We could just issue a discard request, and
the loop driver will punch the hole for us, so we don't need to touch
the internals of loop device and punch the hole ourselves, Thanks.

V0->V1: rebased on devel/for-jens-3.3

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Wed, 12 Oct 2011 21:26:47 +0000 (17:26 -0400)]

xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io

.. and move it to its own function that will deal with the
discard operation.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Wed, 12 Oct 2011 20:23:30 +0000 (16:23 -0400)]

xen/blk[front|back]: Enhance discard support with secure erasing support.

Part of the blkdev_issue_discard(xx) operation is that it can also
issue a secure discard operation that will permanantly remove the
sectors in question. We advertise that we can support that via the
'discard-secure' attribute and on the request, if the 'secure' bit
is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE.

CC: Li Dongyang <lidongyang@novell.com>
[v1: Used 'flag' instead of 'secure:1' bit]
[v2: Use 'reserved' uint8_t instead of adding a new value]
[v3: Check for nseg when mapping instead of operation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Wed, 12 Oct 2011 16:12:36 +0000 (12:12 -0400)]

xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard together

In a union type structure to deal with the overlapping
attributes in a easier manner.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 12 Dec 2011 22:52:41 +0000 (14:52 -0800)]

Merge from upstream: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning

./lpfc-8.3.5.40-8.3.5.44-1/r12610_r12603.patch
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 12 Dec 2011 22:51:53 +0000 (14:51 -0800)]

Fixed mailbox double free panic

./lpfc-8.3.5.40-8.3.5.44-1/r12522_r12510.patch
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 12 Dec 2011 22:51:02 +0000 (14:51 -0800)]

Fixed compiler warning for putting large amount of memory on stack

./lpfc-8.3.5.40-8.3.5.44-1/r12517_r12512.patch
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 12 Dec 2011 22:49:44 +0000 (14:49 -0800)]

Enable BG by default

./lpfc-8.3.5.40-8.3.5.44-1/r12228.patch

commit | commitdiff | tree

Maxim Uvarov [Tue, 6 Dec 2011 01:20:56 +0000 (17:20 -0800)]

SPEC: ol6 req dracut-kernel-004-242.0.3

Orabug: 13388545
Since firmware moved to uname -r directory dracut has to be able
to load firmware from that directory
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 6 Dec 2011 01:15:22 +0000 (17:15 -0800)]

SPEC: req udev-095-14.27.0.1.el5_7.1 or more

Orabug: 13348381
Since firmware moved to uname -r directory udev has to be able
to load firmware from that directory
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 6 Dec 2011 01:10:17 +0000 (17:10 -0800)]

SPEC: el5 mkinird more then 5.1.19.6-71.0.10

Orabug: 13459000
Since firmware moved to uname -r directory updated mkinird is required
for el5.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 5 Dec 2011 18:57:21 +0000 (10:57 -0800)]

hpwd watchdog mark page executable

Orabug: 13115973
Mark hpwdt watchdog pages executable to prevent failing:
BUG: unable to handle kernel paging request at c00f0000
IP: [<c00f0000>] 0xc00effff
*pdpt = 0000000000b7c001 *pde = 0000000000cf5067 *pte = 80000000000f0163
Oops: 0011 [#1] SMP
Modules linked in: hpwdt(+)(U) ipmi_si(U) ipmi_msghandler(U) serio_raw(U)
pcspkr(U) k8temp(U) ext4(U) mbcache(U) jbd2(U) hpsa(U) cciss(U) lpfc(U)
qla2xxx(U) scsi_transport_fc(U) scsi_tgt(U) radeon(U) ttm(U)
drm_kms_helper(U) drm(U) hwmon(U) i2c_algo_bit(U) i2c_core(U) dm_mod(U)
.
Pid: 741, comm: modprobe Not tainted 2.6.39-100.0.15.el6uek.i686 #1 HP
ProLiant BL685c G1
EIP: 0060:[<c00f0000>] EFLAGS: 00010286 CPU: 1
EIP is at 0xc00f0000
EAX: 55524324 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: e892fda0 ESP: e892fd70
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process modprobe (pid: 741, ti=e892e000 task=e96d0da0 task.ti=e892e000)
Stack:
  f902b020 00000060 0000007b 00000286 ffffffed c00f0000 e892fda0 e892fda0
  c00f0000 00000001 00000000 c00f0000 e892fdc4 f902b500 f902c0e0 c00f0000
  e892fdc4 c0439b6f c00ffee0 c0100000 c00f0000 e892fdf0 f902b627 ea276860
Call Trace:
  [<f902b020>] ? asminline_call+0x20/0x50 [hpwdt]
  [<f902b500>] cru_detect+0x43/0xf6 [hpwdt]
  [<c0439b6f>] ? ioremap_nocache+0x1f/0x30
  [<f902b627>] hpwdt_init_nmi_decoding+0x74/0x16b [hpwdt]
  [<c085f469>] ? printk+0x1d/0x24
  [<f902b7f4>] hpwdt_init_one+0xd6/0x162 [hpwdt]
  [<c06d8475>] ? pm_runtime_enable+0x45/0x70
  [<c06149c7>] local_pci_probe+0x47/0xb0
  [<c0615978>] pci_device_probe+0x68/0x90
  [<c06d0aee>] really_probe+0x5e/0x210
  [<c06d9808>] ? pm_runtime_barrier+0x48/0xb0
  [<c06d0ce3>] driver_probe_device+0x43/0xa0
  [<c061494e>] ? pci_match_device+0x9e/0xb0
  [<c06d0dc1>] __driver_attach+0x81/0x90
  [<c06d0020>] bus_for_each_dev+0x50/0x70
  [<c06d08fe>] driver_attach+0x1e/0x20
  [<c06d0d40>] ? driver_probe_device+0xa0/0xa0
  [<c06d0397>] bus_add_driver+0x197/0x270
  [<c06157f0>] ? pci_dev_put+0x20/0x20
  [<c06d13ea>] driver_register+0x6a/0x130
  [<c0615ba5>] __pci_register_driver+0x45/0xb0
  [<f902e017>] hpwdt_init+0x17/0x19 [hpwdt]
  [<c0403035>] do_one_initcall+0x35/0x170
  [<f902e000>] ? 0xf902dfff
  [<c0491ac5>] sys_init_module+0x75/0x1c0
  [<c04ac8a6>] ? audit_syscall_exit+0x216/0x240
  [<c0868f9f>] sysenter_do_call+0x12/0x28
Code: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  90 80 fc d8 75 0d e9 03 07 00 00 b8 04 00 00 02 05 00 00 9c
EIP: [<c00f0000>] 0xc00f0000 SS:ESP 0068:e892fd70
CR2: 00000000c00f0000

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Sat, 3 Dec 2011 00:03:06 +0000 (16:03 -0800)]

put firmware to kernel version specific location

Orabug: 13254457
By default firmware loaded with priorities from this folders:
/lib/udev/firmware.sh:
FIRMWARE_DIRS="/lib/firmware/updates/$(uname -r) /lib/firmware/updates \
/lib/firmware/$(uname -r) /lib/firmware"

Place firmware to /lib/firmware/$(uname -r) instead of /lib/firmware
to avoid collisions between different firmware versions.

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Andi Kleen [Thu, 1 Dec 2011 21:38:15 +0000 (15:38 -0600)]

DIO: optimize cache misses in the submission path

Some investigation of a transaction processing workload showed that
a major consumer of cycles in __blockdev_direct_IO is the cache miss
while accessing the block size. This is because it has to walk
the chain from block_dev to gendisk to queue.

The block size is needed early on to check alignment and sizes.
It's only done if the check for the inode block size fails.
But the costly block device state is unconditionally fetched.

- Reorganize the code to only fetch block dev state when actually
needed.

Then do a prefetch on the block dev early on in the direct IO
path. This is worth it, because there is substantial code runbefore we actually touch the block dev now.

- I also added some unlikelies to make it clear the compiler
that block device fetch code is not normally executed.

This gave a small, but measurable improvement on a large database
benchmark (about 0.3%)

v2: Remove unlikely (Jeff Moyer)
Signed-off-by: Andi Kleen <ak@linux.intel.com>

commit | commitdiff | tree

Andi Kleen [Thu, 1 Dec 2011 21:36:56 +0000 (15:36 -0600)]

VFS: Cache request_queue in struct block_device

This makes it possible to get from the inode to the request_queue
with one less cache miss. Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the
gendisk while it's visible to block_devices. I think that's safe correct?

Cc: axboe@kernel.dk
Signed-off-by: Andi Kleen <ak@linux.intel.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 1 Dec 2011 19:41:43 +0000 (11:41 -0800)]

Install include/drm headers

Orabug:13260234
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:09 +0000 (21:38 -0700)]

direct-io: merge direct_io_walker into __blockdev_direct_IO

This doesn't change anything for the compiler, but hch thought it would
make the code clearer.

I moved the reference counting into its own little inline.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:08 +0000 (21:38 -0700)]

direct-io: inline the complete submission path

Add inlines to all the submission path functions. While this increases
code size it also gives gcc a lot of optimization opportunities
in this critical hotpath.

In particular -- together with some other changes -- this
allows gcc to get rid of the unnecessary clearing of
sdio at the beginning and optimize the messy parameter passing.
Any non inlining of a function which takes a sdio parameter
would break this optimization because they cannot be done if the
address of a structure is taken.

Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING
and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off.

This gives about 2.2% improvement on a large database benchmark
with a high IOPS rate.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:07 +0000 (21:38 -0700)]

direct-io: separate map_bh from dio

Only a single b_private field in the map_bh buffer head is needed after
the submission path. Move map_bh separately to avoid storing
this information in the long term slab.

This avoids the weird 104 byte hole in struct dio_submit which also needed
to be memseted early.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:06 +0000 (21:38 -0700)]

direct-io: use a slab cache for struct dio

A direct slab call is slightly faster than kmalloc and can be better cached
per CPU. It also avoids rounding to the next kmalloc slab.

In addition this enforces cache line alignment for struct dio to avoid
any false sharing.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:05 +0000 (21:38 -0700)]

direct-io: rearrange fields in dio/dio_submit to avoid holes

Fix most problems reported by pahole.

There is still a weird 104 byte hole after map_bh. I'm not sure what
causes this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:04 +0000 (21:38 -0700)]

direct-io: fix a wrong comment

There's nothing on the stack, even before my changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Andi Kleen [Tue, 2 Aug 2011 04:38:03 +0000 (21:38 -0700)]

direct-io: separate fields only used in the submission path from struct dio

This large, but largely mechanic, patch moves all fields in struct dio
that are only used in the submission path into a separate on stack
data structure. This has the advantage that the memory is very likely
cache hot, which is not guaranteed for memory fresh out of kmalloc.

This also gives gcc more optimization potential because it can easier
determine that there are no external aliases for these variables.

The sdio initialization is a initialization now instead of memset.
This allows gcc to break sdio into individual fields and optimize
away unnecessary zeroing (after all the functions are inlined)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Maxim Uvarov [Wed, 30 Nov 2011 01:19:46 +0000 (17:19 -0800)]

Merge branch 'uek2-oracleasm' of ca-git.us.oracle.com:linux-mkp-public into uek2-stable-update2

commit | commitdiff | tree

Maxim Uvarov [Tue, 29 Nov 2011 22:31:43 +0000 (14:31 -0800)]

Set panic_on_oops to default to true

Orabug: 13248236
(cherry picked from commit 699300f48c4eb16308fec6575fa2047891d56fd1)
(cherry picked from commit 48f59636aca88a6c9c04c8e0919b4d117185037d)
Conflicts:

kernel/panic.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 29 Nov 2011 01:37:43 +0000 (17:37 -0800)]

modsign: no sign if keys are missing

Orabug: 13421398
- SPEC: use kernel source dir for gpg
- No sign modules if no keys

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Martin K. Petersen [Tue, 22 Nov 2011 21:27:39 +0000 (16:27 -0500)]

Oracle ASM Kernel Driver

Include version 2.0.7 of oracleasm.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Wed, 16 Nov 2011 18:11:13 +0000 (10:11 -0800)]

SPEC: v2.6.39-100.0.17

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Wed, 16 Nov 2011 18:04:24 +0000 (10:04 -0800)]

Merge branch 'uek2-stable' of ssh://ca-server1/home/mason/git/linux-uek-2.6.39 into uek2-stable-update

commit | commitdiff | tree

Liu Bo [Tue, 15 Nov 2011 01:48:06 +0000 (20:48 -0500)]

Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.

But there are two cases to lead to tree corruptions:

1) multi-thread snapshots can commit serveral snapshots in a transaction,
   and this may change the src root when processing the following pending
   snapshots, which lead to the former snapshots corruptions;

2) the free inode cache was changing the roots when it root the cache,
   which lead to corruptions.

This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit f1ebcc74d5b2159f44c96b479b6eb8afc7829095)

commit | commitdiff | tree

David Sterba [Fri, 11 Nov 2011 15:14:57 +0000 (10:14 -0500)]

btrfs: rename the option to nospace_cache

Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.

The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 8965593e41dd2d0e2a2f1e6f245336005ea94a2c)

commit | commitdiff | tree

Arne Jansen [Fri, 11 Nov 2011 13:17:10 +0000 (08:17 -0500)]

Btrfs: handle bio_add_page failure gracefully in scrub

Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
dm based targets accept only one page per bio, thus making scrub always
fails. This patch just submits the current bio when an error is encountered
and starts a new one.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 69f4cb526bd02ae5af35846f9a710c099eec3347)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:05 +0000 (20:45 -0500)]

Btrfs: fix deadlock caused by the race between relocation

We can not do flushable reservation for the relocation when we create snapshot,
because it may make the transaction commit task and the flush task wait for
each other and the deadlock happens.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 62f30c5462374b991e7e3f42d49ce2265c1b82f1)

commit | commitdiff | tree

Josef Bacik [Fri, 11 Nov 2011 01:45:05 +0000 (20:45 -0500)]

Btrfs: only map pages if we know we need them when reading the space cache

People have been running into a warning when loading space cache because the
page is already mapped when trying to read in a bitmap.  The way we read in
entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
maps the entries if it needs to, and if it hits the end of the page it simply
unmaps the page.  That way we can unconditionally unmap the io_ctl before
reading in the bitmap and we should stop hitting these warnings.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 2f120c05e67ae34c93786b1050c6828904314429)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:05 +0000 (20:45 -0500)]

Btrfs: fix orphan backref nodes

If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().

kernel BUG at fs/btrfs/relocation.c:239!

This patch fixes it by adding them into ->leaves list of backref cache.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 76b9e23d25d5c99f994bee3172de39492e452e93)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:05 +0000 (20:45 -0500)]

Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}

btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 61b520a9d0083b9b361638e456af45fd75150c87)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:05 +0000 (20:45 -0500)]

Btrfs: fix unreleased path in btrfs_orphan_cleanup()

When we did stress test for the space relocation, the deadlock happened.
By debugging, We found it was caused by the carelessness that we forgot
to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
before we end the transaction handle, so the transaction commit task waited
the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
but that task waited the commit task to end the transaction commit, and
the deadlock happened. Fix it.

Signed-ff-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 3254c87618354e58fa2a7b375c6664f567480c33)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:04 +0000 (20:45 -0500)]

Btrfs: fix no reserved space for writing out inode cache

I-node cache forgets to reserve the space when writing out it. And when
we do some stress test, such as synctest, it will trigger WARN_ON() in
use_block_rsv().

WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
...
Call Trace:
[<ffffffff8104df86>] warn_slowpath_common+0x80/0x98
[<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17
[<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
[<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
[<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs]
[<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs]
[<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs]
[<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs]
[<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs]
[<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs]
[<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
[<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs]
[<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs]
[<ffffffff810690df>] ? wake_up_bit+0x25/0x25
[<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
[<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs]
[<ffffffff81122806>] ? mntput+0x21/0x23
[<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
[<ffffffff8112d170>] vfs_fsync+0x17/0x19
[<ffffffff8112d316>] do_fsync+0x29/0x3e
[<ffffffff8112d348>] sys_fsync+0xb/0xf
[<ffffffff81468352>] system_call_fastpath+0x16/0x1b

Sometimes it causes BUG_ON() in the reservation code of the delayed inode
is triggered.

So we must reserve enough space for inode cache.

Note: If we can not reserve the enough space for inode cache, we will
give up writing out it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit ba38eb4de354d228f2792f93cde2c748a3a3f3b2)

commit | commitdiff | tree

Miao Xie [Fri, 11 Nov 2011 01:45:04 +0000 (20:45 -0500)]

Btrfs: fix nocow when deleting the item

btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 924cd8fbe41851eda2b68bf2ed501b2777fd77b4)

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom