]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
13 years agoMerge branch 'uek2-mpt2sas' of git://ca-git.us.oracle.com/linux-mkp-public into uek2...
Guru Anbalagane [Fri, 16 Dec 2011 21:50:53 +0000 (13:50 -0800)]
Merge branch 'uek2-mpt2sas' of git://ca-git.us.oracle.com/linux-mkp-public into uek2-stable

13 years ago[SCSI] mpt2sas: Removed redundant calling of _scsih_probe_devices() from _scsih_probe
nagalakshmi.nandigama@lsi.com [Tue, 13 Dec 2011 03:59:15 +0000 (09:29 +0530)]
[SCSI] mpt2sas: Removed redundant calling of _scsih_probe_devices() from _scsih_probe

Removed redundant calling of _scsih_probe_devices() from _scsih_probe as
it is getting called from _scsih_scan_finished.

Also moved the function scsi_scan_host(shost) to get called after the
volumes on warp drive are reported to the OS. Otherwise by the time
the (ioc->hide_drives) flags is set, the volumes on warp drive
are reported to the OS already.

Also modified the initialization of reply queues only in case of driver load
time in the function _base_make_ioc_operational().

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Remove unused duplicate diag_buffer_enable param
Roland Dreier [Wed, 30 Nov 2011 18:05:50 +0000 (10:05 -0800)]
[SCSI] mpt2sas: Remove unused duplicate diag_buffer_enable param

Commit 921cd8024b90 ("[SCSI] mpt2sas: New feature - Fast Load
Support") moved handling of the diag_buffer_enable module parameter
from mpt2sas_base.c to mpt2sas_scsih.c, but it left an old copy of the
parameter in mpt2sas_base.c.  Remove the unused stub.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix possible integer truncation of cpu_count
Roland Dreier [Thu, 1 Dec 2011 00:30:33 +0000 (16:30 -0800)]
[SCSI] mpt2sas: Fix possible integer truncation of cpu_count

When computing reply_queue_count (the number of MSI-X vectors to use),
the driver does

ioc->reply_queue_count = min_t(u8, ioc->cpu_count,
    ioc->msix_vector_count);

However, on a big machine, ioc->cpu_count could be outside the range
that fits in a u8; eg a system with 256 CPUs will end up
reply_queue_count set to 0.

Fix this by calculating the minimum as ints and then letting the
assignment to reply_queue_count handle integer demotion.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix leak on mpt2sas_base_attach() error path
Roland Dreier [Thu, 1 Dec 2011 01:14:22 +0000 (17:14 -0800)]
[SCSI] mpt2sas: Fix leak on mpt2sas_base_attach() error path

Commit 911ae9434f83 ("[SCSI] mpt2sas: Added NUNA IO support in driver
which uses multi-reply queue support of the HBA") added new
allocations to the beginning of mpt2sas_base_attach(), which means
directly returning an error on failure of mpt2sas_base_map_resources()
will leak those allocations.

Fix this by doing "goto out_free_resources" in this place too, as the
rest of the function does.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas : Bump driver vesion to 12.100.00.00
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:23:13 +0000 (07:53 +0530)]
[SCSI] mpt2sas : Bump driver vesion to 12.100.00.00

Bump driver vesion to 12.100.00.00

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas : Fix for memory allocation error for large host credits
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:23:08 +0000 (07:53 +0530)]
[SCSI] mpt2sas : Fix for memory allocation error for large host credits

The amount of memory required for tracking chain buffers is rather
large, and when the host credit count is big, memory allocation
failure occurs inside __get_free_pages.

The fix is to limit the number of chains to 100,000.  In addition,
the number of host credits is limited to 30,000 IOs. However this
limitation can be overridden this using the command line option
max_queue_depth.  The algorithm for calculating the
reply_post_queue_depth is changed so that it is equal to
(reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2).

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Do not retry a timed out direct IO for warpdrive
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:23:02 +0000 (07:53 +0530)]
[SCSI] mpt2sas: Do not retry a timed out direct IO for warpdrive

When an I/O request to a WarpDrive is timed out by SML and if the
I/O request to the WarpDrive is sent as direct I/O then the aborted
direct I/O will be retried as normal Volume I/O and which results
in failure of Target Reset and results in host reset.

The fix is to not retry a failed IO to volume when the original
IO was sent as direct IO with an ioc status
MPI2_IOCSTATUS_SCSI_TASK_TERMINATED.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Release spinlock for the raid device list before blocking it
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:22:56 +0000 (07:52 +0530)]
[SCSI] mpt2sas: Release spinlock for the raid device list before blocking it

Added code to release the spinlock that is used to protect the
raid device list before calling a function that can block. The
blocking was causing a reschedule, and subsequently it is tried
to acquire the same lock, resulting in a panic (NMI Watchdog
detecting a CPU lockup).

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: MPI next revision header update
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:22:49 +0000 (07:52 +0530)]
[SCSI] mpt2sas: MPI next revision header update

1)Removed Power Management Control option for PCIe link.
2)Added RAID Action for performing a compatibility check. Added
product-specific range to RAID Action values.
3)Added PhysicalPort field to SAS Device Status Change Event data.
4)Added SpinupFlags field containing a Disable Spin-up bit to the
SpinupGroupParameters fields of SAS IO Unit Page 4.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Added support for customer specific branding
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:22:42 +0000 (07:52 +0530)]
[SCSI] mpt2sas: Added support for customer specific branding

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Increase max transfer support from 4MB to 16MB
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:22:08 +0000 (07:52 +0530)]
[SCSI] mpt2sas: Increase max transfer support from 4MB to 16MB

Increase max transfer support from 4MB to 16MB.
This is done by changing the shost->max_sector from 8192 to 32767

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Support for greater than 2TB capacity WarpDrive
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:21:55 +0000 (07:51 +0530)]
[SCSI] mpt2sas: Support for greater than 2TB capacity WarpDrive

The driver is modified to allow access to the greater than 2TB WarpDrive
and properly handle direct-io mapping for WarpDrive volumes greater than 2TB.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Bump driver version to 11.100.00.00
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:14:10 +0000 (07:44 +0530)]
[SCSI] mpt2sas: Bump driver version to 11.100.00.00

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Rearrange the the code so that the completion queues are initialized...
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:58 +0000 (07:43 +0530)]
[SCSI] mpt2sas: Rearrange the the code so that the completion queues are initialized prior to sending the request to controller firmware

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Do not set sas_device->starget to NULL from the slave_destroy callbac...
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:50 +0000 (07:43 +0530)]
[SCSI] mpt2sas: Do not set sas_device->starget to NULL from the slave_destroy callback when all the LUNS have been deleted

If the sas_device->starget to NULL from slave_destroy callback for LUN=1
even though LUN=0 exist, results in entire target getting deleted.
To resolve the issue, the driver should only set sas_device->starget to
NULL when all the LUNS have been deleted from the slave_destroy.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: MPI next revision header update
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:37 +0000 (07:43 +0530)]
[SCSI] mpt2sas: MPI next revision header update

1) Added product specific range of ImageType macros for the Extended
   Image Header.

2) Added Flags field and related defines to
   MPI2_TOOLBOX_ISTWI_READ_WRITE_REQUEST to support automatic
   reserve/release and page addressing.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Adding support for customer specific branding
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:13:00 +0000 (07:43 +0530)]
[SCSI] mpt2sas: Adding support for customer specific branding

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: When IOs are terminated, update the result to DID_SOFT_ERROR to avoid...
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:12:40 +0000 (07:42 +0530)]
[SCSI] mpt2sas: When IOs are terminated, update the result to DID_SOFT_ERROR to avoid infinite resets

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Better handling DEAD IOC (PCI-E LInk down) error condition
nagalakshmi.nandigama@lsi.com [Thu, 1 Dec 2011 02:12:04 +0000 (07:42 +0530)]
[SCSI] mpt2sas: Better handling DEAD IOC (PCI-E LInk down) error condition

Detection of Dead IOC has been done in fault_reset_work thread.

If IOC Doorbell is 0xFFFFFFFF, it will be detected as non-operation/DEAD IOC.
When a DEAD IOC is detected, the code is modified to remove that IOC and
all its attached devices from OS.
The PCI layer API pci_remove_bus_device() is called to remove the dead IOC.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: _scsih_smart_predicted_fault uses GFP_KERNEL in interrupt context
Anton Blanchard [Mon, 7 Nov 2011 11:05:21 +0000 (22:05 +1100)]
[SCSI] mpt2sas: _scsih_smart_predicted_fault uses GFP_KERNEL in interrupt context

_scsih_smart_predicted_fault is called in an interrupt and therefore
must allocate memory using GFP_ATOMIC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: add missing allocation.
Dan Carpenter [Fri, 4 Nov 2011 18:25:01 +0000 (21:25 +0300)]
[SCSI] mpt2sas: add missing allocation.

There was supposed to be a kzalloc() here and the compiler complained
about it.
mpt2sas_scsih.c: In function ‘mpt2sas_scsih_reset_handler’:
mpt2sas_scsih.c:2807:21: warning: ‘fw_event’ may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Bump driver version to 10.100.00.00
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:54 +0000 (15:37 +0530)]
[SCSI] mpt2sas: Bump driver version to 10.100.00.00

Bump driver vesion to 10.100.00.00

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix for Panic when inactive volume is tried deleting
nagalakshmi.nandigama@lsi.com [Fri, 21 Oct 2011 04:38:07 +0000 (10:08 +0530)]
[SCSI] mpt2sas: Fix for Panic when inactive volume is tried deleting

The driver was setting the action to MPI2_CONFIG_ACTION_PAGE_READ_CURRENT,
which only returns active volumes. In order to get info on inactive volumes,
the driver needs to change the action to
MPI2_RAID_PGAD_FORM_GET_NEXT_CONFIGNUM, and traverse each config till the
iocstatus is MPI2_IOCSTATUS_CONFIG_INVALID_PAGE returned.
Added a change in the driver to remove the instance of
sas_device object when the driver returns "1" from the slave_configure callback.
Also fixed code to report the hot spares to the operating system with a /dev/sg
assigned.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix for issue Port Reset taking long time(around 5 mins) to complete...
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:37 +0000 (15:37 +0530)]
[SCSI] mpt2sas: Fix for issue Port Reset taking long time(around 5 mins) to complete while issued during creating a volume

This is due to the slave_configuration routine is getting called when
host reset is active, and config page reads are failing, and driver
attempts to added device with stale config data.

To fix the issue, added error checking in slave_configure to check
for configuration pages failing, and return "1" so the device  is
not configured.  The config pages are failing if raid volume is
configured while issuing a host reset, thus driver is reading stale
data and proceeding to attempt to add.  The fix is to return error
so the volume is not configured.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix for deadlock between hot plug worker threads and host reset context
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:24 +0000 (15:37 +0530)]
[SCSI] mpt2sas: Fix for deadlock between hot plug worker threads and host reset context

This is due to driver reporting a device missing to the OS then the OS sending
a SYNC_CACHE request to driver while the IO queues are locked due to host reset.

To fix the issue, the driver will be waking up the port enable context
immediately when the driver receives the reply message, instead of waiting
on the hot plug worker threads.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix for dead lock occurring between host_lock and sas_device_lock
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:14 +0000 (15:37 +0530)]
[SCSI] mpt2sas: Fix for dead lock occurring between host_lock and sas_device_lock

Fix for dead lock occurring between host_lock and sas_device_lock.

The deadlock is between two spin locks, between the shost->host_lock
and driver ioc->sas_device_lock.

The fix is to rearrange the code in the  FW/Driver device removal
handshake so the ioc->sas_device_lock is not occurring when the
shost->host_lock is taken.

[jejb: zero initialise sas_address to fix spurious compiler warning]
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix drives not getting properly deleted if sas cable is removed while...
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:07:00 +0000 (15:37 +0530)]
[SCSI] mpt2sas: Fix drives not getting properly deleted if sas cable is removed while host reset is active

The fix is in the driver-firmware handshake device removal code. We
need to read the controller ioc_state to see if controller is OPERATIONAL
prior to sending target reset and OP_REMOVE. Previously it was checking
the flag ioc->shost_recovery flag, which is always set when host reset is
active, thus preventing drives from getting properly deleted.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix failure message displayed during diag reset
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:47 +0000 (15:36 +0530)]
[SCSI] mpt2sas: Fix failure message displayed during diag reset

The fix is to inhibit the warning message in _scsih_get_sas_address
when the MPI2_IOCSTATUS_CONFIG_INVALID_PAGE ioc status is returned.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Fix for system hang when discovery in progress
nagalakshmi.nandigama@lsi.com [Fri, 21 Oct 2011 04:36:33 +0000 (10:06 +0530)]
[SCSI] mpt2sas: Fix for system hang when discovery in progress

Fix for issue : While discovery is in progress, hot unplug and hot plug of
enclosure connected to the controller card is causing system to hang.

When a device is in the process of being detected at driver load time then
if it is removed, the device that is no longer present will not be added
to the list. So the code in _scsih_probe_sas() is rearranged as such so
the devices that failed to be detected are not added to the list.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: New feature - Fast Load Support
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:26 +0000 (15:36 +0530)]
[SCSI] mpt2sas: New feature - Fast Load Support

New feature Fast Load Support.

(1)Asynchronous SCSI scanning: This will allow the drivers to scan
for devices in parallel while other device drivers are loading at
the same time. This will improve the amount of time it takes for the
OS to load.

(2) Reporting Devices while port enable is active: This feature will
allow devices to be reported to OS immediately while port enable is
active. The previous implementation waits for port enable to complete,
and then report devices. This feature is only enabled on IT firmware
configurations when there are no boot device configured in BIOS Configuration
Utility, else the driver will wait till port enable completes reporting
devices. For IR firmware, this feature is turned off. This feature is to
address large SAS topologies (>100 drives) when the boot OS is using onboard
SATA device, in other words, the boot devices is not
connected to our controller.

(3) Scanning for devices after diagnostic reset completes: A new routine
_scsih_scan_start is added. This will scan the expander pages, IR pages,
and sas device pages, then reporting new devices to SCSI Mid layer. It
seems the driver is not supporting adding devices while diagnostic reset
is active. Apparently this is due to the sanity checks on
ioc->shost_recovery flag throughout the context of kernel work thread FIFO,
and the mpt2sas_fw_work.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Manual merge of upstream commit #921cd802.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
13 years ago[SCSI] mpt2sas: MPI next revision header update
nagalakshmi.nandigama@lsi.com [Wed, 19 Oct 2011 10:06:05 +0000 (15:36 +0530)]
[SCSI] mpt2sas: MPI next revision header update

1)Added ProxyVF_ID field to Configuration Request message.
2)Added IO Unit Page 8, IO Unit Page 9,and IO Unit Page 10.
3)Added SASNotifyPrimitiveMasks field to IOC Page 7.
4)Added SAS NOTIFY Primitive event.
5)Added Temperature Threshold Event.
6)Added Host Message Event.
7)Added Send Host Message request and reply.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: take size of pointed value, not pointer
Julia Lawall [Fri, 16 Sep 2011 06:57:34 +0000 (08:57 +0200)]
[SCSI] mpt2sas: take size of pointed value, not pointer

Sizeof a pointer-typed expression returns the size of the pointer, not that
of the pointed data.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression *e;
type T;
identifier f;
@@

f(...,(T)e,...,
-sizeof(e)
+sizeof(*e)
,...)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Bump driver version 09.100.00.01
nagalakshmi.nandigama@lsi.com [Thu, 8 Sep 2011 01:43:35 +0000 (07:13 +0530)]
[SCSI] mpt2sas: Bump driver version 09.100.00.01

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Added NUNA IO support in driver which uses multi-reply queue support...
nagalakshmi.nandigama@lsi.com [Thu, 8 Sep 2011 00:48:50 +0000 (06:18 +0530)]
[SCSI] mpt2sas: Added NUNA IO support in driver which uses multi-reply queue support of the HBA

Support added for controllers capable of multi reply queues.

The following are the modifications to the driver to support NUMA.

1) Create the new structure adapter_reply_queue to contain the reply queue
   info for every msix vector.  This object will contain a
   reply_post_host_index, reply_post_free for each instance, msix_index, among
   other parameters.  We will track all the reply queues on a link list called
   ioc->reply_queue_list. Each reply queue is aligned with each IRQ, and is
   passed to the interrupt via the bus_id parameter.

(2) The driver will figure out the msix_vector_count from the PCIe MSIX
    capabilities register instead of the IOC Facts->MaxMSIxVectors. This is
    because the firmware is not filling in this field until the driver has
    already registered MSIX support.

(3) If the ioc_facts reports that the controller is MSIX compatible in the
    capabilities, then the driver will request for multiple irqs.  This count
    is calculated based on the minimum between the online cpus available and
    the ioc->msix_vector_count.  This count is reported to firmware in the
    ioc_init request.

(4) New routines were added _base_free_irq and _base_request_irq, so
    registering and freeing msix vectors were done thru simple function API.

(5) The new routine _base_assign_reply_queues was added to align the msix
    indexes across cpus. This will initialize the array called
    ioc->cpu_msix_table.  This array is looked up on every MPI request so the
    MSIxIndex is set appropriately.

(6) A new shost sysfs attribute was added to report the reply_queue_count.

(7) User needs to set the affinity cpu mask, so the interrupts occur on the
    same cpu that sent the original request.

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years agoRemove unneeded version.h includes from drivers/scsi/
Jesper Juhl [Mon, 1 Aug 2011 21:27:12 +0000 (23:27 +0200)]
Remove unneeded version.h includes from drivers/scsi/

It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in drivers/scsi/.
This patch removes them.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
13 years ago[SCSI] mpt2sas: Added missing mpt2sas_base_detach call from scsih_remove context
kashyap.desai@lsi.com [Thu, 4 Aug 2011 11:17:50 +0000 (16:47 +0530)]
[SCSI] mpt2sas: Added missing mpt2sas_base_detach call from scsih_remove context

mpt2sas_base_detach() call was removed from _scsih_remove() while
doing some code shuffling.  Mainly when we work on adding code for
scsih_shutdown().  I have added back mpt2sas_base_detach() which will
get callled from _scsih_remove().

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command entry...
Kashyap, Desai [Tue, 5 Jul 2011 07:10:23 +0000 (12:40 +0530)]
[SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command entry in MPI message

Issue:

This issue is seen on LSI H/W WarpDrive SSS6200 When filed direct I/O
is tried as volume I/O the scmd field in internal lookup table get
cleared and because of that the retried volume I/O never gets reported
as completed to SML.

Result:

I/O timeout and Error handling thread will kicking off

Fix:

Setting back the scmd in the lookup table before retrying the failed
direct i/o

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Bump version 09.100.00.00
Kashyap, Desai [Tue, 14 Jun 2011 05:27:51 +0000 (10:57 +0530)]
[SCSI] mpt2sas: Bump version 09.100.00.00

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: fix broadcast AEN and task management issue
Kashyap, Desai [Tue, 14 Jun 2011 05:26:43 +0000 (10:56 +0530)]
[SCSI] mpt2sas: fix broadcast AEN and task management issue

Properly handling of target reset in multi-initiator environment

Clean up in broadcast change handling:
(1) Need to look at the status of each task management request, and retry
    the TM when there are failures.
(2) Need quiescence IO so the driver doesn't take on more IO request while
    it's in the middle of sending TM  request to firmware
(3)  Add support to keep track of how many pending broadcast AEN events
     are received while the broadcast handling is active, then loop back at
     the end of this routine if there were any events received.

Clean up in mpt2sas_scsih_issue_tm routine:
(1) Make sure proper status is returned when host reset fails
(2) Clean up sanity checks near end of routine, insuring all outstanding
    IOs were completed.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas: Set max_sector count from module parameter
Kashyap, Desai [Tue, 14 Jun 2011 05:26:12 +0000 (10:56 +0530)]
[SCSI] mpt2sas: Set max_sector count from module parameter

This feature is to override the default
max_sectors setting at load time, taking max_sectors as an
command line option when loading the driver.  The setting is
currently hard-coded in the driver to 8192 sectors (4MB transfers).
If max_sectors is specified at load time, minimum specified
setting will be 64, and the maximum is 8192.  The driver will
modify the setting to be on even boundary. If max_sectors is not
specified, the driver will default to 8192.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years ago[SCSI] mpt2sas MPI next revision header update
Kashyap, Desai [Tue, 14 Jun 2011 05:25:45 +0000 (10:55 +0530)]
[SCSI] mpt2sas MPI next revision header update

mpt2sas driver revision q header update:

(1) Modified the descriptions of the LocalAddress bit in the
    Flags field of the MPI SGE Format description and the MPI
    Simple Element.
(2) Modified Data Location Address Space bits in the Flags field
    of the IEEE Chain Element.
(3) Added more detail to the description of the DataLength field
    for the SCSI IO Request and Target Assist Request. Removed
    restriction on using chained SGLs when using multicast or
    bidirectional support.
(4) In Manufacturing Page 7, added ReceptacleID field to
    ConnectorInfo, and reworked how the Pinout field is used.
(5) In IO Unit Page 7, added BoardTemperature and
    BoardTemperatureUnits fields.
(6) In IOC Page 1, changed CoalescingTimeout to units of
    half-microsecond and updated descriptions.
(7) Modified descriptions of SATASlumberTimeout and
    SASSlumberTimeout fields in SAS IO Unit Page 5 to indicate
    the timers start after partial mode is entered.
(8) Added Extended Manufacturing configuration pages.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
13 years agoMerge branch 'uek-2.6.39-stable' of git://ca-git.us.oracle.com/linux-joejin-public...
Guru Anbalagane [Fri, 16 Dec 2011 00:02:39 +0000 (16:02 -0800)]
Merge branch 'uek-2.6.39-stable' of git://ca-git.us.oracle.com/linux-joejin-public into uek2-stable

13 years agoMerge branch 'directio' of git://ca-git.us.oracle.com/linux-dkleikam-public into...
Guru Anbalagane [Thu, 15 Dec 2011 22:25:25 +0000 (14:25 -0800)]
Merge branch 'directio' of git://ca-git.us.oracle.com/linux-dkleikam-public into uek2-stable

13 years agoMerge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable
Guru Anbalagane [Thu, 15 Dec 2011 22:12:13 +0000 (14:12 -0800)]
Merge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

13 years agoMerge branches 'stable/pci.fixes-3.2' and 'stable/e820-3.2.rebased' into uek2-merge
Konrad Rzeszutek Wilk [Thu, 15 Dec 2011 21:25:32 +0000 (16:25 -0500)]
Merge branches 'stable/pci.fixes-3.2' and 'stable/e820-3.2.rebased' into uek2-merge

* stable/pci.fixes-3.2:
  xen/swiotlb: Use page alignment for early buffer allocation.

* stable/e820-3.2.rebased:
  xen: only limit memory map to maximum reservation for domain 0.

Conflicts:
arch/x86/xen/setup.c

13 years agoxen/swiotlb: Use page alignment for early buffer allocation.
Konrad Rzeszutek Wilk [Thu, 15 Dec 2011 16:28:46 +0000 (11:28 -0500)]
xen/swiotlb: Use page alignment for early buffer allocation.

This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:

ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO

The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid  which was to piggyback on git commit
e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment
for early buffer allocation" which:

We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignment.

So alloc them with page alignment at first, to avoid lose two pages

And doing that fixes the outstanding issue.

CC: stable@kernel.org
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen: only limit memory map to maximum reservation for domain 0.
Ian Campbell [Wed, 14 Dec 2011 12:16:08 +0000 (12:16 +0000)]
xen: only limit memory map to maximum reservation for domain 0.

d312ae878b6a "xen: use maximum reservation to limit amount of usable RAM"
clamped the total amount of RAM to the current maximum reservation. This is
correct for dom0 but is not correct for guest domains. In order to boot a guest
"pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
future memory expansion the guest must derive max_pfn from the e820 provided by
the toolstack and not the current maximum reservation (which can reflect only
the current maximum, not the guest lifetime max). The existing algorithm
already behaves this correctly if we do not artificially limit the maximum
number of pages for the guest case.

For a guest booted with maxmem=512, memory=128 this results in:
 [    0.000000] BIOS-provided physical RAM map:
 [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
 [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
-[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
-[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
+[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
...
 [    0.000000] NX (Execute Disable) protection: active
 [    0.000000] DMI not present or invalid.
 [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
 [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
-[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
+[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
 [    0.000000] initial memory mapped : 0 - 027ff000
 [    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
-[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
-[    0.000000]  0000000000 - 0008100000 page 4k
-[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
+[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
+[    0.000000]  0000000000 - 0020800000 page 4k
+[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
 [    0.000000] xen: setting RW the range 27e8000 - 27ff000
 [    0.000000] 0MB HIGHMEM available.
-[    0.000000] 129MB LOWMEM available.
-[    0.000000]   mapped low ram: 0 - 08100000
-[    0.000000]   low ram: 0 - 08100000
+[    0.000000] 520MB LOWMEM available.
+[    0.000000]   mapped low ram: 0 - 20800000
+[    0.000000]   low ram: 0 - 20800000

With this change "xl mem-set <domain> 512M" will successfully increase the
guest RAM (by reducing the balloon).

There is no change for dom0.

Reported-and-Tested-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen: Enable CONFIG_XEN_WDT so that we can reboot the box in case the dom0 is hanged.
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 21:08:52 +0000 (16:08 -0500)]
xen: Enable CONFIG_XEN_WDT so that we can reboot the box in case the dom0 is hanged.

It does require the generic watchdog RPM to be installed and used.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoMerge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable
Guru Anbalagane [Thu, 15 Dec 2011 06:11:42 +0000 (22:11 -0800)]
Merge branch 'uek2-merge' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

13 years ago[firmware] radeon: Add License for raedon firmware files
Joe Jin [Thu, 15 Dec 2011 02:57:08 +0000 (10:57 +0800)]
[firmware] radeon: Add License for raedon firmware files

commit 40d1bb513f02ba9c519228c81f948407cc5f395b added missed firmware
from linux-firmware.git but missing LICENSE file.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agonetxen: Upgrade netxen_nic driver to v4.0.77
Joe Jin [Thu, 15 Dec 2011 02:04:11 +0000 (10:04 +0800)]
netxen: Upgrade netxen_nic driver to v4.0.77

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: adding loopback support
Joe Jin [Thu, 15 Dec 2011 02:39:33 +0000 (10:39 +0800)]
mlx4_en: adding loopback support

commit 60d6fe99e4a507f77b63c090eb8aacb67e21687a
Author: Amir Vadai <amirv@mellanox.co.il>
Date:   Sat Nov 26 19:55:19 2011 +0000

    net/mlx4_en: adding loopback support

    Device must be in promiscuous mode or DMAC must be same as the host MAC, or
    else packet will be dropped by the HW rx filtering.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: fix WOL handlers were always looking at port2 capability bit
Joe Jin [Thu, 15 Dec 2011 01:37:32 +0000 (09:37 +0800)]
mlx4_en: fix WOL handlers were always looking at port2 capability bit

commit 559a9f1d354b577af28f84181751820ff7d29feb
Author: Oren Duer <oren@mellanox.co.il>
Date:   Sat Nov 26 19:55:15 2011 +0000

    net/mlx4_en: fix WOL handlers were always looking at port2 capability bit

    There are 2 capability bits for WOL, one for each port.
    WOL handlers were looking only on the second bit, regardless of the port.

Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: using non collapsed CQ on TX
Joe Jin [Thu, 15 Dec 2011 01:37:11 +0000 (09:37 +0800)]
mlx4_en: using non collapsed CQ on TX

commit f0ab34f011d805ce5b1a341409c9c26f0fc8252b
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Sat Nov 26 19:55:10 2011 +0000

    net/mlx4_en: using non collapsed CQ on TX

    Moving to regular Completion Queue implementation (not collapsed)
    Completion for each transmitted packet is written to new entry.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Remove FCS bytes from packet length.
Joe Jin [Thu, 15 Dec 2011 01:36:53 +0000 (09:36 +0800)]
mlx4_en: Remove FCS bytes from packet length.

commit 4a5f4dd8595a3d3cdf75db7247b644ae37f5d460
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Mon Nov 14 14:25:36 2011 -0500

    mlx4_en: Remove FCS bytes from packet length.

    When HW doesn't remove FCS bytes they are reported in the completion
    byte count, we don't need to take them to skb.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: fix skb truesize underestimation
Joe Jin [Thu, 15 Dec 2011 01:36:34 +0000 (09:36 +0800)]
mlx4_en: fix skb truesize underestimation

commit 90278c9ffb8a92672d60a618a58a99e2370a98ac
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Oct 19 18:49:52 2011 +0000

    mlx4_en: fix skb truesize underestimation

    skb->truesize must account for allocated memory, not the used part of
    it. Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Updating driver version
Joe Jin [Thu, 15 Dec 2011 01:36:20 +0000 (09:36 +0800)]
mlx4_en: Updating driver version

commit 1e5c22cde3b85737921d3ec6ecf2c356e5b64ea7
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:51:36 2011 +0000

    mlx4_en: Updating driver version

    Driver version updated to 1.5.4.2

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Adding rxhash support
Joe Jin [Thu, 15 Dec 2011 01:36:01 +0000 (09:36 +0800)]
mlx4_en: Adding rxhash support

commit ad86107f7ba38f36597d6cfe9ed2ddfd2c88aee9
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:51:24 2011 +0000

    mlx4_en: Adding rxhash support

    Moving to Toeplitz function in RSS calculation.
    Reporting rxhash in skb.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Recording rx queue for gro packets
Joe Jin [Thu, 15 Dec 2011 01:35:44 +0000 (09:35 +0800)]
mlx4_en: Recording rx queue for gro packets

commit 3b61008d88b55467cffee5db9b1e0cf32edbe8ac
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:51:09 2011 +0000

    mlx4_en: Recording rx queue for gro packets

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Checksum counters per ring
Joe Jin [Thu, 15 Dec 2011 01:35:20 +0000 (09:35 +0800)]
mlx4_en: Checksum counters per ring

commit ad04378cecca9c33b5ea3e46aa4ed71b15e0be0c
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:50:56 2011 +0000

    mlx4_en: Checksum counters per ring

    Not updating common counters from data path.
    The checksum counters are per ring, summarizing them when collecting statistics.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Controlling FCS header removal
Joe Jin [Thu, 15 Dec 2011 01:35:00 +0000 (09:35 +0800)]
mlx4_en: Controlling FCS header removal

commit f3a9d1f25dfeadf22c775880633a587cc6778872
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:50:42 2011 +0000

    mlx4_en: Controlling FCS header removal

    Canceling FCS removal where FW allows for better alignment
    of incoming data.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4: Fix vlan table overflow
Joe Jin [Thu, 15 Dec 2011 01:34:37 +0000 (09:34 +0800)]
mlx4: Fix vlan table overflow

commit e72ebf5a578464204c8418d7d9b375333bb33161
Author: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date:   Tue Oct 18 01:50:29 2011 +0000

    mlx4: Fix vlan table overflow

    Prevent overflow when trying to register more Vlans then the Vlan table in
    HW is configured to.
    Need to take into acount that the first 2 entries are reserved.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Adding 40gb speed report for ethtool
Joe Jin [Thu, 15 Dec 2011 01:33:49 +0000 (09:33 +0800)]
mlx4_en: Adding 40gb speed report for ethtool

commit f0ec7177e239ed94a398a6c70b38530ff1393cb7
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:29:42 2011 +0000

    mlx4_en: Adding 40gb speed report for ethtool

    Query port will now identify a 40G Ethernet speed.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Fix crash upon device initialization error
Joe Jin [Thu, 15 Dec 2011 01:32:14 +0000 (09:32 +0800)]
mlx4_en: Fix crash upon device initialization error

commit 4234144f5ca69a0a13d5adae6c94b6937c52541f
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:29:35 2011 +0000

    mlx4_en: Fix crash upon device initialization error

    Netdevice was being freed without being unregistered first if
    mlx4_SET_PORT_general or mlx4_INIT_PORT failed.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Fix QP number calculation according to module param
Joe Jin [Thu, 15 Dec 2011 01:31:36 +0000 (09:31 +0800)]
mlx4_en: Fix QP number calculation according to module param

commit 999bb4b3831abd6ad53023a0b8e5d304875927dd
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:29:26 2011 +0000

    mlx4_en: Fix QP number calculation according to module param

    Number of bits taken from mac table index in QP
    calculation should be based on log_num_mac parameter.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Added missing iounmap upon releasing a device
Joe Jin [Thu, 15 Dec 2011 01:30:46 +0000 (09:30 +0800)]
mlx4_en: Added missing iounmap upon releasing a device

commit 7398af403f621418fa05c6936cac34aa06b5a758
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:27:11 2011 +0000

    mlx4_en: Added missing iounmap upon releasing a device

    Fixed a memory leak caused by missing iounmap when device
    is being released.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Adjusting moderation per each ring
Joe Jin [Thu, 15 Dec 2011 01:30:04 +0000 (09:30 +0800)]
mlx4_en: Adjusting moderation per each ring

commit 6b4d8d9fd1acb9ff230810793b363dbdb267b892
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:38:23 2011 +0000

    mlx4_en: Adjusting moderation per each ring

    Moderation is now done per ring and coalescing is enabled
    by set_ring_param in ethtool.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Removing reserve vectors
Joe Jin [Thu, 15 Dec 2011 01:27:23 +0000 (09:27 +0800)]
mlx4_en: Removing reserve vectors

commit fe0af03c69abc2178fc4667664726ec1f688539b
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:26:46 2011 +0000

    mlx4_en: Removing reserve vectors

    Fixed a bug where ring size change caused insufficient memory
    upon driver restart due to unreleased EQs.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_en: Assigning TX irq per ring
Joe Jin [Thu, 15 Dec 2011 01:26:31 +0000 (09:26 +0800)]
mlx4_en: Assigning TX irq per ring

commit 76532d0c7e7424914ab6f24683c63e50f0a08f1c
Author: Alexander Guller <alexg@mellanox.com>
Date:   Sun Oct 9 05:26:31 2011 +0000

    mlx4_en: Assigning TX irq per ring

    Until now only RX rings used irq per ring
    and TX used only one per port.
    >From now on, both of them will use the
    irq per ring while RX & TX ring[i] will
    use the same irq.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_core: Clean up error flow in mlx4_register_mac()
Roland Dreier [Thu, 6 Oct 2011 16:33:11 +0000 (09:33 -0700)]
mlx4_core: Clean up error flow in mlx4_register_mac()

Fix a leak of entry if radix_tree_insert() fails.

Also, reduce the indentation and make the flow easier to read by
sticking to the conventional

    err = do_something();
    if (err)
            return err;

    err = do_another();
    if (err)
            return err;

rather than mixing the direction of the test as

    err = do_something();
    if (!err) {
            err = do_another();
            if (err)
                    return err;
    } else
            return err;

Signed-off-by: Roland Dreier <roland@purestorage.com>
(cherry picked from commit 0f6740c7c455693f719580f34bb8afa8a298ea36)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4: decreasing ref count when removing mac
Yevgeny Petrilin [Thu, 4 Aug 2011 01:05:12 +0000 (01:05 +0000)]
mlx4: decreasing ref count when removing mac

For older FW versions, when a Mac address removed from Mac table,
we should set 0 for reference count for the corresponding Mac index.
Fixes a bug where removing Mac from the table still left that entry as
invalid.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Tested-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 20e72a44098641f0c4de34a31287a93e006afb5b)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4: Fixing Ethernet unicast packet steering
Yevgeny Petrilin [Wed, 3 Aug 2011 23:38:59 +0000 (16:38 -0700)]
mlx4: Fixing Ethernet unicast packet steering

For older FW versions, fixing the usage of per port Mac table.
For each port we must define the base QP number, which is passed
to the HW.
Setting the correct value in SET_PORT FW command to enable the steering.

Reported-by: Roland Dreier <roland@purestorage.com>
Tested-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 06fa0a883a01a34a0449ec116c5288c1d196b4b0)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4: do vlan cleanup
Jiri Pirko [Wed, 20 Jul 2011 04:54:22 +0000 (04:54 +0000)]
mlx4: do vlan cleanup

- unify vlan and nonvlan path
- kill priv->vlgrp and mlx4_en_vlan_rx_register

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f1b553fbe73bfad38f41269d1c7a7ce3176d9539)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_core: Read extended capabilities into the flags field
Or Gerlitz [Thu, 7 Jul 2011 19:19:29 +0000 (19:19 +0000)]
mlx4_core: Read extended capabilities into the flags field

Query another dword containing up to 32 extended device capabilities
and merge it into struct mlx4_caps.flags.  Update the code that
handles the current extended device capabilities (e.g UDP RSS, WoL,
vep steering, etc) to use the extended device cap flags field instead
of a field per extended capability.  Initial patch done by Eli Cohen
<eli@mellanox.co.il>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
(cherry picked from commit ccf863219675aa86bebdd6a2806acb8176478e37)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4_core: Extend capability flags to 64 bits
Or Gerlitz [Wed, 15 Jun 2011 14:41:42 +0000 (14:41 +0000)]
mlx4_core: Extend capability flags to 64 bits

The latest firmware adds a second dword containing more device flags,
so extend the device capabilities flags field from 32 to 64 bits.
Derived from patch by Eli Cohen <eli@mellanox.co.il>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
(cherry picked from commit 52eafc68d601afd699b023201b0c6be5209f39ce)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agomlx4: use pci_dev->revision
Sergei Shtylyov [Thu, 23 Jun 2011 04:44:30 +0000 (04:44 +0000)]
mlx4: use pci_dev->revision

Commit 725c89997e03d71b09ea3c17c997da0712b9d835 (mlx4_en: Reporting HW revision
in ethtool -i) added code to read the revision ID from the PCI configuration
register while it's already stored by PCI subsystem in the 'revision' field of
'struct pci_dev'...

While at it, move the code being changed a bit in order to not break the
initialization sequence.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aca7a3acb19a7a4b1084f6f2411f6eaf52dd79c0)

Signed-off-by: Joe Jin <joe.jin@oracle.com>
13 years agoAIO: Don't plug the I/O queue in do_io_submit()
Dave Kleikamp [Tue, 13 Dec 2011 19:49:16 +0000 (13:49 -0600)]
AIO: Don't plug the I/O queue in do_io_submit()

Asynchronous I/O latency to a solid-state disk greatly increased
between the 2.6.32 and 3.0 kernels. By removing the plug from
do_io_submit(), we observed a 34% improvement in the I/O latency.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
13 years agoMerge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:34:43 +0000 (12:34 -0500)]
Merge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge

* stable/bug.fixes-3.3.rebased:
  Revert "xen/pm_idle: Make pm_idle be default_idle under Xen."

13 years agoRevert "xen/pm_idle: Make pm_idle be default_idle under Xen."
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:34:15 +0000 (12:34 -0500)]
Revert "xen/pm_idle: Make pm_idle be default_idle under Xen."

as it is already such in kernels that are 3.0 or earlier.
This reverts commit 9964aedb7350736b1f7a799d57ee92bbf4b99ea6.

13 years agoMerge branch 'stable/acpi-cpufreq.v3.rebased' into uek2-merge
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 17:09:34 +0000 (12:09 -0500)]
Merge branch 'stable/acpi-cpufreq.v3.rebased' into uek2-merge

.. which is not yet upstream, albeit it has been posted:
https://lkml.org/lkml/2011/11/30/245

but it still needs guidance from the ACPI maintainers - but they are right
now busy with the ACPI v5.0 so for the time being carrying this patch
out of the tree.

In the future we will have to revert this and insert the one that is in
the upstream kernel.

* stable/acpi-cpufreq.v3.rebased:
  ACPI: xen processor: set ignore_ppc to handle PPC event for Xen vcpu.
  ACPI: xen processor: add PM notification interfaces.
  ACPI: processor: override the interface of register acpi processor handler for Xen vcpu
  ACPI: add processor driver for Xen virtual CPUs.
  ACPI: processor: add __acpi_processor_[un]register_driver helpers.
  ACPI: processor: cache acpi_power_register in cx structure
  ACPI: processor: Don't setup cpu idle handler when we do not want them.
  ACPI: processor: export necessary interfaces
  xen/acpi: Domain0 acpi parser related platform hypercall

Conflicts:
drivers/xen/Makefile

13 years agoACPI: xen processor: set ignore_ppc to handle PPC event for Xen vcpu.
Kevin Tian [Wed, 19 Oct 2011 10:37:18 +0000 (18:37 +0800)]
ACPI: xen processor: set ignore_ppc to handle PPC event for Xen vcpu.

Xen acpi processor does not CPUFREQ_START, hence we we need to set
ignore_ppc to handle PPC events.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: xen processor: add PM notification interfaces.
Kevin Tian [Wed, 19 Oct 2011 10:36:39 +0000 (18:36 +0800)]
ACPI: xen processor: add PM notification interfaces.

Since cpu power is controlled by VMM in Xen, to provide
that information to the VMM, we have to use hypercall to exchange
power management state between domain with hypervisor.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: processor: override the interface of register acpi processor handler for Xen...
Tang Liang [Wed, 19 Oct 2011 10:33:46 +0000 (18:33 +0800)]
ACPI: processor: override the interface of register acpi processor handler for Xen vcpu

This patch calls the check which detectes whether to override
the interface to register ACPI processor.

Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: add processor driver for Xen virtual CPUs.
Kevin Tian [Wed, 19 Oct 2011 10:16:51 +0000 (18:16 +0800)]
ACPI: add processor driver for Xen virtual CPUs.

Because the processor is controlled by the VMM in xen,
we need new acpi processor driver for Xen virtual CPU.

Specifically we need to be able to pass the CXX/PXX states
to the hypervisor, and as well deal with the peculiarity
that the amount of CPUs that Linux parses in the ACPI
is different from the amount visible to the Linux kernel.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:

drivers/xen/Makefile
include/xen/acpi.h

13 years agoACPI: processor: add __acpi_processor_[un]register_driver helpers.
Tang Liang [Wed, 19 Oct 2011 09:01:20 +0000 (17:01 +0800)]
ACPI: processor: add __acpi_processor_[un]register_driver helpers.

This patch implement __acpi_processor_[un]register_driver helper,
so we can registry override processor driver function. Specifically
the Xen processor driver.

By default the values are set to the native one.

Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: processor: cache acpi_power_register in cx structure
Kevin Tian [Wed, 19 Oct 2011 08:51:51 +0000 (16:51 +0800)]
ACPI: processor: cache acpi_power_register in cx structure

This patch save acpi_power_register in cx structure because we need
pass this to the Xen ACPI processor driver.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: processor: Don't setup cpu idle handler when we do not want them.
Kevin Tian [Wed, 19 Oct 2011 08:47:51 +0000 (16:47 +0800)]
ACPI: processor: Don't setup cpu idle handler when we do not want them.

This patch inhibits processing of the CPU idle handler if it is not
set to the appropiate one. This is needed by the Xen processor driver
which, while still needing processor details, wants to use the default_idle
call (which makes a yield hypercall).

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoACPI: processor: export necessary interfaces
Kevin Tian [Wed, 19 Oct 2011 08:39:37 +0000 (16:39 +0800)]
ACPI: processor: export necessary interfaces

This patch export some necessary functions which parse processor
power management information. The Xen ACPI processor driver uses them.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/acpi: Domain0 acpi parser related platform hypercall
Yu Ke [Wed, 24 Mar 2010 18:01:13 +0000 (11:01 -0700)]
xen/acpi: Domain0 acpi parser related platform hypercall

This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI info to hypervisor.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Added DEFINE_GUEST.. in appropiate headers]
[v2: Ripped out typedefs]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoMerge branch 'stable/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad...
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:27:08 +0000 (11:27 -0500)]
Merge branch 'stable/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-merge

Which adds the microcode code support. It is not upstream
and probably won't be as the upstream as the x86 maintainers want to
load the microcode blob (in a new format) as part of the GRUB loader:
[http://lists.xen.org/archives/html/xen-devel/2011-12/msg00250.html]

Jan Beulich implemented a patchset for Xen hypervisor which would do this
as part of the mboot loader and define which payload using 'ucode=<number>'.
[http://lists.xen.org/archives/html/xen-devel/2011-12/msg00007.html]
but that is not what the x86 maintainers want to do (as he did not define
a new format and just ingested the raw binary blob). There is also
a feature: "[PATCH] x86/microcode: Allow "ucode=" argument to be negative"
which will pick the microcode as the last payload.

For the time being lets use this old driver that loads the microcode
in the dom0 and pushes it up to the hypervisor - and let the x86 and xen
folks sort this out.

* 'stable/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86/microcode: check proper return code.
  xen/v86d: Fix /dev/mem to access memory below 1MB
  xen: add CPU microcode update driver
  xen: add dom0_op hypercall
  xen/acpi: Domain0 acpi parser related platform hypercall

Conflicts:
arch/x86/xen/Kconfig

13 years agoMerge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:15:33 +0000 (11:15 -0500)]
Merge branch 'stable/bug.fixes-3.3.rebased' into uek2-merge

* stable/bug.fixes-3.3.rebased:
  x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test
  x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.
  xen/pm_idle: Make pm_idle be default_idle under Xen.

13 years agoMerge branches 'stable/xen-block.rebase' and 'stable/vmalloc-3.2.rebased' into uek2...
Konrad Rzeszutek Wilk [Tue, 13 Dec 2011 16:15:27 +0000 (11:15 -0500)]
Merge branches 'stable/xen-block.rebase' and 'stable/vmalloc-3.2.rebased' into uek2-merge

* stable/xen-block.rebase:
  xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
  block: xen-blkback: use API provided by xenbus module to map rings
  xen-blkback: convert hole punching to discard request on loop devices
  xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io
  xen/blk[front|back]: Enhance discard support with secure erasing support.
  xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard together

* stable/vmalloc-3.2.rebased:
  xen: map foreign pages for shared rings by updating the PTEs directly
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()

13 years agoxen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
Joe Jin [Mon, 15 Aug 2011 04:51:31 +0000 (12:51 +0800)]
xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.

When do block-attach/block-detach test with below steps, umount hangs
in the guest. Furthermore shutdown ends up being stuck when umounting file-systems.

1. start guest.
2. attach new block device by xm block-attach in Dom0.
3. mount new disk in guest.
4. execute xm block-detach to detach the block device in dom0 until timeout
5. Any request to the disk will hung.

Root cause:
This issue is caused when setting backend device's state to
'XenbusStateClosing', which sends to the frontend the XenbusStateClosing
notification. When frontend receives the notification it tries to release
the disk in blkfront_closing(), but at that moment the disk is still in use
by guest, so frontend refuses to close. Specifically it sets the disk state to
XenbusStateClosing and sends the notification to backend - when backend receives the
event, it disconnects the vbd from real device, and sets the vbd device state to
XenbusStateClosing. The backend disconnects the real device/file, and any IO
requests to the disk in guest will end up in ether, leaving disk DEAD and set to
XenbusStateClosing. When the guest wants to disconnect the disk, umount will
hang on blkif_release()->xlvbd_release_gendisk() as it is unable to send any IO
to the disk, which prevents clean system shutdown.

Solution:
Don't disconnect backend until frontend state switched to XenbusStateClosed.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Daniel Stodden <daniel.stodden@citrix.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
[v1: Modified description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agox86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test
Konrad Rzeszutek Wilk [Fri, 4 Nov 2011 17:18:15 +0000 (13:18 -0400)]
x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test

For details refer to patch "x86/paravirt: Use pte_attrs instead of
pte_flags on CPA/set_p.._wb/wc operations." which explains that
some pages have the _PAGE_PWT bit set in the _PAGE_PSE field
when running under Xen.

When pageattr_test is running it uses pte_flags to check whether
it succedded in setting _PAGE_UNUSED1 bit, but also whether the
page had _PAGE_PSE. This can happen when one of the randomly selected
pages to be tested is a page that has been set to be _PAGE_WC
as under Xen, that field is under _PAGE_PSE. Since the 'pte_huge'
call is using the pte_flags(x) macro, which extracts the "raw" contents
of the PTE, the translation of _PAGE_PSE -> _PAGE_PWT does not happen
and we incorrectly identify the PTE as bad.

Using the 'pte_val' instead of 'pte_flags' fixes the problem and
this patch does that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@kernel.org
13 years agox86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.
Konrad Rzeszutek Wilk [Fri, 4 Nov 2011 15:59:34 +0000 (11:59 -0400)]
x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

When using the paravirt interface, most of the page operations are wrapped
in the pvops interface. The one that is not is the pte_flags. The reason
being that for most cases, the "raw" PTE flag values for baremetal and whatever
pvops platform is running (in this case) - share the same bit meaning.

Except for PAT. Under Linux, the PAT MSR is written to be:

          PAT4                 PAT0
+---+----+----+----+-----+----+----+
 WC | WC | WB | UC | UC- | WC | WB |  <= Linux
+---+----+----+----+-----+----+----+
 WC | WT | WB | UC | UC- | WT | WB |  <= BIOS
+---+----+----+----+-----+----+----+
 WC | WP | WC | UC | UC- | WT | WB |  <= Xen
+---+----+----+----+-----+----+----+

The lookup of this index table translates to looking up
Bit 7, Bit 4, and Bit 3 of PTE:

 PAT/PSE (bit 7) ... PCD (bit 4) .. PWT (bit 3).

If all bits are off, then we are using PAT0. If bit 3 turned on,
then we are using PAT1, if bit 3 and bit 4, then PAT2..

Back to the PAT MSR table:

As you can see, the PAT1 translates to PAT4 under Xen. Under Linux
we only use PAT0, PAT1, and PAT2 for the caching as:

 WB = none (so PAT0)
 WC = PWT (bit 3 on)
 UC = PWT | PCD (bit 3 and 4 are on).

But to make it work with Xen, we end up doing for WC a translation:

 PWT (so bit 3 on) --> PAT (so bit 7 is on) and clear bit 3

And to translate back (when the paravirt pte_val is used) we would:

 PAT (bit 7 on) --> PWT (bit 3 on) and clear bit 7.

This works quite well, except if code uses the pte_flags, as pte_flags
reads the raw value and does not go through the paravirt. Which means
that if (when running under Xen):

 1) we allocate some pages.
 2) call set_pages_array_wc, which ends up calling:
     __page_change_att_set_clr(.., __pgprot(__PAGE_WC),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE flags and _only_ look at the
    _PTE_FLAG_MASK contents with __PAGE_MASK cleared (0x18) and
    __PAGE_WC (0x8) set.

     read raw *pte -> 0x67
     *pte = 0x67 & ^0x18 | 0x8
     *pte = 0x67 & 0xfffffe7 | 0x8
     *pte = 0x6f

   [now set_pte_atomic is called, and 0x6f is written in, but under
    xen_make_pte, the bit 3 is translated to bit 7, so it ends up
    writting 0xa7, which is correct]

 3) do something to them.
 4) call set_pages_array_wb
     __page_change_att_set_clr(.., __pgprot(__PAGE_WB),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE and _only_ look at the
    _PTE_FLAG_MASK contents with _PAGE_MASK cleared (0x18) and
    __PAGE_WB (0x0) set:

     read raw *pte -> 0xa7
     *pte = 0xa7 & &0x18 | 0
     *pte = 0xa7 & 0xfffffe7 | 0
     *pte = 0xa7

   [we check whether the old PTE is different from the new one

    if (pte_val(old_pte) != pte_val(new_pte)) {
        set_pte_atomic(kpte, new_pte);
        ...

   and find out that 0xA7 == 0xA7 so we do not write the new PTE value in]

   End result is that we failed at removing the WC caching bit!

 5) free them.
   [and have pages with PAT4 (bit 7) set, so other subsystems end up using
    the pages that have the write combined bit set resulting in crashes. Yikes!].

The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
code with newly introduced pte_attrs which can go through the pvops interface
to get the "emulated" value instead of the raw. Naturally if CONFIG_PARAVIRT is
not set, it would end calling native_pte_val.

The other way to fix this is by wrapping pte_flags and go through the pvops
interface and it really is the Right Thing to do.  The problem is, that past
experience with mprotect stuff demonstrates that it be really expensive in inner
loops, and pte_flags() is used in some very perf-critical areas.

Example code to run this and see the various mysterious subsystems/applications
crashing

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>");
MODULE_DESCRIPTION("wb_to_wc_and_back");
MODULE_LICENSE("GPL");
MODULE_VERSION(WB_TO_WC);

static int thread(void *arg)
{
struct page *a[MAX_PAGES];
unsigned int i, j;
do {
for (j = 0, i = 0;i < MAX_PAGES; i++, j++) {
a[i] = alloc_page(GFP_KERNEL);
if (!a[i])
break;
}
set_pages_array_wc(a, j);
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout_interruptible(HZ);
for (i = 0; i < j; i++) {
unsigned long *addr = page_address(a[i]);
if (addr) {
memset(addr, 0xc2, PAGE_SIZE);
}
}
set_pages_array_wb(a, j);
for (i = 0; i< MAX_PAGES; i++) {
if (a[i])
__free_page(a[i]);
a[i] = NULL;
}
} while (!kthread_should_stop());
return 0;
}
static struct task_struct *t;
static int __init wb_to_wc_init(void)
{
t = kthread_run(thread, NULL, "wb_to_wc_and_back");
return 0;
}
static void __exit wb_to_wc_exit(void)
{
if (t)
kthread_stop(t);
}
module_init(wb_to_wc_init);
module_exit(wb_to_wc_exit);

This fixes RH BZ #742032
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Tom Goetz <tom.goetz@virtualcomputer.com>
CC: stable@kernel.org
13 years agoxen/pm_idle: Make pm_idle be default_idle under Xen.
Konrad Rzeszutek Wilk [Mon, 21 Nov 2011 23:02:02 +0000 (18:02 -0500)]
xen/pm_idle: Make pm_idle be default_idle under Xen.

The idea behind commit d91ee5863b71 ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoxen: map foreign pages for shared rings by updating the PTEs directly
David Vrabel [Thu, 29 Sep 2011 15:53:32 +0000 (16:53 +0100)]
xen: map foreign pages for shared rings by updating the PTEs directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
13 years agonet: xen-netback: use API provided by xenbus module to map rings
David Vrabel [Thu, 29 Sep 2011 15:53:31 +0000 (16:53 +0100)]
net: xen-netback: use API provided by xenbus module to map rings

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the Tx and Rx ring pages
granted by the frontend.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoblock: xen-blkback: use API provided by xenbus module to map rings
David Vrabel [Thu, 29 Sep 2011 15:53:30 +0000 (16:53 +0100)]
block: xen-blkback: use API provided by xenbus module to map rings

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the ring pages granted by
the frontend.

Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>