Arun Easi [Tue, 19 Jun 2012 23:56:27 +0000 (16:56 -0700)]
qla2xxx: Fix for continuous rescan attempts in arbitrated loop topology.
Stale information in the temporary fcport created in
qla2x00_configure_local_loop() causes qla2x00_get_port_database() call
to fail. This reschedules scan, which gets stuck continuously in the
rescheduling-of-scan loop due to the failure.
Chad Dupuis [Thu, 19 Jul 2012 09:13:50 +0000 (14:43 +0530)]
qla2xxx: Use bitmap to store loop_id's for fcports.
Store used fcport loop_id's in a bitmap so that as opposed to looping through
all fcports to find the next free loop_id, new loop_id lookup can be just be
done via bitops.
Joe Carnuccio [Wed, 18 Apr 2012 20:19:06 +0000 (13:19 -0700)]
qla2xxx: Add I2C BSG interface.
Add BSG interface to generically access I2C attached devices.
The transferred data limitations:
- the address must be on an even byte boundary,
- the chunk size must be no more than 64 bytes,
(these being limitations of the firmware).
So, to transfer more than 64 bytes, the caller must chunk up
the data into 64 byte chunks and perform multiple accesses.
The caller is responsible for setting the device address,
the data offset, the option bits, and the data length.
Giridhar Malavali [Thu, 29 Mar 2012 20:14:09 +0000 (13:14 -0700)]
qla2xxx: Proper completion to scsi-ml for scsi status task_set_full and busy.
In case of firmmware detected under-run condition and scsi status of task_set_full
or busy_condition, DID_OK is returned to scsi-ml for faster recovery.
Chetan Loke [Thu, 16 Feb 2012 19:34:56 +0000 (13:34 -0600)]
qla2xxx: Micro optimization in queuecommand handler
Optimized queuecommand handler's to eliminate double head-room checks.
The checks are moved inside the 1st if-loop otherwise you would end up checking twice when there is
enough head room.
Add an extra layer of logging granularity for messages that are necessary in
some circumstances but may flood the kernel log buffer with too many messages
otherwise.
Giridhar Malavali [Fri, 27 Jan 2012 15:09:15 +0000 (09:09 -0600)]
qla2xxx: Block flash access from application when device is initialized for ISP82xx.
This could lead to CRB initialization failures or as fail to capture minidump
data. So access to flash needs to be avoided when device is doing reset for
ISP82xx.
Chad Dupuis [Tue, 6 Dec 2011 19:53:04 +0000 (14:53 -0500)]
qla2xxx: Handle interrupt registration failures more gracefully.
If interrupt registration failed we could crash the machine as we were trying
to deference some pointers which weren't allocated yet. Move the allocation
a little earlier and make some checks to the free resource code to make sure
that we don't try to free a resource that was never allocated.
Andre Przywara [Tue, 29 May 2012 11:07:31 +0000 (13:07 +0200)]
xen/setup: filter APERFMPERF cpuid feature out
Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs
This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.
[upstream git commit 5e626254206a709c6e937f3dda69bf26c7344f6f] Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge
* stable/for-linus-3.6.rebased:
xen PVonHVM: move shared_info to MMIO before kexec
xen: simplify init_hvm_pv_info
xen: remove cast from HYPERVISOR_shared_info assignment
xen: enable platform-pci only in a Xen guest
xen/pv-on-hvm kexec: shutdown watches from old kernel
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
xen/hvc: Fix up checks when the info is allocated.
xen/mm: zero PTEs for non-present MFNs in the initial page table
xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
xen/x86: add desc_equal() to compare GDT descriptors
x86/xen: avoid updating TLS descriptors if they haven't changed
xen: populate correct number of pages when across mem boundary (v2)
xen/mce: add .poll method for mcelog device driver
Olaf Hering [Tue, 17 Jul 2012 15:43:35 +0000 (17:43 +0200)]
xen PVonHVM: move shared_info to MMIO before kexec
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into
MMIO space before the kexec boot.
The pfn containing the shared_info is located somewhere in RAM. This
will cause trouble if the current kernel is doing a kexec boot into a
new kernel. The new kernel (and its startup code) can not know where the
pfn is, so it can not reserve the page. The hypervisor will continue to
update the pfn, and as a result memory corruption occours in the new
kernel.
One way to work around this issue is to allocate a page in the
xen-platform pci device's BAR memory range. But pci init is done very
late and the shared_info page is already in use very early to read the
pvclock. So moving the pfn from RAM to MMIO is racy because some code
paths on other vcpus could access the pfn during the small window when
the old pfn is moved to the new pfn. There is even a small window were
the old pfn is not backed by a mfn, and during that time all reads
return -1.
Because it is not known upfront where the MMIO region is located it can
not be used right from the start in xen_hvm_init_shared_info.
To minimise trouble the move of the pfn is done shortly before kexec.
This does not eliminate the race because all vcpus are still online when
the syscore_ops will be called. But hopefully there is no work pending
at this point in time. Also the syscore_op is run last which reduces the
risk further.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 17 Jul 2012 09:59:15 +0000 (11:59 +0200)]
xen: simplify init_hvm_pv_info
init_hvm_pv_info is called only in PVonHVM context, move it into ifdef.
init_hvm_pv_info does not fail, make it a void function.
remove arguments from init_hvm_pv_info because they are not used by the
caller.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 10 Jul 2012 13:31:39 +0000 (15:31 +0200)]
xen: enable platform-pci only in a Xen guest
While debugging kexec issues in a PVonHVM guest I modified
xen_hvm_platform() to return false to disable all PV drivers. This
caused a crash in platform_pci_init() because it expects certain data
structures to be initialized properly.
To avoid such a crash make sure the driver is initialized only if
running in a Xen guest.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Tue, 10 Jul 2012 12:50:03 +0000 (14:50 +0200)]
xen/pv-on-hvm kexec: shutdown watches from old kernel
Add xs_reset_watches function to shutdown watches from old kernel after
kexec boot. The old kernel does not unregister all watches in the
shutdown path. They are still active, the double registration can not
be detected by the new kernel. When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL). An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.
With this change the xenstored is instructed to wipe all active watches
for the guest. However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:06 +0000 (11:39 +0100)]
xen/mm: zero PTEs for non-present MFNs in the initial page table
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out. In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).
This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:05 +0000 (11:39 +0100)]
xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.
On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE). Instead, do one mmu_update hypercall
directly.
During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.
This significantly improves page fault performance in 32-bit PV
guests.
lmbench3 test Before After Improvement
----------------------------------------------
lat_pagefault 3.18 us 2.32 us 27%
lat_proc fork 356 us 313.3 us 11%
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Mon, 9 Jul 2012 10:39:08 +0000 (11:39 +0100)]
x86/xen: avoid updating TLS descriptors if they haven't changed
When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed. This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.
The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.
lmbench3 test Before After Improvement
--------------------------------------------
lat_ctx -s 32 24 7.19 6.52 9%
lat_pipe 12.56 11.66 7%
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen: populate correct number of pages when across mem boundary (v2)
When populate pages across a mem boundary at bootup, the page count
populated isn't correct. This is due to mem populated to non-mem
region and ignored.
Pfn range is also wrongly aligned when mem boundary isn't page aligned.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
[v2: If xen_do_chunk fail(populate), abort this chunk and any others]
Suggested by David, thanks.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 14306942
RDS will be using devinet_ioctl() to implement IP failover/fallback for IB
devices to support active/active. This is an enhancement request to export
devinet_ioctl() so non-kernel modules such as RDS can use it. Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created. However, in that case we
do not properly clear the 'tfile_check_list'. Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.
Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Davide Libenzi <davidel@xmailserver.org> Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
broke VLAN tagging on outbound packets.
It ifdef'ed BCM_KERNEL_SUPPORTS_8021Q, but this
is not set anywhere. So vlan never gets set, and
all packets are sent with vlan=0.
v2: We can just remove the test. vlan_tx_tag_present
is valid regardless of whether the 802.1q module
is built.
Tested on BCM5721 rev 11.
Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5c1e688388f629e8d8e88183b5ebc21e209252aa)
Andrea Arcangeli [Wed, 20 Jun 2012 19:52:57 +0000 (12:52 -0700)]
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
Orabug: 14300370
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid having
to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part of
the pmd was found null, the high part will be null too (so the pmd will be
considered unstable). And if the low part of the pmd is found "stable"
later, then it means the whole pmd was read atomically (because after a
pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
atomically to declare the pmd as "stable" and that's true for THP and no
THP, furthermore in the THP case we also have a barrier() that will
prevent any inconsistent pmdvals to be cached by a later re-read of the
*pmd.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Clark [Fri, 9 Mar 2012 22:50:30 +0000 (14:50 -0800)]
[SCSI] libfc: fcoe_transport_create fails in single-CPU environment
Orabug: 14239242
(mainline commit: 011a9008b11604b12e8386fa6ac3433ab3175dc2)
Starting fcoe fails at fcoe_transport_create when attempting to allocate a
pool of 4K exchanges on a 64-bit single-CPU environment because the call to
__alloc_percpu() is greater than the max of 32K. This patch reduces the
number of exchanges to fit within the maximum allowed space.
[ Whitespace problems fixed by Robert Love to satisfy chechpatch.pl ]
Signed-off-by: Steven Clark <sclark@crossbeam.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
John Stultz [Mon, 2 Jul 2012 23:28:54 +0000 (19:28 -0400)]
3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.
This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.
This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).
An apparent for this issue workaround is running:
$ date -s "`date`"
John Stultz [Sun, 1 Jul 2012 18:01:21 +0000 (14:01 -0400)]
3.0.x: hrtimer: Fix clock_was_set so it is safe to call from irq context
NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.
Currently clock_was_set() is unsafe to be called from irq
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().
To fix this, if clock_was_set is called when irqs are
disabled, we schedule a timer to fire for immedately after
we're out of interrupt context to then notify the hrtimer
subsystem.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
John Stultz [Mon, 2 Jul 2012 23:28:54 +0000 (19:28 -0400)]
3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.
This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.
This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).
An apparent workaround for this issue is running:
$ date -s "`date`"
I believe this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change. (Although I've not yet chased all the way down to the
hrtimer code to validate exactly what's going on there).
The workaround functions as it forces a clock_was_set()
call from settimeofday().
This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.
NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com>
John Stultz [Mon, 2 Jul 2012 00:25:12 +0000 (20:25 -0400)]
Fix clock_was_set so it is safe to call from atomic
Backport for 3.0.36
NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.
Currently clock_was_set() is unsafe to be called from atomic
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().
To fix this, introduce a work_struct so if we're in_atomic,
we can schedule work to do the necessary update after we're
out of the atomic section.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com>