www.infradead.org Git - users/jedix/linux-maple.git/log

epoll: clear the tfile_check_list on -ELOOP

Orabug: 14306496
upstream commit: 13d518074a952d33d47c428419693f63389547e9

An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created. However, in that case we
do not properly clear the 'tfile_check_list'. Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

SPEC: v2.6.39-300.0.5

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

tg3: fix VLAN tagging regression

commit 92cd3a17ce9c719abb4c28dee3438e0c641f8de4
tg3: Simplify tx bd assignments

broke VLAN tagging on outbound packets.
It ifdef'ed BCM_KERNEL_SUPPORTS_8021Q, but this
is not set anywhere. So vlan never gets set, and
all packets are sent with vlan=0.

v2: We can just remove the test. vlan_tx_tag_present
is valid regardless of whether the 802.1q module
is built.

Tested on BCM5721 rev 11.

Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5c1e688388f629e8d8e88183b5ebc21e209252aa)

Conflicts:
drivers/net/tg3.c

Signed-off-by: Joe Jin <joe.jin@oracle.com>

thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE

Orabug: 14300370
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
Xen.

So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid having
to use cmpxchg8b).

The only guarantee we get from pmd_read_atomic is that if the low part of
the pmd was found null, the high part will be null too (so the pmd will be
considered unstable). And if the low part of the pmd is found "stable"
later, then it means the whole pmd was read atomically (because after a
pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
and we read the high part after the low part).

In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
atomically to declare the pmd as "stable" and that's true for THP and no
THP, furthermore in the THP case we also have a barrier() that will
prevent any inconsistent pmdvals to be cached by a later re-read of the
*pmd.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Tested-by: Andrew Jones <drjones@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[SCSI] libfc: fcoe_transport_create fails in single-CPU environment

Orabug: 14239242
(mainline commit: 011a9008b11604b12e8386fa6ac3433ab3175dc2)
Starting fcoe fails at fcoe_transport_create when attempting to allocate a
pool of 4K exchanges on a 64-bit single-CPU environment because the call to
__alloc_percpu() is greater than the max of 32K. This patch reduces the
number of exchanges to fit within the maximum allowed space.

[ Whitespace problems fixed by Robert Love to satisfy chechpatch.pl ]

Signed-off-by: Steven Clark <sclark@crossbeam.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

SPEC: v2.6.39-300.0.4

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Merge branch 'uek-2.6.39-200' of git://ca-git.us.oracle.com/linux-uek-2.6.39 into master_stableup

Conflicts:
kernel/time/timekeeping.c
uek-rpm/ol5/kernel-uek.spec
uek-rpm/ol6/kernel-uek.spec

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"

This reverts commit 05f144a0d5c2207a0349348127f996e104ad7404.

This patch is broken and should have been reverted by now by an
alternative fix.

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

SPEC: v2.6.39-200.29.1

Signed-off-by: Joe Jin <joe.jin@oracle.com>

3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt

This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.

This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.

This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Backported-by: Joe Jin <joe.jin@oracle.com>

3.0.x: time: Fix leapsecond triggered hrtimer/futex load spike issue

As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent for this issue workaround is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

I this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change.

The workaround functions as it forces a clock_was_set()
call from settimeofday().

This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.

NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Backported-by: Joe Jin <joe.jin@oracle.com>

3.0.x: hrtimer: Fix clock_was_set so it is safe to call from irq context

NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.

Currently clock_was_set() is unsafe to be called from irq
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().

To fix this, if clock_was_set is called when irqs are
disabled, we schedule a timer to fire for immedately after
we're out of interrupt context to then notify the hrtimer
subsystem.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Backported-by: Joe Jin <joe.jin@oracle.com>

Revert "Fix clock_was_set so it is safe to call from atomic"

This reverts commit f84af0ca7768cc12c300cfc42289706199a0c93c.
To apply new patchset.

Signed-off-by: Joe Jin <joe.jin@oracle.com>

Revert "Fix leapsecond triggered hrtimer/futex load spike issue"

This reverts commit 05b3801d5d008ec51a9b9afad9856ce15ee02265.
To apply new patchset.

Signed-off-by: Joe Jin <joe.jin@oracle.com>

Revert "3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt"

This reverts commit 1ecf58256194384908dc2ec31f4ca92c1bd73077.

To apply new patchset.

Signed-off-by: Joe Jin <joe.jin@oracle.com>

SPEC: v2.6.39-200.28.1

Signed-off-by: Joe Jin <joe.jin@oracle.com>

3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt

This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.

This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.

This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Backported-by: Joe Jin <joe.jin@oracle.com>

SPEC: v2.6.39-200.27.1

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

SPEC: replace kernel-ovs to kernel-uek

Orabug: 14238939

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

SPEC: v2.6.39-200.26.1

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Fix leapsecond triggered hrtimer/futex load spike issue

Backport for 3.0.36

As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

I believe this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change. (Although I've not yet chased all the way down to the
hrtimer code to validate exactly what's going on there).

The workaround functions as it forces a clock_was_set()
call from settimeofday().

This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.

NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>

Fix clock_was_set so it is safe to call from atomic

Backport for 3.0.36

NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.

Currently clock_was_set() is unsafe to be called from atomic
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().

To fix this, introduce a work_struct so if we're in_atomic,
we can schedule work to do the necessary update after we're
out of the atomic section.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>

SPEC: v2.6.39-200.25.1

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

fixed some merge errors

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

futex: Mark get_robust_list as deprecated

Notify get_robust_list users that the syscall is going away.

commit ec0c4274e33c0373e476b73e01995c53128f1257 upstream

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: kernel-hardening@lists.openwall.com
Cc: spender@grsecurity.net
Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Conflicts:

Documentation/feature-removal-schedule.txt

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.36' into uek2-2.6.39-300#14252075

This is the 3.0.36 stable release

Conflicts:
arch/x86/xen/enlighten.c
include/linux/pci.h

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.35' into uek2-2.6.39-300#14252075

This is the 3.0.35 stable release

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.34' into uek2-2.6.39-300#14252075

This is the 3.0.34 stable release

Conflicts:
arch/x86/include/asm/pgtable-3level.h

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.33' into uek2-2.6.39-300#14252075

This is the 3.0.33 stable release

Conflicts:
drivers/scsi/hpsa.c

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.32' into uek2-2.6.39-300#14252075

This is the 3.0.32 stable release

Conflicts:
arch/x86/xen/enlighten.c
drivers/scsi/hpsa.c

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.31' into uek2-2.6.39-300#14252075

This is the 3.0.31 stable release

Conflicts:
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/hwmon/fam15h_power.c
virt/kvm/iommu.c

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.30' into uek2-2.6.39-300#14252075

This is the 3.0.30 stable release

Conflicts:
drivers/xen/xenbus/xenbus_probe_frontend.c

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.29' into uek2-2.6.39-300#14252075

This is the 3.0.29 stable release

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.28' into uek2-2.6.39-300#14252075

This is the 3.0.28 stable release

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge tag 'v3.0.27' into uek2-2.6.39-300#14252075

This is the 3.0.27 stable release

Conflicts:
drivers/hwmon/fam15h_power.c
drivers/net/e1000e/netdev.c
drivers/usb/serial/ftdi_sio.c
include/asm-generic/pgtable.h
net/ipv6/route.c

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Merge git://ca-git.us.oracle.com/linux-ganbalag-public.git v2.6.39-200.24.1#leapsec

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

ntp: Fix leap-second hrtimer livelock

Orabug: 14264454  leap second fix for UEK

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0                                                    CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
    process_adjtimex_modes(); timer_interrupt()
      process_adj_status();                                do_timer()
        ntp_start_leap_timer();                             write_lock(&xtime_lock);
          hrtimer_start();                                  update_wall_time();
             hrtimer_reprogram();                            ntp_tick_length()
               tick_program_event()                            spin_lock(&ntp_lock);
                 clockevents_program_event()
   ktime_get()
                     seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

ntp: Add ntp_lock to replace xtime_locking

Use a ntp_lock spin lock to replace xtime_lock locking in ntp.c

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

ntp: Access tick_length variable via ntp_tick_length()

Currently the NTP managed tick_length value is accessed globally,
in preparations for locking cleanups, make sure it is accessed via
a function and mark it as static.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

ntp: Cleanup timex.h

Move ntp_sycned to ntp.c and mark time_status as static.
Also yank function declaration for non-existant function.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

Merge branch 'kexec-v3' of git://ca-git.us.oracle.com/linux-dkiper-public

dm-nfs: force random mode for the backend file

Orabug: 14092678
Without this flag page_cache_sync_readahead() might take some seconds to
complete.
Since dm-nfs used for ovm and as vdisk, random access is expect, so force
set this flag when open the backend file.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Adnan Misherfi <adnan.misherfi@oracle.com>
Cc: Kurt C Hackel <kurt.hackel@oracle.com>
Cc: Andrew Thomas <andrew.thomas@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge branch 'uek-2.6.39-200' of ca-git.us.oracle.com:linux-uek-2.6.39

Conflicts:
block/scsi_ioctl.c
drivers/hwmon/fam15h_power.c
drivers/infiniband/ulp/ipoib/ipoib_main.c
drivers/infiniband/ulp/ipoib/ipoib_multicast.c
drivers/md/dm-mpath.c
drivers/net/Kconfig
drivers/net/Makefile
drivers/net/hxge/hxge_main.c
drivers/net/hxge/hxge_txdma.c
drivers/pci/pcie/aspm.c
drivers/scsi/sd.c
include/linux/if_vlan.h
net/ipv4/route.c
net/ipv6/route.c
net/sunrpc/svc.c
net/sunrpc/svc_xprt.c
uek-rpm/ol5/kernel-uek.spec
uek-rpm/ol6/config-generic
uek-rpm/ol6/kernel-uek.spec

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

x86: Add Xen kexec control code size check to linker script

Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

drivers/xen: Export vmcoreinfo through sysfs

Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen/enlighten: Add init and crash kexec/kdump hooks

Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen: Add kexec/kdump makefile rules

Add kexec/kdump makefile rules.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen: Add x86_64 kexec/kdump implementation

Add x86_64 kexec/kdump implementation.

v2 - Konrad Rzeszutek Wilk suggestions:
  - rewrite assembler code for transition page table initialization,
  - improve comments in assembler code,
  - other code cleanups for assembler code.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen: Add placeholder for i386 kexec/kdump implementation

Add placeholder for i386 kexec/kdump implementation
to not break compilation on this architecture.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen: Register resources required by kexec-tools

Register resources required by kexec-tools.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/xen: Introduce architecture dependent data for kexec/kdump

Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

xen: Introduce architecture independent data for kexec/kdump

Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

kexec: introduce kexec_ops struct

Some kexec/kdump implementations (e.g. Xen PVOPS) on different archs could
not use default functions or require some changes in behavior of kexec/kdump
generic code. To cope with that problem kexec_ops struct was introduced.
It allows a developer to replace all or some functions and control some
functionality of kexec/kdump generic code.

Default behavior of kexec/kdump generic code is not changed.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>

Merge branch 'uek2-merge' of git://ca-git.us.oracle.com/linux-konrad-public

SPEC: v2.6.39-200.24.1
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Revert "Add Oracle VM guest messaging driver"
Orabug: 14233627
This reverts commit 0193318fe7899d2717cabff800c3a51cbfbc6ada.

Linux 3.0.36

USB: fix gathering of interface associations

commit b3a3dd074f7053ef824ad077e5331b52220ceba1 upstream.

TEAC's UD-H01 (and probably other devices) have a gap in the interface
number allocation of their descriptors:

  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength          220
    bNumInterfaces          3
    [...]
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      [...]
    Interface Association:
      bLength                 8
      bDescriptorType        11
      bFirstInterface         2
      bInterfaceCount         2
      bFunctionClass          1 Audio
      bFunctionSubClass       0
      bFunctionProtocol      32
      iFunction               4
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      [...]

Once a configuration is selected, usb_set_configuration() walks the
known interfaces of a given configuration and calls find_iad() on
each of them to set the interface association pointer the interface
is included in.

The problem here is that the loop variable is taken for the interface
number in the comparison logic that gathers the association. Which is
fine as long as the descriptors are sane.

In the case above, however, the logic gets out of sync and the
interface association fields of all interfaces beyond the interface
number gap are wrong.

Fix this by passing the interface's bInterfaceNumber to find_iad()
instead.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: bEN <ml_all@circa.be>
Reported-by: Ivan Perrone <ivanperrone@hotmail.com>
Tested-by: ivan perrone <ivanperrone@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: serial: Enforce USB driver and USB serial driver match

commit 954c3f8a5f1b7716be9eee978b3bc85bae92d7c8 upstream.

We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.

An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:

May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected

sysfs view of the same problem:

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind

So we end up with a mismatch between the USB driver and the
USB serial driver.  The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.

This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver.  This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:

[11811.646396] drivers/usb/serial/usb-serial.c: get_free_serial 1
[11811.646443] drivers/usb/serial/usb-serial.c: get_free_serial - minor base = 0
[11811.646460] drivers/usb/serial/usb-serial.c: usb_serial_probe - registering ttyUSB0
[11811.646766] usb 6-1: pl2303 converter now attached to ttyUSB0
[11812.264197] USB Serial deregistering driver FTDI USB Serial Device
[11812.264865] usbcore: deregistering interface driver ftdi_sio
[11812.282180] USB Serial deregistering driver pl2303
[11812.283141] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[11812.283272] usbcore: deregistering interface driver pl2303
[11812.301056] USB Serial deregistering driver generic
[11812.301186] usbcore: deregistering interface driver usbserial_generic
[11812.301259] drivers/usb/serial/usb-serial.c: usb_serial_disconnect
[11812.301823] BUG: unable to handle kernel paging request at f8e7438c
[11812.301845] IP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.301871] *pde = 357ef067 *pte = 00000000
[11812.301957] Oops: 0000 [#1] PREEMPT SMP
[11812.301983] Modules linked in: usbserial(-) [last unloaded: pl2303]
[11812.302008]
[11812.302019] Pid: 1323, comm: modprobe Tainted: G        W    3.4.0-rc7+ #101 Dell Inc. Vostro 1520/0T816J
[11812.302115] EIP: 0060:[<f8e38445>] EFLAGS: 00010246 CPU: 1
[11812.302130] EIP is at usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.302141] EAX: f508a180 EBX: f508a180 ECX: 00000000 EDX: f8e74300
[11812.302151] ESI: f5050800 EDI: 00000001 EBP: f5141e78 ESP: f5141e58
[11812.302160]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11812.302170] CR0: 8005003b CR2: f8e7438c CR3: 34848000 CR4: 000007d0
[11812.302180] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11812.302189] DR6: ffff0ff0 DR7: 00000400
[11812.302199] Process modprobe (pid: 1323, ti=f5140000 task=f61e2bc0 task.ti=f5140000)
[11812.302209] Stack:
[11812.302216]  f8e3be0f f8e3b29c f8e3ae00 00000000 f513641c f5136400 f513641c f507a540
[11812.302325]  f5141e98 c133d2c1 00000000 00000000 f509c400 f513641c f507a590 f5136450
[11812.302372]  f5141ea8 c12f0344 f513641c f507a590 f5141ebc c12f0c67 00000000 f507a590
[11812.302419] Call Trace:
[11812.302439]  [<c133d2c1>] usb_unbind_interface+0x51/0x190
[11812.302456]  [<c12f0344>] __device_release_driver+0x64/0xb0
[11812.302469]  [<c12f0c67>] driver_detach+0x97/0xa0
[11812.302483]  [<c12f001c>] bus_remove_driver+0x6c/0xe0
[11812.302500]  [<c145938d>] ? __mutex_unlock_slowpath+0xcd/0x140
[11812.302514]  [<c12f0ff9>] driver_unregister+0x49/0x80
[11812.302528]  [<c1457df6>] ? printk+0x1d/0x1f
[11812.302540]  [<c133c50d>] usb_deregister+0x5d/0xb0
[11812.302557]  [<f8e37c55>] ? usb_serial_deregister+0x45/0x50 [usbserial]
[11812.302575]  [<f8e37c8d>] usb_serial_deregister_drivers+0x2d/0x40 [usbserial]
[11812.302593]  [<f8e3a6e2>] usb_serial_generic_deregister+0x12/0x20 [usbserial]
[11812.302611]  [<f8e3acf0>] usb_serial_exit+0x8/0x32 [usbserial]
[11812.302716]  [<c1080b48>] sys_delete_module+0x158/0x260
[11812.302730]  [<c110594e>] ? mntput+0x1e/0x30
[11812.302746]  [<c145c3c3>] ? sysenter_exit+0xf/0x18
[11812.302746]  [<c107777c>] ? trace_hardirqs_on_caller+0xec/0x170
[11812.302746]  [<c145c390>] sysenter_do_call+0x12/0x36
[11812.302746] Code: 24 02 00 00 e8 dd f3 20 c8 f6 86 74 02 00 00 02 74 b4 8d 86 4c 02 00 00 47 e8 78 55 4b c8 0f b6 43 0e 39 f8 7f a9 8b 53 04 89 d8 <ff> 92 8c 00 00 00 89 d8 e8 0e ff ff ff 8b 45 f0 c7 44 24 04 2f
[11812.302746] EIP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial] SS:ESP 0068:f5141e58
[11812.302746] CR2: 00000000f8e7438c

Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing.  This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: serial: sierra: Add support for Sierra Wireless AirCard 320U modem

commit 19a3dd1575e954e8c004413bee3e12d3962f2525 upstream.

Add support for Sierra Wireless AirCard 320U modem

Signed-off-by: Tomas Cassidy <tomas.cassidy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: cdc-acm: fix devices not unthrottled on open

commit 6c4707f3f8c44ec18282e1c014c80e1c257042f9 upstream.

Currently CDC-ACM devices stay throttled when their TTY is closed while
throttled, stalling further communication attempts after the next open.

Unthrottling during open/activate got lost starting with kernel
3.0.0 and this patch reintroduces it.

Signed-off-by: Otto Meta <otto.patches@sister-shadow.de>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2

commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b upstream.

This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers.  Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.

After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state.  Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.

The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep.  Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.

A similar patch has already been applied as commit
151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during
suspend on ASUS computers).  The patch supersedes that one and reverts
it.  There are two differences:

The old patch added the flag at the USB level; this patch
adds it at the PCI level.

The old patch applied to all chipsets with the same vendor,
subsystem vendor, and product IDs; this patch makes an
exception for a known-good system (based on DMI information).

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: ftdi-sio: Add support for RT Systems USB-RTS01 serial adapter

commit e00a54d772210d450e5c1a801534c3c8a448549f upstream.

Add support for RT Systems USB-RTS01 USB to Serial adapter:
http://www.rtsystemsinc.com/Photos/USBRTS01.html

Tested by controlling Icom IC-718 amateur radio transceiver via hamlib.

Signed-off-by: Evan McNabb <evan@mcnabbs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: serial: cp210x: add Optris MS Pro usb id

commit 5bbfa6f427c1d7244a5ee154ab8fa37265a5e049 upstream.

Signed-off-by: Mikko Tuumanen <mikko.tuumanen@qemsoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: mct_u232: Fix incorrect TIOCMSET return

commit 1aa3c63cf0a79153ee13c8f82e4eb6c40b66a161 upstream.

The low level helper returns 1 on success. The ioctl should however return
0. As this is the only user of the helper return, make the helper return 0 or
an error code.

Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=43009
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: qcserial: Add Sierra Wireless device IDs

commit c41444ccfa33a1c20efa319e554cb531576e64a2 upstream.

Some additional IDs found in the BSD/GPL licensed out-of-tree
GobiSerial driver from Sierra Wireless.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: mos7840: Fix compilation of usb serial driver

commit b9c87663eead64c767e72a373ae6f8a94bead459 upstream.

The __devinitconst section can't be referenced
from usb_serial_device structure. Thus removed it as
it done in other mos* device drivers.

Error itself:
WARNING: drivers/usb/serial/mos7840.o(.data+0x8): Section mismatch in reference
from the variable moschip7840_4port_device to the variable
.devinit.rodata:id_table
The variable moschip7840_4port_device references
the variable __devinitconst id_table

[v2] no attach now

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xHCI: Increase the timeout for controller save/restore state operation

commit 622eb783fe6ff4c1baa47db16c3a5db97f9e6e50 upstream.

When system software decides to power down the xHC with the intent of
resuming operation at a later time, it will ask xHC to save the internal
state and restore it when resume to correctly recover from a power event.
Two bits are used to enable this operation: Save State and Restore State.

xHCI spec 4.23.2 says software should "Set the Controller Save/Restore
State flag in the USBCMD register and wait for the Save/Restore State
Status flag in the USBSTS register to transition to '0'". However, it does
not define how long software should wait for the SSS/RSS bit to transition
to 0.

Currently the timeout is set to 1ms. There is bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1002697)
indicates that the timeout is too short for ASMedia ASM1042 host controller
to save/restore the state successfully. Increase the timeout to 10ms helps to
resolve the issue.

This patch should be backported to stable kernels as old as 2.6.37, that
contain the commit 5535b1d5f8885695c6ded783c692e3c0d0eda8ca "USB: xHCI:
PCI power management implementation"

Signed-off-by: Andiry Xu <andiry.xu@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hfsplus: fix overflow in sector calculations in hfsplus_submit_bio

commit a6dc8c04218eb752ff79cdc24a995cf51866caed upstream.

The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.

Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: option: fix port-data abuse

commit 4273f9878b0a8271df055e3c8f2e7f08c6a4a2f4 upstream.

Commit 8b4c6a3ab596961b78465 ("USB: option: Use generic USB wwan code")
moved option port-data allocation to usb_wwan_startup but still cast the
port data to the old struct...

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: option: fix memory leak

commit b9c3aab315b51f81649a0d737c4c73783fbd8de0 upstream.

Fix memory leak introduced by commit 383cedc3bb435de7a2 ("USB: serial:
full autosuspend support for the option driver") which allocates
usb-serial data but never frees it.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: option: add more YUGA device ids

commit 0ef0be15fd2564767f114c249fc4af704d8e16f4 upstream.

Signed-off-by: gavin zhu <gavin.zhu@qq.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: option: Updated Huawei K4605 has better id

commit 42ca7da1c2363dbef4ba1b6917c4c02274b6a5e2 upstream.

Later firmwares for this device now have proper subclass and
protocol info so we can identify it nicely without needing to use
the blacklist. I'm not removing the old 0xff matching as there
may be devices in the field that still need that.

Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: option: Add Vodafone/Huawei K5005 support

commit 4cbbb039a9719fb3bba73d255c6a95bc6dc6428b upstream.

Tested-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1: Fix a request leak on the back channel

commit b3b02ae5865c2dcd506322e0fc6def59a042e72f upstream.

If the call to svc_process_common() fails, then the request
needs to be freed before we can exit bc_svc_process.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen/setup: filter APERFMPERF cpuid feature out

commit 5e626254206a709c6e937f3dda69bf26c7344f6f upstream.

Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs

This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM i.MX imx21ads: Fix overlapping static i/o mappings

commit 350ab15bb2ffe7103bc6bf6c634f3c5b286eaf2a upstream.

The statically defined I/O memory regions for the i.MX21 on chip
peripherals and the on board I/O peripherals of the i.MX21ADS board
overlap. This results in a kernel crash during startup. This is fixed
by reducing the memory range for the on board I/O peripherals to the
actually required range.

Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SPEC: v2.6.39-200.23.1
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

SPEC: add block/net modules to list used by installer

Orabug: 14224837

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

SPEC: v2.6.39-200.22.1

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

NFSv4: include bitmap in nfsv4 get acl data

The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.

This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.

Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
This fixes: CVE-2011-4131

Cc: stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Add Oracle VM guest messaging driver

Signed-off-by: Cathy Avery <cathy.avery@oracle.com>
Signed-off-by: Steve Prochniak <steve.prochniak@oracle.com>
Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE

In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
contents under Xen.

So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid
having to use cmpxchg8b).

The only guarantee we get from pmd_read_atomic is that if the low part
of the pmd was found null, the high part will be null too (so the pmd
will be considered unstable). And if the low part of the pmd is found
"stable" later, then it means the whole pmd was read atomically
(because after a pmd is stable, neither MADV_DONTNEED nor page faults
can alter it anymore, and we read the high part after the low part).

In the 32bit PAE x86 case, it is enough to read the low part of the
pmdval atomically to declare the pmd as "stable" and that's true for
THP and no THP, furthermore in the THP case we also have a barrier()
that will prevent any inconsistent pmdvals to be cached by a later
re-read of the *pmd.
(cherry picked from commit cdc7a76d4903387391fba3284be3b0b5c364f3d2)

Orabug: 14217003
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch 'loop' of git://ca-git.us.oracle.com/linux-dkleikam-public into uek-2.6.39-200

ocfs2:btrfs: aio-dio-loop changes broke setrlimit behavior [orabug 14207636]

The aio-dio changes for the loop device driver broke ocfs2 and btrfs's
handling of rlimit. generic_write_checks() adjusts the IO byte count to
account for the rlimit, but the updated count was not being reflected in
the iov_iter data structure.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge

* stable/for-linus-3.6.rebased:
xen/mce: add .poll method for mcelog device driver

xen/mce: add .poll method for mcelog device driver

If a driver leaves its poll method NULL, the device is assumed to
be both readable and writable without blocking.

This patch add .poll method to xen mcelog device driver, so that
when mcelog use system calls like ppoll or select, it would be
blocked when no data available, and avoid spinning at CPU.

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

SPEC: v2.6.39-200.21.0

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

KVM: Fix buffer overflow in kvm_set_irq()

Bugdb: 13966
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.

Fix by ensuring that an MSI entry is added to an empty list.
This fixes: CVE-2012-2137
Signed-off-by: Avi Kivity <avi@redhat.com>

net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()

Bugdb: 13966
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
This fixes: CVE-2012-2136
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition

Bugdb: 13966
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
#0 [f06a9dd8] crash_kexec at c049b5ec
#1 [f06a9e2c] oops_end at c083d1c2
#2 [f06a9e40] no_context at c0433ded
#3 [f06a9e64] bad_area_nosemaphore at c043401a
#4 [f06a9e6c] __do_page_fault at c0434493
#5 [f06a9eec] do_page_fault at c083eb45
#6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
#7 [f06a9f38] _spin_lock at c083bc14
#8 [f06a9f44] sys_mincore at c0507b7d
#9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
This fixes: CVE-2012-2373

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KVM: lock slots_lock around device assignment

Bugdb: 13966
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu.  This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots.  Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings.  Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes: CVE-2012-2121
Conflicts:

virt/kvm/iommu.c

KVM: unmap pages from the iommu when slots are removed

Bugdb: 13966
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.

Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
This fixes: CVE-2012-2121

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: introduce kvm_for_each_memslot macro

Bugdb: 13966
Introduce kvm_for_each_memslot to walk all valid memslot

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fcaps: clear the same personality flags as suid when fcaps are used

Bugdb: 13966
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
This fixes: CVE-2012-2123

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

hwmon: (fam15h_power) Correct sign extension of running_avg_capture

Wrong bit was used for sign extension which caused wrong end results.
Thanks to Andre for spotting this bug.

Reported-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Conflicts:

drivers/hwmon/fam15h_power.c

EDAC: Make pci_device_id tables __devinitconst.

These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:

"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."

So use DEFINE_PCI_DEVICE_TABLE(x).

Based on PaX and earlier work by Andi Kleen.

Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Conflicts:

drivers/edac/sb_edac.c

x86, MCE, AMD: Make APIC LVT thresholding interrupt optional

Currently, the APIC LVT interrupt for error thresholding is implicitly
enabled. However, there are models in the F15h range which do not enable
it. Make the code machinery which sets up the APIC interrupt support
an optional setting and add an ->interrupt_capable member to the bank
representation mirroring that capability and enable the interrupt offset
programming only if it is true.

Simplify code and fixup comment style while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Conflicts:

arch/x86/kernel/cpu/mcheck/mce_amd.c