An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created. However, in that case we
do not properly clear the 'tfile_check_list'. Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.
Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Davide Libenzi <davidel@xmailserver.org> Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
broke VLAN tagging on outbound packets.
It ifdef'ed BCM_KERNEL_SUPPORTS_8021Q, but this
is not set anywhere. So vlan never gets set, and
all packets are sent with vlan=0.
v2: We can just remove the test. vlan_tx_tag_present
is valid regardless of whether the 802.1q module
is built.
Tested on BCM5721 rev 11.
Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5c1e688388f629e8d8e88183b5ebc21e209252aa)
Andrea Arcangeli [Wed, 20 Jun 2012 19:52:57 +0000 (12:52 -0700)]
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
Orabug: 14300370
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid having
to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part of
the pmd was found null, the high part will be null too (so the pmd will be
considered unstable). And if the low part of the pmd is found "stable"
later, then it means the whole pmd was read atomically (because after a
pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
atomically to declare the pmd as "stable" and that's true for THP and no
THP, furthermore in the THP case we also have a barrier() that will
prevent any inconsistent pmdvals to be cached by a later re-read of the
*pmd.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Clark [Fri, 9 Mar 2012 22:50:30 +0000 (14:50 -0800)]
[SCSI] libfc: fcoe_transport_create fails in single-CPU environment
Orabug: 14239242
(mainline commit: 011a9008b11604b12e8386fa6ac3433ab3175dc2)
Starting fcoe fails at fcoe_transport_create when attempting to allocate a
pool of 4K exchanges on a 64-bit single-CPU environment because the call to
__alloc_percpu() is greater than the max of 32K. This patch reduces the
number of exchanges to fit within the maximum allowed space.
[ Whitespace problems fixed by Robert Love to satisfy chechpatch.pl ]
Signed-off-by: Steven Clark <sclark@crossbeam.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
John Stultz [Mon, 2 Jul 2012 23:28:54 +0000 (19:28 -0400)]
3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.
This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.
This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).
An apparent for this issue workaround is running:
$ date -s "`date`"
John Stultz [Sun, 1 Jul 2012 18:01:21 +0000 (14:01 -0400)]
3.0.x: hrtimer: Fix clock_was_set so it is safe to call from irq context
NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.
Currently clock_was_set() is unsafe to be called from irq
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().
To fix this, if clock_was_set is called when irqs are
disabled, we schedule a timer to fire for immedately after
we're out of interrupt context to then notify the hrtimer
subsystem.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
John Stultz [Mon, 2 Jul 2012 23:28:54 +0000 (19:28 -0400)]
3.0.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.
This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.
This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Backported-by: Joe Jin <joe.jin@oracle.com>
As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).
An apparent workaround for this issue is running:
$ date -s "`date`"
I believe this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change. (Although I've not yet chased all the way down to the
hrtimer code to validate exactly what's going on there).
The workaround functions as it forces a clock_was_set()
call from settimeofday().
This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.
NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com>
John Stultz [Mon, 2 Jul 2012 00:25:12 +0000 (20:25 -0400)]
Fix clock_was_set so it is safe to call from atomic
Backport for 3.0.36
NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.
Currently clock_was_set() is unsafe to be called from atomic
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().
To fix this, introduce a work_struct so if we're in_atomic,
we can schedule work to do the necessary update after we're
out of the atomic section.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com>
Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.
Thomas diagnosed this as the following pattern:
CPU 0 CPU 1
do_adjtimex()
spin_lock_irq(&ntp_lock);
process_adjtimex_modes(); timer_interrupt()
process_adj_status(); do_timer()
ntp_start_leap_timer(); write_lock(&xtime_lock);
hrtimer_start(); update_wall_time();
hrtimer_reprogram(); ntp_tick_length()
tick_program_event() spin_lock(&ntp_lock);
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock);
This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.
The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ) after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).
This patch applies on top of tip/timers/core.
CC: Sasha Levin <levinsasha928@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sasha Levin <levinsasha928@gmail.com> Diagnoised-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
John Stultz [Mon, 14 Nov 2011 21:48:36 +0000 (13:48 -0800)]
ntp: Add ntp_lock to replace xtime_locking
Use a ntp_lock spin lock to replace xtime_lock locking in ntp.c
CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Mon, 14 Nov 2011 21:18:07 +0000 (13:18 -0800)]
ntp: Access tick_length variable via ntp_tick_length()
Currently the NTP managed tick_length value is accessed globally,
in preparations for locking cleanups, make sure it is accessed via
a function and mark it as static.
CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Mon, 14 Nov 2011 21:06:21 +0000 (13:06 -0800)]
ntp: Cleanup timex.h
Move ntp_sycned to ntp.c and mark time_status as static.
Also yank function declaration for non-existant function.
CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
Joe Jin [Mon, 4 Jun 2012 05:45:02 +0000 (13:45 +0800)]
dm-nfs: force random mode for the backend file
Orabug: 14092678
Without this flag page_cache_sync_readahead() might take some seconds to
complete.
Since dm-nfs used for ovm and as vdisk, random access is expect, so force
set this flag when open the backend file.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Adnan Misherfi <adnan.misherfi@oracle.com> Cc: Kurt C Hackel <kurt.hackel@oracle.com> Cc: Andrew Thomas <andrew.thomas@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Joe Jin [Mon, 4 Jun 2012 05:45:02 +0000 (13:45 +0800)]
dm-nfs: force random mode for the backend file
Orabug: 14092678
Without this flag page_cache_sync_readahead() might take some seconds to
complete.
Since dm-nfs used for ovm and as vdisk, random access is expect, so force
set this flag when open the backend file.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Adnan Misherfi <adnan.misherfi@oracle.com> Cc: Kurt C Hackel <kurt.hackel@oracle.com> Cc: Andrew Thomas <andrew.thomas@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel Kiper [Thu, 21 Jun 2012 13:24:09 +0000 (15:24 +0200)]
x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Daniel Kiper [Thu, 21 Jun 2012 13:22:41 +0000 (15:22 +0200)]
kexec: introduce kexec_ops struct
Some kexec/kdump implementations (e.g. Xen PVOPS) on different archs could
not use default functions or require some changes in behavior of kexec/kdump
generic code. To cope with that problem kexec_ops struct was introduced.
It allows a developer to replace all or some functions and control some
functionality of kexec/kdump generic code.
Default behavior of kexec/kdump generic code is not changed.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Once a configuration is selected, usb_set_configuration() walks the
known interfaces of a given configuration and calls find_iad() on
each of them to set the interface association pointer the interface
is included in.
The problem here is that the loop variable is taken for the interface
number in the comparison logic that gathers the association. Which is
fine as long as the descriptors are sane.
In the case above, however, the logic gets out of sync and the
interface association fields of all interfaces beyond the interface
number gap are wrong.
Fix this by passing the interface's bInterfaceNumber to find_iad()
instead.
Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: bEN <ml_all@circa.be> Reported-by: Ivan Perrone <ivanperrone@hotmail.com> Tested-by: ivan perrone <ivanperrone@hotmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.
An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:
May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected
sysfs view of the same problem:
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
So we end up with a mismatch between the USB driver and the
USB serial driver. The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.
This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver. This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:
Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing. This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: Felipe Balbi <balbi@ti.com> Tested-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently CDC-ACM devices stay throttled when their TTY is closed while
throttled, stalling further communication attempts after the next open.
Unthrottling during open/activate got lost starting with kernel
3.0.0 and this patch reintroduces it.
Signed-off-by: Otto Meta <otto.patches@sister-shadow.de> Acked-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers. Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.
After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state. Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.
The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep. Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.
A similar patch has already been applied as commit 151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during
suspend on ASUS computers). The patch supersedes that one and reverts
it. There are two differences:
The old patch added the flag at the USB level; this patch
adds it at the PCI level.
The old patch applied to all chipsets with the same vendor,
subsystem vendor, and product IDs; this patch makes an
exception for a known-good system (based on DMI information).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Dâniel Fraga <fragabr@gmail.com> Tested-by: Andrey Rahmatullin <wrar@wrar.name> Tested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The low level helper returns 1 on success. The ioctl should however return
0. As this is the only user of the helper return, make the helper return 0 or
an error code.
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=43009 Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The __devinitconst section can't be referenced
from usb_serial_device structure. Thus removed it as
it done in other mos* device drivers.
Error itself:
WARNING: drivers/usb/serial/mos7840.o(.data+0x8): Section mismatch in reference
from the variable moschip7840_4port_device to the variable
.devinit.rodata:id_table
The variable moschip7840_4port_device references
the variable __devinitconst id_table
[v2] no attach now
Signed-off-by: Tony Zelenoff <antonz@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When system software decides to power down the xHC with the intent of
resuming operation at a later time, it will ask xHC to save the internal
state and restore it when resume to correctly recover from a power event.
Two bits are used to enable this operation: Save State and Restore State.
xHCI spec 4.23.2 says software should "Set the Controller Save/Restore
State flag in the USBCMD register and wait for the Save/Restore State
Status flag in the USBSTS register to transition to '0'". However, it does
not define how long software should wait for the SSS/RSS bit to transition
to 0.
Currently the timeout is set to 1ms. There is bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1002697)
indicates that the timeout is too short for ASMedia ASM1042 host controller
to save/restore the state successfully. Increase the timeout to 10ms helps to
resolve the issue.
This patch should be backported to stable kernels as old as 2.6.37, that
contain the commit 5535b1d5f8885695c6ded783c692e3c0d0eda8ca "USB: xHCI:
PCI power management implementation"
Signed-off-by: Andiry Xu <andiry.xu@gmail.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.
Signed-off-by: Janne Kalliomäki <janne@tuxera.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8b4c6a3ab596961b78465 ("USB: option: Use generic USB wwan code")
moved option port-data allocation to usb_wwan_startup but still cast the
port data to the old struct...
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix memory leak introduced by commit 383cedc3bb435de7a2 ("USB: serial:
full autosuspend support for the option driver") which allocates
usb-serial data but never frees it.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Later firmwares for this device now have proper subclass and
protocol info so we can identify it nicely without needing to use
the blacklist. I'm not removing the old 0xff matching as there
may be devices in the field that still need that.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs
This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The statically defined I/O memory regions for the i.MX21 on chip
peripherals and the on board I/O peripherals of the i.MX21ADS board
overlap. This results in a kernel crash during startup. This is fixed
by reducing the memory range for the on board I/O peripherals to the
actually required range.
Andy Adamson [Wed, 7 Dec 2011 16:55:27 +0000 (11:55 -0500)]
NFSv4: include bitmap in nfsv4 get acl data
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
This fixes: CVE-2011-4131
Cc: stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andrea Arcangeli [Thu, 7 Jun 2012 19:45:30 +0000 (21:45 +0200)]
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
contents under Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid
having to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part
of the pmd was found null, the high part will be null too (so the pmd
will be considered unstable). And if the low part of the pmd is found
"stable" later, then it means the whole pmd was read atomically
(because after a pmd is stable, neither MADV_DONTNEED nor page faults
can alter it anymore, and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the
pmdval atomically to declare the pmd as "stable" and that's true for
THP and no THP, furthermore in the THP case we also have a barrier()
that will prevent any inconsistent pmdvals to be cached by a later
re-read of the *pmd.
(cherry picked from commit cdc7a76d4903387391fba3284be3b0b5c364f3d2)
Orabug: 14217003 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
The aio-dio changes for the loop device driver broke ocfs2 and btrfs's
handling of rlimit. generic_write_checks() adjusts the IO byte count to
account for the rlimit, but the updated count was not being reflected in
the iov_iter data structure.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Liu, Jinsong [Fri, 15 Jun 2012 01:03:39 +0000 (09:03 +0800)]
xen/mce: add .poll method for mcelog device driver
If a driver leaves its poll method NULL, the device is assumed to
be both readable and writable without blocking.
This patch add .poll method to xen mcelog device driver, so that
when mcelog use system calls like ppoll or select, it would be
blocked when no data available, and avoid spinning at CPU.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Avi Kivity [Sun, 22 Apr 2012 14:02:11 +0000 (17:02 +0300)]
KVM: Fix buffer overflow in kvm_set_irq()
Bugdb: 13966
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.
Fix by ensuring that an MSI entry is added to an empty list.
This fixes: CVE-2012-2137 Signed-off-by: Avi Kivity <avi@redhat.com>
Jason Wang [Wed, 30 May 2012 21:18:10 +0000 (21:18 +0000)]
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
Bugdb: 13966
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
This fixes: CVE-2012-2136 Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Arcangeli [Tue, 29 May 2012 22:06:49 +0000 (15:06 -0700)]
mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
Bugdb: 13966
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.
This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.
With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.
This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.
Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.
NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.
This bug was discovered and fully debugged by Ulrich, quote:
----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.
496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
*pmd)
497 {
498 /* depend on compiler for an atomic pmd read */
499 pmd_t pmdval = *pmd;
Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.
- The PMD entry {high|low} is 0x0000000000000000.
The "mov" at 0xc0507a84 loads 0x00000000 into edx.
- A page fault (on another CPU) sneaks in between the two "mov"
instructions and instantiates the PMD.
- The PMD entry {high|low} is now 0x00000003fda38067.
The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
This fixes: CVE-2012-2373
Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Williamson [Wed, 18 Apr 2012 03:46:44 +0000 (21:46 -0600)]
KVM: lock slots_lock around device assignment
Bugdb: 13966
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes: CVE-2012-2121
Conflicts:
Alex Williamson [Wed, 11 Apr 2012 15:51:49 +0000 (09:51 -0600)]
KVM: unmap pages from the iommu when slots are removed
Bugdb: 13966
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.
Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
This fixes: CVE-2012-2121
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Eric Paris [Tue, 17 Apr 2012 20:26:54 +0000 (16:26 -0400)]
fcaps: clear the same personality flags as suid when fcaps are used
Bugdb: 13966
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
This fixes: CVE-2012-2123
Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
x86, MCE, AMD: Make APIC LVT thresholding interrupt optional
Currently, the APIC LVT interrupt for error thresholding is implicitly
enabled. However, there are models in the F15h range which do not enable
it. Make the code machinery which sets up the APIC interrupt support
an optional setting and add an ->interrupt_capable member to the bank
representation mirroring that capability and enable the interrupt offset
programming only if it is true.
Simplify code and fixup comment style while at it.