www.infradead.org Git - users/mchehab/edac.git/log

events/hw_event: Create a Hardware Events Report Mecanism (HERM)

Adds a trace class for handle hardware events

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

In order to provide a proper hardware event subsystem, let's
encapsulate the memory error hardware events into a trace facility.

As no agreement was reached so far for the MCA-based trace events, for
now, let's add events only for memory errors. A latter patch can change
the tracepoint, for events originated via MCA.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Fix memory error count

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i7300_edac: fixup

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Call the minimum grain node as "rank" if chip select is used

The minimum hierarchy unit for csrows-based MC's is rank, and not dimm.

There are a few possible fixes here:

1) add a per-rank location at the dimm struct, and don't create one
dimm_info entry per rank;

2) Convert the drivers to internally convert ranks into dimm's;

3) rename "dimm" with "rank";

A few drivers do (2) internally: some of them have per-dimm registers,
instead of per-rank ones; a few others have some routines to translate
between the two representations of the memory. Those drivers report the
memories via a DIMM slot and a channel.

While no better solution is done, let's do (2) for the drivers that work
with csrows/channel layers to represent a rank.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: add a sysfs node that stores the max possible memory location

This is useful for edac utilities that want to print the memory layout,
including the empty slots.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i5100_edac: Fix the logic

This driver is different from the other drivers: it is not for a
FB-DIMM based device, but it does the right thing of exporting
the memories as dimm's and not as ranks.

Due to that, the mapping should be different than the other drivers.

After this change, the memories are properly reported:

/sys/devices/system/edac/mc/mc0/dimm0/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm2/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm4/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm6/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm8/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm10/dimm_size:1024
/sys/devices/system/edac/mc/mc0/dimm0/dimm_location:channel 0 slot 0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_location:channel 0 slot 2
/sys/devices/system/edac/mc/mc0/dimm4/dimm_location:channel 0 slot 4
/sys/devices/system/edac/mc/mc0/dimm6/dimm_location:channel 1 slot 0
/sys/devices/system/edac/mc/mc0/dimm8/dimm_location:channel 1 slot 2
/sys/devices/system/edac/mc/mc0/dimm10/dimm_location:channel 1 slot 4

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i5000_edac: Fix the logic that retrieves memory information

The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.

The check if the AMB is present were also wrong.

Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.

After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:

calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot  3       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: slot  2       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot  1       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: slot  0     512 MB 1R|  512 MB 1R|  512 MB 1R|  512 MB 1R|
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size:            channel 0 | channel 1 | channel 2 | channel 3 |
calculate_dimm_size:                   branch 0       |        branch 1       |

(1R above means that all memories on my test machine are single-ranked)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

amd64_edac: remove a duplicated call to edac_mc_handle_error()

By mistake, the error handler call were added twice by a previous
patch.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: be sure to use the GET_POS macro to get memset_info struct

Most drivers use the csrows/channel way to get the struct memset_info
associated with a dimm/rank. FB-DIMM's and new Intel arch drivers
use different memory layers. Be sure to get the right DIMM on
those drivers. So, use a macro to calculate the offset of the dimm inside
its structure.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: fill the location with something useful if the DIMM is not found

If, for any reason, the driver passes a location for a not-filled
memory, uses "unknown memory" for the label. This is to prevent a
message like the one below:

[30418.916116] Generating a fake error to 0.-1.1 to test core handling. NOTE: this won't test the driver-specific decoding logic.
[30418.916123] EDAC MC0: UE FAKE ERROR on (branch 0 slot 1 page 0x0 offset 0x0 grain 0 for EDAC testing only)

(in this case, there's no memories at slot 1 - so it is simulating
a driver decoding failure)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i5400_edac: Better represent the memory controller hierarchy

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i5400_edac: Avoid calling pci_put_device() twice

When i5400_edac driver is removed and re-loaded a few times, it causes
an OOPS, as it is currently decrementing some PCI device usage two
times.

When called inside a loop, pci_get_device() will call
pci_put_device(). That mangles the error count. In this specific
case, it seems easier to just duplicate the call.

Also fixes the error logic when pci_get_device fails.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_mc: Fixes the logic that fills the dimms

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_mc_sysfs: don't create inactive errcount sysfs nodes

It doesn't make any sense to create an error count node for
something that is not active.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Fix module removal logic

There's a bug at the sysfs node removal: dimm's are being removed
twice. Fix it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_mc: Improve the labels parsing

Only show labels for installed memories, and let it clear that,
when more than one label is pointed, the error is not on all
memories (it is an "or" logical operation, and not "and").

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: i5400: Fix DIMM memory filling

The routine were not filling DIMM banks right. Fix it, getting
rid of the rest of the fake csrow info.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Fix per layer error count counters

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Fix new error counts

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Fix sysfs csrow?/*ce*count counters

Fix a bug at the logic that were preventing error counts at the
same csrow/channel to be properly incremented.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: don't OOPS if the csrow is not visible

[  119.831403] EDAC DEBUG: edac_mc_handle_error: edac_mc_handle_error: incrementing csrows (-1,0)
[  119.831418] BUG: unable to handle kernel paging request at 000000010000001b
[  119.838677] IP: [<ffffffffa0131276>] edac_mc_handle_error+0x45e/0x8c1 [edac_core]

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Initialize the dimm label with the known information

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_mc: Fix the enable label filter logic

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Add a sysfs node to test the EDAC error report facility

Not all hardware supports error injection. Also, this feature
could be disabled by the BIOS. As we need to test the EDAC
error report facilities and the edac utils logic, add a way
to generate fake errors, when EDAC debug is enabled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_mc: Some clenups at the log message

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Cleanup the logs for i7core and sb edac drivers

Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Export MC hierarchy counters for CE and UE

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: rework memory layer hierarchy description

The old way of allocating data were confusing. It were
also introducing a miss-concept with regards to "channel",
when csrows are used.

Instead, use a more generic approach: breaks the memory
controller into layers. Drivers are free to describe the
layers any way they want, in order to match the memory
architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac-mc: Allow reporting errors on a non-csrow oriented way

The edac core were written with the idea that memory controllers
are able to directly access csrows, and that the channels are
used inside a csrows select.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some advanced memory controllers don't present a per-csrows
view.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This allowed to remove several hacks on FB-DIMM and RAMBUS
memory controllers.

Compile-tested all platforms (x86_64, i386, tile and several ppc subarchs).

Further per-driver tests will happen, in order to fix possible bugs
caused by this changeset.

TODO: a multi-rank DIMM's are currently represented by multiple DIMM
entries at struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same dimm.
Such bug is there since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: DIMM location cleanup

Cleans up the DIMM location information:
- Remove it from the csrow structure;
- make the location sysfs node code more flexible;
- cleans up the dimm code inside the drivers that fill the
dimm location properties.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Add per-dimm sysfs show nodes

Add sysfs nodes to describe DIMM properties: size, memory type,
dev type and edac mode.

With this change, the physical memory characteristics of the dimm stick
is now properly presented, as detected by the memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: move nr_pages to dimm struct

The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Don't initialize csrow's first_page & friends when not needed

Almost all edac drivers initialize first_page, last_page and
page_mask. Those vars are used inside the EDAC core, in order to
calculate the csrow affected by an error, by using the routine
edac_mc_find_csrow_by_page().

However, very few drivers actually use it:
        e752x_edac.c
        e7xxx_edac.c
        i3000_edac.c
        i82443bxgx_edac.c
        i82860_edac.c
        i82875p_edac.c
        i82975x_edac.c
        r82600_edac.c

There also a few other drivers that have their own calculus
formula internally using those vars.

All the others are just wasting time by initializing those
data.

While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: move dimm properties to struct memset_info

On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized. However, this is not true for all types of memory
controllers. Controllers for FB-DIMM's don't have such requirements.
Also, modern Intel controllers also seem to be capable of handling such
differences.

So, we need to get rid of storing the DIMM information into a per-csrow
data.

The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i7300_edac: Convert it to report memory with the new location

On this driver, the memory controller supports only FB-DIMMs.

The memory controller hierarchy here has 3 layers bellow the
memory controller:
        - two branches;
        - each branch has two channels;
        - each channel can select up to 8 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.

As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

i5400_edac: Convert it to report memory with the new location

On this driver, the memory controller supports only FB-DIMMs.

The memory controller hierarchy here has 3 layers bellow the
memory controller:
- two branches;
- each branch has two channels;
- each channel can select up to 4 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.

As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Prepare to push down to drivers the filling of the memset_info

Several data that it is currently stored per csrow will be
pushed into the dimm themselves. Due to that, the mci->dimms
filling will need to happen inside the drivers.

Prepare for that, by initializing the dimm fields during mci
alloc. With this change, the changes at the drivers will be
smaller, as they won't need to touch at the fields they don't
currently initialize.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Add per dimm's sysfs nodes

Instead of just exporting a per-csrow dimm directories, add
a pure per-dimm attributes. This will help to better map
the DIMM properties, when csrow info is not available.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Create a dimm struct and move the labels into it

The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel,
on csrow-based architectures, or on channel/dimm number,
on modern architectures.

On three drivers based on the modern architectures
(i5100_edac, sb_edac and i7core_edac), the labels were
filled inside the driver, as a way to avoid loosing the
channel/dimm number. Those drivers were converted to
properly fill the DIMM location properties internally.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

drivers/edac: rename channel_info to rank_info

What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.

On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.

On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.

The AMB selection is via the DIMM slot, and not via a csrow.

It is up to the AMB to talk with the csrows of the DRAM chips.

So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.

Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.

Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.

The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.

To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.

In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.

Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac: Better describe the memory concepts

The memory terms changed along the time, since when EDAC were originally
written: new concepts were introduced, and some things have different
meanings, depending on the memory architecture. Better define those
terms, and better describe each supported memory type.

No functional changes. Just comments were touched.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac/ppc4xx_edac: Fix compilation

It seems that nobody is cross-compiling for this arch anymore...

drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe':
drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove'
...
drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function)
drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in
drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value]

Acked-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Linux 3.2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  fix CAN MAINTAINERS SCM tree type
  mwifiex: fix crash during simultaneous scan and connect
  b43: fix regression in PIO case
  ath9k: Fix kernel panic in AR2427 in AP mode
  CAN MAINTAINERS update
  net: fsl: fec: fix build for mx23-only kernel
  sch_qfq: fix overflow in qfq_update_start()
  Revert "Bluetooth: Increase HCI reset timeout in hci_dev_do_close"

minixfs: misplaced checks lead to dentry leak

bitmap size sanity checks should be done *before* allocating ->s_root;
there their cleanup on failure would be correct. As it is, we do iput()
on root inode, but leak the root dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach

This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
   stopped thread group is not possible. This was our goal but it is
   not yet achieved: a stopped-but-resumed tracee can clone the running
   thread which can initiate another group-stop.

   Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
   and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
   but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
   in do_jobctl_trap() if another debugger attaches.

   Change __ptrace_unlink() to set the artificial SIGSTOP for report.

   Alternatively we could change ptrace_init_task() to copy signr from
   current, but this means we can copy it for no reason and hide the
   possible similar problems.

Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org> [3.1]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race

Test-case:

int main(void)
{
int pid, status;

pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}

assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);

assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);

do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);

return 1;
}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.

Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org> [3.0+]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.samba.org/sfrench/cifs-2.6

* git://git.samba.org/sfrench/cifs-2.6:
[CIFS] default ntlmv2 for cifs mount delayed to 3.3
cifs: fix bad buffer length check in coalesce_t2

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

Revert "rtc: Expire alarms after the time is set."

This reverts commit 93b2ec0128c431148b216b8f7337c1a52131ef03.

The call to "schedule_work()" in rtc_initialize_alarm() happens too
early, and can cause oopses at bootup

Neil Brown explains why we do it:

  "If you set an alarm in the future, then shutdown and boot again after
   that time, then you will end up with a timer_queue node which is in
   the past.

   When this happens the queue gets stuck.  That entry-in-the-past won't
   get removed until and interrupt happens and an interrupt won't happen
   because the RTC only triggers an interrupt when the alarm is "now".

   So you'll find that e.g.  "hwclock" will always tell you that
   'select' timed out.

   So we force the interrupt work to happen at the start just in case."

and has a patch that convert it to do things in-process rather than with
the worker thread, but right now it's too late to play around with this,
so we just revert the patch that caused problems for now.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Requested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Requested-by: John Stultz <john.stultz@linaro.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[CIFS] default ntlmv2 for cifs mount delayed to 3.3

Turned out the ntlmv2 (default security authentication)
upgrade was harder to test than expected, and we ran
out of time to test against Apple and a few other servers
that we wanted to. Delay upgrade of default security
from ntlm to ntlmv2 (on mount) to 3.3. Still works
fine to specify it explicitly via "sec=ntlmv2" so this
should be fine.

Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>

cifs: fix bad buffer length check in coalesce_t2

The current check looks to see if the RFC1002 length is larger than
CIFSMaxBufSize, and fails if it is. The buffer is actually larger than
that by MAX_CIFS_HDR_SIZE.

This bug has been around for a long time, but the fact that we used to
cap the clients MaxBufferSize at the same level as the server tended
to paper over it. Commit c974befa changed that however and caused this
bug to bite in more cases.

Reported-and-Tested-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>

Revert "rtc: Disable the alarm in the hardware"

This reverts commit c0afabd3d553c521e003779c127143ffde55a16f.

It causes failures on Toshiba laptops - instead of disabling the alarm,
it actually seems to enable it on the affected laptops, resulting in
(for example) the laptop powering on automatically five minutes after
shutdown.

There's a patch for it that appears to work for at least some people,
but it's too late to play around with this, so revert for now and try
again in the next merge window.

See for example

http://bugs.debian.org/652869

Reported-and-bisected-by: Andreas Friedrich <afrie@gmx.net> (Toshiba Tecra)
Reported-by: Antonio-M. Corbi Bellot <antonio.corbi@ua.es> (Toshiba Portege R500)
Reported-by: Marco Santos <marco.santos@waynext.com> (Toshiba Portege Z830)
Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> (Toshiba Portege R830)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Requested-by: John Stultz <john.stultz@linaro.org>
Cc: stable@kernel.org # for the versions that applied this
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hung_task: fix false positive during vfork

vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

security: Fix security_old_inode_init_security() when CONFIG_SECURITY is not set

Commit 1e39f384bb01 ("evm: fix build problems") makes the stub version
of security_old_inode_init_security() return 0 when CONFIG_SECURITY is
not set.

But that makes callers such as reiserfs_security_init() assume that
security_old_inode_init_security() has set name, value, and len
arguments properly - but security_old_inode_init_security() left them
uninitialized which then results in interesting failures.

Revert security_old_inode_init_security() to the old behavior of
returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this
just fine.

[ Also fixed the S_PRIVATE(inode) case of the actual non-stub
  security_old_inode_init_security() function to return EOPNOTSUPP
  for the same reason, as pointed out by Mimi Zohar.

  It got incorrectly changed to match the new function in commit
  fb88c2b6cbb1: "evm: fix security/security_old_init_security return
  code".   - Linus ]

Reported-by: Jorge Bastos <mysql.jorge@decimal.pt>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fix CAN MAINTAINERS SCM tree type

As pointed out by Joe Perches the SCM tree type was missing in my patch.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
CC: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
CC: Urs Thuermann <urs.thuermann@volkswagen.de>
CC: Wolfgang Grandegger <wg@grandegger.com>
CC: Marc Kleine-Budde <mkl@pengutronix.de>
CC: linux-can@vger.kernel.org

mwifiex: fix crash during simultaneous scan and connect

If 'iw connect' command is fired when driver is already busy in
serving 'iw scan' command, ssid specific scan operation for connect
is skipped. In this case cmd wait queue handler gets called with no
command in queue (i.e. adapter->cmd_queued = NULL).

This patch adds a NULL check in mwifiex_wait_queue_complete()
routine to fix crash observed during simultaneous scan and assoc
operations.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43: fix regression in PIO case

This patch fixes the regression, introduced by

commit 17030f48e31adde5b043741c91ba143f5f7db0fd
From: Rafał Miłecki <zajec5@gmail.com>
Date: Thu, 11 Aug 2011 17:16:27 +0200
Subject: [PATCH] b43: support new RX header, noticed to be used in 598.314+ fw

in PIO case.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ath9k: Fix kernel panic in AR2427 in AP mode

don't do aggregation related stuff for 'AP mode client power save
handling' if aggregation is not enabled in the driver, otherwise it
will lead to panic because those data structures won't be never
intialized in 'ath_tx_node_init' if aggregation is disabled

EIP is at ath_tx_aggr_wakeup+0x37/0x80 [ath9k]
EAX: e8c09a20 EBX: f2a304e8 ECX: 00000001 EDX: 00000000
ESI: e8c085e0 EDI: f2a304ac EBP: f40e1ca4 ESP: f40e1c8c
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper/1 (pid: 0, ti=f40e0000 task=f408e860
task.ti=f40dc000)
Stack:
0001e966 e8c09a20 00000000 f2a304ac e8c085e0 f2a304ac
f40e1cb0 f8186741
f8186700 f40e1d2c f922988d f2a304ac 00000202 00000001
c0b4ba43 00000000
0000000f e8eb75c0 e8c085e0 205b0001 34383220 f2a304ac
f2a30000 00010020
Call Trace:
[<f8186741>] ath9k_sta_notify+0x41/0x50 [ath9k]
[<f8186700>] ? ath9k_get_survey+0x110/0x110 [ath9k]
[<f922988d>] ieee80211_sta_ps_deliver_wakeup+0x9d/0x350
[mac80211]
[<c018dc75>] ? __module_address+0x95/0xb0
[<f92465b3>] ap_sta_ps_end+0x63/0xa0 [mac80211]
[<f9246746>] ieee80211_rx_h_sta_process+0x156/0x2b0
[mac80211]
[<f9247d1e>] ieee80211_rx_handlers+0xce/0x510 [mac80211]
[<c018440b>] ? trace_hardirqs_on+0xb/0x10
[<c056936e>] ? skb_queue_tail+0x3e/0x50
[<f9248271>] ieee80211_prepare_and_rx_handle+0x111/0x750
[mac80211]
[<f9248bf9>] ieee80211_rx+0x349/0xb20 [mac80211]
[<f9248949>] ? ieee80211_rx+0x99/0xb20 [mac80211]
[<f818b0b8>] ath_rx_tasklet+0x818/0x1d00 [ath9k]
[<f8187a75>] ? ath9k_tasklet+0x35/0x1c0 [ath9k]
[<f8187a75>] ? ath9k_tasklet+0x35/0x1c0 [ath9k]
[<f8187b33>] ath9k_tasklet+0xf3/0x1c0 [ath9k]
[<c0151b7e>] tasklet_action+0xbe/0x180

Cc: stable@kernel.org
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Reported-by: Ashwin Mendonca <ashwinloyal@gmail.com>
Tested-by: Ashwin Mendonca <ashwinloyal@gmail.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth

CAN MAINTAINERS update

Update the CAN MAINTAINERS section:

- point out active maintainers
- pull the CAN driver discussion away from netdev ML
- point to the new CAN web site on gitorious.org
- add CAN development git repository URL to submit patches

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
CC: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
CC: Urs Thuermann <urs.thuermann@volkswagen.de>
CC: Wolfgang Grandegger <wg@grandegger.com>
CC: Marc Kleine-Budde <mkl@pengutronix.de>
CC: linux-can@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fsl: fec: fix build for mx23-only kernel

If one only selects mx23-based boards, compile fails:

drivers/net/ethernet/freescale/fec.c:410:2: error: 'FEC_HASH_TABLE_HIGH' undeclared (first use in this function)
drivers/net/ethernet/freescale/fec.c:411:2: error: 'FEC_HASH_TABLE_LOW' undeclared (first use in this function)

This is because fec.h uses CONFIG_SOC_IMX28 to determine the register
layout of the core which makes sense since the MX23 does not have a fec.
However, Kconfig uses the broader ARCH_MXS symbol and this way even
makes the fec-driver default for MX23. Adapt Kconfig to use the more
precise SOC_IMX28 as well.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_qfq: fix overflow in qfq_update_start()

grp->slot_shift is between 22 and 41, so using 32bit wide variables is
probably a typo.

This could explain QFQ hangs Dave reported to me, after 2^23 packets ?

(23 = 64 - 41)

Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/radeon/kms/atom: fix possible segfault in pm setup

If we end up with no power states, don't look up
current vddc.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=44130

agd5f: fix patch formatting

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
dt/device: Fix auxdata matching to handle entries without a name override

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: ctnetlink: fix timeout calculation
  ipvs: try also real server with port 0 in backup server
  skge: restore rx multicast filter on resume and after config changes
  mlx4_en: nullify cq->vector field when closing completion queue

Merge branch 'fix/asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

* 'fix/asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8776: add missing break in sample size switch

gspca: Fix falling back to lower isoc alt settings

The current gspca core code has a regression where it no longer properly
falls back to lower alt settings when there is not enough bandwidth.

This causes many iso based usb-1 cameras to not work when plugged into a
usb2 hub or a sandybridge chipset motherboard!

This patch fixes this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

futex: Fix uninterruptible loop due to gate_area

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

netfilter: ctnetlink: fix timeout calculation

The sanity check (timeout < 0) never works; the dividend is unsigned
and so is the division, which should have been a signed division.

long timeout = (ct->timeout.expires - jiffies) / HZ;
if (timeout < 0)
timeout = 0;

This patch converts the time values to signed for the division.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: try also real server with port 0 in backup server

We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

skge: restore rx multicast filter on resume and after config changes

Restore skge hardware registers for multicast filtering to their
appropriate values after system resume and after hardware restarts
that are done when changing certain settings.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: nullify cq->vector field when closing completion queue

Caused loss of connectivity when changing ring size.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: 7237/1: PL330: Fix driver freeze
  ARM: 7197/1: errata: Remove SMP dependency for erratum 751472
  ARM: 7196/1: errata: Remove SMP dependency for erratum 720789
  ARM: 7220/1: mmc: mmci: Fixup error handling for dma
  ARM: 7214/1: mmc: mmci: Fixup handling of MCI_STARTBITERR

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: plat-orion: make gpiochip label unique
  enable uncompress log on cpuimx35sd
  cpuimx35: fix touchscreen support
  cpuimx35sd: fix Kconfig
  clock-imx35: fix reboot in internal boot mode
  dma: MX3_IPU fix depends
  imx_v4_v5_defconfig: update default configuration
  cpuimx25sd: fix Kconfig
  arm/imx: fix cpufreq section mismatch
  ARM:imx:fix pwm period value
  ARM: OMAP: hwmod data: fix iva and mailbox hwmods for OMAP 3

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: sentelic - fix retrieving number of buttons
Input: sentelic - release mutex upon register write failure

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: disable use of dcache for readdir etc.

Merge branch 'v3.2-samsung-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung

* 'v3.2-samsung-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: EXYNOS: Remove duplicated SROMC static memory mapping
ARM: SAMSUNG: Fix build error when selecting CPU_FREQ_S3C24XX_DEBUGFS on S3C2440

Revert "clockevents: Set noop handler in clockevents_exchange_device()"

This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c.

It results in resume problems for various people. See for example

  http://thread.gmane.org/gmane.linux.kernel/1233033
  http://thread.gmane.org/gmane.linux.kernel/1233389
  http://thread.gmane.org/gmane.linux.kernel/1233159
  http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

  https://bugzilla.redhat.com/show_bug.cgi?id=767248
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Phil Miller <mille121@illinois.edu>
Reported-by: Philip Langdale <philipl@overt.org>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org # for stable kernels that applied the original
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://www.linux-watchdog.org/linux-watchdog

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: iTCO_wdt.c - problems with newer hardware due to SMI clearing (part 2)
  watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
  watchdog: sp805: Fix section mismatch in ID table.
  watchdog: move coh901327 state holders

Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Initialize domain->handler in iommu_domain_alloc()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  packet: fix possible dev refcnt leak when bind fail
  netem: dont call vfree() under spinlock and BH disabled
  netfilter: ctnetlink: fix scheduling while atomic if helper is autoloaded
  netfilter: ctnetlink: fix return value of ctnetlink_get_expect()

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix raw_spin_unlock_irqrestore() usage
oprofile, arm/sh: Fix oprofile_arch_exit() linkage issue

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log all dirty inodes in xfs_fs_sync_fs
xfs: log the inode in ->write_inode calls for kupdate

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix blk_queue_end_tag()
  block: re-use existing 'reading' variable instead of checking direction again
  block, cfq: fix empty queue crash caused by request merge

mm: hugetlb: fix non-atomic enqueue of huge page

If a huge page is enqueued under the protection of hugetlb_lock, then the
operation is atomic and safe.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

procfs: do not confuse jiffies with cputime64_t

Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/mempolicy.c: refix mbind_range() vma issue

commit 8aacc9f550 ("mm/mempolicy.c: fix pgoff in mbind vma merge") is the
slightly incorrect fix.

Why? Think following case.

1. map 4 pages of a file at offset 0

   [0123]

2. map 2 pages just after the first mapping of the same file but with
   page offset 2

   [0123][23]

3. mbind() 2 pages from the first mapping at offset 2.
   mbind_range() should treat new vma is,

   [0123][23]
     |23|
     mbind vma

   but it does

   [0123][23]
     |01|
     mbind vma

   Oops. then, it makes wrong vma merge and splitting ([01][0123] or similar).

This patch fixes it.

[testcase]
  test result - before the patch

case4: 126: test failed. expect '2,4', actual '2,2,2'
        case5: passed
case6: passed
case7: passed
case8: passed
case_n: 246: test failed. expect '4,2', actual '1,4'

------------[ cut here ]------------
kernel BUG at mm/filemap.c:135!
invalid opcode: 0000 [#4] SMP DEBUG_PAGEALLOC

(snip long bug on messages)

  test result - after the patch

case4: passed
        case5: passed
case6: passed
case7: passed
case8: passed
case_n: passed

  source:  mbind_vma_test.c
============================================================
#include <numaif.h>
#include <numa.h>
#include <sys/mman.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

static unsigned long pagesize;
void* mmap_addr;
struct bitmask *nmask;
char buf[1024];
FILE *file;
char retbuf[10240] = "";
int mapped_fd;

char *rubysrc = "ruby -e '\
  pid = %d; \
  vstart = 0x%llx; \
  vend = 0x%llx; \
  s = `pmap -q #{pid}`; \
  rary = []; \
  s.each_line {|line|; \
    ary=line.split(\" \"); \
    addr = ary[0].to_i(16); \
    if(vstart <= addr && addr < vend) then \
      rary.push(ary[1].to_i()/4); \
    end; \
  }; \
  print rary.join(\",\"); \
'";

void init(void)
{
void* addr;
char buf[128];

nmask = numa_allocate_nodemask();
numa_bitmask_setbit(nmask, 0);

pagesize = getpagesize();

sprintf(buf, "%s", "mbind_vma_XXXXXX");
mapped_fd = mkstemp(buf);
if (mapped_fd == -1)
perror("mkstemp "), exit(1);
unlink(buf);

if (lseek(mapped_fd, pagesize*8, SEEK_SET) < 0)
perror("lseek "), exit(1);
if (write(mapped_fd, "\0", 1) < 0)
perror("write "), exit(1);

addr = mmap(NULL, pagesize*8, PROT_NONE,
    MAP_SHARED, mapped_fd, 0);
if (addr == MAP_FAILED)
perror("mmap "), exit(1);

if (mprotect(addr+pagesize, pagesize*6, PROT_READ|PROT_WRITE) < 0)
perror("mprotect "), exit(1);

mmap_addr = addr + pagesize;

/* make page populate */
memset(mmap_addr, 0, pagesize*6);
}

void fin(void)
{
void* addr = mmap_addr - pagesize;
munmap(addr, pagesize*8);

memset(buf, 0, sizeof(buf));
memset(retbuf, 0, sizeof(retbuf));
}

void mem_bind(int index, int len)
{
int err;

err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_BIND, nmask->maskp, nmask->size, 0);
if (err)
perror("mbind "), exit(err);
}

void mem_interleave(int index, int len)
{
int err;

err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_INTERLEAVE, nmask->maskp, nmask->size, 0);
if (err)
perror("mbind "), exit(err);
}

void mem_unbind(int index, int len)
{
int err;

err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_DEFAULT, NULL, 0, 0);
if (err)
perror("mbind "), exit(err);
}

void Assert(char *expected, char *value, char *name, int line)
{
if (strcmp(expected, value) == 0) {
fprintf(stderr, "%s: passed\n", name);
return;
}
else {
fprintf(stderr, "%s: %d: test failed. expect '%s', actual '%s'\n",
name, line,
expected, value);
// exit(1);
}
}

/*
      AAAA
    PPPPPPNNNNNN
    might become
    PPNNNNNNNNNN
    case 4 below
*/
void case4(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

mem_bind(0, 4);
mem_unbind(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("2,4", retbuf, "case4", __LINE__);

fin();
}

/*
       AAAA
PPPPPPNNNNNN
might become
PPPPPPPPPPNN
case 5 below
*/
void case5(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

mem_bind(0, 2);
mem_bind(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("4,2", retbuf, "case5", __LINE__);

fin();
}

/*
    AAAA
PPPPNNNNXXXX
might become
PPPPPPPPPPPP 6
*/
void case6(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

mem_bind(0, 2);
mem_bind(4, 2);
mem_bind(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("6", retbuf, "case6", __LINE__);

fin();
}

/*
    AAAA
PPPPNNNNXXXX
might become
PPPPPPPPXXXX 7
*/
void case7(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

mem_bind(0, 2);
mem_interleave(4, 2);
mem_bind(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("4,2", retbuf, "case7", __LINE__);

fin();
}

/*
    AAAA
PPPPNNNNXXXX
might become
PPPPNNNNNNNN 8
*/
void case8(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

mem_bind(0, 2);
mem_interleave(4, 2);
mem_interleave(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("2,4", retbuf, "case8", __LINE__);

fin();
}

void case_n(void)
{
init();
sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

/* make redundunt mappings [0][1234][34][7] */
mmap(mmap_addr + pagesize*4, pagesize*2, PROT_READ|PROT_WRITE,
     MAP_FIXED|MAP_SHARED, mapped_fd, pagesize*3);

/* Expect to do nothing. */
mem_unbind(2, 2);

file = popen(buf, "r");
fread(retbuf, sizeof(retbuf), 1, file);
Assert("4,2", retbuf, "case_n", __LINE__);

fin();
}

int main(int argc, char** argv)
{
case4();
case5();
case6();
case7();
case8();
case_n();

return 0;
}
=============================================================

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Caspar Zhang <caspar@casparzhang.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <stable@vger.kernel.org> [3.1.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gspca: Fix bulk mode cameras no longer working (regression fix)

The new iso bandwidth calculation code accidentally has broken support
for bulk mode cameras. This has broken the following drivers:
finepix, jeilinj, ovfx2, ov534, ov534_9, se401, sq905, sq905c, sq930x,
stv0680, vicam.

Thix patch fixes this. Fix tested with: se401, sq905, sq905c, stv0680 & vicam
cams.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Input: sentelic - fix retrieving number of buttons

Fixing wrong register offset which is used to retrieve the number of buttons
attached to the hardware.

Signed-off-by: Tai-hwa Liang <avatar@sentelic.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

ceph: disable use of dcache for readdir etc.

Ceph attempts to use the dcache to satisfy negative lookups and readdir
when the entire directory contents are in cache. Disable this behavior
until lingering bugs in this code are shaken out; we'll re-enable these
hooks once things are fully stable.

Signed-off-by: Sage Weil <sage@newdream.net>

block: fix blk_queue_end_tag()

Commit 5e081591 "block: warn if tag is greater than real_max_depth"
cleaned up blk_queue_end_tag() to warn when the tag is truly invalid
(greater than real_max_depth).  However, it changed behavior in the tag <
max_depth case to not end the request.  Leading to triggering of
BUG_ON(blk_queued_rq(rq)) in the request completion path:

  http://marc.info/?l=linux-kernel&m=132204370518629&w=2

In order to allow blk_queue_resize_tags() to shrink the tag space
blk_queue_end_tag() must always complete tags with a value less than
real_max_depth regardless of the current max_depth.  The comment about
"handling the shrink case" seems to be what prompted changes in this
space, so remove it and BUG on all invalid tags (made even simpler by
Matthew's suggestion to use an unsigned compare).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Tao Ma <boyu.mt@taobao.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Reported-by: Meelis Roos <mroos@ut.ee>
Reported-by: Ed Nadolski <edmund.nadolski@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ARM: EXYNOS: Remove duplicated SROMC static memory mapping

SROMC static memory mapping is included in the common s5p initialization
code. Hence, remove the duplicated SROMC static memory mapping for EXYNOS.

Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Cc: stable@kernel.org
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: SAMSUNG: Fix build error when selecting CPU_FREQ_S3C24XX_DEBUGFS on S3C2440

Following is happened when CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
is selected without building of s3c2410-iotiming.c file:

arch/arm/mach-s3c2440/built-in.o:(.data+0x38c): undefined reference to `s3c2410_iotiming_debugfs

Basically, the CONFIG_S3C2410_IOTIMING is not selected for
MACH_MINI2440. Because the s3c2410-iotiming.c is not ever
compiled and enabling CONFIG_CPU_FREQ_S3C24XX_DEBUGFS option
caused undefined reference to s3c2410_iotiming_debugfs()
defined in that file. The s3c2410_iotiming_debugfs defined
as NULL for this case.

Signed-off-by: Denis Kuzmenko <linux@solonet.org.ua>
Cc: stable@kernel.org
[kgene.kim@samsung.com: removed useless changes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

packet: fix possible dev refcnt leak when bind fail

If bind is fail when bind is called after set PACKET_FANOUT
sock option, the dev refcnt will leak.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

watchdog: iTCO_wdt.c - problems with newer hardware due to SMI clearing (part 2)

Redhat Bugzilla: Bug 727875 - TCO_EN bit is disabled by TCO driver

The previous patch breaks reset watchdog behaviour on the older hardware.
It is therefor better to make sure that the behaviour for older hardware (<=ICH5 or
6300ESB) is preserved and that the behaviour for newer hardware is changed.
We therefor use the iTCO_version to see if we need the clearing of the SMI_TCO_EN
bit in the SMI_EN register.

So the new behaviour becomes:
turn_SMI_watchdog_clear_off=0 -> Do not turn off SMI clearing watchdog.
turn_SMI_watchdog_clear_off=1 -> Turn off SMI clearing watchdog when iTCO_version=1
(ICHO till ICH5 + 6300ESB only)
turn_SMI_watchdog_clear_off=2 -> Turn off SMI clearing watchdog.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

drm/i915: Disable RC6 on Sandybridge by default

RC6 fails again.

> I found my system freeze mostly during starting up X and KDE. Sometimes it
> works for some minutes, sometimes it freezes immediatly. When the freeze
> happens, everything is dead (even the reset button does not work, I need to
> power cycle).

> I disabled RC6, and my system runs wonderfully.

> The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
> GB of RAM and UEFI firmware.

Reported-by: Kai Krakow <hurikhan77@gmail.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drm/i915: Disable semaphores by default on SNB

Semaphores still cause problems on some machines:

> From Udo Steinberg:
>
> With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
> text scroll in an xterm, such as when extracting a tar archive. Such as this
> one (note the timestamps):
>
>  I can reproduce it fairly easily with something
>  as simple as:
>
>   while true; do dmesg; done

This patch turns them off on SNB while leaving them on for IVB.

Reported-by: Udo Steinberg <udo@hypervisor.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Eugeni Dodonov <eugeni@dodonov.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: e500: include linux/export.h
  KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
  KVM: PPC: protect use of kvmppc_h_pr
  KVM: PPC: move compute_tlbie_rb to book3s_64 common header
  KVM: Don't automatically expose the TSC deadline timer in cpuid
  KVM: Device assignment permission checks
  KVM: Remove ability to assign a device without iommu support
  KVM: x86: Prevent starting PIT timers in the absence of irqchip support