events/hw_event: Create a Hardware Events Report Mecanism (HERM)
Adds a trace class for handle hardware events
Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.
[1] http://lwn.net/Articles/416669/
We have several subsystems & methods for reporting hardware errors:
1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.
2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.
3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();
Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.
In order to provide a proper hardware event subsystem, let's
encapsulate the memory error hardware events into a trace facility.
As no agreement was reached so far for the MCA-based trace events, for
now, let's add events only for memory errors. A latter patch can change
the tracepoint, for events originated via MCA.
edac: Call the minimum grain node as "rank" if chip select is used
The minimum hierarchy unit for csrows-based MC's is rank, and not dimm.
There are a few possible fixes here:
1) add a per-rank location at the dimm struct, and don't create one
dimm_info entry per rank;
2) Convert the drivers to internally convert ranks into dimm's;
3) rename "dimm" with "rank";
A few drivers do (2) internally: some of them have per-dimm registers,
instead of per-rank ones; a few others have some routines to translate
between the two representations of the memory. Those drivers report the
memories via a DIMM slot and a channel.
While no better solution is done, let's do (2) for the drivers that work
with csrows/channel layers to represent a rank.
This driver is different from the other drivers: it is not for a
FB-DIMM based device, but it does the right thing of exporting
the memories as dimm's and not as ranks.
Due to that, the mapping should be different than the other drivers.
After this change, the memories are properly reported:
i5000_edac: Fix the logic that retrieves memory information
The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.
The check if the AMB is present were also wrong.
Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.
After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:
edac: be sure to use the GET_POS macro to get memset_info struct
Most drivers use the csrows/channel way to get the struct memset_info
associated with a dimm/rank. FB-DIMM's and new Intel arch drivers
use different memory layers. Be sure to get the right DIMM on
those drivers. So, use a macro to calculate the offset of the dimm inside
its structure.
edac: fill the location with something useful if the DIMM is not found
If, for any reason, the driver passes a location for a not-filled
memory, uses "unknown memory" for the label. This is to prevent a
message like the one below:
[30418.916116] Generating a fake error to 0.-1.1 to test core handling. NOTE: this won't test the driver-specific decoding logic.
[30418.916123] EDAC MC0: UE FAKE ERROR on (branch 0 slot 1 page 0x0 offset 0x0 grain 0 for EDAC testing only)
(in this case, there's no memories at slot 1 - so it is simulating
a driver decoding failure)
When i5400_edac driver is removed and re-loaded a few times, it causes
an OOPS, as it is currently decrementing some PCI device usage two
times.
When called inside a loop, pci_get_device() will call
pci_put_device(). That mangles the error count. In this specific
case, it seems easier to just duplicate the call.
Also fixes the error logic when pci_get_device fails.
Only show labels for installed memories, and let it clear that,
when more than one label is pointed, the error is not on all
memories (it is an "or" logical operation, and not "and").
edac: Add a sysfs node to test the EDAC error report facility
Not all hardware supports error injection. Also, this feature
could be disabled by the BIOS. As we need to test the EDAC
error report facilities and the edac utils logic, add a way
to generate fake errors, when EDAC debug is enabled.
edac: Cleanup the logs for i7core and sb edac drivers
Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.
The old way of allocating data were confusing. It were
also introducing a miss-concept with regards to "channel",
when csrows are used.
Instead, use a more generic approach: breaks the memory
controller into layers. Drivers are free to describe the
layers any way they want, in order to match the memory
architecture.
edac-mc: Allow reporting errors on a non-csrow oriented way
The edac core were written with the idea that memory controllers
are able to directly access csrows, and that the channels are
used inside a csrows select.
This is not true for FB-DIMM and RAMBUS memory controllers.
Also, some advanced memory controllers don't present a per-csrows
view.
So, change the allocation and error report routines to allow
them to work with all types of architectures.
This allowed to remove several hacks on FB-DIMM and RAMBUS
memory controllers.
Compile-tested all platforms (x86_64, i386, tile and several ppc subarchs).
Further per-driver tests will happen, in order to fix possible bugs
caused by this changeset.
TODO: a multi-rank DIMM's are currently represented by multiple DIMM
entries at struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same dimm.
Such bug is there since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.
Cleans up the DIMM location information:
- Remove it from the csrow structure;
- make the location sysfs node code more flexible;
- cleans up the dimm code inside the drivers that fill the
dimm location properties.
edac: Don't initialize csrow's first_page & friends when not needed
Almost all edac drivers initialize first_page, last_page and
page_mask. Those vars are used inside the EDAC core, in order to
calculate the csrow affected by an error, by using the routine
edac_mc_find_csrow_by_page().
However, very few drivers actually use it:
e752x_edac.c
e7xxx_edac.c
i3000_edac.c
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
r82600_edac.c
There also a few other drivers that have their own calculus
formula internally using those vars.
All the others are just wasting time by initializing those
data.
While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.
On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized. However, this is not true for all types of memory
controllers. Controllers for FB-DIMM's don't have such requirements.
Also, modern Intel controllers also seem to be capable of handling such
differences.
So, we need to get rid of storing the DIMM information into a per-csrow
data.
The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.
i7300_edac: Convert it to report memory with the new location
On this driver, the memory controller supports only FB-DIMMs.
The memory controller hierarchy here has 3 layers bellow the
memory controller:
- two branches;
- each branch has two channels;
- each channel can select up to 8 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.
As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.
i5400_edac: Convert it to report memory with the new location
On this driver, the memory controller supports only FB-DIMMs.
The memory controller hierarchy here has 3 layers bellow the
memory controller:
- two branches;
- each branch has two channels;
- each channel can select up to 4 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.
As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.
edac: Prepare to push down to drivers the filling of the memset_info
Several data that it is currently stored per csrow will be
pushed into the dimm themselves. Due to that, the mci->dimms
filling will need to happen inside the drivers.
Prepare for that, by initializing the dimm fields during mci
alloc. With this change, the changes at the drivers will be
smaller, as they won't need to touch at the fields they don't
currently initialize.
Instead of just exporting a per-csrow dimm directories, add
a pure per-dimm attributes. This will help to better map
the DIMM properties, when csrow info is not available.
edac: Create a dimm struct and move the labels into it
The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.
This forced drivers to fake a csrow struct, and to create
a mess under csrow/channel original's concept.
Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel,
on csrow-based architectures, or on channel/dimm number,
on modern architectures.
On three drivers based on the modern architectures
(i5100_edac, sb_edac and i7core_edac), the labels were
filled inside the driver, as a way to avoid loosing the
channel/dimm number. Those drivers were converted to
properly fill the DIMM location properties internally.
All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.
What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.
On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.
On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.
The AMB selection is via the DIMM slot, and not via a csrow.
It is up to the AMB to talk with the csrows of the DRAM chips.
So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.
Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.
Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.
The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.
To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.
In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.
Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.
The memory terms changed along the time, since when EDAC were originally
written: new concepts were introduced, and some things have different
meanings, depending on the memory architecture. Better define those
terms, and better describe each supported memory type.
No functional changes. Just comments were touched.
It seems that nobody is cross-compiling for this arch anymore...
drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe':
drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove'
...
drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function)
drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in
drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fix CAN MAINTAINERS SCM tree type
mwifiex: fix crash during simultaneous scan and connect
b43: fix regression in PIO case
ath9k: Fix kernel panic in AR2427 in AP mode
CAN MAINTAINERS update
net: fsl: fec: fix build for mx23-only kernel
sch_qfq: fix overflow in qfq_update_start()
Revert "Bluetooth: Increase HCI reset timeout in hci_dev_do_close"
Al Viro [Wed, 4 Jan 2012 10:51:03 +0000 (10:51 +0000)]
minixfs: misplaced checks lead to dentry leak
bitmap size sanity checks should be done *before* allocating ->s_root;
there their cleanup on failure would be correct. As it is, we do iput()
on root inode, but leak the root dentry...
Oleg Nesterov [Wed, 4 Jan 2012 16:29:20 +0000 (17:29 +0100)]
ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach
This is the temporary simple fix for 3.2, we need more changes in this
area.
1. do_signal_stop() assumes that the running untraced thread in the
stopped thread group is not possible. This was our goal but it is
not yet achieved: a stopped-but-resumed tracee can clone the running
thread which can initiate another group-stop.
Remove WARN_ON_ONCE(!current->ptrace).
2. A new thread always starts with ->jobctl = 0. If it is auto-attached
and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
in do_jobctl_trap() if another debugger attaches.
Change __ptrace_unlink() to set the artificial SIGSTOP for report.
Alternatively we could change ptrace_init_task() to copy signr from
current, but this means we can copy it for no reason and hide the
possible similar problems.
do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);
return 1;
}
It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().
The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.
This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.
I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
The call to "schedule_work()" in rtc_initialize_alarm() happens too
early, and can cause oopses at bootup
Neil Brown explains why we do it:
"If you set an alarm in the future, then shutdown and boot again after
that time, then you will end up with a timer_queue node which is in
the past.
When this happens the queue gets stuck. That entry-in-the-past won't
get removed until and interrupt happens and an interrupt won't happen
because the RTC only triggers an interrupt when the alarm is "now".
So you'll find that e.g. "hwclock" will always tell you that
'select' timed out.
So we force the interrupt work to happen at the start just in case."
and has a patch that convert it to do things in-process rather than with
the worker thread, but right now it's too late to play around with this,
so we just revert the patch that caused problems for now.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Requested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Requested-by: John Stultz <john.stultz@linaro.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve French [Wed, 4 Jan 2012 05:08:24 +0000 (23:08 -0600)]
[CIFS] default ntlmv2 for cifs mount delayed to 3.3
Turned out the ntlmv2 (default security authentication)
upgrade was harder to test than expected, and we ran
out of time to test against Apple and a few other servers
that we wanted to. Delay upgrade of default security
from ntlm to ntlmv2 (on mount) to 3.3. Still works
fine to specify it explicitly via "sec=ntlmv2" so this
should be fine.
Acked-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
Jeff Layton [Sun, 1 Jan 2012 15:34:39 +0000 (10:34 -0500)]
cifs: fix bad buffer length check in coalesce_t2
The current check looks to see if the RFC1002 length is larger than
CIFSMaxBufSize, and fails if it is. The buffer is actually larger than
that by MAX_CIFS_HDR_SIZE.
This bug has been around for a long time, but the fact that we used to
cap the clients MaxBufferSize at the same level as the server tended
to paper over it. Commit c974befa changed that however and caused this
bug to bite in more cases.
Reported-and-Tested-by: Konstantinos Skarlatos <k.skarlatos@gmail.com> Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
It causes failures on Toshiba laptops - instead of disabling the alarm,
it actually seems to enable it on the affected laptops, resulting in
(for example) the laptop powering on automatically five minutes after
shutdown.
There's a patch for it that appears to work for at least some people,
but it's too late to play around with this, so revert for now and try
again in the next merge window.
See for example
http://bugs.debian.org/652869
Reported-and-bisected-by: Andreas Friedrich <afrie@gmx.net> (Toshiba Tecra) Reported-by: Antonio-M. Corbi Bellot <antonio.corbi@ua.es> (Toshiba Portege R500) Reported-by: Marco Santos <marco.santos@waynext.com> (Toshiba Portege Z830) Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> (Toshiba Portege R830) Cc: Jonathan Nieder <jrnieder@gmail.com> Requested-by: John Stultz <john.stultz@linaro.org> Cc: stable@kernel.org # for the versions that applied this Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Jan Kara [Tue, 3 Jan 2012 12:14:29 +0000 (13:14 +0100)]
security: Fix security_old_inode_init_security() when CONFIG_SECURITY is not set
Commit 1e39f384bb01 ("evm: fix build problems") makes the stub version
of security_old_inode_init_security() return 0 when CONFIG_SECURITY is
not set.
But that makes callers such as reiserfs_security_init() assume that
security_old_inode_init_security() has set name, value, and len
arguments properly - but security_old_inode_init_security() left them
uninitialized which then results in interesting failures.
Revert security_old_inode_init_security() to the old behavior of
returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this
just fine.
[ Also fixed the S_PRIVATE(inode) case of the actual non-stub
security_old_inode_init_security() function to return EOPNOTSUPP
for the same reason, as pointed out by Mimi Zohar.
It got incorrectly changed to match the new function in commit fb88c2b6cbb1: "evm: fix security/security_old_init_security return
code". - Linus ]
Reported-by: Jorge Bastos <mysql.jorge@decimal.pt> Acked-by: James Morris <jmorris@namei.org> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amitkumar Karwar [Tue, 3 Jan 2012 00:18:40 +0000 (16:18 -0800)]
mwifiex: fix crash during simultaneous scan and connect
If 'iw connect' command is fired when driver is already busy in
serving 'iw scan' command, ssid specific scan operation for connect
is skipped. In this case cmd wait queue handler gets called with no
command in queue (i.e. adapter->cmd_queued = NULL).
This patch adds a NULL check in mwifiex_wait_queue_complete()
routine to fix crash observed during simultaneous scan and assoc
operations.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Guennadi Liakhovetski [Mon, 26 Dec 2011 17:28:08 +0000 (18:28 +0100)]
b43: fix regression in PIO case
This patch fixes the regression, introduced by
commit 17030f48e31adde5b043741c91ba143f5f7db0fd
From: Rafał Miłecki <zajec5@gmail.com>
Date: Thu, 11 Aug 2011 17:16:27 +0200
Subject: [PATCH] b43: support new RX header, noticed to be used in 598.314+ fw
in PIO case.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mohammed Shafi Shajakhan [Mon, 26 Dec 2011 05:12:15 +0000 (10:42 +0530)]
ath9k: Fix kernel panic in AR2427 in AP mode
don't do aggregation related stuff for 'AP mode client power save
handling' if aggregation is not enabled in the driver, otherwise it
will lead to panic because those data structures won't be never
intialized in 'ath_tx_node_init' if aggregation is disabled
Oliver Hartkopp [Tue, 3 Jan 2012 08:40:28 +0000 (08:40 +0000)]
CAN MAINTAINERS update
Update the CAN MAINTAINERS section:
- point out active maintainers
- pull the CAN driver discussion away from netdev ML
- point to the new CAN web site on gitorious.org
- add CAN development git repository URL to submit patches
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> CC: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> CC: Urs Thuermann <urs.thuermann@volkswagen.de> CC: Wolfgang Grandegger <wg@grandegger.com> CC: Marc Kleine-Budde <mkl@pengutronix.de> CC: linux-can@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Wolfram Sang [Tue, 3 Jan 2012 03:46:47 +0000 (03:46 +0000)]
net: fsl: fec: fix build for mx23-only kernel
If one only selects mx23-based boards, compile fails:
drivers/net/ethernet/freescale/fec.c:410:2: error: 'FEC_HASH_TABLE_HIGH' undeclared (first use in this function)
drivers/net/ethernet/freescale/fec.c:411:2: error: 'FEC_HASH_TABLE_LOW' undeclared (first use in this function)
This is because fec.h uses CONFIG_SOC_IMX28 to determine the register
layout of the core which makes sense since the MX23 does not have a fec.
However, Kconfig uses the broader ARCH_MXS symbol and this way even
makes the fec-driver default for MX23. Adapt Kconfig to use the more
precise SOC_IMX28 as well.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: David S. Miller <davem@davemloft.net> Acked-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 2 Jan 2012 05:47:57 +0000 (05:47 +0000)]
sch_qfq: fix overflow in qfq_update_start()
grp->slot_shift is between 22 and 41, so using 32bit wide variables is
probably a typo.
This could explain QFQ hangs Dave reported to me, after 2^23 packets ?
(23 = 64 - 41)
Reported-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: ctnetlink: fix timeout calculation
ipvs: try also real server with port 0 in backup server
skge: restore rx multicast filter on resume and after config changes
mlx4_en: nullify cq->vector field when closing completion queue
Hugh Dickins [Sat, 31 Dec 2011 19:44:01 +0000 (11:44 -0800)]
futex: Fix uninterruptible loop due to gate_area
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Julian Anastasov [Fri, 30 Dec 2011 05:19:02 +0000 (14:19 +0900)]
ipvs: try also real server with port 0 in backup server
We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Zumbiehl [Fri, 30 Dec 2011 17:30:09 +0000 (17:30 +0000)]
skge: restore rx multicast filter on resume and after config changes
Restore skge hardware registers for multicast filtering to their
appropriate values after system resume and after hardware restarts
that are done when changing certain settings.
Signed-off-by: Florian Zumbiehl <florz@florz.de> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
which got bisected down to the stable version of this commit.
Reported-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Phil Miller <mille121@illinois.edu> Reported-by: Philip Langdale <philipl@overt.org> Reported-by: Tim Gardner <tim.gardner@canonical.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <gregkh@suse.de> Cc: stable@kernel.org # for stable kernels that applied the original Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 30 Dec 2011 20:13:03 +0000 (12:13 -0800)]
Merge git://www.linux-watchdog.org/linux-watchdog
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: iTCO_wdt.c - problems with newer hardware due to SMI clearing (part 2)
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
watchdog: sp805: Fix section mismatch in ID table.
watchdog: move coh901327 state holders
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
packet: fix possible dev refcnt leak when bind fail
netem: dont call vfree() under spinlock and BH disabled
netfilter: ctnetlink: fix scheduling while atomic if helper is autoloaded
netfilter: ctnetlink: fix return value of ctnetlink_get_expect()
Andreas Schwab [Wed, 28 Dec 2011 23:57:15 +0000 (15:57 -0800)]
procfs: do not confuse jiffies with cputime64_t
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.
This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.
Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Artem S. Tashkinov" <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hans de Goede [Thu, 29 Dec 2011 21:09:21 +0000 (19:09 -0200)]
gspca: Fix bulk mode cameras no longer working (regression fix)
The new iso bandwidth calculation code accidentally has broken support
for bulk mode cameras. This has broken the following drivers:
finepix, jeilinj, ovfx2, ov534, ov534_9, se401, sq905, sq905c, sq930x,
stv0680, vicam.
Sage Weil [Thu, 29 Dec 2011 16:05:14 +0000 (08:05 -0800)]
ceph: disable use of dcache for readdir etc.
Ceph attempts to use the dcache to satisfy negative lookups and readdir
when the entire directory contents are in cache. Disable this behavior
until lingering bugs in this code are shaken out; we'll re-enable these
hooks once things are fully stable.
Dan Williams [Thu, 29 Dec 2011 08:16:28 +0000 (09:16 +0100)]
block: fix blk_queue_end_tag()
Commit 5e081591 "block: warn if tag is greater than real_max_depth"
cleaned up blk_queue_end_tag() to warn when the tag is truly invalid
(greater than real_max_depth). However, it changed behavior in the tag <
max_depth case to not end the request. Leading to triggering of
BUG_ON(blk_queued_rq(rq)) in the request completion path:
In order to allow blk_queue_resize_tags() to shrink the tag space
blk_queue_end_tag() must always complete tags with a value less than
real_max_depth regardless of the current max_depth. The comment about
"handling the shrink case" seems to be what prompted changes in this
space, so remove it and BUG on all invalid tags (made even simpler by
Matthew's suggestion to use an unsigned compare).
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Tao Ma <boyu.mt@taobao.com> Cc: Matthew Wilcox <matthew@wil.cx> Reported-by: Meelis Roos <mroos@ut.ee> Reported-by: Ed Nadolski <edmund.nadolski@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Kuzmenko [Wed, 28 Dec 2011 05:04:51 +0000 (14:04 +0900)]
ARM: SAMSUNG: Fix build error when selecting CPU_FREQ_S3C24XX_DEBUGFS on S3C2440
Following is happened when CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
is selected without building of s3c2410-iotiming.c file:
arch/arm/mach-s3c2440/built-in.o:(.data+0x38c): undefined reference to `s3c2410_iotiming_debugfs
Basically, the CONFIG_S3C2410_IOTIMING is not selected for
MACH_MINI2440. Because the s3c2410-iotiming.c is not ever
compiled and enabling CONFIG_CPU_FREQ_S3C24XX_DEBUGFS option
caused undefined reference to s3c2410_iotiming_debugfs()
defined in that file. The s3c2410_iotiming_debugfs defined
as NULL for this case.
Wim Van Sebroeck [Mon, 26 Dec 2011 14:23:51 +0000 (15:23 +0100)]
watchdog: iTCO_wdt.c - problems with newer hardware due to SMI clearing (part 2)
Redhat Bugzilla: Bug 727875 - TCO_EN bit is disabled by TCO driver
The previous patch breaks reset watchdog behaviour on the older hardware.
It is therefor better to make sure that the behaviour for older hardware (<=ICH5 or
6300ESB) is preserved and that the behaviour for newer hardware is changed.
We therefor use the iTCO_version to see if we need the clearing of the SMI_TCO_EN
bit in the SMI_EN register.
So the new behaviour becomes:
turn_SMI_watchdog_clear_off=0 -> Do not turn off SMI clearing watchdog.
turn_SMI_watchdog_clear_off=1 -> Turn off SMI clearing watchdog when iTCO_version=1
(ICHO till ICH5 + 6300ESB only)
turn_SMI_watchdog_clear_off=2 -> Turn off SMI clearing watchdog.
Keith Packard [Tue, 27 Dec 2011 01:02:11 +0000 (17:02 -0800)]
drm/i915: Disable RC6 on Sandybridge by default
RC6 fails again.
> I found my system freeze mostly during starting up X and KDE. Sometimes it
> works for some minutes, sometimes it freezes immediatly. When the freeze
> happens, everything is dead (even the reset button does not work, I need to
> power cycle).
> I disabled RC6, and my system runs wonderfully.
> The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
> GB of RAM and UEFI firmware.
Reported-by: Kai Krakow <hurikhan77@gmail.com> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keith Packard [Tue, 27 Dec 2011 01:02:10 +0000 (17:02 -0800)]
drm/i915: Disable semaphores by default on SNB
Semaphores still cause problems on some machines:
> From Udo Steinberg:
>
> With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
> text scroll in an xterm, such as when extracting a tar archive. Such as this
> one (note the timestamps):
>
> I can reproduce it fairly easily with something
> as simple as:
>
> while true; do dmesg; done
This patch turns them off on SNB while leaving them on for IVB.
Reported-by: Udo Steinberg <udo@hypervisor.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Eugeni Dodonov <eugeni@dodonov.net> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 26 Dec 2011 21:17:00 +0000 (13:17 -0800)]
Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: e500: include linux/export.h
KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
KVM: PPC: protect use of kvmppc_h_pr
KVM: PPC: move compute_tlbie_rb to book3s_64 common header
KVM: Don't automatically expose the TSC deadline timer in cpuid
KVM: Device assignment permission checks
KVM: Remove ability to assign a device without iommu support
KVM: x86: Prevent starting PIT timers in the absence of irqchip support