The edac_mc_alloc() routine allocates one dimm_info device for all
possible memories, including the non-filled ones. The debug messages
there are somewhat confusing. So, cleans them, by moving the code
that prints the memory location to edac_mc, and using it on both
edac_mc_sysfs and edac_mc.
Also, only dumps information when DIMM/ranks are actually
filled.
After this patch, a dimm-based memory controller will print the debug
info as:
i7core: fix ranks information at the per-channel struct
There is a flag at the per-channel struct that indicates if there are
any 4R dimm on it. The way the presence of this flag were reported
is not ok, as it might give the false idea that the channel were filled
with 2R memories:
The fatal error channel bits point to a single channel, and not
to a range of channels. Fix the code to properly report it,
instead of printing messages like:
kernel: EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Edac: Add ABI Documentation for the new device nodes
The EDAC ABI were extended to add support for per-DIMM or per-rank
information and silkscreen labels. Properly document them.
Most of the comments there came from edac.txt descriptions of the
fields that are part of the legacy csrowX ABI (e. g.
/sys/devices/system/edac/mc/mc*/csrow*/*).
edac: Only expose csrows/channels on legacy API if they're populated
This patch actually fixes a bug with the legacy API, where, at the
same csrow, some channels may have different DIMMs. This can happen
on FB-DIMM/RAMBUS and modern Intel controllers.
This is the case, for example, of Nehalem machines:
i7300_edac: Get rid of some wrongly-solved rebase conflict
Drivers should not increment tot_dimms manually. This code were
there for an older approach that got removed. After one of the
several rebases that this code suffered until the final version,
the conflict was wrongly merged, and this code returned.
i5100_edac: Fix a warning when compiled with 32 bits
drivers/edac/i5100_edac.c: In function ‘i5100_init_csrows’:
drivers/edac/i5100_edac.c:862:3: warning: format ‘%zd’ expects argument of type ‘signed size_t’, but argument 5 has type ‘long unsigned int’ [-Wformat]
edac: Move grain/dtype/edac_type calculus to be out of channel loop
The 3e7bddc changeset (edac: move dimm properties to struct memset_info)
moved the calculus inside a loop. However, at those stuff are common to
all channels, on several drivers, it is better to put the calculus
outside the loop, to optimize the code.
Reported-by: Aristeu Rozanski Filho <arozansk@redhat.com> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
edac: add a sysfs node to report the maximum location for the system
The userspace tools need to know what's the maximum location on each
system, as it helps to create nice maps showing how the memory was
filled at the system.
edac: add a new per-dimm API and make the old per-virtual-rank API obsolete
The old EDAC API is broken. It only works fine for systems manufatured
before 2005 and for AMD 64. The reason is that it forces all memory
controller drivers to discover rank info.
Also, it doesn't allow grouping the several ranks into a DIMM.
So, what almost all modern drivers do is to create a fake virtual-rank
information, and use it to cheat the EDAC core to accept the driver.
While this works if the user has enough time to discover what DIMM slot
corresponds to each "virtual-rank" information, it prevents EDAC usage
for users with less available time. It also makes life hard for vendors
that may want to provide a table with their motherboards to the userspace
tool (edac-utils) as each driver has its own logic for the virtual
mapping.
So, the old API should be removed, in favor of a more flexible API that
allows newer drivers to not lie to the EDAC core.
amd64_edac: convert sysfs logic to use struct device
Now that the EDAC core supports struct device, there's no sense
on having any logic at the EDAC core to simulate it. So, instead
of adding such logic there, change the logic at amd64_edac to
use it.
mpc85xx_edac: convert sysfs logic to use struct device
Now that the EDAC core supports struct device, there's no sense on
having any logic at the EDAC core to simulate it. So, instead of adding
such logic there, change the logic at mpc85xx_edac to use it
The EDAC subsystem uses the old struct sysdev approach,
creating all nodes using the raw sysfs API. This is bad,
as the API is deprecated.
As we'll be changing the EDAC API, let's first port the existing
code to struct device.
There's one drawback on this patch: driver-specific sysfs
nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
won't be created anymore. While it would be possible to
also port the device-specific code, that would mix kobj with
struct device, with is not recommended. Also, it is easier and nicer
to move the code to the drivers, instead, as the core can get rid
of some complex logic that just emulates what the device_add()
and device_create_file() already does.
The next patches will convert the driver-specific code to use
the device-specific calls. Then, the remaining bits of the old
sysfs API will be removed.
NOTE: a per-MC bus is required, otherwise devices with more than
one memory controller will hit a bug like the one below:
This happens because the bus is not being properly initialized.
Instead of putting the memory sub-devices inside the memory controller,
it is putting everything under the same directory:
As EDAC doesn't use struct device itself, it created a parent dev
pointer called as "pdev". Now that we'll be converting it to use
struct device, instead of struct devsys, this needs to be fixed.
No functional changes.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
i5000_edac: Fix the logic that retrieves memory information
The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.
The check if the AMB is present were also wrong.
Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.
After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:
RAS: Add a tracepoint for reporting memory controller events
Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.
Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.
[1] http://lwn.net/Articles/416669/
We have several subsystems & methods for reporting hardware errors:
1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.
2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.
3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();
Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.
As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.
Where:
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)
Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.
NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.
edac: Cleanup the logs for i7core and sb edac drivers
Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.
edac: Initialize the dimm label with the known information
While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.
For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: A1_DIMM0
Bank Locator: A1_Node0_Channel0_Dimm0
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 800 MHz
Manufacturer: A1_Manufacturer0
Serial Number: A1_SerNum0
Asset Tag: A1_AssetTagNum0
Part Number: A1_PartNum0
The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.
After this patch, the memory label will be filled with:
/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0
And (after the new EDAC API patches) as:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0
So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.
It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).
edac: Change internal representation to work with layers
Change the EDAC internal representation to work with non-csrow
based memory controllers.
There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.
The edac core was written with the idea that memory controllers
are able to directly access csrows.
This is not true for FB-DIMM and RAMBUS memory controllers.
Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.
So, change the allocation and error report routines to allow
them to work with all types of architectures.
This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.
Also, several tests were done on different platforms using different
x86 drivers.
TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.
I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.
Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
edac.h: Add generic layers for describing a memory location
The edac core were written with the idea that memory controllers
are able to directly access csrows, and that the channels are
used inside a csrows select.
This is not true for FB-DIMM and RAMBUS memory controllers.
Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks, accessed
via csrow/channel.
So, changes are needed in order to allow the EDAC core to
work with all types of architectures.
In preparation for handling non-csrows based memory controllers,
add some memory structs and a macro:
enum hw_event_mc_err_type: describes the type of error
(corrected, uncorrected, fatal)
To be used by the new edac_mc_handle_error function;
enum edac_mc_layer: describes the type of a given memory
architecture layer (branch, channel, slot, csrow).
struct edac_mc_layer: describes the properties of a memory
layer (type, size, and if the layer
will be used on a virtual csrow.
EDAC_DIMM_PTR() - as the number of layers can vary from 1 to 3,
this macro converts from an address with up to 3 layers into
a linear address.
The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.
Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.
In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.
edac: Don't initialize csrow's first_page & friends when not needed
Almost all edac drivers initialize csrow_info->first_page,
csrow_info->last_page and csrow_info->page_mask. Those vars are
used inside the EDAC core, in order to calculate the csrow affected
by an error, by using the routine edac_mc_find_csrow_by_page().
However, very few drivers actually use it:
e752x_edac.c
e7xxx_edac.c
i3000_edac.c
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
r82600_edac.c
There also a few other drivers that have their own calculus
formula internally using those vars.
All the others are just wasting time by initializing those
data.
While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.
On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.
However, such assumption is not true for all types of memory
controllers.
Controllers for FB-DIMM's don't have such requirements.
Also, modern Intel controllers seem to be capable of handling such
differences.
So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.
The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
edac: Create a dimm struct and move the labels into it
The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.
This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.
Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.
All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.
TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.
The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.
Linus Torvalds [Sun, 13 May 2012 00:27:41 +0000 (17:27 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM: SoC fixes from Olof Johansson:
"I was hoping to be done with fixes for 3.4 but we got two branches
from subarch maintainers the last couple of days. So here is one
last(?) pull request for arm-soc containing 7 patches:
- Five of them are for shmobile dealing with SMP setup and compile
failures
- The remaining two are for regressions on the Samsung platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1
ARM: EXYNOS: use s5p-timer for UniversalC210 board
ARM / mach-shmobile: Invalidate caches when booting secondary cores
ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper
Linus Torvalds [Sun, 13 May 2012 00:24:29 +0000 (17:24 -0700)]
Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull a few more GPIO bug fixes from Grant Likely:
"Oops, missed a couple. Here's an updated pull req for GPIO"
A set of PCH bug fixes, and one patch to fix up compile warnings
* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6:
gpio/exynos: Fix compiler warnings when non-exynos machines are selected
gpio: pch9: Use proper flow type handlers
Olof Johansson [Sat, 12 May 2012 22:41:22 +0000 (15:41 -0700)]
Merge branch 'v3.4-samsung-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes
* 'v3.4-samsung-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1
ARM: EXYNOS: use s5p-timer for UniversalC210 board
Marek Szyprowski [Fri, 11 May 2012 21:17:59 +0000 (06:17 +0900)]
ARM: EXYNOS: use s5p-timer for UniversalC210 board
Commit 069d4e743 ("ARM: EXYNOS4: Remove clock event timers using
ARM private timers") removed support for local timers and forced
to use MCT as event source. However MCT is not operating properly
on early revision of EXYNOS4 SoCs. All UniversalC210 boards are
based on it, so that commit broke support for it. This patch
provides a workaround that enables UniversalC210 boards to boot
again. s5p-timer is used as an event source, it works only for
non-SMP builds.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Olof Johansson [Sat, 12 May 2012 22:40:56 +0000 (15:40 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/renesas into fixes
By Guennadi Liakhovetski (2) and others via Rafael J. Wysocki:
"[...] urgent fixes for Renesas ARM-based platforms. Four of these
commits are fixes of regressions new in 3.4-rc and the last one is
necessary for SMP to work on those systems in general."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/renesas:
ARM / mach-shmobile: Invalidate caches when booting secondary cores
ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper
Magnus Damm [Wed, 9 May 2012 07:24:59 +0000 (16:24 +0900)]
ARM / mach-shmobile: Invalidate caches when booting secondary cores
Make sure L1 caches are invalidated when booting secondary
cores. Needed to boot all mach-shmobile SMP systems that
are using Cortex-A9 including sh73a0, r8a7779 and EMEV2.
Thanks to imx and tegra guys for actual code.
Signed-off-by: Magnus Damm <damm@opensource.se> Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Kuninori Morimoto [Thu, 10 May 2012 07:26:58 +0000 (00:26 -0700)]
ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
Fix SMP TWD boot regression on sh73a0 based platforms caused by:
4200b16 ARM: shmobile: convert to twd_local_timer_register() interface
After the merge of the above commit it has been impossible to boot
sh73a0 based SoCs with SMP enabled and CONFIG_HAVE_ARM_TWD=y. The
kernel crashes at smp_init_cpus() timing which is before the console
has been initialized, so to the user this looks like a kernel lock up
without any particular error message.
This patch fixes the regression on sh73a0 by moving the TWD
registration code from smp_init_cpus() to sys_timer->init() time.
This patch removed shmobile_twd_init() which is no longer needed
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Magnus Damm <damm@opensource.se> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Magnus Damm [Thu, 10 May 2012 05:57:22 +0000 (14:57 +0900)]
ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
Fix SMP TWD boot regression on r8a7779 based platforms caused by:
4200b16 ARM: shmobile: convert to twd_local_timer_register() interface
After the merge of the above commit it has been impossible to boot
r8a7779 based SoCs with SMP enabled and CONFIG_HAVE_ARM_TWD=y. The
kernel crashes at smp_init_cpus() timing which is before the console
has been initialized, so to the user this looks like a kernel lock up
without any particular error message.
This patch fixes the regression on r8a7779 by moving the TWD
registration code from smp_init_cpus() to sys_timer->init() time.
Signed-off-by: Magnus Damm <damm@opensource.se> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
This also fixes the following modular mmc build failure:
arch/arm/mach-shmobile/built-in.o: In function `mackerel_sdhi0_gpio_cd':
pfc-sh7372.c:(.text+0x1138): undefined reference to `mmc_detect_change'
on this platform by eliminating the use of an inline function, which
calls into the mmc core.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Reviewed-by: Simon Horman <horms@verge.net.au> Acked-by: Magnus Damm <damm@opensource.se> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper
This also fixes the following modular mmc build failure:
arch/arm/mach-shmobile/built-in.o: In function `ag5evm_sdhi0_gpio_cd':
pfc-sh73a0.c:(.text+0x7c0): undefined reference to `mmc_detect_change'
on this platform by eliminating the use of an inline function, which
calls into the mmc core.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Simon Horman <horms@verge.net.au> Acked-by: Magnus Damm <damm@opensource.se> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Linus Torvalds [Sat, 12 May 2012 20:02:31 +0000 (13:02 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of minor qla and virto fixes plus one major regression
fix (oops in all legacy host drivers)."
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] virtio_scsi: fix TMF use-after-free
[SCSI] fix oops in all legacy host adapters caused by 6f381fa
[SCSI] qla2xxx: Update version number to 8.04.00.03-k.
[SCSI] qla2xxx: Properly check for current state after the fabric-login request.
[SCSI] qla2xxx: Proper completion to scsi-ml for scsi status task_set_full and busy.
[SCSI] qla2xxx: Block flash access from application when device is initialized for ISP82xx.
[SCSI] qla2xxx: Fix reset time out as qla2xxx not ack to reset request.
1) Since we do RCU lookups on ipv4 FIB entries, we have to test if the
entry is dead before returning it to our caller.
2) openvswitch locking and packet validation fixes from Ansis Atteka,
Jesse Gross, and Pravin B Shelar.
3) Fix PM resume locking in IGB driver, from Benjamin Poirier.
4) Fix VLAN header handling in vhost-net and macvtap, from Basil Gor.
5) Revert a bogus network namespace isolation change that was causing
regressions on S390 networking devices.
6) If bonding decides to process and handle a LACPDU frame, we
shouldn't bump the rx_dropped counter. From Jiri Bohac.
7) Fix mis-calculation of available TX space in r8169 driver when doing
TSO, which can lead to crashes and/or hung device. From Julien
Ducourthial.
8) SCTP does not validate cached routes properly in all cases, from
Nicolas Dichtel.
9) Link status interrupt needs to be handled in ks8851 driver, from
Stephen Boyd.
10) Use capable(), not cap_raised(), in connector/userns netlink code.
From Eric W. Biederman via Andrew Morton.
11) Fix pktgen OOPS on module unload, from Eric Dumazet.
12) iwlwifi under-estimates SKB truesizes, also from Eric Dumazet.
13) Cure division by zero in SFC driver, from Ben Hutchings.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
ks8851: Update link status during link change interrupt
macvtap: restore vlan header on user read
vhost-net: fix handle_rx buffer size
bonding: don't increase rx_dropped after processing LACPDUs
connector/userns: replace netlink uses of cap_raised() with capable()
sctp: check cached dst before using it
pktgen: fix crash at module unload
Revert "net: maintain namespace isolation between vlan and real device"
ehea: fix losing of NEQ events when one event occurred early
igb: fix rtnl race in PM resume path
ipv4: Do not use dead fib_info entries.
r8169: fix unsigned int wraparound with TSO
sfc: Fix division by zero when using one RX channel and no SR-IOV
openvswitch: Validation of IPv6 set port action uses IPv4 header
net: compare_ether_addr[_64bits]() has no ordering
cdc_ether: Ignore bogus union descriptor for RNDIS devices
bnx2x: bug fix when loading after SAN boot
e1000: Silence sparse warnings by correcting type
igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path
openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.
...
Linus Torvalds [Sat, 12 May 2012 19:56:08 +0000 (12:56 -0700)]
Merge tag 'dm-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper fixes from Alasdair G Kergon:
"Fix a couple of serious memory leaks in device-mapper thin
provisioning and tidy its MODULE_DESCRIPTION.
Mitigate occasional reported hangs associated with multipath scsi_dh
module loading."
* tag 'dm-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm mpath: check if scsi_dh module already loaded before trying to load
dm thin: correct module description
dm thin: fix unprotected use of prepared_discards list
dm thin: reinstate missing mempool_free in cell_release_singleton
Rafael J. Wysocki [Fri, 11 May 2012 19:35:45 +0000 (21:35 +0200)]
MAINTAINERS: Add myself as the cpufreq maintainer
Since cpufreq has no official maintainer at the moment, I'm willing
to maintain it along some other power management core code I've been
maintaining already.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Snitzer [Sat, 12 May 2012 00:43:21 +0000 (01:43 +0100)]
dm mpath: check if scsi_dh module already loaded before trying to load
If the requested scsi_dh module is already loaded then skip
request_module().
Multipath table loads can hang in an unnecessary __request_module.
Reported-by: Ben Marzinski <bmarzins@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Sat, 12 May 2012 00:43:16 +0000 (01:43 +0100)]
dm thin: fix unprotected use of prepared_discards list
Fix two places in commit 104655fd4dce ("dm thin: support discards") that
didn't use pool->lock to protect against concurrent changes to the
prepared_discards list.
Without this fix, thin_endio() can race with process_discard(), leading
to concurrent list_add()s that result in the processes locking up with
an error like the following:
Mike Snitzer [Sat, 12 May 2012 00:43:12 +0000 (01:43 +0100)]
dm thin: reinstate missing mempool_free in cell_release_singleton
Fix a significant memory leak inadvertently introduced during
simplification of cell_release_singleton() in commit 6f94a4c45a6f744383f9f695dde019998db3df55 ("dm thin: fix stacked bi_next
usage").
A cell's hlist_del() must be accompanied by a mempool_free().
Use __cell_release() to do this, like before.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
gpio/exynos: Fix compiler warnings when non-exynos machines are selected
Fixes the following compiler warnings:
drivers/gpio/gpio-samsung.c: In function ‘samsung_gpiolib_init’:
drivers/gpio/gpio-samsung.c:2980:1: warning: label ‘err_ioremap1’ defined but not used [-Wunused-label]
drivers/gpio/gpio-samsung.c:2978:1: warning: label ‘err_ioremap2’ defined but not used [-Wunused-label]
drivers/gpio/gpio-samsung.c:2976:1: warning: label ‘err_ioremap3’ defined but not used [-Wunused-label]
drivers/gpio/gpio-samsung.c:2974:1: warning: label ‘err_ioremap4’ defined but not used [-Wunused-label]
drivers/gpio/gpio-samsung.c:2722:55: warning: unused variable ‘gpio_base4’ [-Wunused-variable]
drivers/gpio/gpio-samsung.c:455:32: warning: ‘exynos_gpio_cfg’ defined but not used [-Wunused-variable]
drivers/gpio/gpio-samsung.c:2126:33: warning: ‘exynos4_gpios_1’ defined but not used [-Wunused-variable]
drivers/gpio/gpio-samsung.c:2228:33: warning: ‘exynos4_gpios_2’ defined but not used [-Wunused-variable]
drivers/gpio/gpio-samsung.c:2373:33: warning: ‘exynos4_gpios_3’ defined but not used [-Wunused-variable]
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Thomas Gleixner [Sat, 28 Apr 2012 08:13:45 +0000 (10:13 +0200)]
gpio: pch9: Use proper flow type handlers
Jean-Francois Dagenais reported:
Configuring a gpio pin with the gpio-pch driver with
"IRQF_TRIGGER_LOW | IRQF_ONESHOT" generates an interrupt storm for
threaded ISR until the ISR thread actually gets to physically clear
the interrupt on the triggering chip!! The immediate observable
symptom is the high CPU usage for my ISR thread task and the
interrupt count in /proc/interrupts incrementing radically.
The driver is wrong in several ways:
1) Using handle_simple_irq() does not provide proper flow control
handling. In the case of oneshot threaded handlers for the
demultiplexed interrupts this results in an interrupt storm because
the simple handler does not deal with masking/unmasking. Even
without threaded oneshot handlers an interrupt storm for level type
interrupts can easily be triggered when the interrupt is disabled
and the interrupt line is activated from the device.
2) Acknowlegding the demultiplexed interrupt before calling the
handler is wrong for level type interrupts.
3) The set_type function unconditionally enables the interrupt. It's
supposed to set the type and nothing else. The unmasking is done by
the core code.
Move the acknowledge code into a separate function and add it to the
demux irqchip callbacks.
Remove the unconditional enabling from the set_type() callback and set
the proper flow handlers depending on the selected type (level/edge).
Reported-and-tested-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Linus Torvalds [Fri, 11 May 2012 23:58:14 +0000 (16:58 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull another powerpc irq fix from Benjamin Herrenschmidt:
"It looks like my previous fix for the lazy irq masking problem wasn't
quite enough. There was another problem related to performance
monitor interrupts acting as NMIs leaving the flags in an incorrect
state. Here's a fix that finally seems to make perf solid again."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/irq: Fix another case of lazy IRQ state getting out of sync
Linus Torvalds [Fri, 11 May 2012 23:49:09 +0000 (16:49 -0700)]
Merge branch '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target fix from Nicholas Bellinger:
"This patch removes some incorrect legacy code to free se_lun_acl
memory in the NodeACL release path that could potentially trigger an
OOPS during shutdown once dynamic -> explicit initiator NodeACL
conversion has occurred.
That said, we've been able to trigger an OOPS in v4.0 code for this
special case when the associated MappedLUNs had not also been made
explicit based on active TPG LUN layout during the conversion, so it
really makes senses to go ahead and drop this extra cruft to avoid any
possible issues here.
This ends up only effecting iscsi-target module code (it's the only
user) and is CC'ed to stable."
* '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Drop incorrect se_lun_acl release for dynamic -> explict ACL conversion
Benjamin Herrenschmidt [Thu, 10 May 2012 16:12:38 +0000 (16:12 +0000)]
powerpc/irq: Fix another case of lazy IRQ state getting out of sync
So we have another case of paca->irq_happened getting out of
sync with the HW irq state. This can happen when a perfmon
interrupt occurs while soft disabled, as it will return to a
soft disabled but hard enabled context while leaving a stale
PACA_IRQ_HARD_DIS flag set.
This patch fixes it, and also adds a test for the condition
of those flags being out of sync in arch_local_irq_restore()
when CONFIG_TRACE_IRQFLAGS is enabled.
This helps catching those gremlins faster (and so far I
can't seem see any anymore, so that's good news).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Boyd [Thu, 10 May 2012 12:51:30 +0000 (12:51 +0000)]
ks8851: Update link status during link change interrupt
If a link change interrupt comes in we just clear the interrupt
and continue along without notifying the upper networking layers
that the link has changed. Use the mii_check_link() function to
update the link status whenever a link change interrupt occurs.
Cc: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Basil Gor [Thu, 3 May 2012 22:55:24 +0000 (22:55 +0000)]
macvtap: restore vlan header on user read
Ethernet vlan header is not on the packet and kept in the skb->vlan_tci
when it comes from lower dev. This patch inserts vlan header in user
buffer during skb copy on user read.
Signed-off-by: Basil Gor <basil.gor@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicholas Bellinger [Fri, 11 May 2012 05:05:49 +0000 (22:05 -0700)]
target: Drop incorrect se_lun_acl release for dynamic -> explict ACL conversion
This patch removes some potentially problematic legacy code within
core_clear_initiator_node_from_tpg() that was originally intended to
release left over se_lun_acl setup during dynamic NodeACL+MappedLUN
generate when running with TPG demo-mode operation.
Since we now only ever expect to allocate and release se_lun_acl from
within target_core_fabric_configfs.c:target_fabric_make_mappedlun() and
target_fabric_drop_mappedlun() context respectively, this code for
demo-mode release is incorrect and needs to be removed.
Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Fri, 11 May 2012 16:28:35 +0000 (09:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull a m68knommu fix from Greg Ungerer:
"It contains a single fix for including the ColdFire QSPI interface
setup code when enabled as a module. This was broken in the
consolidation of the ColdFire SoC device tables in the 3.4 merge
window."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: enable qspi support when SPI_COLDFIRE_QSPI = m
Hugh Dickins [Fri, 11 May 2012 08:00:07 +0000 (01:00 -0700)]
mm: raise MemFree by reverting percpu_pagelist_fraction to 0
Why is there less MemFree than there used to be? It perturbed a test,
so I've just been bisecting linux-next, and now find the offender went
upstream yesterday.
Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
us expect it to be left unset at 0 (and it's not then used as a divisor).
MemTotal: 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB
Repetitive test with percpu_pagelist_fraction 8:
MemFree: 6948420kB 6237172kB 6949696kB 6840692kB 6949048kB 6862984kB
Same test with percpu_pagelist_fraction back to 0:
MemFree: 7945000kB 7944908kB 7948568kB 7949060kB 7948796kB 7948812kB
Signed-off-by: Hugh Dickins <hughd@google.com>
[ We really should fix the crazy sysctl interface too, but that's a
separate thing - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Bohac [Wed, 9 May 2012 01:01:40 +0000 (01:01 +0000)]
bonding: don't increase rx_dropped after processing LACPDUs
Since commit 3aba891d, bonding processes LACP frames (802.3ad
mode) with bond_handle_frame(). Currently a copy of the skb is
made and the original is left to be processed by other
rx_handlers and the rest of the network stack by returning
RX_HANDLER_ANOTHER. As there is no protocol handler for
PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
increased.
Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
if bonding has processed the LACP frame.
Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 4 May 2012 11:34:03 +0000 (11:34 +0000)]
connector/userns: replace netlink uses of cap_raised() with capable()
In 2009 Philip Reiser notied that a few users of netlink connector
interface needed a capability check and added the idiom
cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise
that netlink was asynchronous.
In 2011 Patrick McHardy noticed we were being silly because netlink is
synchronous and removed eff_cap from the netlink_skb_params and changed
the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN).
Looking at those spots with a fresh eye we should be calling
capable(CAP_SYS_ADMIN). The only reason I can see for not calling capable
is that it once appeared we were not in the same task as the caller which
would have made calling capable() impossible.
In the initial user_namespace the only difference between between
cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a
few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets
PF_SUPERPRIV if we use the capability.
Since we are going to be using root privilege setting PF_SUPERPRIV seems
the right thing to do.
The motivation for this that patch is that in a child user namespace
cap_raised(current_cap(),...) tests your capabilities with respect to that
child user namespace not capabilities in the initial user namespace and
thus will allow processes that should be unprivielged to use the kernel
services that are only protected with cap_raised(current_cap(),..).
To fix possible user_namespace issues and to just clean up the code
replace cap_raised(current_cap(), CAP_SYS_ADMIN) with
capable(CAP_SYS_ADMIN).
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Philipp Reisner <philipp.reisner@linbit.com> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>