]> www.infradead.org Git - users/mchehab/rasdaemon.git/log
users/mchehab/rasdaemon.git
10 years agoBump version to 0.5.5 v0.5.5
Mauro Carvalho Chehab [Wed, 3 Jun 2015 13:59:55 +0000 (10:59 -0300)]
Bump version to 0.5.5

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agoImprove INSTALL summary instructions
Mauro Carvalho Chehab [Wed, 3 Jun 2015 13:42:46 +0000 (10:42 -0300)]
Improve INSTALL summary instructions

Using && warrants that the previous command succeeds. So, this
is the recommended way.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: add support to match the machine by system's product name
Aristeu Rozanski [Mon, 1 Jun 2015 20:04:00 +0000 (17:04 -0300)]
rasdaemon: add support to match the machine by system's product name

In some cases the motherboard names will change but the mapping won't
across a line of products. This patch adds support for "Product:" to be
specified in the label files instead of Model:.

An example:
Vendor: Dell Inc.
  Product: PowerEdge R610
    DIMM_A1: 0.0.0;     DIMM_A2:  0.0.1;        DIMM_A3:  0.0.2;
    DIMM_A4: 0.1.0;     DIMM_A5:  0.1.1;        DIMM_A6:  0.1.2;

    DIMM_B1: 1.0.0;     DIMM_B2:  1.0.1;        DIMM_B3:  1.0.2;
    DIMM_B4: 1.1.0;     DIMM_B5:  1.1.1;        DIMM_B6:  1.1.2;

Would match all 'PowerEdge R610' machines.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: make sure the error is valid before handling ranks
Seiichi Ikarashi [Tue, 26 May 2015 14:59:39 +0000 (11:59 -0300)]
rasdaemon: make sure the error is valid before handling ranks

Fix "rank" handling according to the Bit 63 description in Intel SDM Vol.3C
Table 16-23, that says "... Use this information only after there is valid
first error info indicated by bit 62".
Also fix invalid comparisons of unsigned variables "rank0" and "rank1".

Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: enable IMC status usage for Haswell-E
Seiichi Ikarashi [Tue, 26 May 2015 14:59:38 +0000 (11:59 -0300)]
rasdaemon: enable IMC status usage for Haswell-E

Enable IMC status bank for Haswell-E, as described in Intel SDM Vol.3C
Table 35-27.

Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: add missing semicolon in hsw_decode_model()
Seiichi Ikarashi [Tue, 26 May 2015 14:59:37 +0000 (11:59 -0300)]
rasdaemon: add missing semicolon in hsw_decode_model()

hsw_decode_model() tries to skip decode_bitfield() if IA32_MC4_STATUS indicates
some internal errors. Unfortunately, here behaves opposite to the intention
because a semicolon is missing.

Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: properly pring message strings in decode_bitfield()
Seiichi Ikarashi [Tue, 26 May 2015 14:59:36 +0000 (11:59 -0300)]
rasdaemon: properly pring message strings in decode_bitfield()

Fix decode_bitfield() so that it does print message strings from the struct
field table.

Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: add support for Knights Landing
Aristeu Rozanski [Mon, 18 May 2015 17:19:33 +0000 (14:19 -0300)]
rasdaemon: add support for Knights Landing

Patch based on mcelog.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: add support for Broadwell
Aristeu Rozanski [Mon, 18 May 2015 17:19:32 +0000 (14:19 -0300)]
rasdaemon: add support for Broadwell

Only basic support for now.

Based on mcelog code.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: Identify Ivy Bridge properly
Aristeu Rozanski [Mon, 18 May 2015 17:19:31 +0000 (14:19 -0300)]
rasdaemon: Identify Ivy Bridge properly

This patch is based on b29cc4d615cead87cbc163ada0645b10c5b1217d (mcelog)
mcelog: Identify Ivy Bridge properly

Uniquely identify Ivy Bridge even though the machine checks are the same
for Sandy Bridge and Ivy Bridge.  This makes the output for the processor
display "Ivy Bridge".

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: tony.luck@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: Add missing entry to Ivy Bridge memory controller decode table
Aristeu Rozanski [Mon, 18 May 2015 17:19:30 +0000 (14:19 -0300)]
rasdaemon: Add missing entry to Ivy Bridge memory controller decode table

This patch is based on 2577aeb662374cb87169ee675b2e37c06f1aed99 (mcelog)

mcelog: Add missing entry to Ivy Bridge memory controller decode table

September 2013 edition of the software developer manual added an
entry that had been inadvertently omitted from earlier editions.
Add the 0x80 entry for "Corrected memory read error".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: decode new simple error code number 6
Aristeu Rozanski [Mon, 18 May 2015 17:19:29 +0000 (14:19 -0300)]
rasdaemon: decode new simple error code number 6

This patch was based on fa313dd0144596dfa140bd66805367250d6eae9b
(mcelog)

mcelog: Decode new simple error code number 6

Edition 050 of the Intel SDM released in late February 2014
includes a new simple error code in "Table 15-8. IA32_MCi_Status
[15:0] Simple Error Code Encoding".  Code 6 (0000 0000 0000 0110)
has been allocated for the reporting of cases where the BIOS SMM
code attempts to execute code outside of the protected SMRR area.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agorasdaemon: add support for Haswell
Aristeu Rozanski [Mon, 18 May 2015 17:19:28 +0000 (14:19 -0300)]
rasdaemon: add support for Haswell

Based on mcelog code.

Acked-by: Tony Luck <tony.luck@intel,com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
10 years agoBump version to 0.5.4 v0.5.4
Mauro Carvalho Chehab [Fri, 15 Aug 2014 22:15:47 +0000 (19:15 -0300)]
Bump version to 0.5.4

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agorasdaemon: do not assume dimmX/ directories will be present
Aristeu Rozanski [Fri, 15 Aug 2014 17:50:58 +0000 (13:50 -0400)]
rasdaemon: do not assume dimmX/ directories will be present

While finding the labels, size and location, ras-mc-ctl will search /sys for
the files and calculate the location. When it uses the location trying to map
back to files to print labels or write labels, it'll just assume dimm*
directories exist which is not correct while using drivers like amd64_edac.
This patch adds two new hashes to store the location and the label file path
so it can be used later.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agorasdaemon: enable recording by default in service file
Aristeu Rozanski [Mon, 21 Jul 2014 20:23:18 +0000 (16:23 -0400)]
rasdaemon: enable recording by default in service file

This patch changes the service file to enable the tracing events after
the daemon is started and starts the daemon recording events by default.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agorasdaemon: correct range while parsing top, middle and lower layers
Aristeu Rozanski [Mon, 21 Jul 2014 19:25:40 +0000 (15:25 -0400)]
rasdaemon: correct range while parsing top, middle and lower layers

{top,middle,lower}_layer are signed char, therefore will never be 255.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1035746

Tested in a GHES enabled machine using EINJ.

v2: no need to test ranges at all

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agoBump version to 0.5.3 v0.5.3
Mauro Carvalho Chehab [Sun, 10 Aug 2014 14:04:10 +0000 (11:04 -0300)]
Bump version to 0.5.3

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agoAdd a target to build rasdaemon with mock
Mauro Carvalho Chehab [Sun, 10 Aug 2014 15:51:04 +0000 (12:51 -0300)]
Add a target to build rasdaemon with mock

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agoAdd an option to build the srpm
Mauro Carvalho Chehab [Sun, 10 Aug 2014 15:47:21 +0000 (12:47 -0300)]
Add an option to build the srpm

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
10 years agorasdaemon: Add support for extlog trace events
Luck, Tony [Mon, 4 Aug 2014 20:29:01 +0000 (13:29 -0700)]
rasdaemon: Add support for extlog trace events

Linux kernel 3.17 includes a new trace event to pick up extended
error logs produced by BIOS in the Common Platform Error Record
format described in appendix N of the UEFI standard. This patch
adds support to collect that information and log it both in
readable ASCII and into the sqlite3 database that rasdaemon
uses to store all error information.  In addition ras-mc-ctl
is updated to query that database for both detailed and summary
reports.

Big thanks to Aristeu for pretty much all the sqlite3 pieces,
plus testing and fixing miscellaneous issues elsewhere.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: handle failures of snprintf()
Aristeu Rozanski [Tue, 24 Jun 2014 15:01:31 +0000 (11:01 -0400)]
rasdaemon: handle failures of snprintf()

Florian Weimer found that in bitfield_msg() the return value of
snprintf() is used to calculate length ignoring that it can return a
negative number. This patch makes bitfield_msg() to stop writing in such
case.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1035741

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: fix mce numfield decoded error
Xie XiuQi [Thu, 8 May 2014 12:07:19 +0000 (20:07 +0800)]
rasdaemon: fix mce numfield decoded error

Some fields are missing in mce decode information, as below:
...
rasdaemon: register inserted at db
           <...>-31568 [000]  4023.214080: mce_record:
2014-05-07 15:51:16 +0800 bank=2, status= bd000000000000c0, MEMORY
CONTROLLER MS_CHANNEL0_ERR Transaction: Memory scrubbing error %s: %Lu
 %s: %Lx
 %s: %Lx
 %s: %Lu
 %s: %Lu
 %s: %Lx
, mci=Uncorrected_error Error_enabled SRAO, n_errors=0 channel=0,
dimm=0, cpu_type= Intel Xeon 5500 series / Core i3/5/7
("Nehalem/Westmere"), cpu= 0, socketid= 0, ip= 1eadbabe (INEXACT), cs=
73, misc= 8c, addr= 62b000, mcgstatus= 5 RIPV MCIP, mcgcap= 1c09,
apicid= 0

"f->name" & "v" are missed to print in decode_numfield(), so fix it.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: sqlite truncates some MCE fields to 32-bit
Luck, Tony [Mon, 7 Apr 2014 18:27:47 +0000 (11:27 -0700)]
rasdaemon: sqlite truncates some MCE fields to 32-bit

The sqlite3_bind_int() function takes an "int" as the argument value to
save to the database. But some fields are wider than 32-bits.  Use
sqlite3_bind_int64() for the fields where we know values can exceed
4G.

Before:

# ./rasdaemon/util/ras-mc-ctl --errors
 ...
MCE events:
1 2014-04-04 08:50:32 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x00010090, addr=0x35fcb9c0, misc=0x5026a686, walltime=0x5342e4f9, cpu=0x0000000e, cpuid=0x000306f1, apicid=0x00000020, socketid=0x00000001, bank=0x00000008
2 2014-04-04 08:50:35 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x00010090, addr=0x4187adc0, misc=0x4274f486, walltime=0x5342e4fc, cpu=0x0000000e, cpuid=0x000306f1, apicid=0x00000020, socketid=0x00000001, bank=0x00000007
3 2014-04-04 08:50:37 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x00010090, addr=0x52efc600, misc=0x50028286, walltime=0x5342e4fd, cpu=0x0000000e, cpuid=0x000306f1, apicid=0x00000020, socketid=0x00000001, bank=0x00000008

After:
./rasdaemon/util/ras-mc-ctl --errors
 ...
1 2014-04-04 09:00:07 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x8c00004000010090, addr=0x45340a180, misc=0x140686886, walltime=0x5342e736, cpuid=0x000306f1, bank=0x00000008
2 2014-04-04 09:00:08 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x8c00004000010090, addr=0x44d6e4780, misc=0x15060e086, walltime=0x5342e737, cpuid=0x000306f1, bank=0x00000007
3 2014-04-04 09:00:10 -0700 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus= 0, mci Corrected_error, mcgcap=0x07000c16, status=0x8c00004000010090, addr=0x44cb64640, misc=0x140505086, walltime=0x5342e739, cpuid=0x000306f1, bank=0x00000008

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: fix some typos and cut/paste errors in sqlite bits
Luck, Tony [Mon, 7 Apr 2014 19:23:25 +0000 (12:23 -0700)]
rasdaemon: fix some typos and cut/paste errors in sqlite bits

aer event has the error_type as field 2 and msg as field 3 - but the calls
the sqlite3_bind_text use 3 and 4.

mce event forgot to declare the "mcastatus_msg"

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoBump version to 0.5.2 v0.5.2
Mauro Carvalho Chehab [Thu, 3 Apr 2014 11:50:45 +0000 (08:50 -0300)]
Bump version to 0.5.2

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoCorrect ABRT report data
Jakub Filak [Wed, 2 Apr 2014 13:03:44 +0000 (15:03 +0200)]
Correct ABRT report data

Remove '\0' byte from 'PUT' message because this was superfluous.

Replaced 'BASENAME' item with 'TYPE' item because the first one is no
longer supported by abrtd and the second one is required. Basically the
later is a substitute for the first one.

Removed the closing message which is not supported by abrtd. abrtd
considers that message as a part of the problem report.

Removed a superfluous space from 'Backtrace'.

Signed-off-by: Jakub Filak <jfilak@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoBump version to 0.5.1 v0.5.1
Mauro Carvalho Chehab [Fri, 28 Mar 2014 21:36:00 +0000 (18:36 -0300)]
Bump version to 0.5.1

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoAdd two new generated files to .gitignore
Mauro Carvalho Chehab [Fri, 28 Mar 2014 21:47:41 +0000 (18:47 -0300)]
Add two new generated files to .gitignore

The service files are now auto-generated.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoMake paths in the systemd services configurable
Jakub Filak [Fri, 21 Feb 2014 14:54:09 +0000 (15:54 +0100)]
Make paths in the systemd services configurable

The path to a binary depends on configuration, therefore it is better to
not use hard coded strings.

Signed-off-by: Jakub Filak <jfilak@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoras-mc-ctl: Print useful message when run without rasdaemon -r
Betty Dall [Wed, 19 Mar 2014 21:54:56 +0000 (15:54 -0600)]
ras-mc-ctl: Print useful message when run without rasdaemon -r

The utility script ras-mc-ctl requires that rasdaemon --record be run
to create the me_event table in the SQLite database. The current behaviour
is this:
[root@sa1 util]# ras-mc-ctl --errors
DBD::SQLite::db prepare failed: no such table: mc_event at
/usr/local/sbin/ras-mc-ctl line 914.
Can't call method "execute" on an undefined value at
/usr/local/sbin/ras-mc-ctl line 915.

With this change, the user sees:
[root@sa1 util]# ras-mc-ctl --errors
DBD::SQLite::db prepare failed: no such table: mc_event at
/usr/local/sbin/ras-mc-ctl line 914.
ras-mc-ctl: Error: mc_event table missing from
/usr/local/var/lib/rasdaemon/ras-mc_event.db. Run 'rasdaemon --record'.

Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: Add record option to rasdaemon man page
Betty Dall [Wed, 19 Mar 2014 20:59:47 +0000 (14:59 -0600)]
rasdaemon: Add record option to rasdaemon man page

Add the already existing rasdaemon option 'record' to the rasdaemon man
page. This option records events via sqlite3.

Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agorasdaemon: Make record option dependent on HAVE_SQULITE3
Betty Dall [Wed, 19 Mar 2014 20:59:46 +0000 (14:59 -0600)]
rasdaemon: Make record option dependent on HAVE_SQULITE3

The record option in parse_opt() can be a compile time option with
the HAVE_SQLITE3 since that option is used in the corresponding
argp_option structure.

Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoChange version to 0.5.0 v0.5.0
Mauro Carvalho Chehab [Sun, 16 Feb 2014 10:56:05 +0000 (19:56 +0900)]
Change version to 0.5.0

As this version has a new feature, name it as 0.5.0.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoadd abrt suppport for rasdaemon
Junliang Li [Thu, 13 Feb 2014 02:39:53 +0000 (10:39 +0800)]
add abrt suppport for rasdaemon

Adds abrt as another error mechanism for the rasdaemon.
This patch does:

1) read ras event (mc,mce and aer)

2) setup a abrt-server unix socket

3) write messages follow ABRT server protocol, set event
   info into backtrace zone.

4) commit report.

For now, it depends on ABRT to limit flood reports.

Signed-off-by: Junliang Li <lijunliang.dna@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agomce-amd-k8.c: fix a warning
Mauro Carvalho Chehab [Thu, 13 Feb 2014 20:11:26 +0000 (05:11 +0900)]
mce-amd-k8.c: fix a warning

mce-amd-k8.c: In function ‘bank_name’:
mce-amd-k8.c:250:22: warning: argument to ‘sizeof’ in ‘snprintf’ call is the same expression as the destination; did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess]
  snprintf(buf, sizeof(buf), "%s (bank=%d)", s, e->bank);
                      ^

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoREADME: describe the location of the main repositories
Mauro Carvalho Chehab [Wed, 12 Feb 2014 23:25:15 +0000 (08:25 +0900)]
README: describe the location of the main repositories

As it could have more copies of the rasdaemon in the net, add the
location of the main ones.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoUpdate README to reflect the patch submission process
Mauro Carvalho Chehab [Wed, 12 Feb 2014 23:13:18 +0000 (08:13 +0900)]
Update README to reflect the patch submission process

That helps to better document how to contribute with code.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoBump to version 0.4.2 v0.4.2
Mauro Carvalho Chehab [Tue, 10 Sep 2013 16:22:42 +0000 (13:22 -0300)]
Bump to version 0.4.2

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoras-mc-ctl: Fix the DIMM layout display
Mauro Carvalho Chehab [Thu, 15 Aug 2013 20:13:43 +0000 (17:13 -0300)]
ras-mc-ctl: Fix the DIMM layout display

The items weren't being presented at the right order. Fix it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
11 years agocontrib/edac-tests: Make it work without edac-utils
Mauro Carvalho Chehab [Thu, 15 Aug 2013 16:26:03 +0000 (13:26 -0300)]
contrib/edac-tests: Make it work without edac-utils

There were a few traces of edac-utils and an older version of
the EDAC trace on this script. Remove them, and change it to
0755 mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
11 years agoAdd an example of labels file
Mauro Carvalho Chehab [Thu, 15 Aug 2013 15:58:02 +0000 (12:58 -0300)]
Add an example of labels file

This is an example of a labels file for a Dell Power Edge T620.

For now, only DIMMs A1 and B1 are tested here.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoras-mc-ctl: Fix label register with 2 layers
Mauro Carvalho Chehab [Thu, 15 Aug 2013 15:45:18 +0000 (12:45 -0300)]
ras-mc-ctl: Fix label register with 2 layers

When there aren't 3 layers, label print/register weren't working.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
11 years agoras-mc-ctl: Improve parser
Mauro Carvalho Chehab [Thu, 15 Aug 2013 15:43:02 +0000 (12:43 -0300)]
ras-mc-ctl: Improve parser

Accept either . or : as layers separator at config files.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
12 years agoMakefile.am: fix build if rpmbuild was never called before
Mauro Carvalho Chehab [Tue, 4 Jun 2013 10:41:58 +0000 (07:41 -0300)]
Makefile.am: fix build if rpmbuild was never called before

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoTODO: Update it with the current issues
Mauro Carvalho Chehab [Mon, 3 Jun 2013 13:57:02 +0000 (10:57 -0300)]
TODO: Update it with the current issues

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mc-ctl: Fix the name of the error table data
Mauro Carvalho Chehab [Fri, 31 May 2013 19:40:40 +0000 (16:40 -0300)]
ras-mc-ctl: Fix the name of the error table data

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mc-ctl: report errors also for PCIe AER and MCE
Mauro Carvalho Chehab [Fri, 31 May 2013 19:16:44 +0000 (16:16 -0300)]
ras-mc-ctl: report errors also for PCIe AER and MCE

Show also PCIe AER and MCE when used with --errors parameter.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mc-ctl: add summary for MCE and PCIe AER errors
Mauro Carvalho Chehab [Fri, 31 May 2013 17:57:54 +0000 (14:57 -0300)]
ras-mc-ctl: add summary for MCE and PCIe AER errors

Report the summary also for MCE and PCIe errors.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoAdd support to store MCE events at the database
Mauro Carvalho Chehab [Fri, 31 May 2013 17:18:24 +0000 (14:18 -0300)]
Add support to store MCE events at the database

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoAdd support to record AER events
Mauro Carvalho Chehab [Fri, 31 May 2013 16:54:11 +0000 (13:54 -0300)]
Add support to record AER events

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: Make the code easier to add support for other tables
Mauro Carvalho Chehab [Fri, 31 May 2013 16:53:18 +0000 (13:53 -0300)]
ras-record: Make the code easier to add support for other tables

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: reorder functions
Mauro Carvalho Chehab [Fri, 31 May 2013 16:51:55 +0000 (13:51 -0300)]
ras-record: reorder functions

No functional changes

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: rename stmt to stmt_mc_event
Mauro Carvalho Chehab [Fri, 31 May 2013 16:10:16 +0000 (13:10 -0300)]
ras-record: rename stmt to stmt_mc_event

This stmt is used only for mc_event. So, rename it, as we'll be
adding other stmts for the other tables.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: make the code more generic
Mauro Carvalho Chehab [Fri, 31 May 2013 15:41:01 +0000 (12:41 -0300)]
ras-record: make the code more generic

Now that we're ready to add more tables to the database, make
the code that creates and inserts data into the table more
generic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mc-ctl: Improve error summary to show label and mc
Mauro Carvalho Chehab [Thu, 30 May 2013 00:53:58 +0000 (21:53 -0300)]
ras-mc-ctl: Improve error summary to show label and mc

Both information are useful for the users, even on summary.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoUpdate rasdaemon.spec.in v0.4.1
Mauro Carvalho Chehab [Wed, 29 May 2013 15:04:29 +0000 (12:04 -0300)]
Update rasdaemon.spec.in

This is exactly what it should be used for Fedora.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoCreate directories via install target
Mauro Carvalho Chehab [Wed, 29 May 2013 14:57:21 +0000 (11:57 -0300)]
Create directories via install target

As the dirs will be created via install target, we may cleanup the
rpm spec model file.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoMakefile.am: honour destdir at the local install target
Mauro Carvalho Chehab [Wed, 29 May 2013 14:33:11 +0000 (11:33 -0300)]
Makefile.am: honour destdir at the local install target

That avoids building errors like:
/bin/sh /builddir/build/BUILD/rasdaemon-0.4.1/install-sh -d "/var/lib/rasdaemon"
mkdir: cannot create directory '/var/lib/rasdaemon': Permission denied
mkdir: cannot create directory '/var/lib/rasdaemon': Permission denied

When building for a distro package.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoBump to version 0.4.1
Mauro Carvalho Chehab [Wed, 29 May 2013 14:10:44 +0000 (11:10 -0300)]
Bump to version 0.4.1

The sqlite3 bugfix is important enough to deserve a version.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoREADME: update to reflect the need of perl DBI sqlite
Mauro Carvalho Chehab [Wed, 29 May 2013 14:03:04 +0000 (11:03 -0300)]
README: update to reflect the need of perl DBI sqlite

This is now needed by ras-mc-ctl.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoMakefile.am: create ${prefix}/var/lib/rasdaemon on install
Mauro Carvalho Chehab [Wed, 29 May 2013 13:59:43 +0000 (10:59 -0300)]
Makefile.am: create ${prefix}/var/lib/rasdaemon on install

rasdaemon -r requires that directory to be created, otherwise,
sql open will fail.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mc-ctl: add support for queuing the errors
Mauro Carvalho Chehab [Wed, 29 May 2013 12:33:45 +0000 (09:33 -0300)]
ras-mc-ctl: add support for queuing the errors

As the mc_event table is filled by rasdaemon, we need a tool to
extract data from it.

So, use the existing perl script for the basic queries.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: use sqlite3_reset to allow reusing the prepared statement
Mauro Carvalho Chehab [Wed, 29 May 2013 10:41:30 +0000 (07:41 -0300)]
ras-record: use sqlite3_reset to allow reusing the prepared statement

Instead of using sqlite3_finalize, we should use sqlite3_reset, or
otherwise the prepared statement will be de-allocated.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agorasdaemon.spec.in: Require sqlite-devel
Mauro Carvalho Chehab [Wed, 29 May 2013 10:40:46 +0000 (07:40 -0300)]
rasdaemon.spec.in: Require sqlite-devel

This library is needed on builds when --enable-sqlite3 is used.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: Fence-post error when reporting number of cpus we listen to
Tony Luck [Tue, 28 May 2013 18:20:36 +0000 (11:20 -0700)]
ras-events: Fence-post error when reporting number of cpus we listen to

I see:
rasdaemon: Listening to events for cpus 0 to 64

which would be 65 total cpus - I only have 64.

Fix the log message to use "n_cpus - 1" rather than "n_cpus".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoAdd a tool to automate releasing new versions v0.4.0
Mauro Carvalho Chehab [Tue, 28 May 2013 18:10:05 +0000 (15:10 -0300)]
Add a tool to automate releasing new versions

This small script automates the process of building newer
versions of the tool.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoReplace some hard-coded strings by the autotools macro names
Mauro Carvalho Chehab [Tue, 28 May 2013 18:09:29 +0000 (15:09 -0300)]
Replace some hard-coded strings by the autotools macro names

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoBump version to 0.4.0
Mauro Carvalho Chehab [Tue, 28 May 2013 18:00:22 +0000 (15:00 -0300)]
Bump version to 0.4.0

There are too many changes already. Bump it to version 0.4.0.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: parse errors at select_tracing_timestamp()
Mauro Carvalho Chehab [Tue, 28 May 2013 17:58:36 +0000 (14:58 -0300)]
ras-events: parse errors at select_tracing_timestamp()

This fixes the following warnings:
ras-events.c: In function 'select_tracing_timestamp':
ras-events.c:501:6: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result]
ras-events.c:531:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoStore RAS sqlite3 db file on a proper place
Mauro Carvalho Chehab [Tue, 28 May 2013 17:08:07 +0000 (14:08 -0300)]
Store RAS sqlite3 db file on a proper place

Instead of creating it on the same directory as when it
is called, put it at ${prefix}/var/lib/rasdaemon directory.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: use sysconf to get the number of CPU's
Mauro Carvalho Chehab [Tue, 28 May 2013 14:37:50 +0000 (11:37 -0300)]
ras-events: use sysconf to get the number of CPU's

There are several "per-cpu" files at sysfs that seem to be
utterly bogus, as trying to poll from them just return POLLERR.

Let's use, instead, sysconf() to get the number of CPU's, avoiding
such bug.

Not sure if this would work with hotplugged CPU's, though, so
let's preserve the old code there, for now.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: Only use pthreads for collect if poll() not available
Mauro Carvalho Chehab [Tue, 28 May 2013 11:47:57 +0000 (08:47 -0300)]
ras-events: Only use pthreads for collect if poll() not available

Before kernel 3.10, one pthread per cpu was used, as the code
would need to run an endless loop, in order to get events.

With kernel 3.10 and upper, we can simply use poll() there.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mce-handler: change the test order to avoid leaked memory
Mauro Carvalho Chehab [Tue, 28 May 2013 11:13:17 +0000 (08:13 -0300)]
ras-mce-handler: change the test order to avoid leaked memory

As getdelim allocates memory, the better is to swap the
tests, or otherwise the code will allocate some memory that
will never be de-allocated.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mce-handler: Fix /proc/cpuinfo parser
Mauro Carvalho Chehab [Tue, 28 May 2013 10:47:53 +0000 (07:47 -0300)]
ras-mce-handler: Fix /proc/cpuinfo parser

The test for the parsing completion is wrong. Fix it.

While here, change the namespace to avoid latter
conflicts.

Reported-by: Chen Gong <gong.chen@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mce-handler: Fix a warning
Mauro Carvalho Chehab [Mon, 27 May 2013 21:19:08 +0000 (18:19 -0300)]
ras-mce-handler: Fix a warning

ras-mce-handler.c: In function ‘register_mce_handler’:
ras-mce-handler.c:200:13: warning: ‘mce’ may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoEnable MCE parsing at RPM files
Mauro Carvalho Chehab [Mon, 27 May 2013 20:47:15 +0000 (17:47 -0300)]
Enable MCE parsing at RPM files

As this is known to work, enable it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoREADME: update to reflect the current status
Mauro Carvalho Chehab [Mon, 27 May 2013 20:46:56 +0000 (17:46 -0300)]
README: update to reflect the current status

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoUpdate TODO list
Mauro Carvalho Chehab [Mon, 27 May 2013 20:26:04 +0000 (17:26 -0300)]
Update TODO list

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agomce-intel-sb: add memory controller decoding
Mauro Carvalho Chehab [Mon, 27 May 2013 20:23:48 +0000 (17:23 -0300)]
mce-intel-sb: add memory controller decoding

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoAdd support to decode memory controller data on Nehalem
Mauro Carvalho Chehab [Mon, 27 May 2013 20:19:11 +0000 (17:19 -0300)]
Add support to decode memory controller data on Nehalem

xeon75xx code can be dropped as it doesn't exist anyway on
mcelog. According to the code there, it lacks support for it
to work at the Kernel.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agomce-intel: Enable iMC log where available
Mauro Carvalho Chehab [Mon, 27 May 2013 19:46:12 +0000 (16:46 -0300)]
mce-intel: Enable iMC log where available

Add a code to enable iMC log where available.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agomce-intel-ivb: enable the code that parses memory controller errors
Mauro Carvalho Chehab [Mon, 27 May 2013 18:50:51 +0000 (15:50 -0300)]
mce-intel-ivb: enable the code that parses memory controller errors

Enable the code that parses the memory controller errors.
This code assumes that iMC log is already enabled.

A latter patch will add support for enabling it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agospelling: Fix spelling in ras-record.c
Tony Luck [Fri, 24 May 2013 16:55:40 +0000 (09:55 -0700)]
spelling: Fix spelling in ras-record.c

s/interted/inserted/

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoconfigure: Fix help string for sqlite3
Tony Luck [Fri, 24 May 2013 16:29:06 +0000 (09:29 -0700)]
configure: Fix help string for sqlite3

The AS_HELP_STRING has a typo and says to use "--enable-sqlite" when
it should say "-enable-sqlite3"

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agomce: Some improvements at the output format
Mauro Carvalho Chehab [Fri, 24 May 2013 14:21:32 +0000 (11:21 -0300)]
mce: Some improvements at the output format

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-mce-handler: fix /proc/cpuinfo parser
Mauro Carvalho Chehab [Fri, 24 May 2013 11:21:51 +0000 (08:21 -0300)]
ras-mce-handler: fix /proc/cpuinfo parser

The scanf parsers for /proc/cpuinfo were broken, as they
got a "mce->" prefix by mistake. Remove it to fix.

With that, MCE parser will successfully register.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoevent-parse: Remove a temporary debug message
Mauro Carvalho Chehab [Fri, 24 May 2013 11:18:48 +0000 (08:18 -0300)]
event-parse: Remove a temporary debug message

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoDon't require that all tracing types to be supported
Mauro Carvalho Chehab [Fri, 24 May 2013 11:16:57 +0000 (08:16 -0300)]
Don't require that all tracing types to be supported

Not all systems support all 3 types of RAS (EDAC, PCIe AER, MCELOG).
Don't bail out if at least one of them is supported.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoUpdate edac-tests to use ras-mc-ctl instead of ./edac-ctl
Mauro Carvalho Chehab [Fri, 24 May 2013 10:37:06 +0000 (07:37 -0300)]
Update edac-tests to use ras-mc-ctl instead of ./edac-ctl

All functionalities previously found on my test version of
edac-ctl is present on ras-mc-ctl. So, let's rename it.

The test code still tries to run edac-util. This tool,
which is part of edac-utils, use the edac error counters to
check the errors. For now, let's keep it, as it might be useful,
although this will likely be removed on future versions of this
testing script.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: Fix the logic that retrieves the debugfs mount point
Mauro Carvalho Chehab [Fri, 24 May 2013 09:18:54 +0000 (06:18 -0300)]
ras-events: Fix the logic that retrieves the debugfs mount point

While on Fedora/RHEL the mount device for debugfs is called "debugfs",
it is usual to use "none" on some other distros or for manually
mounted debugfs.

So, fix the logic to look at the filesystem type, instead, as it should
always be "debugfs", on both cases.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-record: Avoid NULL pointer when running without sqlite
Tony Luck [Thu, 23 May 2013 20:27:31 +0000 (13:27 -0700)]
ras-record: Avoid NULL pointer when running without sqlite

When running "rasdaemon -f" we can dereference a NULL pointer in
ras_store_mc_event() since "ras->db_priv" is NULL.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: Fix MCE binding
Mauro Carvalho Chehab [Thu, 23 May 2013 19:42:08 +0000 (16:42 -0300)]
ras-events: Fix MCE binding

The #ifdef for detecting MCE was wrong. Due to that, the MCE
handler was not being enabled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoMake the enable function more generic
Mauro Carvalho Chehab [Thu, 23 May 2013 19:37:54 +0000 (16:37 -0300)]
Make the enable function more generic

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoGet rid of ras-record warnings
Mauro Carvalho Chehab [Thu, 23 May 2013 17:58:21 +0000 (14:58 -0300)]
Get rid of ras-record warnings

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoget rid of MCE warnings
Mauro Carvalho Chehab [Thu, 23 May 2013 17:44:36 +0000 (14:44 -0300)]
get rid of MCE warnings

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoCleanup warnings at ras-aer-handler.c
Mauro Carvalho Chehab [Thu, 23 May 2013 17:26:07 +0000 (14:26 -0300)]
Cleanup warnings at ras-aer-handler.c

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoFix event handler parser logic
Mauro Carvalho Chehab [Thu, 23 May 2013 16:35:07 +0000 (13:35 -0300)]
Fix event handler parser logic

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agoras-events: Add some hacks to make it work with 3.6.10-rc2
Mauro Carvalho Chehab [Thu, 23 May 2013 14:48:02 +0000 (11:48 -0300)]
ras-events: Add some hacks to make it work with 3.6.10-rc2

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
12 years agolibtrace: sync with the latest code from trace-cmd
Mauro Carvalho Chehab [Thu, 23 May 2013 14:07:29 +0000 (11:07 -0300)]
libtrace: sync with the latest code from trace-cmd

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>