Shiju Jose [Sun, 5 Mar 2023 23:14:42 +0000 (23:14 +0000)]
rasdaemon: fix table create if some cpus are offline
Fix for regression in ras_mc_create_table() if some cpus are offline
at the system start
Issue:
Regression in the ras_mc_create_table() if some of the cpus are offline
at the system start when run the rasdaemon.
This issue is reproducible in ras_mc_create_table() with decode and
record non-standard events and reproducible sometimes with
ras_mc_create_table() for the standard events.
Also in the multi thread way, there is memory leak in ras_mc_event_opendb()
as struct sqlite3_priv *priv and sqlite3 *db allocated/initialized per
thread, but stored in the common struct ras_events ras in pthread data,
which is shared across the threads.
Reason:
when the system starts with some of the cpus offline and then run
the rasdaemon, read_ras_event_all_cpus() exit with error and switch to
the multi thread way. However read() in read_ras_event() return error in
threads for each of the offline CPUs and does clean up including calling
ras_mc_event_closedb().
Since the 'struct ras_events ras' passed in the pthread_data to each of the
threads is common, struct sqlite3_priv *priv and sqlite3 *db allocated/
initialized per thread and stored in the common 'struct ras_events ras',
are getting overwritten in each ras_mc_event_opendb()(which called from
pthread per cpu), result memory leak.
Also when ras_mc_event_closedb() is called in the above error case from
the threads corresponding to the offline cpus, close the sqlite3 *db and
free sqlite3_priv *priv stored in the common 'struct ras_events ras',
result regression when accessing priv->db in the ras_mc_create_table()
from another context later.
Solution:
In ras_mc_event_opendb(), allocate struct sqlite3_priv *priv,
init sqlite3 *db and create tables common for the threads with shared
'struct ras_events ras' based on a reference count and free them in the
same way.
Also protect critical code ras_mc_event_opendb() and ras_mc_event_closedb()
using mutex in the multi thread case from any regression caused by the
thread pre-emption.
Reported-by: Lei Feng <fenglei47@h-partners.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Sam James [Sun, 19 Feb 2023 18:33:20 +0000 (18:33 +0000)]
configure.ac: fix bashisms
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '=='.
Shiju Jose [Sat, 4 Feb 2023 19:15:55 +0000 (19:15 +0000)]
rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely
The error events are not received in the rasdaemon since kernel 6.1-rc6.
This issue is firstly detected and reported, when testing the CXL error
events in the rasdaemon.
Debugging showed, poll() on trace_pipe_raw in the ras-events.c do not
return and this issue is seen after the commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
polling block on watermark").
This issue is also verified using a test application for poll()
and select() on per_cpu trace_pipe_raw.
There is also a bug reported on this issue,
https://lore.kernel.org/all/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com/
This issue occurs for the per_cpu case, which calls the ring_buffer_poll_wait(),
in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until
the percentage of pages are available. The default value set for the
buffer_percent is 50 in the kernel/trace/trace.c. However poll() does not return
even met the percentage of pages condition.
As a fix, rasdaemon set buffer_percent as 0 through the
/sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent, then the
task will wake up as soon as data is added to any of the specific cpu
buffer and poll() on per_cpu/cpuX/trace_pipe_raw does not block
indefinitely.
Dependency on the kernel fix commit 3e46d910d8acf94e5360126593b68bf4fee4c4a1("tracing: Fix poll() and select()
do not work on per_cpu trace_pipe and trace_pipe_raw")
Rasdaemon used for a long time an early version of this library,
with the code embedded directly into its code. The rationale is
that the library was not officially released on that time, but
this has long changed.
Sam James [Thu, 29 Dec 2022 17:23:47 +0000 (17:23 +0000)]
configure.ac: fix bashisms
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '=='.
This retains compatibility with bash.
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Shiju Jose [Thu, 24 Feb 2022 18:02:14 +0000 (18:02 +0000)]
rasdaemon: ras-mc-ctl: Modify error statistics for HiSilicon KunPeng9xx common errors
Modify the error statistics for the HiSilicon KunPeng9xx platforms common errors
to display the statistics and error info based on the module and the error severity.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Shengwei Luo [Wed, 23 Feb 2022 09:21:58 +0000 (17:21 +0800)]
rasdaemon: Support cpu fault isolation for corrected errors
When the corrected errors exceed the set limit in cycle, try to
offline the related cpu core.
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com> Signed-off-by: Junchong Pan <panjunchong@hisilicon.com> Signed-off-by: Lei Feng <fenglei47@h-partners.com> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Yang Shi [Mon, 4 Apr 2022 23:34:05 +0000 (16:34 -0700)]
rasdaemon: use the new block_rq_error tracepoint
Since Linux 5.18-rc1 a new block tracepoint called block_rq_error is
available for tracing disk error events dedicatedly. Currently
rasdaemon is using block_rq_complete which also traces successful cases.
It incurs excessive tracing logs and somehow overhead since the event is
triggered quite often.
Use the new tracepoint for disk error reporting, and the new trace point
has the same format as block_rq_complete.
Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Xiaofei Tan [Wed, 20 Oct 2021 06:33:40 +0000 (14:33 +0800)]
rasdaemon: Add some modules supported by hisi common error section
Add some modules supported by hisi common error section. Besides,
HHA is the module for some old platform, and it takes the same place
of MATA, so remove it.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Xiaofei Tan [Wed, 20 Oct 2021 06:33:37 +0000 (14:33 +0800)]
rasdaemon: Fix the issue of sprintf data type mismatch in uuid_le()
The data type of sprintf called in the function uuid_le() is mismatch.
Arm64 compiler force it to unsigned char by default, and can work normally.
But if someone compile it with the option -fsigned-char, the function
can't work correctly.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Muralidhara M K [Tue, 27 Jul 2021 11:36:45 +0000 (06:36 -0500)]
rasdaemon: ras-mc-ctl: Fix script to parse dimm sizes
Removes trailing spaces at the end of a line from
file location and fixes --layout option to parse dimm nodes
to get the size of each dimm from ras-mc-ctl.
Issue is reported https://github.com/mchehab/rasdaemon/issues/43
Where '> ras-mc-ctl --layout' reports all 0s
Fix the following compile errors that occurs when building against musl:
ras-events.c: In function 'read_ras_event_all_cpus':
ras-events.c:366:16: error: 'PATH_MAX' undeclared (first use in this function)
366 | char pipe_raw[PATH_MAX];
| ^~~~~~~~
ras-events.c: In function 'handle_ras_events_cpu':
ras-events.c:564:16: error: 'PATH_MAX' undeclared (first use in this function)
564 | char pipe_raw[PATH_MAX];
|
rasdaemon: Add new SMCA bank types with error decoding
Upcoming systems with Scalable Machine Check Architecture (SMCA) have
new MCA banks added.
This patch adds the (HWID, MCATYPE) tuple, name and error decoding for
those new SMCA banks.
While at it, optimize the string names in smca_bank_name[].
Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Matt Whitlock [Wed, 9 Jun 2021 14:25:18 +0000 (10:25 -0400)]
configure.ac: fix SYSCONFDEFDIR default value
configure.ac was using AC_ARG_WITH incorrectly, yielding a generated configure script like:
# Check whether --with-sysconfdefdir was given.
if test "${with_sysconfdefdir+set}" = set; then :
withval=$with_sysconfdefdir; SYSCONFDEFDIR=$withval
else
"/etc/sysconfig"
fi
This commit fixes the default case so that the SYSCONFDEFDIR variable is assigned the value "/etc/sysconfig" rather than trying to execute "/etc/sysconfig" as a command.
rasdaemon: Add Ice Lake and Sapphire Rapids MSCOD values
Based on mcelog commits:
ee90ff20ce6a ("mcelog: Add support for Icelake server, Icelake-D, and Snow Ridge") 391abaac9bdf ("mcelog: Add decode for MCi_MISC from 10nm memory controller") 59cb7ad4bc72 ("mcelog: i10nm: Fix mapping from bank number to functional unit") c0acd0e6a639 ("mcelog: Add support for Sapphirerapids server.")
Shiju Jose [Tue, 9 Mar 2021 16:18:56 +0000 (16:18 +0000)]
rasdaemon: fix build error in register_ns_ev_decoder if the sqlite3 is not enabled
ns_ev_decoder->stmt_dec_record = NULL; in the register_ns_ev_decoder()
should be under #ifdef HAVE_SQLITE3 to fix the compilation error
when build without the configure option --enable-sqlite3.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Shiju Jose [Mon, 8 Mar 2021 16:57:26 +0000 (16:57 +0000)]
rasdaemon: add support for memory_failure events
Add support to log the memory_failure kernel trace
events.
Example rasdaemon log and SQLite DB output for the
memory_failure event,
=================================================
rasdaemon: memory_failure_event store: 0x126ce8f8
rasdaemon: register inserted at db
<...>-785 [000] 0.000024: memory_failure_event: 2020-10-02 13:27:13 -0400 pfn=0x204000000 page_type=free buddy page action_result=Delayed
B. Wilson [Mon, 12 Apr 2021 15:29:58 +0000 (00:29 +0900)]
ras-record: Create RASSTATEDIR at runtime instead of install time
Package managers such as Nix and Guix force installation into an
isolated directory hierarchy. Furthermore, said hierarchy becomes
readonly after the install has completed, rendering any
<hierarchy>/var/lib/rasdaemon/ directory effectively useless.
In addition to being standard practice, creating RASSTATEDIR when
necessary at runtime fixes the above use cases.
Jason Tian [Thu, 4 Feb 2021 01:57:05 +0000 (09:57 +0800)]
Add code to decode Ampere specific error
All Ampere specific errors(payload type0/1/2/3) include 48 bytes
OEM data, which will be decoded out error type,subtype,instance,
socket number and so on.
Signed-off-by: Jason Tian <jason@os.amperecomputing.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Josh Hunt [Fri, 8 Jan 2021 00:12:52 +0000 (19:12 -0500)]
rasdaemon: fix memory leak in parse_ras_data
parse_ras_data() is calling trace_seq_init() which allocates a buffer,
but never calls the corresponding trace_seq_destroy() to free it causing
us to leak memory.
Subhendu Saha [Tue, 12 Jan 2021 08:29:55 +0000 (03:29 -0500)]
Fix ras-mc-ctl script.
When rasdaemon is compiled without enabling aer, mce, devlink,
etc., those tables are not created in the database file. Then
ras-mc-ctl script breaks trying to query data from non-existent
tables.
Signed-off-by: Subhendu Saha subhends@akamai.com Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
lvying6 [Sat, 31 Oct 2020 09:57:15 +0000 (17:57 +0800)]
ras-page-isolation: page which is PAGE_OFFLINE_FAILED can be offlined again
OS may fail to offline page at the previous time. After some time,
this page's state changed, and the page can be offlined by OS.
At this time, Correctable errors on this page reached the threshold.
Rasdaemon should trigger to offline this page again.
lvying [Sat, 31 Oct 2020 09:57:14 +0000 (17:57 +0800)]
ras-page-isolation: do_page_offline always considers page offline was successful
do_page_offline always consider page offline was successful even if
kernel soft/hard offline page failed.
Calling rasdaemon with:
/etc/sysconfig/rasdaemon PAGE_CE_THRESHOLD="1"
i.e when a page's address occurs Corrected Error, rasdaemon should
trigger this page soft offline.
However, after adding a livepatch into kernel's
store_soft_offline_page to observe this function's return value,
when injecting a CE into address 0x3f7ec30000, the Kernel
lot reports:
soft_offline: 0x3f7ec30: unknown non LRU page type ffffe0000000000 ()
[store_soft_offline_page]return from soft_offline_page: -5
While rasdaemon log reports:
rasdaemon[73711]: cpu 00:rasdaemon: Corrected Errors at 0x3f7ec30000 exceed threshold
rasdaemon[73711]: rasdaemon: Result of offlining page at 0x3f7ec30000: offlined
using strace to record rasdaemon's system call, it reports:
So, kernel actually soft offline pfn 0x3f7ec30 failed and
store_soft_offline_page returned -EIO. However, rasdaemon always
considers the page offline to be successful.
According to strace display, ferror was unable of detecting the
failure of the write syscall.
This patch changes fopen-fprintf-ferror-fclose process to use
the lower I/O level, by using instead open-write-close, which
can detect such syscall failure.
Shiju Jose [Mon, 10 Aug 2020 14:42:56 +0000 (15:42 +0100)]
rasdaemon: Modify non-standard error decoding interface using linked list
Replace the current non-standard error decoding interface with the
interface based on the linked list to avoid using realloc and
to improve the interface.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>