www.infradead.org Git - users/jedix/linux-maple.git/log

ctf: handle srcdir-relative paths properly.

The dwarf2ctf tool maintains various blacklists to get around things some
modules do which are incompatible with large-scale deduplication of types or
with having one type table per module. These blacklists are in the source tree,
and unfortunately we were looking for them using a relative path, which equates
to the objdir when O= or KBUILD_OUTPUT are in use: so we got assertion failures
from dwarf2ctf and a kernel build failure when it detected that something was
wrong with its duplicate-type detection.

Worse yet, one of these blacklists, member.blacklist, contains the names of
source files.  These paths relate to entities in the source tree, but they are
compared to source-file paths in object files generated by the compiler (which
are in theory objdir-relative, but which are in any case always absolute iff
KBUILD_OUTPUT is in use).  If these comparisons are to be successful, we must
absolutize the relative paths in the member.blacklist against the source tree,
not against the object tree.

We don't need to extend the realpath()-result-caching infrastructure to handle
this case, because we're only absolutizing *one path* this way, and at most
it'll be a few dozen.  The caching infrastructure exists to handle cases where
millions or billions of realpath()s are done.  We just need to chdir() to the
source tree, realpath(), and fchdir() back.  (We use the O_PATH open() flag to
do this if possible, but older glibcs such as that on OL6 do not provide O_PATH,
and it has an architecture-dependent so we cannot provide it ourselves: in that
case we fall back to O_RDONLY | O_DIRECTORY, which should always work since the
objdir really should always be readable by the current user.)

Orabug: 19712731

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Guangyu Sun <guangyu.sun@oracle.com>
Acked-by: Guru Anbalagane <Guru.Anbalagane@oracle.com>

kbuild/ctf: Fix out-of-tree module build when CONFIG_CTF=n.

When doing an out-of-tree build against a kernel with CONFIG_CTF=n (such as
debug kernels), linking fails because the list of object files in the
command-line given to the linker is empty.

The object files are substituted by this line in Makefile.modpost:

$(patsubst $(ctf-dir)/%,,$(filter-out FORCE,$^))

which takes the list of prerequisites for the module (which includes all its CTF
type information as well as its object files and a FORCE dummy target) and
filters out the FORCE target and all the CTF prerequisites (they are not linked
at this stage, but objcopied in immediately afterward). This works because all
the CTF files are generated under the $(ctf-dir) directory, and nothing else is
kept under there.

Unfortunately $(ctf-dir) is left empty if CONFIG_CTF=n. This is unproblematic
in normal kernel builds because all paths are relative, but if an out-of-tree
module build is being performed this strips everything out of the link line,
since out-of-tree builds usually specify all object files using absolute
pathnames, which naturally begin with a slash, and when $(ctf-dir) is empty so
does the patsubst pattern.

The solution is to provide a non-empty expansion of ctf-dir when CONFIG_CTF=n:
any value that can't appear at the start of object file names will do.

Orabug: 19078361

Reported-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

dtrace: support order-only-prerequisites for sdtstub generation

This commit ensures that order-only-prerequisites are supported in the
module build process when it comes to the generation of the sdtstub file
for SDT probes in modules. This is necessary even when there are no
probes in the actual module source code.

Orabug: 18906444

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

dtrace: ensure that building outside src tree works

The Makefile.build and Makefile.modpost uses of dtrace_sdt.sh were
not safe for out-of-srctree building because they expected to be
able to call the scripts with a relative path. This has been
corrected. The problem was introduced with the SDT-in-modules
support.

Orabug: 18691341

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: ensure one can try to get user pages without locking or faulting

This commit changes the FOLL_NOFAULT flag into a FOLL_IMMED flag, to
more accurately convey its meaning, i.e. to request user pages without
waiting for any locks and without servicing any page faults as a result
of the request. This is necessary in order to request user pages from
interrupt context.

This also completes the implementation by ensuring that the PTE spinlock
is checked rather than trying to lock it (and possibly get stuck in a
deadlock spinning for it).

Orabug: 18653173

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

mm / dtrace: Allow DTrace to entirely disable page faults.

In some very limited circumstances (largely restricted to get_user() calls, but
not entirely) DTrace needs to be able to probe userspace in irq and trap
context, in which page faults are prohibited or otherwise impossible because we
cannot guarantee that certain locks (like the mmap_sem) are not already taken by
the time DTrace is invoked.

In a similar fashion to the existing CPU_DTRACE_NOFAULT machinery, which allows
DTrace to prohibit invalid address faults, we introduce a CPU_DTRACE_NOPF value
in the per-CPU DTrace flag variable, which, when set by the DTrace module,
causes all page faults to set CPU_DTRACE_PF_TRAPPED in the flag variable: the
page fault is then ignored (and the current instruction skipped so that it is
not immediately retriggered).

Clearly, ignoring page faults in random pieces of kernel code would be very
risky: this machinery is only used in situations such as get_user() in which the
kernel code does nothing with the page other than to extract data from it and
pass it back to the DTrace module.

The impact on the core x86 fault path when CONFIG_DTRACE is set is one
unlikely() conditional and one function call: when CONFIG_DTRACE is not on the
impact is zero. (Eliminating the function call and directly testing the DTrace
per-CPU flag is possible as a future optimization, but was considered too
invasive for now.)

Orabug: 18412802
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
arch/x86/mm/fault.c

mm: allow __get_user_pages() callers to avoid triggering page faults.

Some callers of __get_userpages() would like to tentatively probe for
pages which may or may not be faulted in, in contexts in which faulting
is prohibited. __get_user_pages() can return early before the fault is
complete, but cannot avoid the fault entirely.

This change introduces FOLL_NOFAULT, which causes an early return if a
fault would occur, without triggering a fault, and indicating (if the
caller requests it) that the call would block.

Orabug: 18412802
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

dtrace: implement omni-present cyclics

This commit adds support for omni-present cyclics.  An omni-present
cyclic is one that fires on every CPU on the system, each at the same
frequency/interval.  They are implemented based on regular cyclics,
pinned to specific CPUs.  The implementation is such that hotplugging
CPUs is supported, i.e. when new CPUs come online, cyclics are started
on them for every omni-present cyclic that is currently active, and
when a CPU goes offline, the cyclics associated with that CPU (at least
those that are part of an omni-present cyclic) are removed.

Orabug: 18323501

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

gitignore: update .gitignore with generated SDT files

These files are generated for every module as part of SDT generation, so should
be included in .gitignore to avoid messing up 'git status' output.

Orabug: 17851716
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: avoid unreliable entries in stack() output

The original implementation of the stacktrace walker for DTrace often reported
unreliable (i.e. non-callframe) entries in the stack() output. This was most
often seen as a result of datastructures allocated on the stack in functions.
This new implementation isn't plagued by that issue anymore. It uses knowledge
of the basepointer (bp) to link callframes.

Orabug: 18323450

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: fix leaking psinfo objects

The psinfo objects created from a kmem cache (slab) to hold information
about a task's environment and arguments can leak when a task executes
two consecutive execve() calls. The implementation also made it possible
for tasks to have no psinfo (NULL) after a fork() unless an execve() was
done shortly after. This commit resolved both problems by adding a refc
to the psinfo objects (so one can be shared across task parentage
relations due to fork()), and by ensuring that upon execve() any existing
psinfo gets its refc decremented and a new psinfo object gets installed
to reflect the new execution environment.

This commit also adds the necessary initialization of a psinfo object for
tasks that do not have a mm object associated with them.

Orabug: 18383027

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ctf: spot non-struct/union/enum children of DW_TAG_structure_type

One of the jobs of the dwarf2ctf duplicate detector is to trace the members of
structures and unions in every kernel module, one by one, and recursively mark
the types of every such member as used in that module; any such types used in
more than one module are promoted to the shared type repository (and marked as
used in that in the same way). In conjunction with a variety of other rules,
this ensures that types in per-module type repositories can refer to types in
the shared type repository, but that the shared repository is self-contained:
DTrace userspace can then load any module's type repository, set its parent to
be the shared type repository, and be sure that it can fully characterize any
type in that module.

dwarf2ctf was aborting because a type was missed by the duplicate detector but
then emitted by the type generation phase.  The type in question is found in
drivers/message/i2o/i2o_proc.c:

typedef struct _i2o_user_table {
[...]
} i2o_user_table;

struct {
[...]
i2o_user_table user[64];
} *result;

The array is represented as the following:

[  beca]      structure_type
[  bf2f] member
name       (strp) "user"
type       (ref4) [ bfa9]

[  bfa9]    array_type
     type   (ref4) [  bebe]
     sibling   (ref4) [  bfb9]

[  bebe]      typedef
       name     (strp) "i2o_user_table"
       decl_file     (data1) 1
       decl_line     (data2) 1114
       type     (ref4) [  be6f]

[  be6f]      structure_type
       name     (strp) "_i2o_user_table"
       byte_size     (data1) 8
       decl_file     (data1) 1
       decl_line     (data2) 1108
       sibling     (ref4) [  bebe]

As the indentation makes clear, in GCC 4.4.x the array_type is at the top level,
where it is always spotted by the duplicate detector as a matter of course.  In
fact, in GCC 4.4.x *no* types other than aggregates are ever emitted as children
of structure_types: they're all at the top level and thus trivially detected by
the walk through top-level types that the duplicate detector does as a matter of
course: so this part of the scanning phase only needs to look at the types of
structures, unions and enumerations, to handle perverse cases like

       struct foo {
   struct {
       int womble;
   } *baz;
       };

in which the innermost struct's DIE (but *not* the pointer to it) is represented
as a child of the outer one's, even in GCC 4.4.x.

In GCC 4.8.x this is no longer true: the array_type DIE above is emitted as a
direct child of the structure_type here shown as [ beca], and indeed you can see
all sorts of types as children of structure_types, even basic types like 'int'
if they happen not to be used anywhere else in the translation unit.  So we must
mark them all as seen.

-- or almost all.  We do not bother marking types we will not later emit CTF
for, and we do not mark structure or union members themselves, because structure
members are not types and cannot be referenced by another member.  (Since
members are very numerous and cannot be duplicated across kernel modules unless
their containing structure is also duplicated, not marking them as seen also
saves a great deal of memory.)

Thankfully the emission phase does not care what the parent of DIEs is except if
they are things like structures or unions which have parents in CTF too.  So
arrays, basic types, and the like inside structure_type DIEs do not disturb the
actual emission of types: once we have fixed the duplicate detector, all is
well.

Orabug: 18117464
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ctf: capture all DIEs with structs/enums as their ultimate supertype

dwarf2ctf has to scan all the DWARF in the kernel repeatedly while generating
its compressed type representation: it even has to scan inside functions, even
though it is only interested in file-scope types and variables, because opaque
structure, union, or enumeration references at the top level can contain
references to a definition inside a function.  Repeated scanning of gigabytes of
DWARF is slow: dwarf2ctf spends most of its time doing this, not emitting types.
To make this less abominably slow, dwarf2ctf contains a filtration mechanism to
avoid scanning or emitting DIEs which will never be useful.

An iron rule of dwarf2ctf is that *all* DIEs that are emitted as types must
first be spotted by this 'duplicate detector' scanning phase, since it not only
determines which types are duplicated but also where they should go: whether
they are shared between kernel modules (and should go into the shared type
repository in ctf.ko) or are shared between translation units in a single module
or not duplicated at all, in which case they go into the module's local CTF.  So
if the filtration mechanism skips types which are later emitted, dwarf2ctf can
do little but abort (in general dwarf2ctf aborts if it finds internal
consistency problems which may affect the generation of many types, but skips
types and warns if it finds problems with single types which are not likely to
affect the generation of other types).

One of these filtration functions is filter_ctf_file_scope().  This is called to
filter out DIEs representing most qualifiers and base types when they appear
inside functions, by simply checking if the parent of the DIE is
DW_TAG_compile_unit.  In GCC 4.4.x, applying this to a variety of DIEs was
sufficient to filter out all DIEs inside functions that we weren't going to emit
CTF representations of.

Unfortunately GCC has since changed its debugging information representation
somewhat in a fashion which breaks this.  Inside drivers/firmware/dmi-id.c we
see the example dwarf2ctf calls out when it fails:

    static const struct mafield {
    const char *prefix;
    int field;
    } fields[] = {
    [...]
    { NULL,  DMI_NONE }
    };

    const struct mafield *f;

This is a structure inside a function, but even so GCC generates some of its
type DIEs at global scope and they are emitted into the CTF.  Over time, GCC is
moving more of the DIEs into subprogram scope, but not yet all of them: this is
what has broken us.

The variable 'f' is represented by the following DIEs, where A, B, C and D are
DIE offsets:

[  A]     structure_type
           name                 (strp) "mafield"
[...]
[  B]     const_type
           type                 (ref4) [  A]

[  C]     pointer_type
           byte_size            (data1) 8
           type                 (ref4) [  B]

In GCC 4.4.x, all of these other than the structure_type itself are at the
global scope, not the subprogram scope, so the filter never kicks in for these
DIEs and all of them are emitted successfully.  Between 4.4.x and 4.8.0, the
'const_type' moved into subprogram scope, but the pointer_type did not, so we
have a pointer_type at global scope referring to a const_type at subprogram
scope -- which has been filtered out by the filtration function, so emission of
the pointer_type fails and dwarf2ctf aborts because its type graph is
incomplete, with a type pointing at a type it has no record of.

Fix this by having filter_ctf_file_scope() look at the DW_AT_type attribute of
its target DIE, chaining to its type-attributeless terminus and note whether
that terminus is a structure, union, or enumeration: all DIEs having such types
at the terminus of their type chains must not be filtered out.

Orabug: 18117464
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ctf: handle structure and union offsets in form DW_FORM_data1

GCC 4.8.0 and above have started to represent structure member offsets in the
DW_AT_data_member_location attribute using DWARF forms DW_FORM_data1 and _data2
where possible: indeed, this is the common case, as most structure members are
less than 64K into the structure: and every structure contains at least one
member whose offset is represented this way.

dwarf2ctf neglected to check for DW_FORM_data1, so the build failed when it hit
the first structure in the kernel and realised that it didn't understand its
offset.

Fix trivial.

Orabug: 18117464
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ctf: cater for elfutils 0.156 change in dwfl_report_elf() prototype

This version of elfutils breaks source compatibility, adding a new parameter to
dwfl_report_elf(). Compensate for this in a manner that does not break
compilation with older elfutils.

Orabug: 18117421
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: vtimestamp implementation

This commit adds DTrace vtimestamp support. It keeps track of how much
time a task has spent actually processing on a CPU. The time is set to
zero at task creation, and is updated whenever the task leaves a CPU
(gets scheduled off), and when the dtrace_probe() function is entered,
to enusre that the most recent value of consumed time is reported.

Some code got moved around for consistency of the implementation.

Orabug: 17741477

Reviewed-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Conflicts:
include/linux/ktime.h
kernel/sched/core.c

dtrace: implement SDT in kernel modules

Full implementation of SDT probes in kernel modules.

The dtrace_sdt.sh script has been modified to handle both the creation
of the SDT stubs and the SDT info.  It's syntax has therefore changed:

  dtrace_sdt.sh sdtstub <stubfile> <object-file> <object-file>*
or
  dtrace_sdt.sh sdtinfo <infofile> vmlinux.o
or
  dtrace_sdt.sh sdtinfo <infofile> vmlinux.o .tmp_vmlinux1
or
  dtrace_sdt.sh sdtinfo <infofile> <kmod>.o kmod

The first form generates a stub file in assembler to ensure that the
(fake) functions that are called from SDT probe points will not longer
be reported as undefined symbols, and to ensure that when SDT is not
enabled, the probes become calls to a function that simply returns.

The second form creates the initial (dummy) SDT info file for the kernel
linking process, mainly to ensure that its size is known.  The third
form then creates the true SDT info file for the kernel, based on the
kernel object file and the first stage linked kernel image.

The fourth and final form generates SDT info for a kernel module, based
on its initial linked object.

This commit also enables the test probes in the dt_test module.

Orabug: 17851716

Reviewed-by: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Conflicts:
Makefile
scripts/Makefile.build

dtrace: remove functionality of dtrace_os_exit() as deprecated

This function is in its current form no longer called from anywhere, but since
it is exported someone could (maliciously) still call it and cause a mess. We
therefore issue a warning and do not do anything else. This will be removed in
a future version.

Orabug: 17717401

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>

dtrace: fix mutex_owned() implementation

The mutex_owned() function was not accounting for the possibility that a lock
might have an owner registered while unlocked.

Orabug: 17624236

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>

dtrace: new cyclic implementation

The original cyclic implementation (based on hrtimer_*()) failed
because it caused handlers to be called from interrupt context, which
causes quite some interesting (bad) side effects.

The change to tasklet_hrtimer_*() as underlying implementation solved
the context issues for handlers, but resulted in runaway timers that
could call handlers in modules that are no longer loaded, causing a
crash. Cause was related to a race between timer cancellation and the
tasklet restarting the timer.

The new implementation is a two layer approach where hrtimer_*() is
used to generate handler invocations requests, scheduling a tasklet for
handler call processing as needed, and using a counter to determine how
many times the handler needs to be called (if the timer fires more than
once between two subsequent tasklet processing schedulings). The
tasklet (one per cyclic - only scheduled when there has been one or
moretimer expirations) takes care of calling the handler.

More details are embedded in comments in the code...

Orabug: 17553446

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

dtrace: Use tasklet_hrtimer_*() instead of hrtimer_*() for cyclics

Orabug: 17553446

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

dtrace: fix for psinfo allocation during execve

Allocate the psinfo structure from a slab (alike other structures related to
the task_struct), and use kmalloc() for the argv and envp members (with size
limit to avoid allocation issues).

Orabug: 17407069
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

kbuild/ctf: Use shell expansion, not $(wildcard ...), for CTF section copying.

The stub module kernel/ctf/ctf.ko must contain sections for all CTF for the core
kernel, CTF shared between modules and CTF for every built-in module.  This set
can vary as the kernel configuration changes: it is computed by dwarf2ctf from
objects.builtin, modules.builtin, and analysis of the debugging information in
the object files comprising the kernel build.

This means that we must dynamically construct the command line for the objcopy
which inserts these sections into the final .ko, as it is not known until
recipes are executed.  This is done via the module-ctf-flags variable, which is
expanded into cmd_ld_ko_o.  We chose to use the GNU Make wildcard function to
determine the set of builtin modules, but this has turned out to be problematic,
because, though it is nowhere documented, the output of this function is often
cached if the directory exists.  glob lookups will rescan it as needed, but
$(wildcard) seemingly does not, and can often produce inaccurate or even empty
lists even when many files matching the wildcard have existed for some time.
This shows up as (often parallel) makes failing to insert any CTF sections into
kernel/ctf/ctf.ko, with DTrace subsequently failing because the CTF for the core
kernel and inter-module shared types was missing.

For now, work around this bug by using shell expansion instead.  (This
particular part of the module-ctf-flags is only expanded once, for one module,
and only after a period of computation by dwarf2ctf lasting many minutes, so no
negative impact on build times from this shell loop is expected.)

Orabug: 17445637
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

kbuild/ctf: always build vmlinux when building CTF.

Before now, a 'make modules' in a clean tree would neglect to build the built-in
objects and yield an empty objects.builtin, or no objects.builtin at all.

Correct generation of the CTF requires all the .o and .a files which go into
vmlinux, and a list of them in objects.builtin: without them, dwarf2ctf will
fail one way or the other.

While we're at it, remove all CTF-related references in the build system to
CONFIG_DTRACE and CONFIG_DT_DISABLE_CTF: CTF generation has been decoupled from
DTrace and has its own config symbol nowadays.

Orabug: 17397200
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: remove unnecessary exported symbol

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 17346878

dtrace: Ensure that USDT probes are carried over correctly across fork().

When a process forks, its child will have a copy of the address space of the
parent, and therefore any enabled USDT probes from the parent will also fire
for the child. In order for those probe firings to be valid, the child must
have its own pid-specific providers (created by duplicating the parent's
providers).

This commit also adds some additional cleanup.

Orabug: 17346878

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: fix retrieval of arg5 through arg9

Fix the retrieval of arguments passed on the stack for SDT, USDT, and direct
call probes. This commit also adds trivial support for testcases related to
this fix.

Orabug: 17368166

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: Ensure that task_struct members are initialized correctly

Due to an initialization issue with current->predcache, it was possible for the
predicate on a probe to never be evaluated because the dtrace_probe() code was
incorrectly assuming that there was a valid predicate cache result.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: ensure that builds in a separate objdir work

Orabug: 17369799

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

ctf: ensure the CTF directory exists before writing the filelist

If the CTF directory does not yet exist, we must create it before writing
the filelist into it.

Orabug: 17363469
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: avoid command-line length limits by passing .o filenames via a file

Historically, dwarf2ctf took the names of all object files it was to run over on
the command line.  This can lead to very long command lines, but since the kernel
now supports lines of almost unlimited length (up to RLIMIT_STACK / 4, which is
much longer than the names of all .c files in the kernel tree added together for
any reasonable stack size), this seemed safe.

It turns out not to be, because GNU Make sometimes passes command lines through
the shell even when it doesn't need to, and GNU Bash does not yet know that the
kernel has a very long command-line length limit and protests if it is
> ARG_MAX, which is much shorter.  We were trying to prevent the dwarf2ctf
command line being passed to the shell, but there is no documented way to do so
in GNU Make and clearly simply avoiding shell metacharacters is not always
enough.

So, instead, write out the list of object files to a file in the .ctf directory
and have dwarf2ctf read it in.  This takes some trickery in Makefile.modpost to
arrange to invoke a separate shell to add each filename to the filelist, but
it's not too complicated.  (To minimize the possibility of disruption, we
duplicate the code already used for reading blacklists in dwarf2ctf.  This
should be unified later.)

Orabug: 17363469
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: DT_FASTTRAP should select UPROBE_EVENT

Orabug: 17325699

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: Fix for the argument validation code.

The command line argument validation code failed to correctly identify some of
the potential failure cases, and sadly, also one specific valid use case. This
has been corrected.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 17313687

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: Include asm/current.h for the mutex_owned() fucntion.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 17313687

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>

dtrace: Bug fix for logic to determine the (inode, offset) pair for uprobes.

The logic used to determine the (inode, offset) pair needed by uprobes, and
caculated based on an address in a process memory space. was flawed. This
caused USDT probes in shared libraries to not work correctly.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: ensure memory allocation results are checked throughout the code

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: remove pre-alpha features for release

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: CONFIG_UPROBES is needed by CONFIG_DT_FASTTRAP, not CONFIG_DTRACE

So move it.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: CONFIG_DTRACE should depend on CONFIG_UPROBES

Since DTrace uses uprobes, it needs to select it, or we end up with most of
uprobes not being compiled in. In this situation, you can still *try* to
register a uprobe, but it will always fail.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

wait: fix loss of error code from waitid() when info is provided

One of the review comments on the original waitid patches suggested using the
pattern

ret = __put_user(...);
ret |= __put_user(...);
...

rather than

if (!ret)
ret = __put_user(...);
if (!ret)
ret |= __put_user(...);

This turns out to be a bad idea if ret is used for anything else first, e.g. if
it is used to store errnos from waitid(), since it overwrites any nonzero error
with the return value of __put_user(), even if that is zero.

The solution is to use a temporary error variable instead, and not assign it to
the actual return value if the temporary error variable is still zero after
doing all the __put_user()s (as is the common case).

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

waitfd selftest: dike out some dead code.

Trivial cleanup.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

epoll, wait: introduce poll_wait_fixed(), and use it in waitfds

The poll() machinery expects to be used with files, or things enough like files
that the wake_up key contains an indication as to whether this wakeup
corresponds to a POLLIN / POLLOUT / POLLERR event on this fd.  You can override
this in your poll_queue_proc, but the poll() and epoll() queue procs both have
this interpretation.

Unfortunately, it is not true for waitfds, wihch wait on the the wait_chldexit
waitqueue, whose key is a pointer to the task_struct of the task being killed.
We can't do anything with this key, but we certainly don't want the poll
machinery treating it as a bitmask and checking it against poll events!

So we introduce a new poll_wait() analogue, poll_wait_fixed().  This is used for
poll_wait() calls which know they must wait on waitqueues whose keys are not
a typecast representation of poll events, and passes in an extra argument to
the poll_queue_proc, which if nonzero is the event which a wakeup on this
waitqueue should be considered as equivalent to.  The poll_queue_proc can then
skip adding entirely if that fixed event is not included in the set to be
caught by this poll().

We also add a new poll_table_entry.fixed_key.  The poll_queue_proc can record
the fixed key it is passed in here, and reuse it at wakeup time to track that
a nonzero fixed key was passed in to poll_wait_fixed() and that the key should
be ignored in preference to fixed_key.

With this in place, you can say, e.g. (as waitfd now does)

poll_wait_fixed(file, &current->signal->wait_chldexit, wait,
POLLIN);

and the key passed to wakeups on the wait_chldexit waitqueue will be ignored:
the fd will always be treated as having raised POLLIN, waking up poll()s and
epoll()s that have specified that event.  (Obviously, a poll function that
calls this should return the same value from the poll function as was passed
to poll_wait_fixed(), or, as usual, zero if this was a spurious wakeup.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: no longer reference 'ctf.ko.unsigned' in CTF debuginfo stripping machinery

We used to generate .ko.unsigned modules, which were renamed to .ko after
signing: so when stripping the debugging information out of ctf.ko, we had to
consider the possibility that we might have to strip it out of ctf.ko.unsigned
instead. With the new signing machinery, this is no longer the case, and that
hack can be removed.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

wait: add waitfd(), and a testcase for it

This syscall, of prototype

int waitfd(int which, pid_t upid, int options, int flags);

yields a pollable file descriptor which yields a 'struct siginfo_t' whenever
waitid() or waitpid() would return (when child processes die or ptrace()d
tracees undergo an appropriate state change).

The which, upid and options arguments are as to waitid(); the flags argument is
fd flags as to open() or fcntl(F_SETFL), to which O_RDWR is automatically added.
WNOHANG in the options is automatically translated into O_NONBLOCK in the flags,
and vice versa.

No compat wrappers are in place for this syscall: 32-bit calls with a 64-bit
kernel will return a 64-bit version of 'struct siginfo'.

Current bugs:
- select/poll/epoll is not waking up the process yet, even when it should.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: ensure that arg6 through arg9 get retrieved correctly for USDT probes

A bug in the implementation of retrieving arguments from the stack caused
bogus values to be returned for arg6 through arg9 on x86_64. This has been
resolved.

This commit also removes various debugging output that is no longer needed.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: finish the implementation of is-enabled USDT probes

This commit completes the implementation of is-enabled USDT probes, i.e. probes
that when fired cause an execution flow change.  This makes it possible (when
using USDT) to encode more complex code sections that are only executed if a
specific USDT probe has been enabled, i.e. when a consumer is listening for
s specific USDT probe.  This is most commonly used for cases where a USDT probe
might require additional computation for one or more of its arguments, and so
it would be too expensive to always do that computation, whether the probe is
enabled or not.  With is-enabled probes, the overhead when DTrace is not used
is negligible (2 NOPs), and when DTrace is in use but the guarded probe is not
enabled, the cost is a single probe firing (without calling dtrace_probe()).

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: fixes for tracepoint cleanup

Various fixes to handle tracepoint cleanup.  It is important to note that it is
most common that USDT providers will be cleaned up (asynchronously) when the
process/task they relate to is already gone.  We therefore cannot use a (pid,
addr) pair to identify the tracepoint for removal.  The new implementation
stores the (inode, offset) pair calculated right before registering the uprobe,
so that we can use that same pair again when unregistering.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: update syscall tracing in view of Linux 3.8 changes

The handling of various stub-based syscalls in Linux 3.8 changed to no longer
use the saved registers directly. This change is now also reflected in the
DTrace overrides for those system calls.

Also reset the compilatio debug settings to all-off (they should only be
turned on in local compilations - never in the repo.)

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: USDT implementation (phase 2)

This commit contains the 2nd phase of the USDT implementation for DTrace for
Linux.  It provides the mechanics of fasttrap probes (enabling, firing, and
disabling).  It also contains various debugging statements that will be cleaned
up in a future commit.  They exist right now, because this commit represents a
work in progress that is known to contain various bugs and loose ends.  The
commit for this code is provided as a "sharing of the pain" , and to ensure
that others can start playing with this code and test the interaction with
userspace.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: revamp and split up DTrace headers; add ioctl() debugging machinery

It has always been annoying that we have a duplicate set of DTrace headers
in userspace, and further annoying that the DTrace header we have is such a
monolithic monster.

This fixes both of these, at the cost of a (very) little extra complexity when
maintaining the headers.  It also adds automated machinery to verify that each
of the new headers is 'standalone enough'.

What does 'standalone enough' mean?  This is the problem that has stopped us
sharing DTrace headers between userspace and kernelspace for a long time: they
depend on types that come from different places and have different definitions
in the kernel and in userspace, and can never be made to come from the same
place.

So we fix this by dictating that the headers are standalone *given* that certain
headers are included first.  For userspace, this set is <sys/types.h>,
<ctf/types.h>, and <unistd.h>; for kernelspace, the set is <linux/types.h> and a
newly-introduced header included as <dtrace/types.h>.  We avoid exposing this
requirement to header users by arranging for the header included as "dtrace.h"
in kernelspace and as <sys/dtrace.h> in userspace to include the prerequisite
headers, and enough other headers that the users of those headers can keep
using them exactly as they did before.  (There is a single exception for
dtrace/ioctl.h: see below.)

Where have the headers gone?  They have been split in two directions (and the
Kbuild machinery has been adjusted to have its include paths point at the
appropriate places).

All DTrace headers, even <linux/dtrace_cpu.h> in the global kernel tree, have
had 'defines headers' split out of them, named ${header}_defines.h.  These
headers contain all typedefs, enums and #defines which do not depend on
visibility into structure definitions declared in the main non-defines header,
and opaque forwardings for all structure definitions declared in that header.
This means that users can include the defines header if they only want opaque
use of structures and relevant constants.  (This has necessitated some
mechanical changes to the DTrace headers to convert uses of foo_t typedefs into
'struct foo' where necessary.  Not all uses have been converted: only those that
need to be).  A _defines header is always accompanied by a corresponding non
_defines header (even if it is just one #include), but the reverse is not true.

In addition to this split, the core dtrace.h header has been split into three
pieces:

- dtrace/include/uapi/linux/dtrace/*.h contains the majority of the headers,
   split into _defines and further by section roughly corresponding with the
   comment-delimited sections in the original file.  The include paths have been
   set up so that these can be used via

   #include <linux/dtrace/blah.h>

   in both userspace and kernelspace.

   #include <linux/include/dtrace.h> includes all the headers in the same order
   as they were originally.

   These headers are installed in userspace and used by dtrace-util.  They
   should always contain a CDDL license header, and should not use any
   kernel-specific types (modulo those that userspace normally uses, such as
   those related to kernel-specific functionality such as ioctl()).  If you use
   a new kernel-specific type, please add a definition of it to
   uts/common/sys/dtrace_types.h in userspace too.

   The old dtrace_ioctl.h has been renamed to <linux/dtrace/ioctl.h> and
   moved in here too.  It has gained some extra machinery to help debug
   ioctl() type size conflicts.  If you call dtrace_ioctl_sizes(), then
   dtrace_size_dbg_print() (not defined here) is repeatedly called with
   two parameters, the name of each ioctl() type and the size of that type.
   Please keep this list up to date, it is useful!

   There is one unfortunate exception to the userspace-types rule here:
   ioctl.h needs types from <linux/dtrace_cpu_defines.h>, which is not
   even a userspace-installed header.  So userspace must provide a copy of
   this header with appropriate typedefs as well.  (This should probably
   be fixed in due course.)

   <linux/dtrace/universal.h> defines constants and types used by virtually
   everything in any way related to dtrace.

- dtrace/include/dtrace/*.h contains headers that are not shared with
   userspace, included as <dtrace/blah.h>:

   - <dtrace/types.h>: this contains the kernel-side definitions of types
     used in the shared headers.  (It has not been 'gardened' in any way,
     so probably contains a lot of other types as well).  This header is
     installed into the same place as the shared userspace headers.  This
     header needs a bit of care maintaining, as not everything kernel-
     side is allowable in it: see below.

   - <dtrace/provider.h>: The provider API, and its corresponding
     <dtrace/provider_defines.h> defines header.  This includes <dtrace/types.h>
     itself, so should be standalone -- however, this has not been in any way
     tested yet, unlike for the userspace headers.  This header is also
     installed into the same place as the shared userspace headers.

   - <dtrace/dtrace_impl.h> and <dtrace/dtrace_impl_defines.h>.

     These headers contain definitions used only by the DTrace core, and are
     not installed anywhere.

Note that because of the rules regarding kernel-specific types in the UAPI
DTrace headers, a number of uses of CONFIG_64BIT and CONFIG_BIG_ENDIAN have been
reverted to their Solaris-era-and-userspace _LP64 and _LITTLE_ENDIAN forms;
<dtrace/types.h> translates between the two.  (We have also fixed up a
couple of places in core DTrace where the nonexistent _BIG_ENDIAN was used.)

There are several things called dtrace.h now, which might get confusing:

- dtrace.h at the top level is for the DTrace core and included providers
  alone.  Anything goes in here.  It is never installed anywhere, and not even
  standalone providers can see it.

- dtrace/include/uapi/linux/dtrace/dtrace.h is shared with userspace and with
  standalone providers, and needs to follow the same rules as all such shared
  headers.

We have two new packages, dtrace-modules-$kver-headers and
dtrace-modules-$kver-provider-headers; to make it easy for people to Require
them, they provide features named 'dtrace-modules-headers' and
'dtrace-modules-provider-headers' using the same incrementing API version number
as 'dtrace-kernel-interface' (both currently 1).  dtrace-modules-headers serves
much the same purpose as dtrace-kernel-interface, tracking changes in the
userspace/kernelspace API: the dtrace-modules-provider-headers version number
tracks changes in the DTrace core/provider API.

The top-level Makefile has acquired two new rules, headers_install and
headers_check.  The former is a simple emulation of the top-level
headers_install rule, and is used by the RPM build system.  The latter checks to
be sure that the userspace-side headers can be #included on their own
(modulo only <unistd.h> et al, as above), and that dtrace/ioctl.h has all its
types in scope with appropriate sizes, since if they aren't everything will
appear to work until userspace tries to invoke the ioctl() and gets an -ENOTTY
error.

The latter is particularly difficult, since the size-checking machinery only
works for kernelspace builds, and the headers_check build is userspace.  So that
machinery hacks up a sort of halfway-house to the kernel environment, enough to
do the ioctl() size checks but not enough to #include arbitrary kernel
headers (among other things, there are no CONFIG constants here).  This means
that changes to those kernel-side headers which are included by this
machinery (in particular dtrace/types.h, dtrace_os.h) need to include a
headers_check run to make sure that machinery hasn't broken.  In general,
surrounding suspect definitions with an #ifndef HEADERS_CHECK should suffice:
nearly all of dtrace_os.h is so surrounded (and if we ever move dtrace_id_t out
into the shared headers, as perhaps we should, this problem will become less
serious).  (Feel free to make the headers_check dtrace/ioctl.h compilation
environment more like that used for the real kernel.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: blacklist certain structure members entirely

The problem of structures with identical names but conflicting members has
bitten dwarf2ctf before.  We defined a deduplication blacklist which prevents
specific modules from participating in deduplication, because they define a
structure with the same name as one in the shared type repository but with
different members.  This is, it turns out, not always enough.

Some kernel modules have begun to define structures with conflicting members in
different translation units within the same module.  There is no trick we can
use to help dwarf2ctf deal with this: each module gets one CTF file, covering
all that module's translation units, and if types have conflicting definitions
within that one module, there's nothing we can do: we must skip them entirely.
But we can limit the damage somewhat.

This commit adds a new blacklist, the 'member blacklist', stored in
scripts/dwarf2ctf/member.blacklist, which blacklists structure or union members
by name.  There are some limitations: only members of named structures and
unions can be blacklisted (not members of typedeffed, unnamed structures or
unions); and you can only blacklist types in the kernel tree, not types in
external modules.  Both of these restrictions can be lifted if it ever becomes
necessary, the latter quite easily.

The blacklist is of the form

filename:structure name.member name

The filenames are absolutized and compared with the filenames in the DWARF for
each structure member, both at emission time and when recursing to mark shared
types.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: repair faulty indentation

Part of assemble_ctf_su_member() got misidented.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: split the absolute-file-name caching machinery out of type_id()

We will be needing similar caching elsewhere in a forthcoming commit.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: sentinelize str_appendn()

str_appendn() takes a variable argument list of items to append, terminated by a
NULL. Every single time I have used it, I have forgotten the NULL.

Add a sentinel attribute to make it harder to forget in future.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ptrace: Add PTRACE_GETMAPFD.

This request returns (in 'data') the file descriptor backing the mapping that
covers a given 'addr'.  This works even if the mapped file is unlinked and not
open anywhere besides this mapping.

New errors:

-BADFD if the address is an anonymous mapping
-EFAULT if the address is not mapped at all
-PERM if the LSM does not permit the receipt of this file descriptor

Note that because the original fd is returned, you can access portions of it
outside the range of the original mapping (the use case for this, involving
acquiring ELF headers for executables and shared libraries that were unlinked
after mapping, relies on this).

This does not introduce a security hole because in order for the ptraced process
to mmap any file in the first place it must have had an fd to it, and the
ptracee could have accessed the outside-original-mapping portions at that point,
or simply forced the ptraced process to send it the fd via a Unix-domain socket
and held on to it.  There is no danger that an execute-only process can be read
by this mechanism either, since you cannot PTRACE_ATTACH to such a
process.  (Shared libraries cannot be execute-only at all.)

PTRACE_GETMAPFD is provisionally of value 0x42A5, in the architecture-
independent addition range, out-of-the-way so as not to collide with other
additions.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: update execve() syscall probe support

The execve syscall stub and entry syscall function changed from 3.6 to 3.7 to
no longer pass a pointer to the registers from the stub to the entry syscall
function. Instead, the entry syscall function now retrieves the registers
using current_pt_regs(), and passes it to the do_execve() function. The DTrace
stub and override function have been updated to use the same mechanism.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: add support for an SDT probe getting called from multiple functions

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: move SDT call location for surrender probe

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: USDT implementation (Phase 1)

This rather large patch provides the implementation for USDT support at the
DTrace kernel core level, and the fasttrap provider level. It ensures that
executables can register their embedded providers (with USDT probes), that
the probes are visible to the dtrace userspace utility, and that probes are
properly removed upon executable completion, execve() invocation, or any
unexpected executable termination.

The following parts are provided by this patch:
- meta-provider support (dtrace_ptofapi)
- helper ioctl interface (dtrace_dev)
- DIF validation for helper objects (dtrace_dif)
- DOF processing for helpers for provider and probe definitions (dtrace_dof)
- fasttrap meta-provider for USDT only (fasttrap*)

The dtrace_helper.c file was removed because this code belongs in dtrace_dof.c
instead.

Minimal changes were made to the core kernel in exec.c, sched.h, exit.c, and
fork.c to add support for process-specific helpers (and those encapsulate
providers and probes).

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: remove incorrect FBT support code

Code that was supporting an early implementation of FBT was still left hanging
in the source code tree. This patch removes it in preparation for a more
generic and above all, correct (and stable) implementation.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: move psinfo to its own header file

The psinfo structure definition and related function prototypes have been moved
to their own header file to avoiding including dtrace_os.h in sched.h. This is
also one step closer to providing the psinfo as a more generic piece of info,
rather than it being dtrace specific.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: update copyright statements

Various updates were made to files in the past months, and copyright statements
were never updated to reflect that.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

ctf: update the shared CTF file right after initialization

If we don't do this, the void and function pointer types are not available for
lookup until something else gets added to the shared type repository and
triggers an update, so insertions of types depending on such types into the
shared type repository will fail until that happens.  Since function pointers
are seen a lot in structures in the Linux kernel, such a failure for a member of
a structure means that all later members of the structure are skipped, and since
a major cause of insertion into the shared type repository is recursive
insertion from structure members in headers, this has the effect of losing track
of quite a lot of types first seen in translation units mentioned early on the
dwarf2ctf command line.  Fortunately, since they haven't been successfully
inserted, insertion is retried later and succeeds.  So this bugfix fixes a
latent bug only.

We would have seen this long ago as a bunch of error output were it not for a
bug in libdtrace-ctf leading to bad ID errors in CTF lookups that recurse to
parents not being emitted anywhere we were looking for them.  Fixing that bug
requires us to fix this one, lest we get bombed with error messages
henceforward.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: Improve debugging and indentation fixes

Reporting the type ID helps us determine when types used later on are
introduced.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: dwarf2ctf doc revisions

Mostly small rewordings and reshufflings for clarity, and the introduction of a
program-wide flow graph, and a flow graph in each section. (One hopes they
don't get out of date too fast.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: internal performance measurement support code

The dt_perf provider implements a few probes that are used in performance
(or more accurately put, overhead) measurements. It uses an ioctl()
interface to trigger N-count interations of invoking probes through
various mechanisms, and a probe to post the results back to userspace.
This code also adds an SDT probe in the DTrace kernel support code, just
to measure overhead for triggering trap based SDT probes.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

kvm / dtrace: disable KVM steal-time accounting when DTrace is in use

This feature does clocksource work in interrupt context, which interoperates
badly with the DTrace probes there.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: cosmetic improvements to CTF linking

DWARF2CTF is too long a tag: CTF aligns better.

We should also not carry around the CONFIG_MODULE_SIG ctf module-name-setting
code while module signing is not present in this kernel.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: remove a few obsolete probes

The handle_sysrq, init_module and delete_module probes were for testing purposes
only.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: cater for changes in the way the kernel is linked

3.6 started linking using a shell script, scripts/linux-vmlinux.sh, rather than
the old hair in the top-level Makefile. The SDT stub creation has to move in
there too.

Also, skip debugging sections explicitly when hunting for SDT probes in objdump
output.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: miscellaneous 3.6 porting work

Missing headers, catering for header movement, the occasional missing
prototype, and changes in the way the syscall table is built.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: fix up rq.dtrace_cpu_info member

This moved into sched.h in 3.6, so the member must move there too.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

gitignore: Ignore objects.builtin and dwarf2ctf.

These are generated files.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: fix outright typos in the 3.6 forward-port.

The mass cherry-pick of 3.6 support introduced some build-breaking typos. This
fixes them (fixing here rather than where introduced for simplicity's sake and
because we don't really expect past history to be compilable against 3.6 without
changes).

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: remove obsolete static probe documentation

This is outdated and useless.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

ctf: DTrace-independent CTF

These largely cosmetic changes remove mention of DTrace from the CTF code,
making it clear that it is freely usable by non-DTrace consumers.

The changes are:

- dtrace_ctf.ko is now named ctf.ko, and is stored in kernel/ctf rather than
   kernel/dtrace, controlled by a new CONFIG_CTF Kconfig option select'ed by
   CONFIG_DTRACE.  (CONFIG_DT_DISABLE_CTF, being largely a DTrace debugging
   option, remains under DTrace configure control).  The function used to
   trigger loading of ctf.ko has changed name similarly, from
   dtrace_ctf_forceload() to ctf_forceload().

- The CTF section names have changed, from .dtrace_ctf.* to .ctf.* (which as a
   bonus is more obviously related to the .ctf directory long used to store the
   CTF data during the build process).

- The shared CTF repository is now stored in .ctf.shared_ctf instead of
   .dtrace_ctf.dtrace_ctf, making its intended use somewhat clearer.

These changes depend on a suitably changed libdtrace-ctf: a suitably changed
userspace is needed to take advantage of them.  The dtrace-kernel-interface is
bumped accordingly.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: do not build in CTF data for no-longer-built-in modules

The module-ctf-flags code in Makefile.modpost uses $(wildcard) to include CTF
for all built-in modules (named *.builtin.ctf) inside dtrace_ctf.ko without
needing to know what its name is. Unfortunately, if the .config is changed to
make a built-in module modular, or to not build it at all, a stale .builtin.ctf
file persists, and is built in to dtrace_ctf.ko despite the absence of any
module corresponding to it.

Since all .builtin.ctf files are regenerated (named .builtin.ctf.new) on every
dwarf2ctf run, the solution is to delete .builtin.ctf files without corresponding
.new files on every dwarf2ctf run.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: document dwarf2ctf

dwarf2ctf is complex enough and has enough tricky corners that it really, really
needs some documentation.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: Extend the deduplication blacklist

A quick audit of the kernel suggests two more modules that need blacklisting
from deduplication due to #inclusion of translation units also used in other
modules while using #defines to change the definitions of structures defined in
those translation units. Neither module is built by default by OEL, and both
are ancient ISA sound cards so are unlikely to be encountered by anyone else
either.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: Improve error message on internal deduplication error

We should at least emit the name of the variable we are looking up (if possible)
as well as the name of the variable we are working over. (Often this will be
unknown, but not always.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: Note a future enhancement

In -DDEBUG mode, we should not assume that identically-named structures with
different numbers of members are declarations of the same structure with a
shared prefix, but should validate it. (The compiler cannot always verify this,
e.g. if the same structure is defined in multiple translation units.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: document parameters to die_to_ctf()

This function has enough parameters that systematically describing them is
necessary if one is not to become frequently confused.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: do not construct objects.builtin if CTF is not being built

This file is used only for the dwarf2ctf run inside Makefile.modpost. That run
is conditionalized on CONFIG_DT_DISABLE_CTF, so objects.builtin should be
conditionalized on that too.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: do not build dwarf2ctf nor attempt to use it if !CONFIG_DTRACE

We were already not using it for the CONFIG_DT_DISABLE_CTF case, but not
suppressing its build nor noting the !CONFIG_DTRACE case properly.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: additional action support (and bug fixes)

Removed ASSIST_* definitions because they are no longer necessary (though they
may come back in the future).

Changed the behaviour of DTrace in interrupt context to base it on in_irq()
rather than in_interrupt().

On Linux it is always safe to dereference current, so there is no need to do
special casing on various process-based DIF functions. There is no need to
fake values coming from the 0-pid process.

Added curcpu variables.

Added d_path() function. This takes a struct path and turns it into a string.

Renumbered the register IDs to match the xlator support at userspace, and to
also match the on-stack order of registers.

Have dtrace_getreg() operate on the task rather than just a set of registers,
because (in 64-bit mode) segment registers have their value stored in fields
in the process-specific task info.

Implemented the raise() action.

Changed the deadman interal to 10s, and timeout/user to 120s.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: add psinfo/cpuinfo OS level support

Added a member to the task structure, to hold DTrace specific process info.  It
stores (up to) the first 79 characters of the command line (with arguments),
as required for the pr_psargs elements in psinfo_t.  It also stores the number
of initial arguments (pr_argc), and the initial argument (pr_argv) and
environment variable (pr_envp) vectors.

This process information is pre-populated when an execve takes place in the
current task.  Note that if a process alters the arguments, this altertation
will be visible when pr_argv is consulted, and thus returned argument strings
may not match the originally provided values.

Enforce that invop handlers return a uint8 value, to be used in the future for
knowing what instruction to emulate upon return from an invop trap.

Added per-cpu CPU information that is used to provide cpuinfo_t data.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: change the DTrace startup handling (at boot time) for SDT

The DTrace OS level handling was initialized at DTrace module load, which
caused major indigestion on the side of the scheduler when SDT probe points at
crucial locations in the scheduler were being patched by one CPU while another
was trying to get some real work done.  Even a nice stop_machine() based
approach turned out not to be possible, because that *cough* depends on the
scheduler also.

Instead, the DTrace OS support is initialized from the Linux boot sequence,
before SMP is enabled, which removes the complications altogether (and it is a
lot cleaner and faster).  We also call CPU-specific initialization for DTrace
during the boot sequence, albeit *after* the CPUs have been identified for SMP,
to ensure that we get accurate information.

Renamed sdt_register.c to be dtrace_sdt.c (for consistency).  And implemented
a better patching of SDT probe points.

Added a 'nosdt' kernel command line option to allow system wide diabling of
SDT probe points (at the kernel level).  This can be used when the patching of
SDT probe points somehow causes a problem.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: cleanup (and adding) of SDT probe points

Changed io SDT probe points to be located at the buffer_head level rather than
the bio level. This may need to be revisted depending on further analysis,
but doing it this way provides consistent semantics that were not guaranteed by
the previous bio-based placement.

Changed the sched STD probes to not pass irrelevant arguments, and to pass
specific runqueue CPU information. The CPU information is not available from
the task structure, so it needs to be passed explicily.

Added proc SDT probes start and lwp-start.

Added proc SDT probes for signal-discard and signal-clear.

Corrected the argument to the exit proc SDT probe, which should indicate the
reason for the process termination (exit, killed, core dumped) rather than the
return code of the process.

Provided argument information for all the new (and changed) SDT probe points.
This depends on working xlator support in userspace.

Enabling of SDT probes now uses a generic dtrace_invop_(enable|disable) rather
than SDT-specific functions.

SDT probes are not destroyed correctly, to ensure that subsequent uses will not
result in unpleasant events.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

ctf: write the CTF files for standalone modules to a subdir of the module dir

Writing all ctf to the .ctf subdirectory of the kernel directory is problematic
when building standalone modules, when the kernel directory may be unwritable.
So write it to the .ctf subdirectory of the module directory in this case
as well. (The old .ctf relative path was hardwired into dwarf2ctf, so this
too is changed to accept the path to write the CTF files to as the first
parameter, in both non-standalone and standalone mode.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: unnamed structure/union support

Since neither CTF nor DTrace userspace have support for unnamed structure/union
members, we have to cheat a bit.  We can model an unnamed structure member as
being precisely equivalent to simply naming the structure members in the
enclosing structure, with their offsets biased by the offset of the unnamed
member in its enclosing structure.  An unnamed union is the same, excepting the
overlapping offsets, which we don't need to pay any attention to since we
already get all our offset information directly from the debugging information
anyway.

So we handle this by detecting an anonymous member after offset computation,
skipping to its type DIE's first child (if any), and calling die_to_ctf()
directly with that child, so that die_to_ctf() works over the anonymous member's
members as if they were members of the enclosing structure, skipping all the
usual addition of that structure as a CTF entity in its own right.  We handle
the offset biasing by adding a parent_bias to die_to_ctf() and all CTF
construction functions, and adding that bias to all structure member offsets.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: recurse_ctf() -> die_to_ctf()

This function is badly named: sure, it's recursive, but so are half a dozen
other functions in dwarf2ctf. Its callers do not care that it is recursive:
they care that its function is to translate a DWARF DIE to CTF.

So rename it accordingly.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: fix the signed-modules case

The code in Makefile.modpost's module-ctf-flags variable which computes the name
of the CTF file, given the name of the kernel module being linked, was torpedoed
by the name of the unsigned module that is linked when module signing is in use.
So introduce a new ctf-module-name variable that substitutes the name
appropriately given the state of module signing.

Also, fix up some related places where I used spaces instead of tabs by mistake.

(3.6: most signed-modules code omitted, but a bit of supporting code remains
in readiness for signed modules in 3.7.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: correctly propagate IDs for array types

We were constructing things of array type wrongly, for a rather interesting
reason.

DWARF describes an array by putting its base type in the parent DIE, then
describing its bounds in the child DIE.  Unfortunately, this puts us in an
unfortunate position: we always visit the parent before its children (so we can
build things like structures before filling their members in) -- but you can't
build a CTF array without knowing what its bounds are ('flexible' is not the
same as 'we don't know yet') -- and we *do* need the base type to be constructed
anyway.  So we constructed the base type when working over the parent DIE, then
wrapped an array around it when we visited the children where the dimensions
were described (possibly more than one for a multidimensional array: though GCC
happens not to emit those for C, it is permitted to, and handling it is easy, so
we do).  Unfortunately, recurse_ctf() throws away the type ID returned from
child DIEs, so the CTF ID that gets stashed for assignment to things of that
array type (when they are looked up via lookup_ctf_type()) turns out not to be
an array type at all, but the base type!

So we add an optional override parameter to recurse_ctf() and to all the
construction functions: it is passed only by recursive recurse_ctf() calls (when
processing child DIEs): it is set nonzero by construction functions that wrap
and replace the ID from their parent DIE, and when it is set, the type ID
returned by the construction function replaces the type ID that recurse_ctf()
passes back to its parent.  Now assemble_ctf_array_dimension() just needs to
set this override parameter, and all is well.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: fix off-by-one in emitted array bounds

We were treating arrays described by DW_AT_count and DW_AT_upper_bound
identically, but in a language like C with zero-based arrays they are not:
DW_AT_upper_bound does not give the number of members unless you add one
to it.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: fix tiny comment typo

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: blacklist certain modules from deduplication

sound/pci/ens1371.c #includes another file, ens1370.c, with a #define
that changes the definition, but not name, of a single structure.

While this grotesquerie is permitted in C, there's no way that translation units
that engage in it can be permitted to share types with other translation units.
More specifically, types defined in such TUs must not be permitted to transform
a non-shared type to shared by virtue of their being detected in such TUs.

I'd like to detect the redefined structures themselves, but since the
preprocessor trickery leaves no mark in the DWARF another pass would be
necessary just to detect this. It's easier -- and faster -- to introduce a
blacklist of modules that do things like this and simply turn deduplication
scanning off for these modules. (Because they are still allowed to reuse
duplicates found in other modules, this does not increase their size
appreciably.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

ctf: include enumeration types inside functions

Like structures and unions, enumerations are a named type in their own
namespace: like structures and unions, arrays and other types based on such
types are represented by a DWARF DIE outside all functions. So the duplicate
detector must treat them like structures and unions, and include them even
if they are inside functions.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: new IO and sched provider probes

New IO provider probes: start, done, wait-start, wait-done

New sched provider probes: enqueue, dequeue, wakeup, preempt, remain-cpu,
change-pri, surrender

(Note that the preempt probe currently passes a debugging argument that
will be removed in the future to match the argument-less version in the
documentation.)

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: fix to handle multiple SDT-based probes in a single function

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: require assembler symbol stripping and debug info

The former is needed because dt_module.c doesn't know how to ignore assembler
labels when reading module symbol data: the latter because dwarf2ctf reads the
types out of debug info.

Use select rather than require because both of these requirements are distinctly
non-obvious and we don't want to force people to hunt about for them.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>