The dtrace_dif_variable() function is inlined during some compilations
and not during others, causing the number of frames to be skipped in
DTrace kernel stack traces to not be a constant. That causes incorrect
values for stackdepth to be reported.
This commit requests dtrace_dif_variable() to always be inlined, and
adjusts the aframes values in function of the inlining.
Orabug: 25872472 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Reviewed-by: Tomas jedlicka <tomas.jedlicka@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Kris Van Hees [Wed, 15 Mar 2017 16:52:01 +0000 (12:52 -0400)]
dtrace: incorrect aframes value and wrong logic messes up caller and stack
Due to a mistake in how we compensate for the potential ULONG_MAX
sentinel value being added to kernel stacks on x86_64 (by the
save_stack_trace() function), the caller was always reported as 0.
This in turn was hiding a problem with the aframes values that are
used to ensure we skip the right amount of frames when reporting a
stack, caller, and calculating the stackdepth. Effectively, it tells
the stack walker how many frames were added to the stack due to DTrace
processing.
Orabug: 25727046 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Wed, 15 Mar 2017 03:20:52 +0000 (23:20 -0400)]
dtrace: ensure we pass a limit to dtrace_stacktrace for stackdepth
When determining the (kernel) stackdepth, we pass scratch memory to the
dtrace_stacktrace() function because we are not interested in the actual
program counter values. However, we were passing in 0 as limit rather
than the actual maximum number of PCs that could fit in the remaining
scratch memory space.
We now also add no-fault protection to dtrace_getstackdepth().
Orabug: 25559321 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Fri, 3 Mar 2017 02:02:01 +0000 (21:02 -0500)]
dtrace: comtinuing the FBT implementation and fixes
This commit continues the implementation of Function Boundary Tracing
(FBT) and fixes various problems with the original implementation and
other things in DTrace that it caused to break. It is done as a single
commit due to the intertwined nature of the code it touches.
1. The sparc64 fast path implementation (dtrace_caller) for the D 'caller'
variable was trampling the %g4 register which Linux uses to hold the
'current' task pointer. By passing in a dummy argument, we ensure
that we can use the %i1 register to temporarily store %g4.
2. For consistency, we are now using stacktrace_state_t instead of
struct stacktrace_state.
3. We now call dtrace_stacktrace() under NOFAULT protection.
4. The ustack stack walker has been rewritten (in the kernel), so the
previous implementation has been removed.
5. We no longer process probes when the kernel panics, to avoid DTrace
disrupting output that could be crucial to debugging.
6. We now ensure that re-entry of dtrace_probe() can no longer happen,
except for the ERROR probe (which is by a re-entry by design).
7. Since FBT now works, the restriction to only support SyS_* functions
has been removed.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com> Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Orabug: 21220305
Orabug: 24829326
Alan Maguire [Mon, 23 Jan 2017 15:18:31 +0000 (15:18 +0000)]
dtrace: introduce and use typedef in6_addr_t
This is for consistency with the similar typedef in_addr_t: we have
to use the typedef in at least one place in the module so that the
compiler incorporates it into the DWARF and it ends up in the CTF
section. (Both the DTrace ip translators and, likely, the users
would expect that if one typedef exists, the other one does too.)
Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Orabug: 25557554
Kris Van Hees [Sat, 17 Dec 2016 23:08:44 +0000 (18:08 -0500)]
dtrace: SDT cleanup and bring in line with kernel
This commit performs some cleanup on the SDT provider, removing some
housekeeping tasks that are no longer needed (such as the need for an
arch-specific sdt_provide_module_arch() function).
This commit also contains a fix for the loop used in enabling and
disabling probes. It was failing to ensure that the enable/disable
function was being called with the correct SDT probe.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Sat, 17 Dec 2016 23:08:44 +0000 (18:08 -0500)]
dtrace: fix preemption checks
The macros to verify whether the current execution can be preempted
were wrong. This commit fixes that. It also ensures that we call the
functions (or macros) provided for enabling/disabling preemption by
the kernel itself.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Sat, 17 Dec 2016 23:08:44 +0000 (18:08 -0500)]
dtrace: when calling all modules do not forget kernel
For DTrace, the kernel is represented as a pseudo-module. When a loop
is made over all loaded modules in ordder to call a function for each
one of them, we need to also call that function for the kernel
pseudo-module.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Sat, 17 Dec 2016 23:08:44 +0000 (18:08 -0500)]
dtrace: remove cleanup_module support
There is no need anymore for providers to call a cleanup_module
function in provider modules. The functionality that this function
provided is being rewritten.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Wed, 23 Nov 2016 18:24:10 +0000 (18:24 +0000)]
dtrace: is-enabled probes for SDT
This is the module side of the is-enabled probe implementation. SDT
distinguishes is-enabled probes from normal probes by the leading ? in
their sdpd_name; at probe-firing time, the arch-dependent code arranges
to return 1 appropriately.
On x86, also arrange to jump past the probe's NOP region. There was no
need to do this before now, because a trap followed by a bunch of NOPs
is a perfectly valid instruction stream: but is-enabled probes have a
three-byte sequence implementing "xor %rax, %rax", and overwriting only
the first byte of that leaves us with a couple of bytes that must be
skipped. On SPARC, we drop the necessary return-value-changing
instruction into the delay slot of the call that used to be there
before we overwrote it with NOPs;: the instruction already there
is setting up the function argument-and-return-value, which is 0
when the probe is disabled, so we can overwrite it safely.
(We make minor adjustments to allow sdt_provide_probe_arch() to
safely modify the sdp_patchpoint.)
Finally, add a test use of an is-enabled probe to dt_test, used by the
DTrace testsuite.
[nca: sparc implementation, ip address adjustment, commit msg] Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 25143173
Nick Alcock [Wed, 14 Sep 2016 10:05:51 +0000 (11:05 +0100)]
dtrace: add a test probe with an empty translation or two
The empty translation case gets tested nowhere in the kernel, so add a
probe to dt_test with a couple of args that get thrown away by the
translation machinery.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24661801
Nick Alcock [Wed, 14 Sep 2016 09:52:17 +0000 (10:52 +0100)]
dtrace: parse sdpd_args to handle sdt_getargdesc() rather than hardwiring
This takes the new sdpd_args element in the sdt_probedesc_t and
translates it into argdesc array entries for sdt_getargdesc() to return.
This is not really a parser but a tokenizer, splitting the input string
into "tokens" which are arg strings, translators, etc; but tokenizers,
like parsers, are often fairly squirrelly and full of annoying special
cases, and that is no less true here. The most notable:
- you can have *no* arguments, either via an empty translation with
brackets "foo : ()" or without "foo : " or simply via having a
probe with zero arguments to start with: this might be represented
as an sdpd_args of NULL.
- arg strings often end in trailing commas because of details of the
macros that produce them, so arguments do not match up one-to-one
with commas, even ignoring the rare case of empty translations
(though, because translations also use commas for the same purpose,
they nearly do). It's OK to overcount a bit when allocating memory
for the array, but it has to be right by the time we're done.
- perf probes have their arg strings in a subtly different format to
native probes. The perf arg string is used by tracepoint.h to define
a function, so it's not a string of types, but rather a string of
arguments, complete with names. Thankfully you can't hand a perf
probe a function pointer, and arrays have decayed to pointers by this
stage, so we don't need to deal with the full horror of C declarator
syntax: rather, we just need to note if a probe name starts with
__perf_ and strip off the last run of alphanumeric and _ characters
from each argument if so (followed by any trailing whitespace thus
exposed).
- native probes with no arguments simply have no arg string (NULL).
Perf probes with no arguments have the string 'void', so identify
that and strip it off.
Once we have the argdesc constructed, adjusting sdt_getargdesc() is
simple: it actually gets quite a lot shorter and faster because we can
treat the sdp_argdesc array as an array rather than looping through it
comparing provider and probe names and indices: the index in the array
*is* the index of the (possibly translated) argument.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24661801
Nick Alcock [Tue, 4 Oct 2016 10:13:45 +0000 (11:13 +0100)]
SPEC: dtrace-module specfile revamp.
Merge both the specfiles into one, and remove all conditionalizing for
older module versions: the specfile now relates only to the current
version, but applies to both OL6 and OL7.
The string @SOURCE_TARBALL@ must be replaced with the name of the source
tarball before building. (The release autobuilders already do this.)
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Nick Alcock [Fri, 12 Aug 2016 18:00:10 +0000 (19:00 +0100)]
dtrace: USDT SPARC parts
This commit adds SPARC USDT support. It's a little simpleminded and has
the same faults as SPARC SDT with regard to register windows and
arguments > 5: this probably needs fixing in both places at the same
time.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24455245
Kris Van Hees [Mon, 23 May 2016 17:38:49 +0000 (10:38 -0700)]
dtrace: ensure pdata and sdt_tab handling works on module reload
The handling of the sdt_tab member in pdata caused crashes when the sdt
module was unloaded and then reloaded. This member holds the trampolines
for SDT probe points in kernel modules, and is allocated when a module is
loaded and resides in modules-only address space). This memory block was
incorrectly free'd from the sdt module code when sdt was unloaded.
This commit also adds verification that the anticipated trampoline max
size as used in the kernel is sufficient for what the sdt module needs.
This commit also adds verification (at runtime, with an assert) that the
reserved allocation to hold the pdata for a module is of a sufficient
size.
Orabug: 23331667 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Nick Alcock [Fri, 29 Jan 2016 14:53:46 +0000 (14:53 +0000)]
dtrace: use copy_from_user() when walking userspace stacks
We were using get_user(), but that doesn't reliably work on all
platforms (such as SPARC64) and cannot trap faults, which meant we were
jumping through extra hoops to trap faults when copy_from_user() does
that anyway.
The extra copy is notably less efficient (since we end up looping over,
and copying, essentially the entire user stack in one-word increments),
but has the advantage of actually working.
Orabug: 22629102 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Fri, 29 Jan 2016 14:47:03 +0000 (14:47 +0000)]
dtrace: do not overrun the start of the user stack
When scanning user stacks in dtrace_getufpstack(), we iterate from the
current stack pointer back to the start of the stack, getting the
unsigned long at each location and seeing if we can interpret it as a
pointer.
However, since the stack grows down on all platforms supported by
DTrace, the 'start' of the stack is the end of the VMA -- so we should
stop one unsigned long before the beginning, or we'll try to read off
the end (harmlessly, but still.)
Orabug: 22629102 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Tue, 26 Jan 2016 17:53:50 +0000 (17:53 +0000)]
dtrace: fix access to uregs[R_L7]
An off-by-one bug causes this access to happen relative to REG_I0 rather
than REG_L0, leading to an invalid memory access (trapped by DTrace, so
no undefined behaviour is incurred, only a spurious ERROR firing).
Orabug: 22602870 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Mon, 18 Jan 2016 10:28:30 +0000 (05:28 -0500)]
dtrace: correct probe disable behaviour for syscalls
Previously, when both entry and return probes were enabled for a
syscall, upon disabling one of them, the function pointer in the
syscall table would already be reset to the default, removing the
interceptor. This resulted in an inconsistent state when the
2nd probe would get removed, and could cause a nasty race if one
were to try to enable one of the probes in between.
We now only remove the interceptor when we know the last probe
is being disabled.
Orabug: 22352636 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
The dtrace modules packages have long depended on their associated kernel, in
order to cause them to be removed when the associated kernel is removed. It
turns out that the installonly feature (which forces installation of kernel
packages and removal of old ones rather than upgrades on 'yum upgrade') fails to
cope with this situation: you get a broken-packages notice rather than a removal
of the dependent package.
So remove the dependency, and instead install an at job from a %postun trigger
that removes old modules a little later. If this hits the rpm lock, it will fail
and leave modules around; if the kernel is later reinstalled, it will remove a
module that has a corresponding kernel still installed. However, both of these
cases are harmless: the first case is expected to be extremely rare (you'd have
to be, by chance, doing an rpm run precisely four hours after the upgrade that
removed the old kernel) and has no negative consequences but the loss of a bit
of disk space to a useless kernel module; the second case is harmless because
the next dtrace run against that kernel will reinstall the module anyway.
Orabug: 21669543 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Wed, 14 Oct 2015 12:09:29 +0000 (08:09 -0400)]
dtrace: Support Linux-specific handling of envp / argv in psinfo
The implementation of retrievable envp and argv psinfo in Linux
requires those arrays to be located in kernel memory whereas in
traditional systems with DTrace implementations this was found in
userspace memory. Therefore, scripts expect to be able to access
this memory using copyin(). We look at the address passed in for
a copyin operation (or copyinstr) and if it is one of these special
cases, we simply pretend to retrieve data from userspace while in
reality we're simply retrieving the data from kernel space.
Orabug: 21984854 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Nick Alcock [Tue, 6 Oct 2015 21:06:28 +0000 (22:06 +0100)]
dtrace: add missing dtrace_*canload() for copyout() and copyoutstr().
On Solaris, where unprivileged tracing is permitted and zone tracing is
implemented, this is a security hole since it allows breaking through
both zone and unprivileged-dtrace boundaries. Linux does not implement
either of these, so this fix is currently unobservable here.
Originally reported as a Solaris DTrace bug, it seems worth fixing here
too, against the day when we implement unprivileged tracing.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
dtrace: ensure dt_perf does not clash with dt_test
The dt_perf provider module (only used for internal testing) mistakenly
registered its device file with the same minor number as dt_test. This
made it impossible for both to be loaded at the same time.
Orabug: 21814949 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Kris Van Hees [Tue, 18 Aug 2015 21:47:27 +0000 (17:47 -0400)]
dtrace: provide OL6 and OL7 spec file with new features
Because of some small differences in building the DTrace modules for OL6
vs OL7,currently two different spec file are used. This commit removes
the old single spec file, and introduces the two specific ones.
This commit also adds support for module signing.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
This commit introduces pdata_init() and pdata_cleanup() to allow
an architecture to perform arch-dependent operations on the pdata
information prior to assigning it to the module, and right before
getting rid of it (at module unloading time).
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Kris Van Hees [Tue, 9 Jun 2015 05:07:54 +0000 (01:07 -0400)]
dtrace: kernel provides SDT trampoline area on SPARC
The allocation of the SDT trampolines was done previously using vmalloc
which may cause the trampolines to be too far away from the code that
they provide a call to dtrace_probe() for, making it impossible to put
a jump to the trampoline in a single instruction at the probe location.
By using module_alloc on SPARC, the trampolines are allocated in the
memory region where modules live, which is by design within the jump
range.
The allocated memory is known to be of sufficient size for trampolines,
yet its actual use is not determined at the kernel level. It is simply
provided as a chunk of memory in the appropriate range.
When requesting a userspace stack trace, the initial frame instruction
pointer should be recorded as frame 0, with the remainder of the stack
trace being filled in based on the stack content. Previously, all the
IP values were taken from the stack. Special handling is provided for
obtaining the correct value of the stack pointer because in pre-4.1
kernels, there isn't an arch-independent way to do so. Once support
for 4.0 is no longer necessary, this can be generalized by using the
current_user_stack_pointer() macro.
If the current task is not a userspace task, an empty stack trace is
returned (and ustackdepth will also report 0).
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Kris Van Hees [Tue, 23 Jun 2015 10:51:35 +0000 (06:51 -0400)]
dtrace: validate argument pointer to d_path()
When an invalid pointer was being passed to d_path(), the system could
crash with an OOPS. This was the result of the kernel implementation
(reasonably) expecting the pointer to be referencing a valid path
struct. We now validate the argument passed to d_path() against the
paths for files known to the current task.
Kris Van Hees [Thu, 4 Jun 2015 14:08:05 +0000 (10:08 -0400)]
dtrace: support USDT for 32-bit applications on 64-bit hosts
A 32-bit application on a 64-bit host was not able to register USDT
probes because the helper ioctl interface was not hooked up to the
compat_ioctl file operation. This has been corrected.
Nick Alcock [Fri, 8 May 2015 13:20:37 +0000 (14:20 +0100)]
dtrace: use the initial user namespace in suitable {from,make}_kuid() calls
There are several places in DTrace (mostly related to privileged or destructive
operations or unprivileged tracing) where we try to compare uids for equality,
thus need to convert them from or to kuid_ts so we can do that. We want to look
in the initial user namespace for this (since it is only in that namespace that
all uids on the system are unambiguous). We were doing this by passing a NULL
to from_kuid() / make_kuid(), but in the presence of CONFIG_USER_NS this results
in dereferencing a null pointer.
So acquire the initial user namespace from a temporary kernel-thread creds
structure, and use it in all such places.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Thu, 7 May 2015 14:19:07 +0000 (15:19 +0100)]
dtrace: use the current user namespace for DIF_VAR_[UG]ID lookups
These lookups are not used for authentication, but rather are passed back
to DTrace itself: it seems reasonable that in this case the user would expect
them to be relative to the user namespace of the current process.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Mon, 16 Feb 2015 15:38:52 +0000 (15:38 +0000)]
Revise dependencies to get out of the shadow of dtrace-modules-headers.
Before the big dependency revamp in 0.4.3 (in the kernel 3.8.13-22 era), the
headers shared between kernel and userland resided in a versioned package named
something like dtrace-modules-3.8.13-21-headers, which provided a
dtrace-modules-headers symbol as well for users to pull in. Unfortunately some
very old packages both had unversioned provides of the same symbol, and had
versioned provides with a numeric scheme indicating compatibility, starting at
'1'. We could use epochs to force 0.4.5-5 to be greater than 1, but nothing
will get us out of the shadow of the unversioned symbol: these are always
considered both greater and less than all other symbols, leading to wildly
counterintuitive behaviour when yum does dependency resolution on them.
So get out from under their shadow: rename the dtrace-modules-headers package to
dtrace-modules-shared-headers, obsolete the old package so that
already-installed copies are upgraded appropriately, and provide
dtrace-modules-headers 1:1 -- epoched, so that it's higher in version than any
we have ever provided. Older userspace should pick up that epoch and upgrade
accordingly, newer userspace will use the new name. Unfortunately nothing can
stop older packages from attempting to pick up ancient kernels -- the
unversioned provide is out there, and nothing can remove it. But installs of
new dtrace-utils, at least, will work, as will updates: all that may break
is explicitly putting the older packages (but not the newer) into your own
yum repo and installing from that: an unimportant use case.
As usual with package configuration changes, we have to bump the module version
number and introduce the new name only in the new version: the older stanzas
will still be used when building old security errata modules, and we don't
want to introduce the new name there. Even there, so little has changed that
we can share nearly all the RPM headers between 0.4.3 and 0.4.4: only
the Obsoletes/Provides needs to be special-cased.
Orabug: 20508087 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Fri, 10 Apr 2015 23:00:04 +0000 (00:00 +0100)]
dtrace: no longer expose kuid_t in the userspace dtrace API
The public header installed as <linux/dtrace/stability.h> exposed
<linux/uidgid.h> to userspace as part of the dtrace_ppriv_t.dtpp_uid member.
This member (used for unprivileged tracing) is part of a facility that is not
yet ported, but using a kuid_t for this is clearly wrong, and as of kernel 4.0
won't compile when used in userspace either.
Fix by migrating to a uid_t and converting it to a kuid at the point of use.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Tue, 10 Feb 2015 17:31:14 +0000 (12:31 -0500)]
dtrace: fix dtrace_helptrace_buffer memory leak
When the help tracing facility is enabled in DTrace, upon loading the
DTrace core module, a buffer was being allocated using vmalloc(), yet
is was never freed upon unloading of the dtrace module. This caused a
leak of (by default) 64K with every load of the dtrace module. This
commit ensures that the memory is freed.
The commit also fixes the problem that the help tracing facility
variables in DTrace were defined in two places.
Kris Van Hees [Tue, 10 Feb 2015 17:18:39 +0000 (12:18 -0500)]
dtrace: support building on UEK4
Support building DTrace modules on UEK4. Various things changed at
the kernel level between UEK3 and UEK4 that require adjustments in the
building of the DTrace modules.
- ARCH no longer reflects the difference between x86 and x86_64. So,
we now use UTS_MACHINE to drive the architecture-specific portions
of DTrace during the building process.
- The trick used to implement a direct call probe in dt_test_probe()
required updating to avoid compiler warnings/errors. It is a little
bit less "ugly" now :)
- The uid and gid used in the task structure now uses kuid_t and kgid_t
as datatypes, which are no longer numeric values but rather a struct.
- The API for the IDR facility in the Linux kernel changed.
- The flush_delayed_work_sync() function has been removed. Source code
has been updated to use flush_delayed_work().
- The mechanism to enforce turning preemption on and off has been
updated.
Kris Van Hees [Tue, 10 Feb 2015 17:12:27 +0000 (12:12 -0500)]
dtrace: add support for DTrace on sparc64
This commit adds support for sparc64 to the DTrace modules. It also
includes some changes to the arch-independent code, to account for
some extra support pieces that are necessary for sparc64 without needing
to unnecessarily increase the portion of arch-dependent code.
- Add sparc64 implementations for arch-specific portions of DTrace.
- Add support for a provider API function (dtps_cleanup_module) to be
called for modules when a provider module is being unloaded. When
defined, this function can take care of any final cleanup that may
be necessary. This facility is used by the SDT code on sparc64 to
clean up the trampolines for the SDT probes.
- Add support for the pdata member in the module struct. This member
(generic pointer) can be populated with a pointer to a structure that
holds implementation specific DTrace data for the module. Each arch
must define dtrace_module_t (in include/<arch>/dtrace/mod_arch.h),
containing at a minimum:
size_t sdt_probe_cnt
int sdt_enabled
size_t fbt_probe_cnt
For sparc64 there is also a sdt_instr_t *sdt_tab member that will
hold a memory block for SDT trampolines.
The dtrace_module_t structs are allocated from a kmem cache. For
modules that exist before dtrace is loaded, the pdata member is
populated during the loading of dtrace. Modules loaded after dtrace
get it populated from a module notifier. When modules are unloaded,
the module notifier cleans up the pdata member. When dtrace itself
is unloaded, all remaining modules have their pdata member cleaned
up.
- Provide a generic method for calling a function on every loaded
module in the absence of a kernel facility to allow modules access
to the actual list of loaded modules. This adds an exported function
Kris Van Hees [Tue, 10 Feb 2015 16:33:19 +0000 (11:33 -0500)]
dtrace: restructuring to support DTrace on multiple architectures
Restructure the DTrace modules code to facilitate supporting ultiple
architectures (rather than just x86_64).
- The assembler implementation of support functions is now in a file
named dtrace_asm_<arch>.S and arch-specific aspects are found in
dtrace_isa_<arch>.c. The SDT provider requires an arch-specific
portion of code as well (in sdt_<arch>.c).
- The number of frames to skip for specific probes has been updated
to be more accurate (mistakes in this area were found during code
review).
- The mechanism for direct calling the test probe in dt_test_probe()
has been updated to work around compiler warnings.
- Removed dtrace_modload and dtrace_modunload. They were expected to
be needed for multi-arch support but it turns out that was not the
case.
- Add conditionals to not try to build anything that relates to providers
not necessarily being supported on all platforms.
- Various fixes for varable datatype issues that were not noticed on
x86 because they mapped to the same or similar numeric datatypes.
- Pass the dtrace_mstate_t struct to dtrace_getstackdepth() to support
the limitation that memory allocation cannot be done from probe
context. The dtrace_getstackdepth() function uses the dtrace_mstate_t
information to obtain a scratch area of memory to use as temporary
storage for PCs in the processing of dtrace_stacktrace().
- Handle the fact that on x86, the user sp for the current task can be
obtained using current_user_stack_pointer() whereas other platforms
use user_stack_pointer(current_pt_regs).
- Support that fact that the current instruction pointer is not always
an 'ip' member of the pt_regs struct. Always obtain the value of
the instruction pointer using the instruction_pointer(regs function.
- Support the use of asm/dtrace_syscall.h to list the system calls
that are implemented using an assembler stub.
- Ensure that membar functions use the SMP-versions.
- Clean up byte order conditionals.
- Remove dead code.
- Ensure needed header files are explicitly included.
dtrace: ensure one can try to get user pages without locking or faulting
This commit changes the FOLL_NOFAULT flag into a FOLL_IMMED flag, to
more accurately convey its meaning, i.e. to request user pages without
waiting for any locks and without servicing any page faults as a result
of the request. This is necessary in order to request user pages from
interrupt context.
This also completes the implementation by ensuring that the PTE spinlock
is checked rather than trying to lock it (and possibly get stuck in a
deadlock spinning for it).
dtrace_getufpstack() had several flaws exposed by ustack() of multithreaded
processes. All the flaws touch the same small body of code, and none could be
verified to work until all were in place: hence this rather do-everything
commit.
Firstly, it was detecting the end of the stack using mm->start_stack. This is
incorrect for all threads but the first, and is even incorrect for the first
thread in languages such as Go with split stacks. As it is, this causes the
stack traversal to attempt to walk over a gap with no VMAs, causing a crash.
The correct solution is of course to look at the VMAs to find the VMA which
covers the user's stack address. We are already looking at the VMAs in
is_code_addr(), but this is both a linear scan when all but no-mmu platforms
have better ways, and a *lockless* scan. This is barely safe in the
single-threaded case, but in the multithreaded case other tasks sharing the same
mm may well be executing in parallel, and it becomes crucial that scanning the
VMAs be done under the mmap_sem. Unfortunately we cannot always take the
mmap_sem: DTrace may well be invoked in contexts in which sleeping is
prohibited, and in which other threads have the semaphore. So we must do a
down_read_trylock() on the mmap_sem, aborting the ustack() if we cannot take it
just as we already do if this is a process with no mm at all. (We also need to
boost the mm_users to prevent problems with group exits.)
We are also accessing the pages themselves without pinning, which means
concurrent memory pressure could swap them out, or memory compaction move them
around. We can use __get_user_pages() to get the VMA and pin the pages we need
simultaneously, as long as we use the newly-introduced FOLL_NOFAULT to ensure
that __get_user_pages() does not incur page faults. We wrap __get_user_pages()
in a local find_user_vma(), which also arranges to optionally fail if particular
pages (such as the stack pages) are not in core. (We need the VMA for some
pages so we can see if they are likely to be text-segment VMAs or not: such
pages do not need to be in core and ustack() need not fail if they are swapped
out.)
For efficiency's sake, we pin each stack page as we cross the page boundary into
it, releasing it afterwards.
But even this does not suffice. FOLL_NOFAULT ensures that __get_user_pages()
will not fault, but does not ensure that a page fault will not happen when
accessing the page itself. So we use the newly-introduced CPU_DTRACE_NOPF
machinery to entirely suppress page faults inside get_user() (and nowhere else),
and check it afterwards.
As an additional feature, dtrace_getufpstack() can now be called with a NULL
pcstack and a pcstack_limit of zero, meaning that the stack frame entries are
only counted, not recorded. We use this feature to reimplement
dtrace_getustackdepth() in terms of dtrace_getufpstack().
With this change, multithreaded ustack()s appear to work, even in the presence
of non-glibc stack layouts (as used by Java and other non-glibc threading
libraries) and concurrent group exits and VMA changes.
Orabug: 18412802 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Updated the NEWS and specfile to add a note that there is a known
regression on test stress/buffering/tst.resize1.d due to the memory
allocation checking changes that were made a while ago. This
non-harmful regression will be fixed in the next release.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Nick Alcock [Mon, 24 Mar 2014 22:51:43 +0000 (22:51 +0000)]
Drop CPU_DTRACE_NOFAULT manipulation in progenyof().
This is only doing a traversal of task_structs via real_parent. This is
nonswappable, so faults are impossible, and blocking faults unnecessary.
Orabug: 18412802 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Nick Alcock [Mon, 24 Mar 2014 22:50:06 +0000 (22:50 +0000)]
Drop CPU_DTRACE_NOFAULT manipulation around ustack calls.
dtrace_getufpstack() and (as of the last commit) dtrace_getustackdepth() both
manipulate the CPU_DTRACE_NOFAULT flag themselves: clearing it after calling
those functions is redundant, and setting it is actually dangerous, since
other functions dtrace_getustackdepth() calls (such as __get_user_pages() do
not expect to have instructions that incur page faults silently skipped without
faulting.
Orabug: 18412802 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Nick Alcock [Mon, 17 Mar 2014 16:39:07 +0000 (16:39 +0000)]
Pass down the tgid to userspace in u{stack,sym,mod,addr}().
Userspace does not know how to attach to threads, only processes (thread group
leaders). All it's doing after attaching is looking up symbols, which are per-
process anyway, so rather than go to the effort of teaching userspace to grab
and release non-thread-group-leaders, simply pass the tgid to userspace so that
it can grab everything the same way.
Also pass the pid (== tid) down, because DTrace consumers could reasonably want
to know the actual thread ID in which the u*() fired (though our userspace does
not care).
This means we are passing one extra item on the buffer for ustack() et al:
internal uses are adjusted accordingly.
Orabug: 18412802 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Nick Alcock [Mon, 17 Mar 2014 16:29:34 +0000 (16:29 +0000)]
Fix the pid and ppid variables in multithreaded processes.
pid is currently equal to the Linux-side PID: i.e., from userspace's
perspective, the thread ID. tgid is equal to the thread ID of the parent. Both
of these are at best inconvenient and at worst wrong: they should both use the
thread group ID of their respective task, which corresponds to the
userspace-visible PID.
Orabug: 18412802 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Fri, 14 Mar 2014 15:40:53 +0000 (11:40 -0400)]
dtrace: add support for profile-* probes
This commit adds support in the profile provider for profile-*
probes, i.e. probes that fire at a specifid frequency/interval on
all active CPUs. Support is also added for passing the appropriate
program counter (kernel or user) as probe argument, as required for
tick-* and profile-* probes.
Nick Alcock [Wed, 29 Jan 2014 20:35:12 +0000 (20:35 +0000)]
Have the new dtrace-modules-provider-headers obsolete the old.
The package name has changed but the new package contains the same files as the
old, so we need to Obsolete: the old ones so that yum will remove them.
(Because the old scheme generated package names on the fly according to the
running kernel, the list in this patch may well be missing a few packages.)
Caveat: this fixes 'yum update' but cannot fix direct RPM installation.
You'll have to uninstall the old package manually if you do that.
Orabug: 18061595 Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Nick Alcock [Thu, 16 Jan 2014 13:22:08 +0000 (13:22 +0000)]
Remove kernel version from name of dtrace-modules-provider-headers package.
This package had a kernel-version-dependent name on the grounds that it
consisted of kernel headers meant to be included by a single kernel version.
This reasoning was flawed: the headers do not change as the kernel is rebuilt,
and as the package provides files that are not under a kernel-version-specific
path, 'yum update' can attempt to install two versions at once, and conflict.
The right solution is to name the package without a kernel-version-specific part.
(We keep the name unchanged when built against earlier kernels, to avoid
sneaking unrelated changes into security errata releases.)
Orabug: 18061595 Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Kris Van Hees [Fri, 20 Dec 2013 16:19:01 +0000 (11:19 -0500)]
dtrace: Fix RPM dependencies.
Userspace depends on dtrace-modules-headers so that it can #include the headers
shared between kernel and userspace. However, it is crucial that this inclusion
not drag in the dtrace module itself, nor the kernel on which it depends,
because that module might be of a version different to that already on the
system (likely older, which would cause yum upgrade to fail).
So drop the dependency between dtrace-modules-headers and the module itself.
Also, userspace has ceased depending on the dtrace-kernel-interface capability,
in favour of automatic but explicit yum installation of module RPMs when needed:
so drop that capability, unversion the dtrace-modules-headers capability, and
remove the kernel version from the dtrace-modules-headers package's name, since
it is not dependent on the running kernel in any way. Unversion the
modules-provider-headers capability too, but leave its name versioned: since it
is meant for provider authors, and providers are kernel modules, it is
necessarily kernel-version-dependent.
--
Modified to allow building of modules prior to 0.4.2 using the older scheme
for dependencies, and use the new scheme starting with 0.4.2.
Orabug: 17804881 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Tue, 17 Dec 2013 23:08:17 +0000 (18:08 -0500)]
dtrace: vtimestamp implementation
This commit adds DTrace vtimestamp support. It keeps track of how much
time a task has spent actually processing on a CPU. The time is set to
zero at task creation, and is updated whenever the task leaves a CPU
(gets scheduled off), and when the dtrace_probe() function is entered,
to enusre that the most recent value of consumed time is reported.
Some code got moved around for consistency of the implementation.
Kris Van Hees [Tue, 17 Dec 2013 23:06:57 +0000 (18:06 -0500)]
dtrace: implement SDT in kernel modules
Full implementation of SDT probes in kernel modules.
The dtrace_sdt.sh script has been modified to handle both the creation
of the SDT stubs and the SDT info. It's syntax has therefore changed:
dtrace_sdt.sh sdtstub <stubfile> <object-file> <object-file>*
or
dtrace_sdt.sh sdtinfo <infofile> vmlinux.o
or
dtrace_sdt.sh sdtinfo <infofile> vmlinux.o .tmp_vmlinux1
or
dtrace_sdt.sh sdtinfo <infofile> <kmod>.o kmod
The first form generates a stub file in assembler to ensure that the
(fake) functions that are called from SDT probe points will not longer
be reported as undefined symbols, and to ensure that when SDT is not
enabled, the probes become calls to a function that simply returns.
The second form creates the initial (dummy) SDT info file for the kernel
linking process, mainly to ensure that its size is known. The third
form then creates the true SDT info file for the kernel, based on the
kernel object file and the first stage linked kernel image.
The fourth and final form generates SDT info for a kernel module, based
on its initial linked object.
This commit also enables the test probes in the dt_test module.
Kris Van Hees [Mon, 16 Dec 2013 19:42:07 +0000 (14:42 -0500)]
dtrace: fix conditionals for changelog composition
Build failure indicated that under some conditions, the changelog created in
the specfile by means of build version conditionals resulted in out-of-order
entries in the changelog. This has been corrected.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Kris Van Hees [Thu, 31 Oct 2013 09:22:56 +0000 (05:22 -0400)]
dtrace: provide a corrected implementation of the 'errno' D variable
This commit provides a corrected implementation for the 'errno' D variable.
It is defined as holding the error code (if non-zero) during the current
system call execution. If the system call is successful, or if no system
call is being executed, its value is to be 0. On (Open)Solaris, this was
retrieved from a task variable that is assigned an error code as soon as
an error is encountered during the processing of a system call, i.e. system
calls use a task variable to store any error code encountered during
execution, and this is used upon return from the system call to alert
userspace of the error code status of the system call. In Linux, system
calls are implemented in the more regular fashion (for Linux at least)
of returning error codes as return values of functions, and therefore
there is no task level variable to consult. So, instead we recognize that
at this point) 'errno' only has meaning during the processing of syscall
return probes, which are handled from the system call wrapper, after the
system call implementation has been executed.
It would therefore be sufficient and correct to assign the value of 'errno'
at that point, but that would require a task variable to be added to the
task struct in order for this value to be recorded.
In order to avoid adding a member to the task struct, we (ab)use the fact
that we can recognize whether we are executing a D action for a syscall
return probe, and if we are *and* if 'errno' is being retrieved, we look
at the arg0 value for the probe (which is defined as the return value of
the syscall), and if the value is between 0 and -2048, we return the error
code it represents as errno.
Kris Van Hees [Thu, 17 Oct 2013 23:18:44 +0000 (19:18 -0400)]
dtrace: fix lock ordering issues, mutex_owned(), and mutex debugging
Several cases of potential lock ordering issues were identified and resolved.
Both static and dynamic analysis of locking comes clean for DTrace after this
commit is applied.
The mutex_owned() function was not accounting for the possibility that a lock
might have an owner registered while unlocked.