From b0338a3520450e7bc68503db81c6f9555e582f55 Mon Sep 17 00:00:00 2001 From: Nick Alcock Date: Wed, 3 Oct 2012 19:37:16 +0100 Subject: [PATCH] ctf: document dwarf2ctf dwarf2ctf is complex enough and has enough tricky corners that it really, really needs some documentation. Signed-off-by: Nick Alcock --- Documentation/dwarf2ctf | 873 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 873 insertions(+) create mode 100644 Documentation/dwarf2ctf diff --git a/Documentation/dwarf2ctf b/Documentation/dwarf2ctf new file mode 100644 index 000000000000..0d7d2b31bc3a --- /dev/null +++ b/Documentation/dwarf2ctf @@ -0,0 +1,873 @@ +dwarf2ctf, a type encoder for the Linux kernel +========= + +Many kernel-level debugging and tracing systems need access to the kernel's type +information. Since C doesn't support any form of introspection, the data must be +extracted from the DWARF debugging information generated by the compiler. +However, this information is very voluminous (just the type information alone +adds up to a couple of hundred megabytes in a 'make allyesconfig' kernel): even +if users are happy to spend the disk space, the time and memory required to read +much of this information in is likely to be prohibitive. + +This problem is not new -- back in 2004, Sun had the same problem when +attempting to give DTrace a view of the type information in the Solaris kernel. +Their solution was the Compact ANSI-C Type Format (CTF), a highly compacted +representation of C types suitable for debuggers and tracers. They combined +this with a highly efficient tool for converting DWARF2 types to CTF, and hacks +in the Solaris kernel causing the kernel itself to emit CTF data for its own +types. + +Unfortunately while this tool may be highly efficient it is not adequate for the +Linux kernel. Not only is its license wrong, but it treats every ELF object as +an independent entity with an independent set of types -- perfectly all right +for the Solaris kernel with a few hundred modules maximum, but very much not for +Linux, where distro kernels often compile in thousands of modules. Ideally, we +would like to treat all kernel modules, built-in or not, the same way, sharing +and deduplicating all globally-visible types across the entire set of visible +modules and recording each precisely once. + +We also want to collect descriptions of interesting global variables and emit +descriptions of their name->type mapping as well, since the kernel has no easily +accessible ELF section we can extract this information from at runtime (kernel +modules must be accessible at runtime for modern Linux systems to work, but the +kernel itself could have come from over the network or off a USB key or from a +non-mounted partition or an EFI boot partition or who knows where, and could +have any name even if it is accessible: so tracing tools should not rely on +being able to look inside the kernel image). + +dwarf2ctf is a CTF generation tool that reads in DWARF from a set of object +files (usually, every object file in the kernel and all modules) and fills a +directory with compressed files containing CTF representations of the types in +those object files: the kernel build system regenerates these as necessary and +links them directly into kernel modules. + +Caveats: It is somewhat specific to the form of DWARF output emitted by GCC, and +doesn't yet support DWARF-4 type signatures or compressed DWARF at all. + +We'll look at each part of this system in turn, from the top down. + + +Invocation and build-system connections +---------- + +dwarf2ctf's command-line syntax emphasises simplicity over compactness. Linux +has nearly-infinitely-long command lines these days, so we can take advantage of +this, as long as we take a bit of care in the makefiles to avoid using shell +metacharacters on makefile lines that run dwarf2ctf, since many Linux shells +still have command-line length limits and a coredumping shell can ruin your +whole day. + +Two syntaxes are supported. The first shares types across multiple modules and +the core kernel; the second is used for out-of-tree module building, and avoids +sharing anything at all across modules, nor depending on the set of shared types +defined for the core kernel. + + +dwarf2ctf outputdir objects.builtin modules.builtin dedup.blacklist \ + vmlinux.o module.o ... +dwarf2ctf outputdir -e module.o ... + +where: + + - 'outputdir' is the possibly-relative path to a directory in which the + generated CTF files get placed. + - 'objects.builtin' is the name of the file containing the object files that + correspond to always-built-in kernel code (that cannot be built as modules). + - 'modules.builtin' is the name of the file containing the names of + kernel modules presently built in to the kernel. + - 'dedup.blacklist' is a blacklist of modules that should never participate + in deduplication: see 'Duplicate type detection' below. + - the .o filenames are the names of object files comprising the kernel and/or + modules: you can feed in whole modules at once (before linking with .mod.o). + This list is often very, very long (I have seen command-lines in excess of + 60Kb). + +dwarf2ctf's output consists of a series of gzip-compressed .ctf.new files in the +outputdir, which the makefile compares with and if necessary moves over the top +of .ctf files with the same basename, so as to avoid relinking things if +dwarf2ctf has written out content identical to what it wrote last time it ran. +These fall into several classes, partitioned according to the contents of +objects.builtin and modules.builtin: + + - dtrace_ctf.builtin.ctf: The shared type repository. Types shared by more + than one of the files below go here. Despite its name it is not DTrace- + specific: it is named after the library that reads and writes it, + libdtrace-ctf). See 'Using dwarf2ctf output' below regarding use of this + data. + - vmlinux.builtin.ctf: Types in the core kernel, that cannot be built in to + modules, go here. + - *.builtin.ctf: One of these is generated for the types in each module that + is presently built in to the kernel. + - *.mod.ctf: One of these is generated for each .ko. + +All the files in the first three classes are linked into the dtrace_ctf.ko +module under various names, an empty module containing nothing but CTF data. + + +A lengthy section of Makefile.modpost, and a short section of the toplevel +Makefile, is dedicated to creating these files, and to linking them into the +kernel modules. The dependency graph related to dwarf2ctf output is quite +complex: modules and objects (ld -r'ed *.o files) are processed by dwarf2ctf to +produce a number of files in the .ctf directory, and the final modules depend on +the relevant ctf files. .mod.ctf's go into the .ko's with the same stem name, +but dtrace_ctf.ko receives content from all the CTF files corresponding to +built-in modules, and until dwarf2ctf runs and creates those files we cannot +tell what those CTF files will be, though we do have a wildcard that matches +them all. + +GNU Make's 'secondary expansion' feature comes to the rescue here: we can +compute a list of expected CTF filenames at runtime, given the names of the +modules we are linking in. For the builtin modules, we cheat and touch a stamp +file after moving any .ctf.new files back over a .ctf file, then depend on that +to see if dtrace_ctf.ko needs to be relinked. + +The actual incorporation of the CTF data into the kernel modules happens before +module signing (if signing is active), by calling objcopy --add-section on the +module in question. This too has some knotty corners. + +First of all, the module linkage process normally links a module using all the +prerequisites of the module's target -- but we have designated all the CTF files +as prerequisites of the module's target, and we don't want to link them directly +in using ld(1), since they aren't object files. So we have to filter them out in +the link line. + +Secondly, those modules which have no CTF files should acquire empty CTF +sections to indicate their lack of unique types -- but objcopy in binutils 2.20 +and below silently exits if asked to --add-section an empty file. So we use dd +to generate a file with a one-byte null in it instead, and teach the users of +CTF sections to treat a one-byte-long 'CTF' section as if it were empty. + + +Using dwarf2ctf output +---------------------- + +Using this data is fairly simple. Once you've read the CTF sections from the +kernel modules and inflated them (or ignored them if they are empty or, as just +mentioned, one byte long), you simply need to look at the ctf_parent_name() for +each module, and if it is set to "dtrace_ctf", call ctf_import() to set the +parent of this module to the CTF data you have read from the +.dtrace_ctf.dtrace_ctf section in the dtrace_ctf.ko kernel module. The core +kernel's types are stored in the .dtrace_ctf.vmlinux section in the same kernel +module, and all built-in kernel modules have their types in +.dtrace_ctf.$module_name. Non-built-in kernel modules just have a .dtrace_ctf +section containing their types, which again might need their parent set to +"dtrace_ctf". (Out-of- tree kernel modules will have no such parent.) + +Once you've set up the parenthood relationships you can call ctf_close() on the +shared type repository and forget about it entirely: it will be refcounted and +destroyed when all its children are closed. + + +You should end up with a family of CTF files, one per kernel module built-in or +not and one for the core kernel, freely usable for whatever purpose you need. + + +Overview of dwarf2ctf operation +-------- + +There are four phases to dwarf2ctf operation: initialization, duplicate type +detection, CTF construction, and writeout. Some of these phases can repeat. +All but the last phase consists purely of sucking data from object files into +GHashTables in memory. (The last two phases could potentially be combined, +shrinking the size of one hash and saving memory, but the hash that is shrunk is +by no means the largest one, so the extra complexity is probably not worth it.) + +dwarf2ctf uses several other libraries to do this: + + - elfutils, used for DWARF parsing. We could potentially write our own + DWARF parser, but elfutils works and is already used by both DTrace and + SystemTap, so isn't even likely to introduce a new dependency. + + - glib, used for the GHashTable. The rest of the kernel uses roll-your-own + hash tables, but dwarf2ctf makes heavy demands of its hashtables: they must + be expanding hashes capable of efficiently storing hundreds of thousands of + items, with amortized log(N) lookup time, and they must support deletion + (though it need not be particularly efficient deletion). This rules out + simple fixed-size bucket hashes like the ones used in other parts of the + kernel build system: GHashTable is already implemented, and works. + + - zlib, used to compress the CTF information. + + - libdtrace-ctf, which both reads and writes the CTF data. This is a port of + the Solaris CTF library, GPLed and with additional support for the storage + of name->type mappings (meant to represent variables) akin to its existing + ELF symbol->type mappings. + +dwarf2ctf has a good few important data structures, described at the top of +scripts/dwarf2ctf/dwarf2ctf.c. + +dwarf2ctf has its own trace facility, implemented via the dw_ctf_trace() macro +and enabled by compiling with -DDEBUG and setting DWARF2CTF_TRACE in the +environment. (The first step is required because some very numerous data +structures are greatly expanded when debugging is turned on, which would waste +memory if it were done all the time). This produces a huge volume of trace +output, several gigabytes when run over an allyesconfig kernel. + + +Unless you're interested in how dwarf2ctf works internally, you can stop reading +her. + + +Initialization +-------------- + +This happens at the top of main() and run(), and in various functions named +init_*() (init_assembly_tab(), init_builtin(), init_blacklist(), and +init_tu_to_modules()). Of these, init_assembly_tab() and init_builtin() serve +only to turn various static arrays and files mentioned on the command line into +more useful internal representations (e.g. the assembly filter array of +structures is turned into a pair of arrays indexed by DWARF tag), and +init_blacklist() is described in the section on duplicate type detection below. + +init_ctf_table(), called both at initialization time and later during CTF +assembly when new CTF files are found to be needed, creates a new CTF file in +memory and either marks it as a child of the shared type repository, or (if it +*is* the shared type repository, or deduplication is off and there is only one +CTF file being processed and no shared type repository at all) creates a few +types in it which CTF has representations of but DWARF does not: a void type, +and a generic catchall pointer-to-function-returning-int. + +That leaves init_tu_to_modules(). This walks over all the top-level +compile_unit DIEs in the DWARF debugging information in every object file +mentioned in the list of modules and built-in modules, constructing a mapping +from translation unit name back to the name of the kernel module it comes from, +even if that module is built in to the kernel. This is normally the same as the +filename (sans extension), but for built-in kernel modules, the name comes from +the modules.builtin file's entry for the translation unit instead, since types +belonging to a built-in module are considered to be in part of that module even +if the module happens to be built in to the kernel, so that the output can land +in a .builtin.ctf file rather than being jammed into vmlinux.builtin.ctf with +the core kernel's types. This means that dwarf2ctf can operate in terms of the +kernel module a type is contained within rather than having to think about the +mapping between object file name, translation unit name and module name all the +time. + + +Toplevel DWARF walkers +---------------------- + +All routines in dwarf2ctf other than initialization and writeout are DWARF +walkers: i.e., they walk over all DWARF DIEs in all object files specified on +the command line and do something with every DIE. This job is done by +process_file() and its helper process_tu_func(), which not only digs out the +corresponding (built-in or non-built-in) module name corresponding to each +object file, but also detects and skips translation units it has handled before +(in case they are incrementally linked into multiple object files) and allows +callbacks to be invoked at the start and end of each translation unit. + +Even though dwarf2ctf only cares about top-level types, in some situations DWARF +can emit top-level types with references to a non-top-level type: if all +occurrences of the top-level type are an opaque structure, and the only +non-opaque definition is inside a function, references in the same translation +unit as the non-opaque definition will point to the definition inside the +function (and references outside the translation unit will not point at any +definition). Thus, if we want to catch all nuances of globally-visible types, +we have to scan types inside functions and lexical blocks inside functions too. + +To avoid generating a vast number of unnecessary type definitions, the 'assembly +table' which describes how to construct a CTF type given a DWARF DIE also +contains a description of a set of filters which are passed the current DIE and +its parent: if they return false, the DIE is skipped and never passed to the +callback function. We also avoid calling the callback for any DWARF DIE whose +tag doesn't appear in the assembly table at all: there's no point doing +duplicate detection or anything else for a DWARF DIE we won't be generating CTF +from. There are currently two filters defined: filter_ctf_file_scope(), which +is called for every DWARF DIE whose tag is one we never expect to see a +reference to if it is inside a function, and filter_ctf_uninteresting(), which +is called for variables to see if they are worthy of recording (top-level named +variables with external linkage not part of the internal workings of macros +only). + + +Type IDs +-------- + +The only thing dwarf2ctf does which the Sun tool does not is the detection of +duplicate and shared types, both within individual kernel modules and across +modules. Our ultimate goal is that a type that appears in the source code once +appears in the CTF output once as well. This goal has mostly been attained, +except for out-of-tree modules, where cross-module type sharing must be disabled +to avoid requiring rebuilds of the module whenever the core kernel is rebuilt. + +The core of this is the concept of a *type ID* and the function type_id() which +computes it. A type ID is an identifier for a type which precisely represents +that type and only that type. Doing this for types in different headers or at +different scopes with the same name without needing to encode knowledge of C +scoping rules into dwarf2ctf is an interesting proposition: we can use the line +number and filename info provided by DWARF in most user-specified types to help. + +A type ID is a recursively-constructed string of the following form (fixed +elements represented by {}, optional elements by []): + +//[filename]//[line number]//{type string} + +Types are *based* upon other types iff they have a DW_AT_type attribute pointing +to some other type. All types based upon other types have a type ID that is the +type ID of the type upon which they are based, with additional information +specific to this type appended to it. The filename and line number is only +added for those types which are not based upon other types and which have a +filename and line number in the DWARF (lots don't, e.g. base types): the +filename is canonicalized with realpath(), though since this is quite slow and +type_id() is called a lot, the mapping from DWARF filename to realpath() result +is cached. Types that have no filename or line number start with '////'. + +We use // to separate the filename and line number elements because this is the +shortest string other than NUL that cannot appear in a canonicalized POSIX +pathname (ignoring Pyramid, Cygwin and other strange systems that actually +return // in the result of realpath(): Linux doesn't use it and that's all that +matters. Should it start to use it, we can switch delimiter to ///.) + +Function pointers are not represented (or, rather, are all mapped to the same +type ID, the generic catchall function-pointer type mentioned above); array +dimensions are represented by [index-type dimension], or [] for flexible array +members. Structure members are not represented, since they are not types, but +the types of their members *are* represented, as are nested structures (the line +number and filename serving, as ever, to disambiguate them from other structures +with the same name declared nested inside different structures). + +The following are some examples of valid type IDs (assuming the kernel source +tree is, implausibly, located at /k/, just off the root directory: comments on +individual types done /* like C */; the last example is broken across lines for +formatting's sake): + +//fp//* # a pointer to a function, any function +////long int +////char [] +////unsigned int typedef __kernel_uid_t typedef __kernel_uid32_t +//fp//* typedef __signalfn_t * typedef __sighandler_t +////struct nsproxy /* an opaque type */ +////struct nsproxy * # /* pointer to it */ +////long unsigned int typedef u64 volatile +////long unsigned int volatile const * const +////long unsigned int typedef sector_t [////long unsigned int 511] +/k/include/linux/types.h//222//struct list_head +/k/include/linux/types.h//222//struct list_head * +/k/include/linux/types.h//222//struct list_head [////long unsigned int 5] +/k/include/linux/types.h//217//struct /* no struct tag */ +/k/include/linux/types.h//217//struct typedef atomic64_t +/k/include/linux/mm_types.h//34//struct page * typedef pgtable_t +/k/fs/eventpoll.c//122//struct nested_calls +/k/include/linux/sysctl.h//1016//struct ctl_table typedef ctl_table + [////long unsigned int 4] var inotify_table /* A global variable */ + +This scheme means that cv-quals and other modifiers applied to other types are +always merged: if there are a dozen typedefs for a single type 'foo' with the +same name declared in the same place, they all end up with the same type ID and +are only emitted into the CTF once. + +The type_id() function can also accept a callback, which is called as the +recursion unwinds, from base type up to derived type: so it might be called for +"////unsigned int", then for "////unsigned int typedef __kernel_uid_t", and so +on up to the DIE that was originally passed in. Because type_id() returns a +dynamically-allocated string, calls to type_id() made purely for the sake of +invoking a callback are normally of the peculiar form "free(type_id(...))". + +type_id() is a very hot spot, so syscall results are cached in it (such as +realpath(), as mentioned above), and when string appending is done, it is done +all at once where possible, via str_appendn(), which calls realloc() only once +no matter the number of strings being appended. + +Lots of core data structures in dwarf2ctf consist of hashes mapping type IDs to +something else (predominantly CTF file/ID pairs and module names). It would +be possible to map from hashes of type IDs, saving some memory, but this would +impair debugging so is not yet implemented. (If it is implemented, it should +probably be implemented only when DEBUG is not defined.) + + +Duplicate detection +------------------- + +The job of the duplicate detection pass is to fill out the id_to_module hash, +which maps type IDs to the module they appear in, with the two special cases +that types that appear only in the core kernel are said to appear in the module +'vmlinux', and types that appear in more than one module (or in a module and in +the core kernel) are said to appear in the module 'dtrace_ctf', the shared type +repository. This is quite a tricky multi-pass process, because we must ensure +that the shared type repository is self-contained: all types in the repository +must not reference any types outside the repository. + +Detecting duplicates itself is easy: we consider two types duplicates if they +have the same type ID: if they both reside in the same module, the resulting +type resides in that module too. However, detecting shared types is harder. +We consider that a type belongs in the shared module if any of these conditions +is true: + + - the type appears multiple times in different modules + - a type for which this type is a base type is shared + - the type is referenced by a structure or union member, and the structure + or union is shared + - the type is a non-opaque type with an opaque variant ('struct foo'), or + vice versa, and either of these variants is shared: these two types will + get different type IDs, so explicit checking is necessary + +It should be fairly easy to see that sharedness is a contagious property: +e.g. if you mark a structure as shared, and one of its members is an +otherwise-unshared opaque pointer to a structure, you have to mark that as +shared: this causes the non-opaque definition of the structure, and all *its* +members, to be shared, and so on. So since dwarf2ctf does not track the members +of structures itself (not until the CTF generation phase, anyway), this means +walking over the DWARF DIEs multiple times, checking for sharedness over and +over until we are done. + +We partition the problem into two parts, both of which are carried out by +process_file() callback functions: detect_duplicates() and +detect_duplicates_alias_fixup(). + + +detect_duplicates() is called first, once for every DIE in the kernel (via +process_file()). This identifies types that are duplicated but not shared, and +identifies shared types without consideration of opaque struct/union aliasing. +It also flags types that have been seen only once as 'seen': this is checked +much later on by the CTF construction phase, since construction of CTF for +any type which has not been inspected by the deduplicator is a sign of a bug in +the deduplicator. + +This has several subtleties. + +If we are running for an out-of-tree module, we must still identify types as +duplicated within the module, but must never mark them as shared: out-of-tree +modules cannot contribute to the shared type repository nor even use types in +it, since they are rebuilt independently from the kernel proper and thus cannot +depend on a type currently in the repository remaining there (e.g. perhaps it +has only two users, both in modules, and the kernel is rebuilt to not build one +of those modules anymore: this should not require rebuilding of any out-of-tree +modules). + +If we mark a type as seen, we must mark aggregate types that appear directly +within that type's DIE as seen as well. This is done by the recursive function +mark_seen_contained(). You might wonder what the point of it is: such types +surely cannot appear anywhere else, and any duplication will precisely match the +duplication of the containing type. The answer is that they can still be +referenced as the type of structure members of their containing structure, +e.g. in + +struct foo { + struct bar { + } *baz; +}; + +Here, a reference to 'struct bar' appears in 'struct foo', and CTF is +constructed for it, even though it is not a top-level DIE. + + +If we find that a type has appeared more than once in different kernel modules +(or in a module and in the core kernel), we must mark it as shared. This is +done via mark_shared(), which is both a function that can be directly called +(e.g. from detect_duplicates()) and a type_id() callback. If it is called +directly, it immediately reinvokes itself as a type_id() callback, which calls +it for the base type of the type in question and then for all qualifiers up the +type ID stack, marking them all as shared if they weren't already. + +If a structure or union is marked as shared, the types of its members are also +marked as shared via a recursive call (even if they have already been so marked: +just because this structure is of a type we've already seen, in a location we've +already seen, doesn't mean that someone might not have legitimately used +#defines to add extra members to the end of it, and we need to mark them as +shared too). We track structures that have been seen in this translation unit +and avoid recursing into them, to avoid an infinite loop in cases like this: + +struct one; +struct two { + struct one *foo; +}; + +struct one { + struct two *foo; +} + +The types being pointers does not help here -- the marking of 'struct one *' as +shared will automatically mark 'struct one' as shared too, because otherwise we +might have a structure in the shared type repository whose members' types +could not be found there. + + +The second pass is the 'alias fixup' pass, implemented by +detect_duplicates_alias_fixup(). This pass serves to detect unshared opaque +types whose non-opaque equivalents are shared, and vice versa. It is executed +repeatedly until no types have been marked as shared for an entire iteration, +but is considerably faster per iteration than the first pass, which often +consumes more than half of dwarf2ctf's total runtime. We work in one direction +only, looking for non-opaque structures, unions or enums which have structure +tags. (Structures without tags cannot have opaque variants, and structures +which are opaque will have non-opaque cousins somewhere, or can be emitted to +the CTF as an opaque structure harmlessly since they truly have no members and +are probably manipulated only via casts.) + +We identify structures, unions or enums with tags via the type_id() callback +is_named_struct_union_enum(), but cannot determine if something is an opaque +structure at this stage. Instead, we do that after the callback, checking to +see if the first four characters of the type ID are "////": this relies on the +fact that GCC never gives opaque structures line numbers in DWARF. We do the +actual checking and marking of each non-opaque structure using +detect_duplicates_alias_fixup_internal(), which is yet another type_id() +callback. + +This function directly synthesises the name which this structure's opaque cousin +would have, if it existed, by stripping off the line number and filename and +replacing them with '////', and sees if either of these types have been marked +as shared while the other has not. If the opaque type is shared, the non-opaque +variant can be marked shared using the same recursive mark_shared() function as +before (thus marking the types of all its members and types it depends upon as +shared too). If it is the opaque type that needs marking shared, this will not +work, since mark_shared() takes a DWARF DIE, and we don't have one for the +opaque type, just a faked-up type ID. However, since an opaque type doesn't +have any members that need recursively tracing, we don't need access to its +DWARF DIE to figure them out, and can just mark it as shared directly, via an +insert into the id_to_module hash. + +Since this function is a type_id() callback, it is called not just for +structures but for types based on them (e.g. in type_id form, "struct foo const +* [43] volatile *"); at every level of this declarator stack, a shared opaque +base type will contaminate its non-opaque cousins with sharedness, and vice +versa. This handles situations in which, say, the opaque version of "struct foo +const * [43]" was used by more than one module and was marked as shared: the +marking process will have marked the opaque versions of "struct foo const * +[43]", "struct foo const *", "struct foo const" and "struct foo" as shared, but +will not have touched any non-opaque versions of these types which may exist +until this routine runs. It also handles typedefs to structures with no need +for any extra code. + +This whole alias fixup process needs to be repeated, because whenever a +non-opaque type is marked as shared and its member's types traced and marked +shared, *those* may themselves be structure types with corresponding opaque or +non-opaque variants, and when they are opaque types the non-opaque variant that +alias fixup works from may already have passed under the DWARF walker's gaze: so +another pass over the kernel's DWARF is necessary to be sure we catch it. +mark_shared() thus sets a flag that scan_duplicates() recognizes and uses to +trigger another run through the alias fixup pass. + + +There are a very few modules that this algorithm doesn't work for. One example +is snd-ens1371, which reads, in toto + +#define CHIP1371 +#include "ens1370.c" + +and ens1370.c (itself a distinct kernel module) then defines a 'struct ensoniq' +whose members vary depending on whether CHIP1371 is defined. Obviously, it is +impossible to share any such types between kernel modules even though their +names are the same and they are defined in the same place in the same source +file in both cases. But this sort of trickery is very rare, so we simply +implement a blacklist of modules which will not introduce new types into the +shared CTF repository, and who do not participate in alias fixup detection +either. Detecting these cases in order to blacklist them is harder: no +automated system has yet been implemented, although instances where #defines of +this nature introduce new types that are then used by later members will cause +assertion failures inside dwarf2ctf which might be a clue. So it is possible +that some examples have been missed. (The blacklist only applies to cases where +structure members change within a single kernel build, so cases where structures +have members whose presence depends on CONFIG_* values are quite all right, as +are cases where #defines are introduced by one translation that #includes +another that then goes on to define whole new structures: it is only cases where +modules #define something that changes the definition of individual +possibly-shared types that will need blacklisting.) + + +CTF construction +---------------- + +The next stage after the detection of duplicate and cross-module shared types is +to generate CTF. We generate all CTF at once before emitting it: this is +potentially somewhat wasteful of memory, but in practice has not proved to be a +problem: substantially less memory is used than is used by other parts of the +kernel build, unless -DDEBUG is enabled. Its job is to look through the +kernel's type DWARF (via process_file(), as usual) and create CTF for every +file-or-global-scope type and every externally-visible variable in the CTF file +in which the duplicate detection pass has said that type should appear. +(Variables are treated exactly like types: it just so happens that they are +never shared because no type or variable can depend upon them, so they always go +directly into the appropriate module and never into the shared type repository.) + +At this stage, the 'CTF files' are not actually files but rather ctf_file_t +structures maintained by libdtrace-ctf and tracked in the module_to_ctf_file +hash. We track every single individual type in the CTF file in the id_to_type +hash, which maps type IDs to pairs of (CTF file ID, ctf type ID): this lets us +use the type IDs described above when considering cross-references within CTF +files (e.g. from one CTF type to a type it depends upon). + +The most important functions in this phase are: + + - construct_ctf_id(), the top-level process_file() callback which is given a + DWARF DIE, looks in module_to_ctf_file for the CTF file where this type + should land (creating it if necessary), makes sure a type with this type ID + has not already been created there, calls die_to_ctf() to create the CTF, + notes where it was created in id_to_type, and handles errors. + + - die_to_ctf(), a recursive function which calls the assembly function for the + DIE it is given and all its immediate children, with special-case handling + for tagged structures and unions. If you want to create a type but not note + where it was created for future lookups by lookup_ctf_type(), this is what to + call. (This is only done currently for unnamed structures/unions.) + + - lookup_ctf_type(), which is called by CTF assembly functions for those + types that depend upon other types: it calls construct_ctf_id() again + to construct the type, and double-checks that all such types appear + either in the module we are constructing types for, or in the shared CTF + module. CTF represents all function pointers the same way, and has a + special type ID for 'void', so we special-case both of these cases. + +These functions are mostly straightforward (though highly recursive, with all +three plus CTF construction functions participating in one recursive loop, +die_to_ctf() calling itself directly and even one situation, the +already-mentioned unnamed structures/unions, in which die_to_ctf() is directly +called back by a CTF construction function.) + + +There are a few subtleties, though. Firstly, error handling. We consider that +errors that will lead to unusable CTF are fatal: mostly, these are errors where +a bug in the deduplicator has failed to trace types correctly and has left at +least one type in the shared module depending on a type in a non-shared module, +or has failed to mark a type as shared at all. In all these cases, you'll +eventually get an error from lookup_ctf_type() of the general form + +blah.c:413:foo_t: Internal error: lookup of flob found in different file. + +The first two parts of this error are the translation unit and line number the +type being assembled (usually a structure or union) was found in: foo_t is the +name of the type being assembled. The type of the structure being looked up +appears nowhere, because we don't know it, but the name of the member is given +("flob" above, or "(unnamed)" if we don't know it). If you want more +information, you can pass -DDEBUG to the compilation of dwarf2ctf.c in +scripts/dwarf2ctf/Makefile, and rerun dwarf2ctf, and you'll get the module and +filename in which both the originating and the target types appear. In order to +actually track down the bug you'll probably have to run dwarf2ctf with +DWARF2CTF_TRACE set in the environment, and look at the place where the target +type was deduplicated, and try to figure out why the deduplicator didn't trace +the reference to the type in foo_t correctly. + +There is another kind of error, though: a failure to assemble a single type, +perhaps because DWARF was emitted that we don't know how to understand (this is +particularly likely in structure assembly, where we are highly dependent on the +form of the DWARF that GCC happens to emit for DW_AT_member_location). We pass +an 'enum skip_type' around, which has three possible values, one of which is +SKIP_ABORT. After every type is assembled, we call the notably inefficient +function ctf_update() on the CTF file we're working over. This takes the +in-memory structures and serializes them (all of them, every time), following +which they can be referred to by other CTF types. If a SKIP_ABORT propagates up +to construct_ctf_id(), we call ctf_discard() instead, which throws away every +type constructed since the last ctf_update() -- i.e., the specific erroneous +type we've just been working on. (We might have emitted some parts of it and +then failed, so we should try to clean up.) + +A SKIP_ABORT is not fatal unless DEBUG is defined: its only effect is to omit +one single type from the resulting CTF, which is probably still usable. + +There's more error-handling complexity inside die_to_ctf(), where errors from +libdtrace-ctf are actually reported (there may be multiple of them for a single +type, e.g. if we are assembling a structure and several members somehow refer to +a type we do not know about). + + +die_to_ctf() itself has the sort of parameter list that can make people swear +off C for life. It is largely explained in the description of ctf_assembly_fun. +Most parts of it are hardly used in the function itself, just passed down to CTF +assembly functions. The only part of die_to_ctf() that is worth noting is that +it goes to extra effort to call ctf_update() early when new structure and union +DIEs are created, before any of their members are assembled. This allows self- +references like + +struct foo { + struct foo *next; +}; + +to work (since you can't refer to a type in CTF if you haven't called +ctf_update() since adding that type). + +Finally, we must note the override flag. Both die_to_ctf() and +construct_ctf_id() return a CTF ID. This is thrown away by the DWARF walking +code (the function construct_ctf() exists just for that purpose), but when +called by lookup_ctf_type(), this CTF ID is taken to be the single ID of the CTF +type that's just been assembled. Normally this is the same as the CTF ID +returned by the CTF assembly function for the top-level DWARF DIE, but there are +a few structures for which we want to return the result of some other CTF +assembly function. + +The only currently-existing example is array dimensions, which DWARF represents +as a typed array DIE whose child is a dimension, but which CTF represents as an +array-with-dimensions that you can't change afterwards. We can't assemble an +'array' at the top level because we don't know how big it is, but we have to +track the type recorded there somehow. We handle this by having +assemble_ctf_array(), the assembly function for the top-level DW_TAG_array_type +DIE, simply look up the type of the array's members and return its ID as if it +had just constructed it, after which assemble_ctf_array_dimension(), the +assembly function for DW_TAG_subrange_type, actually constructs the array, +wrapping it around the CTF 'ID' 'assembled' by the parent and setting the +override flag to make sure that this is what is really recorded. + + +CTF construction functions +-------------------------- + +Each CTF construction function takes a single DWARF DIE and turns it into CTF, +somehow. They are laid out in the assembly table described in 'toplevel DWARF +walkers' above. They all start the same way, with a series of CTF_DW_ENFORCE or +CTF_DW_ENFORCE_NOT assertions. These guard against corrupted DWARF missing some +of the attributes we need, or DWARF containing attributes which indicate that we +can't handle the content (e.g. DW_AT_signature or DW_AT_specification on +structures, which would both indicate this is DWARF 4, which we can't handle +yet.) + +We'll go through these functions one by one, pointing out anything that +maintainers should be aware of. + + +assemble_ctf_base() assembles all integral base types (DW_TAG_base_type) and +transforms them into the corresponding CTF type. The functions we need to call +for this in the CTF API all have the same type signature but have different +names; CTF also distinguishes between the various differently-sized +floating-point types, so we must figure out from the type size which type a +given DWARF base type is referring to. We map from DWARF encoding to a triple +of (CTF addition function, CTF integral type, type size) where the latter is +optional and depends on the size of the DWARF type we are encoding and the size +of various floating_point types on the current system. (This does mean that +cross-compilation using dwarf2ctf is likely to fail fairly often: we need +machinery to determine the sizeof() types on the target system before that can +function.) + +This sizeof()-based search procedure is why we do not currently support +DW_AT_bit_size for base types: we could easily support it for sizes modulo 8, +but GCC happens to emit DW_AT_byte_size in this case. In C DW_AT_bit_size is +likely to be emitted only for bitfields in structures anyway, not for base +types. + + +assemble_ctf_pointer() and assemble_ctf_typedef() are trivial: look up the +associated type with lookup_ctf_type() and assemble the appropriate thing. +assemble_ctf_cvr_qual() is almost as trivial, but has to figure out which of +const, volatile or restrict it was called for and call the corresponding CTF API +function. assemble_ctf_enumeration() and assemble_ctf_enumerator() are quite +simple too. + + +assemble_ctf_variable() has a couple of extra complexities: we unconditionally +set the skip parameter to SKIP_SKIP, suppressing recursion into containing DIEs, +since we already know we won't care about any of them. Also, while the +deduplication pass unifies opaque and non-opaque structures into the same type, +it never makes sure that variables declared in the same header by translation +units which have opaque versus non-opaque structures in scope are deduplicated. +e.g. you could well end up with these two type IDs, depending on whether + was included before in a given translation +unit: + +////struct pid_namespace var init_pid_ns +/path/to/kernel/include/linux/pid_namespace.h//19//struct pid_namespace var init_pid_ns + +These variables both refer to the same type, but deduplicating them would +require an additional deduplication pass. Since variables are always terminal +and nothing can refer to them, nothing will ever look up any of those type IDs +(since the only thing that looks up type IDs is code that is searching for type +that other types depend on). So we don't care about this duplication and +running an additional deduplication pass to eliminate it would slow down +dwarf2ctf to no good end. It's better just to ignore duplicate errors from +ctf_add_variable(). + + +assemble_ctf_array() and assemble_ctf_array_dimension() we talked about +above. One last subtlety remains, which is that figuring out the actual +dimensionality of an array is complicated enough that it has been hived off into +a private_subrange_dimension() function, called both from here and from +type_id(). Arrays with neither a DW_AT_upper_bound nor a DW_AT_count, and +arrays without an indexing type, are best considered flexible arrays; arrays +whose upper bound or count is not unsigned or signed integral data are also +flexible (perhaps they're using a full-blown location list, but we can't encode +that in CTF so we treat it as flexible); and if an upper bound is used, we want +to add one to its value before treating it as a count of elements. + + +This leaves structure/union assembly, both of which are assembled by the same +pair of functions, assemble_ctf_struct_union for the type itself and +assemble_ctf_su_member() for the individual members. As with +assemble_ctf_cvr_qual(), we have to look at the tag to figure out which CTF +function to use to do the assembly, but we have an extra constraint: it is +perfectly idiomatic C to declare a structure repeatedly with a different number +of members every time. This is perfectly permissible as long as the leading +portions of all declarations match. We do not verify this (we hope that the +compiler will diagnose it, which it will unless the conflicting declarations +cross modules), though perhaps we should: we simply look up the structure in the +CTF and the DWARF and skip assembly of the structure members via SKIP_SKIP if +the already-assembled structure has at least as many members as the current one. + +assemble_ctf_su_member() is by far the most complex of the assembly functions. +It has to handle members that already exist, members that need assembly, members +that correspond to unnamed structure members, numerous different ways of +representing structure offsets and members with no offset at all. + +The offset computation is quite laborious and by no means complete: a complete +implementation would require an interpreter for DWARF location lists, which is +total overkill given that in DWARF2 GCC emits a totally stereotyped location +list, and in DWARF3+ we don't need location list parsing at all. CTF wants an +offset in bits. + +We have five cases: + - for DW_AT_data_bit_offset, we just use the offset unchanged. + + - for DW_AT_data_member_location with an integral form (data2, data4, data8, + udata, or sdata) we just look it up and multiply it by eight, adding the + parent's DW_AT_bit_offset to handle structures nested inside other + structures. + + - for DW_AT_data_member_location with a block form, we make sure that the list + is of one particular simple form (DW_OP_plus_uconst and a constant value in + bytes), and abort assembly otherwise. The only case I know of where this + test will trip is C++ virtual bases: if people are using C++ code with + virtual bases inside the kernel they deserve sympathy, but probably not + support in the code. CTF can't represent C++ types in any case. + + - for expression location lists, or anything else that we don't understand, we + simply die (we could simply skip the type, but this seems serious enough that + dying is warranted). + + - with none of these present, we have no offset: the member is at the same + location as the start of the structure. + +But where is the 'start of the structure'? That depends on whether this is an +unnamed struct/union member (usually a union). If it is, we want to fold all +its members directly into the parent structure, with their offsets increased by +the offset of the unnamed member as a whole. This is done by directly calling +die_to_ctf() with the first child of the anonymous member's type and with all +other parameters set as if the parent DIE was the current structure, thus +fooling die_to_ctf() into believing that these members are members of the +current structure, not of the anonymous one. The offset-increasing magic is +done via the parent_bias parameter to die_to_ctf() and all the CTF construction +functions: it is ignored by all of them except for assemble_ctf_su_member() +itself, which adds the parent bias onto the normally-computed offset, and is +otherwise passed down unchanged to all children. This means that even this +terribly contrived case works: + +struct horror { + int spacer; + union { + struct { + int spacer; + struct { + int foo; + int bar; + } b; + } a; + }; +}; + +In this situation, horror.a.b.bar may have: + + - a nonzero parent_bias due to the offset of the anonymous union in 'struct + horror' + - a nonzero offset due to the offset of 'bar' in its containing structure + - if DW_AT_data_member_location with integral form is used, a nonzero + DW_AT_bit_offset of 'b' in 'a' + +If this is not an anonymous union, we are dealing with only one member: we look +up its type and add it reasonably conventionally via ctf_add_member_offset(). +Even here there are subtleties: we use construct_ctf_id() directly rather than +via lookup_ctf_type() so we can get a better error message on failure, and we +ignore any duplicate-member errors because this is probably a sign that this +structure has already been encountered and we are working through another +instance of it with more members. + + +Writeout +-------- + +This couldn't really be simpler. We create an output directory with the +requested name, then work over the entire module_to_ctf_file hash, writing out +every CTF file into a new suitably-named file via zlib's compressed file I/O +functions. -- 2.50.1