--- /dev/null
+dwarf2ctf, a type encoder for the Linux kernel
+=========
+
+Many kernel-level debugging and tracing systems need access to the kernel's type
+information. Since C doesn't support any form of introspection, the data must be
+extracted from the DWARF debugging information generated by the compiler.
+However, this information is very voluminous (just the type information alone
+adds up to a couple of hundred megabytes in a 'make allyesconfig' kernel): even
+if users are happy to spend the disk space, the time and memory required to read
+much of this information in is likely to be prohibitive.
+
+This problem is not new -- back in 2004, Sun had the same problem when
+attempting to give DTrace a view of the type information in the Solaris kernel.
+Their solution was the Compact ANSI-C Type Format (CTF), a highly compacted
+representation of C types suitable for debuggers and tracers. They combined
+this with a highly efficient tool for converting DWARF2 types to CTF, and hacks
+in the Solaris kernel causing the kernel itself to emit CTF data for its own
+types.
+
+Unfortunately while this tool may be highly efficient it is not adequate for the
+Linux kernel. Not only is its license wrong, but it treats every ELF object as
+an independent entity with an independent set of types -- perfectly all right
+for the Solaris kernel with a few hundred modules maximum, but very much not for
+Linux, where distro kernels often compile in thousands of modules. Ideally, we
+would like to treat all kernel modules, built-in or not, the same way, sharing
+and deduplicating all globally-visible types across the entire set of visible
+modules and recording each precisely once.
+
+We also want to collect descriptions of interesting global variables and emit
+descriptions of their name->type mapping as well, since the kernel has no easily
+accessible ELF section we can extract this information from at runtime (kernel
+modules must be accessible at runtime for modern Linux systems to work, but the
+kernel itself could have come from over the network or off a USB key or from a
+non-mounted partition or an EFI boot partition or who knows where, and could
+have any name even if it is accessible: so tracing tools should not rely on
+being able to look inside the kernel image).
+
+dwarf2ctf is a CTF generation tool that reads in DWARF from a set of object
+files (usually, every object file in the kernel and all modules) and fills a
+directory with compressed files containing CTF representations of the types in
+those object files: the kernel build system regenerates these as necessary and
+links them directly into kernel modules.
+
+Caveats: It is somewhat specific to the form of DWARF output emitted by GCC, and
+doesn't yet support DWARF-4 type signatures or compressed DWARF at all.
+
+We'll look at each part of this system in turn, from the top down.
+
+
+Invocation and build-system connections
+----------
+
+dwarf2ctf's command-line syntax emphasises simplicity over compactness. Linux
+has nearly-infinitely-long command lines these days, so we can take advantage of
+this, as long as we take a bit of care in the makefiles to avoid using shell
+metacharacters on makefile lines that run dwarf2ctf, since many Linux shells
+still have command-line length limits and a coredumping shell can ruin your
+whole day.
+
+Two syntaxes are supported. The first shares types across multiple modules and
+the core kernel; the second is used for out-of-tree module building, and avoids
+sharing anything at all across modules, nor depending on the set of shared types
+defined for the core kernel.
+
+
+dwarf2ctf outputdir objects.builtin modules.builtin dedup.blacklist \
+ vmlinux.o module.o ...
+dwarf2ctf outputdir -e module.o ...
+
+where:
+
+ - 'outputdir' is the possibly-relative path to a directory in which the
+ generated CTF files get placed.
+ - 'objects.builtin' is the name of the file containing the object files that
+ correspond to always-built-in kernel code (that cannot be built as modules).
+ - 'modules.builtin' is the name of the file containing the names of
+ kernel modules presently built in to the kernel.
+ - 'dedup.blacklist' is a blacklist of modules that should never participate
+ in deduplication: see 'Duplicate type detection' below.
+ - the .o filenames are the names of object files comprising the kernel and/or
+ modules: you can feed in whole modules at once (before linking with .mod.o).
+ This list is often very, very long (I have seen command-lines in excess of
+ 60Kb).
+
+dwarf2ctf's output consists of a series of gzip-compressed .ctf.new files in the
+outputdir, which the makefile compares with and if necessary moves over the top
+of .ctf files with the same basename, so as to avoid relinking things if
+dwarf2ctf has written out content identical to what it wrote last time it ran.
+These fall into several classes, partitioned according to the contents of
+objects.builtin and modules.builtin:
+
+ - dtrace_ctf.builtin.ctf: The shared type repository. Types shared by more
+ than one of the files below go here. Despite its name it is not DTrace-
+ specific: it is named after the library that reads and writes it,
+ libdtrace-ctf). See 'Using dwarf2ctf output' below regarding use of this
+ data.
+ - vmlinux.builtin.ctf: Types in the core kernel, that cannot be built in to
+ modules, go here.
+ - *.builtin.ctf: One of these is generated for the types in each module that
+ is presently built in to the kernel.
+ - *.mod.ctf: One of these is generated for each .ko.
+
+All the files in the first three classes are linked into the dtrace_ctf.ko
+module under various names, an empty module containing nothing but CTF data.
+
+
+A lengthy section of Makefile.modpost, and a short section of the toplevel
+Makefile, is dedicated to creating these files, and to linking them into the
+kernel modules. The dependency graph related to dwarf2ctf output is quite
+complex: modules and objects (ld -r'ed *.o files) are processed by dwarf2ctf to
+produce a number of files in the .ctf directory, and the final modules depend on
+the relevant ctf files. .mod.ctf's go into the .ko's with the same stem name,
+but dtrace_ctf.ko receives content from all the CTF files corresponding to
+built-in modules, and until dwarf2ctf runs and creates those files we cannot
+tell what those CTF files will be, though we do have a wildcard that matches
+them all.
+
+GNU Make's 'secondary expansion' feature comes to the rescue here: we can
+compute a list of expected CTF filenames at runtime, given the names of the
+modules we are linking in. For the builtin modules, we cheat and touch a stamp
+file after moving any .ctf.new files back over a .ctf file, then depend on that
+to see if dtrace_ctf.ko needs to be relinked.
+
+The actual incorporation of the CTF data into the kernel modules happens before
+module signing (if signing is active), by calling objcopy --add-section on the
+module in question. This too has some knotty corners.
+
+First of all, the module linkage process normally links a module using all the
+prerequisites of the module's target -- but we have designated all the CTF files
+as prerequisites of the module's target, and we don't want to link them directly
+in using ld(1), since they aren't object files. So we have to filter them out in
+the link line.
+
+Secondly, those modules which have no CTF files should acquire empty CTF
+sections to indicate their lack of unique types -- but objcopy in binutils 2.20
+and below silently exits if asked to --add-section an empty file. So we use dd
+to generate a file with a one-byte null in it instead, and teach the users of
+CTF sections to treat a one-byte-long 'CTF' section as if it were empty.
+
+
+Using dwarf2ctf output
+----------------------
+
+Using this data is fairly simple. Once you've read the CTF sections from the
+kernel modules and inflated them (or ignored them if they are empty or, as just
+mentioned, one byte long), you simply need to look at the ctf_parent_name() for
+each module, and if it is set to "dtrace_ctf", call ctf_import() to set the
+parent of this module to the CTF data you have read from the
+.dtrace_ctf.dtrace_ctf section in the dtrace_ctf.ko kernel module. The core
+kernel's types are stored in the .dtrace_ctf.vmlinux section in the same kernel
+module, and all built-in kernel modules have their types in
+.dtrace_ctf.$module_name. Non-built-in kernel modules just have a .dtrace_ctf
+section containing their types, which again might need their parent set to
+"dtrace_ctf". (Out-of- tree kernel modules will have no such parent.)
+
+Once you've set up the parenthood relationships you can call ctf_close() on the
+shared type repository and forget about it entirely: it will be refcounted and
+destroyed when all its children are closed.
+
+
+You should end up with a family of CTF files, one per kernel module built-in or
+not and one for the core kernel, freely usable for whatever purpose you need.
+
+
+Overview of dwarf2ctf operation
+--------
+
+There are four phases to dwarf2ctf operation: initialization, duplicate type
+detection, CTF construction, and writeout. Some of these phases can repeat.
+All but the last phase consists purely of sucking data from object files into
+GHashTables in memory. (The last two phases could potentially be combined,
+shrinking the size of one hash and saving memory, but the hash that is shrunk is
+by no means the largest one, so the extra complexity is probably not worth it.)
+
+dwarf2ctf uses several other libraries to do this:
+
+ - elfutils, used for DWARF parsing. We could potentially write our own
+ DWARF parser, but elfutils works and is already used by both DTrace and
+ SystemTap, so isn't even likely to introduce a new dependency.
+
+ - glib, used for the GHashTable. The rest of the kernel uses roll-your-own
+ hash tables, but dwarf2ctf makes heavy demands of its hashtables: they must
+ be expanding hashes capable of efficiently storing hundreds of thousands of
+ items, with amortized log(N) lookup time, and they must support deletion
+ (though it need not be particularly efficient deletion). This rules out
+ simple fixed-size bucket hashes like the ones used in other parts of the
+ kernel build system: GHashTable is already implemented, and works.
+
+ - zlib, used to compress the CTF information.
+
+ - libdtrace-ctf, which both reads and writes the CTF data. This is a port of
+ the Solaris CTF library, GPLed and with additional support for the storage
+ of name->type mappings (meant to represent variables) akin to its existing
+ ELF symbol->type mappings.
+
+dwarf2ctf has a good few important data structures, described at the top of
+scripts/dwarf2ctf/dwarf2ctf.c.
+
+dwarf2ctf has its own trace facility, implemented via the dw_ctf_trace() macro
+and enabled by compiling with -DDEBUG and setting DWARF2CTF_TRACE in the
+environment. (The first step is required because some very numerous data
+structures are greatly expanded when debugging is turned on, which would waste
+memory if it were done all the time). This produces a huge volume of trace
+output, several gigabytes when run over an allyesconfig kernel.
+
+
+Unless you're interested in how dwarf2ctf works internally, you can stop reading
+her.
+
+
+Initialization
+--------------
+
+This happens at the top of main() and run(), and in various functions named
+init_*() (init_assembly_tab(), init_builtin(), init_blacklist(), and
+init_tu_to_modules()). Of these, init_assembly_tab() and init_builtin() serve
+only to turn various static arrays and files mentioned on the command line into
+more useful internal representations (e.g. the assembly filter array of
+structures is turned into a pair of arrays indexed by DWARF tag), and
+init_blacklist() is described in the section on duplicate type detection below.
+
+init_ctf_table(), called both at initialization time and later during CTF
+assembly when new CTF files are found to be needed, creates a new CTF file in
+memory and either marks it as a child of the shared type repository, or (if it
+*is* the shared type repository, or deduplication is off and there is only one
+CTF file being processed and no shared type repository at all) creates a few
+types in it which CTF has representations of but DWARF does not: a void type,
+and a generic catchall pointer-to-function-returning-int.
+
+That leaves init_tu_to_modules(). This walks over all the top-level
+compile_unit DIEs in the DWARF debugging information in every object file
+mentioned in the list of modules and built-in modules, constructing a mapping
+from translation unit name back to the name of the kernel module it comes from,
+even if that module is built in to the kernel. This is normally the same as the
+filename (sans extension), but for built-in kernel modules, the name comes from
+the modules.builtin file's entry for the translation unit instead, since types
+belonging to a built-in module are considered to be in part of that module even
+if the module happens to be built in to the kernel, so that the output can land
+in a .builtin.ctf file rather than being jammed into vmlinux.builtin.ctf with
+the core kernel's types. This means that dwarf2ctf can operate in terms of the
+kernel module a type is contained within rather than having to think about the
+mapping between object file name, translation unit name and module name all the
+time.
+
+
+Toplevel DWARF walkers
+----------------------
+
+All routines in dwarf2ctf other than initialization and writeout are DWARF
+walkers: i.e., they walk over all DWARF DIEs in all object files specified on
+the command line and do something with every DIE. This job is done by
+process_file() and its helper process_tu_func(), which not only digs out the
+corresponding (built-in or non-built-in) module name corresponding to each
+object file, but also detects and skips translation units it has handled before
+(in case they are incrementally linked into multiple object files) and allows
+callbacks to be invoked at the start and end of each translation unit.
+
+Even though dwarf2ctf only cares about top-level types, in some situations DWARF
+can emit top-level types with references to a non-top-level type: if all
+occurrences of the top-level type are an opaque structure, and the only
+non-opaque definition is inside a function, references in the same translation
+unit as the non-opaque definition will point to the definition inside the
+function (and references outside the translation unit will not point at any
+definition). Thus, if we want to catch all nuances of globally-visible types,
+we have to scan types inside functions and lexical blocks inside functions too.
+
+To avoid generating a vast number of unnecessary type definitions, the 'assembly
+table' which describes how to construct a CTF type given a DWARF DIE also
+contains a description of a set of filters which are passed the current DIE and
+its parent: if they return false, the DIE is skipped and never passed to the
+callback function. We also avoid calling the callback for any DWARF DIE whose
+tag doesn't appear in the assembly table at all: there's no point doing
+duplicate detection or anything else for a DWARF DIE we won't be generating CTF
+from. There are currently two filters defined: filter_ctf_file_scope(), which
+is called for every DWARF DIE whose tag is one we never expect to see a
+reference to if it is inside a function, and filter_ctf_uninteresting(), which
+is called for variables to see if they are worthy of recording (top-level named
+variables with external linkage not part of the internal workings of macros
+only).
+
+
+Type IDs
+--------
+
+The only thing dwarf2ctf does which the Sun tool does not is the detection of
+duplicate and shared types, both within individual kernel modules and across
+modules. Our ultimate goal is that a type that appears in the source code once
+appears in the CTF output once as well. This goal has mostly been attained,
+except for out-of-tree modules, where cross-module type sharing must be disabled
+to avoid requiring rebuilds of the module whenever the core kernel is rebuilt.
+
+The core of this is the concept of a *type ID* and the function type_id() which
+computes it. A type ID is an identifier for a type which precisely represents
+that type and only that type. Doing this for types in different headers or at
+different scopes with the same name without needing to encode knowledge of C
+scoping rules into dwarf2ctf is an interesting proposition: we can use the line
+number and filename info provided by DWARF in most user-specified types to help.
+
+A type ID is a recursively-constructed string of the following form (fixed
+elements represented by {}, optional elements by []):
+
+//[filename]//[line number]//{type string}
+
+Types are *based* upon other types iff they have a DW_AT_type attribute pointing
+to some other type. All types based upon other types have a type ID that is the
+type ID of the type upon which they are based, with additional information
+specific to this type appended to it. The filename and line number is only
+added for those types which are not based upon other types and which have a
+filename and line number in the DWARF (lots don't, e.g. base types): the
+filename is canonicalized with realpath(), though since this is quite slow and
+type_id() is called a lot, the mapping from DWARF filename to realpath() result
+is cached. Types that have no filename or line number start with '////'.
+
+We use // to separate the filename and line number elements because this is the
+shortest string other than NUL that cannot appear in a canonicalized POSIX
+pathname (ignoring Pyramid, Cygwin and other strange systems that actually
+return // in the result of realpath(): Linux doesn't use it and that's all that
+matters. Should it start to use it, we can switch delimiter to ///.)
+
+Function pointers are not represented (or, rather, are all mapped to the same
+type ID, the generic catchall function-pointer type mentioned above); array
+dimensions are represented by [index-type dimension], or [] for flexible array
+members. Structure members are not represented, since they are not types, but
+the types of their members *are* represented, as are nested structures (the line
+number and filename serving, as ever, to disambiguate them from other structures
+with the same name declared nested inside different structures).
+
+The following are some examples of valid type IDs (assuming the kernel source
+tree is, implausibly, located at /k/, just off the root directory: comments on
+individual types done /* like C */; the last example is broken across lines for
+formatting's sake):
+
+//fp//* # a pointer to a function, any function
+////long int
+////char []
+////unsigned int typedef __kernel_uid_t typedef __kernel_uid32_t
+//fp//* typedef __signalfn_t * typedef __sighandler_t
+////struct nsproxy /* an opaque type */
+////struct nsproxy * # /* pointer to it */
+////long unsigned int typedef u64 volatile
+////long unsigned int volatile const * const
+////long unsigned int typedef sector_t [////long unsigned int 511]
+/k/include/linux/types.h//222//struct list_head
+/k/include/linux/types.h//222//struct list_head *
+/k/include/linux/types.h//222//struct list_head [////long unsigned int 5]
+/k/include/linux/types.h//217//struct /* no struct tag */
+/k/include/linux/types.h//217//struct typedef atomic64_t
+/k/include/linux/mm_types.h//34//struct page * typedef pgtable_t
+/k/fs/eventpoll.c//122//struct nested_calls
+/k/include/linux/sysctl.h//1016//struct ctl_table typedef ctl_table
+ [////long unsigned int 4] var inotify_table /* A global variable */
+
+This scheme means that cv-quals and other modifiers applied to other types are
+always merged: if there are a dozen typedefs for a single type 'foo' with the
+same name declared in the same place, they all end up with the same type ID and
+are only emitted into the CTF once.
+
+The type_id() function can also accept a callback, which is called as the
+recursion unwinds, from base type up to derived type: so it might be called for
+"////unsigned int", then for "////unsigned int typedef __kernel_uid_t", and so
+on up to the DIE that was originally passed in. Because type_id() returns a
+dynamically-allocated string, calls to type_id() made purely for the sake of
+invoking a callback are normally of the peculiar form "free(type_id(...))".
+
+type_id() is a very hot spot, so syscall results are cached in it (such as
+realpath(), as mentioned above), and when string appending is done, it is done
+all at once where possible, via str_appendn(), which calls realloc() only once
+no matter the number of strings being appended.
+
+Lots of core data structures in dwarf2ctf consist of hashes mapping type IDs to
+something else (predominantly CTF file/ID pairs and module names). It would
+be possible to map from hashes of type IDs, saving some memory, but this would
+impair debugging so is not yet implemented. (If it is implemented, it should
+probably be implemented only when DEBUG is not defined.)
+
+
+Duplicate detection
+-------------------
+
+The job of the duplicate detection pass is to fill out the id_to_module hash,
+which maps type IDs to the module they appear in, with the two special cases
+that types that appear only in the core kernel are said to appear in the module
+'vmlinux', and types that appear in more than one module (or in a module and in
+the core kernel) are said to appear in the module 'dtrace_ctf', the shared type
+repository. This is quite a tricky multi-pass process, because we must ensure
+that the shared type repository is self-contained: all types in the repository
+must not reference any types outside the repository.
+
+Detecting duplicates itself is easy: we consider two types duplicates if they
+have the same type ID: if they both reside in the same module, the resulting
+type resides in that module too. However, detecting shared types is harder.
+We consider that a type belongs in the shared module if any of these conditions
+is true:
+
+ - the type appears multiple times in different modules
+ - a type for which this type is a base type is shared
+ - the type is referenced by a structure or union member, and the structure
+ or union is shared
+ - the type is a non-opaque type with an opaque variant ('struct foo'), or
+ vice versa, and either of these variants is shared: these two types will
+ get different type IDs, so explicit checking is necessary
+
+It should be fairly easy to see that sharedness is a contagious property:
+e.g. if you mark a structure as shared, and one of its members is an
+otherwise-unshared opaque pointer to a structure, you have to mark that as
+shared: this causes the non-opaque definition of the structure, and all *its*
+members, to be shared, and so on. So since dwarf2ctf does not track the members
+of structures itself (not until the CTF generation phase, anyway), this means
+walking over the DWARF DIEs multiple times, checking for sharedness over and
+over until we are done.
+
+We partition the problem into two parts, both of which are carried out by
+process_file() callback functions: detect_duplicates() and
+detect_duplicates_alias_fixup().
+
+
+detect_duplicates() is called first, once for every DIE in the kernel (via
+process_file()). This identifies types that are duplicated but not shared, and
+identifies shared types without consideration of opaque struct/union aliasing.
+It also flags types that have been seen only once as 'seen': this is checked
+much later on by the CTF construction phase, since construction of CTF for
+any type which has not been inspected by the deduplicator is a sign of a bug in
+the deduplicator.
+
+This has several subtleties.
+
+If we are running for an out-of-tree module, we must still identify types as
+duplicated within the module, but must never mark them as shared: out-of-tree
+modules cannot contribute to the shared type repository nor even use types in
+it, since they are rebuilt independently from the kernel proper and thus cannot
+depend on a type currently in the repository remaining there (e.g. perhaps it
+has only two users, both in modules, and the kernel is rebuilt to not build one
+of those modules anymore: this should not require rebuilding of any out-of-tree
+modules).
+
+If we mark a type as seen, we must mark aggregate types that appear directly
+within that type's DIE as seen as well. This is done by the recursive function
+mark_seen_contained(). You might wonder what the point of it is: such types
+surely cannot appear anywhere else, and any duplication will precisely match the
+duplication of the containing type. The answer is that they can still be
+referenced as the type of structure members of their containing structure,
+e.g. in
+
+struct foo {
+ struct bar {
+ } *baz;
+};
+
+Here, a reference to 'struct bar' appears in 'struct foo', and CTF is
+constructed for it, even though it is not a top-level DIE.
+
+
+If we find that a type has appeared more than once in different kernel modules
+(or in a module and in the core kernel), we must mark it as shared. This is
+done via mark_shared(), which is both a function that can be directly called
+(e.g. from detect_duplicates()) and a type_id() callback. If it is called
+directly, it immediately reinvokes itself as a type_id() callback, which calls
+it for the base type of the type in question and then for all qualifiers up the
+type ID stack, marking them all as shared if they weren't already.
+
+If a structure or union is marked as shared, the types of its members are also
+marked as shared via a recursive call (even if they have already been so marked:
+just because this structure is of a type we've already seen, in a location we've
+already seen, doesn't mean that someone might not have legitimately used
+#defines to add extra members to the end of it, and we need to mark them as
+shared too). We track structures that have been seen in this translation unit
+and avoid recursing into them, to avoid an infinite loop in cases like this:
+
+struct one;
+struct two {
+ struct one *foo;
+};
+
+struct one {
+ struct two *foo;
+}
+
+The types being pointers does not help here -- the marking of 'struct one *' as
+shared will automatically mark 'struct one' as shared too, because otherwise we
+might have a structure in the shared type repository whose members' types
+could not be found there.
+
+
+The second pass is the 'alias fixup' pass, implemented by
+detect_duplicates_alias_fixup(). This pass serves to detect unshared opaque
+types whose non-opaque equivalents are shared, and vice versa. It is executed
+repeatedly until no types have been marked as shared for an entire iteration,
+but is considerably faster per iteration than the first pass, which often
+consumes more than half of dwarf2ctf's total runtime. We work in one direction
+only, looking for non-opaque structures, unions or enums which have structure
+tags. (Structures without tags cannot have opaque variants, and structures
+which are opaque will have non-opaque cousins somewhere, or can be emitted to
+the CTF as an opaque structure harmlessly since they truly have no members and
+are probably manipulated only via casts.)
+
+We identify structures, unions or enums with tags via the type_id() callback
+is_named_struct_union_enum(), but cannot determine if something is an opaque
+structure at this stage. Instead, we do that after the callback, checking to
+see if the first four characters of the type ID are "////": this relies on the
+fact that GCC never gives opaque structures line numbers in DWARF. We do the
+actual checking and marking of each non-opaque structure using
+detect_duplicates_alias_fixup_internal(), which is yet another type_id()
+callback.
+
+This function directly synthesises the name which this structure's opaque cousin
+would have, if it existed, by stripping off the line number and filename and
+replacing them with '////', and sees if either of these types have been marked
+as shared while the other has not. If the opaque type is shared, the non-opaque
+variant can be marked shared using the same recursive mark_shared() function as
+before (thus marking the types of all its members and types it depends upon as
+shared too). If it is the opaque type that needs marking shared, this will not
+work, since mark_shared() takes a DWARF DIE, and we don't have one for the
+opaque type, just a faked-up type ID. However, since an opaque type doesn't
+have any members that need recursively tracing, we don't need access to its
+DWARF DIE to figure them out, and can just mark it as shared directly, via an
+insert into the id_to_module hash.
+
+Since this function is a type_id() callback, it is called not just for
+structures but for types based on them (e.g. in type_id form, "struct foo const
+* [43] volatile *"); at every level of this declarator stack, a shared opaque
+base type will contaminate its non-opaque cousins with sharedness, and vice
+versa. This handles situations in which, say, the opaque version of "struct foo
+const * [43]" was used by more than one module and was marked as shared: the
+marking process will have marked the opaque versions of "struct foo const *
+[43]", "struct foo const *", "struct foo const" and "struct foo" as shared, but
+will not have touched any non-opaque versions of these types which may exist
+until this routine runs. It also handles typedefs to structures with no need
+for any extra code.
+
+This whole alias fixup process needs to be repeated, because whenever a
+non-opaque type is marked as shared and its member's types traced and marked
+shared, *those* may themselves be structure types with corresponding opaque or
+non-opaque variants, and when they are opaque types the non-opaque variant that
+alias fixup works from may already have passed under the DWARF walker's gaze: so
+another pass over the kernel's DWARF is necessary to be sure we catch it.
+mark_shared() thus sets a flag that scan_duplicates() recognizes and uses to
+trigger another run through the alias fixup pass.
+
+
+There are a very few modules that this algorithm doesn't work for. One example
+is snd-ens1371, which reads, in toto
+
+#define CHIP1371
+#include "ens1370.c"
+
+and ens1370.c (itself a distinct kernel module) then defines a 'struct ensoniq'
+whose members vary depending on whether CHIP1371 is defined. Obviously, it is
+impossible to share any such types between kernel modules even though their
+names are the same and they are defined in the same place in the same source
+file in both cases. But this sort of trickery is very rare, so we simply
+implement a blacklist of modules which will not introduce new types into the
+shared CTF repository, and who do not participate in alias fixup detection
+either. Detecting these cases in order to blacklist them is harder: no
+automated system has yet been implemented, although instances where #defines of
+this nature introduce new types that are then used by later members will cause
+assertion failures inside dwarf2ctf which might be a clue. So it is possible
+that some examples have been missed. (The blacklist only applies to cases where
+structure members change within a single kernel build, so cases where structures
+have members whose presence depends on CONFIG_* values are quite all right, as
+are cases where #defines are introduced by one translation that #includes
+another that then goes on to define whole new structures: it is only cases where
+modules #define something that changes the definition of individual
+possibly-shared types that will need blacklisting.)
+
+
+CTF construction
+----------------
+
+The next stage after the detection of duplicate and cross-module shared types is
+to generate CTF. We generate all CTF at once before emitting it: this is
+potentially somewhat wasteful of memory, but in practice has not proved to be a
+problem: substantially less memory is used than is used by other parts of the
+kernel build, unless -DDEBUG is enabled. Its job is to look through the
+kernel's type DWARF (via process_file(), as usual) and create CTF for every
+file-or-global-scope type and every externally-visible variable in the CTF file
+in which the duplicate detection pass has said that type should appear.
+(Variables are treated exactly like types: it just so happens that they are
+never shared because no type or variable can depend upon them, so they always go
+directly into the appropriate module and never into the shared type repository.)
+
+At this stage, the 'CTF files' are not actually files but rather ctf_file_t
+structures maintained by libdtrace-ctf and tracked in the module_to_ctf_file
+hash. We track every single individual type in the CTF file in the id_to_type
+hash, which maps type IDs to pairs of (CTF file ID, ctf type ID): this lets us
+use the type IDs described above when considering cross-references within CTF
+files (e.g. from one CTF type to a type it depends upon).
+
+The most important functions in this phase are:
+
+ - construct_ctf_id(), the top-level process_file() callback which is given a
+ DWARF DIE, looks in module_to_ctf_file for the CTF file where this type
+ should land (creating it if necessary), makes sure a type with this type ID
+ has not already been created there, calls die_to_ctf() to create the CTF,
+ notes where it was created in id_to_type, and handles errors.
+
+ - die_to_ctf(), a recursive function which calls the assembly function for the
+ DIE it is given and all its immediate children, with special-case handling
+ for tagged structures and unions. If you want to create a type but not note
+ where it was created for future lookups by lookup_ctf_type(), this is what to
+ call. (This is only done currently for unnamed structures/unions.)
+
+ - lookup_ctf_type(), which is called by CTF assembly functions for those
+ types that depend upon other types: it calls construct_ctf_id() again
+ to construct the type, and double-checks that all such types appear
+ either in the module we are constructing types for, or in the shared CTF
+ module. CTF represents all function pointers the same way, and has a
+ special type ID for 'void', so we special-case both of these cases.
+
+These functions are mostly straightforward (though highly recursive, with all
+three plus CTF construction functions participating in one recursive loop,
+die_to_ctf() calling itself directly and even one situation, the
+already-mentioned unnamed structures/unions, in which die_to_ctf() is directly
+called back by a CTF construction function.)
+
+
+There are a few subtleties, though. Firstly, error handling. We consider that
+errors that will lead to unusable CTF are fatal: mostly, these are errors where
+a bug in the deduplicator has failed to trace types correctly and has left at
+least one type in the shared module depending on a type in a non-shared module,
+or has failed to mark a type as shared at all. In all these cases, you'll
+eventually get an error from lookup_ctf_type() of the general form
+
+blah.c:413:foo_t: Internal error: lookup of flob found in different file.
+
+The first two parts of this error are the translation unit and line number the
+type being assembled (usually a structure or union) was found in: foo_t is the
+name of the type being assembled. The type of the structure being looked up
+appears nowhere, because we don't know it, but the name of the member is given
+("flob" above, or "(unnamed)" if we don't know it). If you want more
+information, you can pass -DDEBUG to the compilation of dwarf2ctf.c in
+scripts/dwarf2ctf/Makefile, and rerun dwarf2ctf, and you'll get the module and
+filename in which both the originating and the target types appear. In order to
+actually track down the bug you'll probably have to run dwarf2ctf with
+DWARF2CTF_TRACE set in the environment, and look at the place where the target
+type was deduplicated, and try to figure out why the deduplicator didn't trace
+the reference to the type in foo_t correctly.
+
+There is another kind of error, though: a failure to assemble a single type,
+perhaps because DWARF was emitted that we don't know how to understand (this is
+particularly likely in structure assembly, where we are highly dependent on the
+form of the DWARF that GCC happens to emit for DW_AT_member_location). We pass
+an 'enum skip_type' around, which has three possible values, one of which is
+SKIP_ABORT. After every type is assembled, we call the notably inefficient
+function ctf_update() on the CTF file we're working over. This takes the
+in-memory structures and serializes them (all of them, every time), following
+which they can be referred to by other CTF types. If a SKIP_ABORT propagates up
+to construct_ctf_id(), we call ctf_discard() instead, which throws away every
+type constructed since the last ctf_update() -- i.e., the specific erroneous
+type we've just been working on. (We might have emitted some parts of it and
+then failed, so we should try to clean up.)
+
+A SKIP_ABORT is not fatal unless DEBUG is defined: its only effect is to omit
+one single type from the resulting CTF, which is probably still usable.
+
+There's more error-handling complexity inside die_to_ctf(), where errors from
+libdtrace-ctf are actually reported (there may be multiple of them for a single
+type, e.g. if we are assembling a structure and several members somehow refer to
+a type we do not know about).
+
+
+die_to_ctf() itself has the sort of parameter list that can make people swear
+off C for life. It is largely explained in the description of ctf_assembly_fun.
+Most parts of it are hardly used in the function itself, just passed down to CTF
+assembly functions. The only part of die_to_ctf() that is worth noting is that
+it goes to extra effort to call ctf_update() early when new structure and union
+DIEs are created, before any of their members are assembled. This allows self-
+references like
+
+struct foo {
+ struct foo *next;
+};
+
+to work (since you can't refer to a type in CTF if you haven't called
+ctf_update() since adding that type).
+
+Finally, we must note the override flag. Both die_to_ctf() and
+construct_ctf_id() return a CTF ID. This is thrown away by the DWARF walking
+code (the function construct_ctf() exists just for that purpose), but when
+called by lookup_ctf_type(), this CTF ID is taken to be the single ID of the CTF
+type that's just been assembled. Normally this is the same as the CTF ID
+returned by the CTF assembly function for the top-level DWARF DIE, but there are
+a few structures for which we want to return the result of some other CTF
+assembly function.
+
+The only currently-existing example is array dimensions, which DWARF represents
+as a typed array DIE whose child is a dimension, but which CTF represents as an
+array-with-dimensions that you can't change afterwards. We can't assemble an
+'array' at the top level because we don't know how big it is, but we have to
+track the type recorded there somehow. We handle this by having
+assemble_ctf_array(), the assembly function for the top-level DW_TAG_array_type
+DIE, simply look up the type of the array's members and return its ID as if it
+had just constructed it, after which assemble_ctf_array_dimension(), the
+assembly function for DW_TAG_subrange_type, actually constructs the array,
+wrapping it around the CTF 'ID' 'assembled' by the parent and setting the
+override flag to make sure that this is what is really recorded.
+
+
+CTF construction functions
+--------------------------
+
+Each CTF construction function takes a single DWARF DIE and turns it into CTF,
+somehow. They are laid out in the assembly table described in 'toplevel DWARF
+walkers' above. They all start the same way, with a series of CTF_DW_ENFORCE or
+CTF_DW_ENFORCE_NOT assertions. These guard against corrupted DWARF missing some
+of the attributes we need, or DWARF containing attributes which indicate that we
+can't handle the content (e.g. DW_AT_signature or DW_AT_specification on
+structures, which would both indicate this is DWARF 4, which we can't handle
+yet.)
+
+We'll go through these functions one by one, pointing out anything that
+maintainers should be aware of.
+
+
+assemble_ctf_base() assembles all integral base types (DW_TAG_base_type) and
+transforms them into the corresponding CTF type. The functions we need to call
+for this in the CTF API all have the same type signature but have different
+names; CTF also distinguishes between the various differently-sized
+floating-point types, so we must figure out from the type size which type a
+given DWARF base type is referring to. We map from DWARF encoding to a triple
+of (CTF addition function, CTF integral type, type size) where the latter is
+optional and depends on the size of the DWARF type we are encoding and the size
+of various floating_point types on the current system. (This does mean that
+cross-compilation using dwarf2ctf is likely to fail fairly often: we need
+machinery to determine the sizeof() types on the target system before that can
+function.)
+
+This sizeof()-based search procedure is why we do not currently support
+DW_AT_bit_size for base types: we could easily support it for sizes modulo 8,
+but GCC happens to emit DW_AT_byte_size in this case. In C DW_AT_bit_size is
+likely to be emitted only for bitfields in structures anyway, not for base
+types.
+
+
+assemble_ctf_pointer() and assemble_ctf_typedef() are trivial: look up the
+associated type with lookup_ctf_type() and assemble the appropriate thing.
+assemble_ctf_cvr_qual() is almost as trivial, but has to figure out which of
+const, volatile or restrict it was called for and call the corresponding CTF API
+function. assemble_ctf_enumeration() and assemble_ctf_enumerator() are quite
+simple too.
+
+
+assemble_ctf_variable() has a couple of extra complexities: we unconditionally
+set the skip parameter to SKIP_SKIP, suppressing recursion into containing DIEs,
+since we already know we won't care about any of them. Also, while the
+deduplication pass unifies opaque and non-opaque structures into the same type,
+it never makes sure that variables declared in the same header by translation
+units which have opaque versus non-opaque structures in scope are deduplicated.
+e.g. you could well end up with these two type IDs, depending on whether
+<linux/pid_namespace.h> was included before <linux/pid.h> in a given translation
+unit:
+
+////struct pid_namespace var init_pid_ns
+/path/to/kernel/include/linux/pid_namespace.h//19//struct pid_namespace var init_pid_ns
+
+These variables both refer to the same type, but deduplicating them would
+require an additional deduplication pass. Since variables are always terminal
+and nothing can refer to them, nothing will ever look up any of those type IDs
+(since the only thing that looks up type IDs is code that is searching for type
+that other types depend on). So we don't care about this duplication and
+running an additional deduplication pass to eliminate it would slow down
+dwarf2ctf to no good end. It's better just to ignore duplicate errors from
+ctf_add_variable().
+
+
+assemble_ctf_array() and assemble_ctf_array_dimension() we talked about
+above. One last subtlety remains, which is that figuring out the actual
+dimensionality of an array is complicated enough that it has been hived off into
+a private_subrange_dimension() function, called both from here and from
+type_id(). Arrays with neither a DW_AT_upper_bound nor a DW_AT_count, and
+arrays without an indexing type, are best considered flexible arrays; arrays
+whose upper bound or count is not unsigned or signed integral data are also
+flexible (perhaps they're using a full-blown location list, but we can't encode
+that in CTF so we treat it as flexible); and if an upper bound is used, we want
+to add one to its value before treating it as a count of elements.
+
+
+This leaves structure/union assembly, both of which are assembled by the same
+pair of functions, assemble_ctf_struct_union for the type itself and
+assemble_ctf_su_member() for the individual members. As with
+assemble_ctf_cvr_qual(), we have to look at the tag to figure out which CTF
+function to use to do the assembly, but we have an extra constraint: it is
+perfectly idiomatic C to declare a structure repeatedly with a different number
+of members every time. This is perfectly permissible as long as the leading
+portions of all declarations match. We do not verify this (we hope that the
+compiler will diagnose it, which it will unless the conflicting declarations
+cross modules), though perhaps we should: we simply look up the structure in the
+CTF and the DWARF and skip assembly of the structure members via SKIP_SKIP if
+the already-assembled structure has at least as many members as the current one.
+
+assemble_ctf_su_member() is by far the most complex of the assembly functions.
+It has to handle members that already exist, members that need assembly, members
+that correspond to unnamed structure members, numerous different ways of
+representing structure offsets and members with no offset at all.
+
+The offset computation is quite laborious and by no means complete: a complete
+implementation would require an interpreter for DWARF location lists, which is
+total overkill given that in DWARF2 GCC emits a totally stereotyped location
+list, and in DWARF3+ we don't need location list parsing at all. CTF wants an
+offset in bits.
+
+We have five cases:
+ - for DW_AT_data_bit_offset, we just use the offset unchanged.
+
+ - for DW_AT_data_member_location with an integral form (data2, data4, data8,
+ udata, or sdata) we just look it up and multiply it by eight, adding the
+ parent's DW_AT_bit_offset to handle structures nested inside other
+ structures.
+
+ - for DW_AT_data_member_location with a block form, we make sure that the list
+ is of one particular simple form (DW_OP_plus_uconst and a constant value in
+ bytes), and abort assembly otherwise. The only case I know of where this
+ test will trip is C++ virtual bases: if people are using C++ code with
+ virtual bases inside the kernel they deserve sympathy, but probably not
+ support in the code. CTF can't represent C++ types in any case.
+
+ - for expression location lists, or anything else that we don't understand, we
+ simply die (we could simply skip the type, but this seems serious enough that
+ dying is warranted).
+
+ - with none of these present, we have no offset: the member is at the same
+ location as the start of the structure.
+
+But where is the 'start of the structure'? That depends on whether this is an
+unnamed struct/union member (usually a union). If it is, we want to fold all
+its members directly into the parent structure, with their offsets increased by
+the offset of the unnamed member as a whole. This is done by directly calling
+die_to_ctf() with the first child of the anonymous member's type and with all
+other parameters set as if the parent DIE was the current structure, thus
+fooling die_to_ctf() into believing that these members are members of the
+current structure, not of the anonymous one. The offset-increasing magic is
+done via the parent_bias parameter to die_to_ctf() and all the CTF construction
+functions: it is ignored by all of them except for assemble_ctf_su_member()
+itself, which adds the parent bias onto the normally-computed offset, and is
+otherwise passed down unchanged to all children. This means that even this
+terribly contrived case works:
+
+struct horror {
+ int spacer;
+ union {
+ struct {
+ int spacer;
+ struct {
+ int foo;
+ int bar;
+ } b;
+ } a;
+ };
+};
+
+In this situation, horror.a.b.bar may have:
+
+ - a nonzero parent_bias due to the offset of the anonymous union in 'struct
+ horror'
+ - a nonzero offset due to the offset of 'bar' in its containing structure
+ - if DW_AT_data_member_location with integral form is used, a nonzero
+ DW_AT_bit_offset of 'b' in 'a'
+
+If this is not an anonymous union, we are dealing with only one member: we look
+up its type and add it reasonably conventionally via ctf_add_member_offset().
+Even here there are subtleties: we use construct_ctf_id() directly rather than
+via lookup_ctf_type() so we can get a better error message on failure, and we
+ignore any duplicate-member errors because this is probably a sign that this
+structure has already been encountered and we are working through another
+instance of it with more members.
+
+
+Writeout
+--------
+
+This couldn't really be simpler. We create an output directory with the
+requested name, then work over the entire module_to_ctf_file hash, writing out
+every CTF file into a new suitably-named file via zlib's compressed file I/O
+functions.