ctf: dwarf2ctf doc revisions

author Nick Alcock <nick.alcock@oracle.com>

Wed, 10 Oct 2012 13:23:31 +0000 (14:23 +0100)

committer Nick Alcock <nick.alcock@oracle.com>

Mon, 29 Jun 2015 21:41:39 +0000 (22:41 +0100)
author Nick Alcock <nick.alcock@oracle.com>
Wed, 10 Oct 2012 13:23:31 +0000 (14:23 +0100)
committer Nick Alcock <nick.alcock@oracle.com>
Mon, 29 Jun 2015 21:41:39 +0000 (22:41 +0100)
diff --git a/Documentation/dwarf2ctf b/Documentation/dwarf2ctf

index 750b5b619cca1a0ab95df5e63144a658794f1288..2e28359afa8255e0a2b4894c16f80cc00cff1d9a 100644 (file)
--- a/Documentation/dwarf2ctf
+++ b/Documentation/dwarf2ctf
@@ -2,12 +2,13 @@ dwarf2ctf, a type encoder for the Linux kernel
  =========
  
  Many kernel-level debugging and tracing systems need access to the kernel's type
-information.  Since C doesn't support any form of introspection, the data must be
-extracted from the DWARF debugging information generated by the compiler.
-However, this information is very voluminous (just the type information alone
-adds up to a couple of hundred megabytes in a 'make allyesconfig' kernel): even
-if users are happy to spend the disk space, the time and memory required to read
-much of this information in is likely to be prohibitive.
+information.  Since C doesn't support any form of introspection, the data must
+be extracted in some other way: here, we extract it from the DWARF debugging
+information generated by the compiler.  Unfortunately, this information is very
+voluminous (just the type information alone adds up to a couple of hundred
+megabytes in a 'make allyesconfig' kernel): even if users are happy to spend the
+disk space, the time and memory required to read much of this information in is
+likely to be prohibitive.
  
  This problem is not new -- back in 2004, Sun had the same problem when
  attempting to give DTrace a view of the type information in the Solaris kernel.
@@ -18,33 +19,57 @@ in the Solaris kernel causing the kernel itself to emit CTF data for its own
  types.
  
  Unfortunately while this tool may be highly efficient it is not adequate for the
-Linux kernel.  Not only is its license wrong, but it treats every ELF object as
-an independent entity with an independent set of types -- perfectly all right
-for the Solaris kernel with a few hundred modules maximum, but very much not for
-Linux, where distro kernels often compile in thousands of modules.  Ideally, we
-would like to treat all kernel modules, built-in or not, the same way, sharing
-and deduplicating all globally-visible types across the entire set of visible
-modules and recording each precisely once.
-
-We also want to collect descriptions of interesting global variables and emit
-descriptions of their name->type mapping as well, since the kernel has no easily
-accessible ELF section we can extract this information from at runtime (kernel
-modules must be accessible at runtime for modern Linux systems to work, but the
-kernel itself could have come from over the network or off a USB key or from a
-non-mounted partition or an EFI boot partition or who knows where, and could
-have any name even if it is accessible: so tracing tools should not rely on
-being able to look inside the kernel image).
-
-dwarf2ctf is a CTF generation tool that reads in DWARF from a set of object
-files (usually, every object file in the kernel and all modules) and fills a
-directory with compressed files containing CTF representations of the types in
-those object files: the kernel build system regenerates these as necessary and
-links them directly into kernel modules.
+Linux kernel.  It treats every ELF object as an independent entity with an
+independent set of types -- perfectly all right for the Solaris kernel with a
+few hundred modules maximum, but very much not for Linux, where distro kernels
+often compile in thousands of modules.  Ideally, we would like to treat all
+kernel modules, built-in or not, the same way, sharing and deduplicating all
+globally-visible types across the entire set of visible modules and recording
+each precisely once.
+
+We also want to collect descriptions of global variables and emit descriptions
+of their name->type mapping as well, since the kernel has no easily accessible
+ELF section we can extract this information from at runtime (kernel modules must
+be accessible at runtime for modern Linux systems to work, but the kernel itself
+could have come from over the network or off a USB key or from a non-mounted
+partition or an EFI boot partition or who knows where, and could have any name
+even if it is accessible: so tracing tools should not rely on being able to look
+inside the kernel image).
+
+We do all this with dwarf2ctf, a CTF generation tool that reads in DWARF from a
+set of object files (usually, every object file in the kernel and all modules)
+and fills a directory with compressed files containing CTF representations of
+the types in those object files: the kernel build system regenerates these as
+necessary and links them directly into kernel modules.
  
  Caveats: It is somewhat specific to the form of DWARF output emitted by GCC, and
  doesn't yet support DWARF-4 type signatures or compressed DWARF at all.
  
-We'll look at each part of this system in turn, from the top down.
+We'll look at each part of this system in turn, from the top down, starting with
+using the kernel type information  dwarf2ctf produces in other programs.
+
+
+Using dwarf2ctf output
+----------------------
+
+Using this data is fairly simple.  Once you've read the CTF sections from the
+kernel modules and inflated them (or ignored them if they are empty or, as just
+mentioned, one byte long), you simply need to look at the ctf_parent_name() for
+each module, and if it is set to "ctf", call ctf_import() to set the parent of
+this module to the CTF data you have read from the .ctf.shared_ctf section in
+the ctf.ko kernel module.  The core kernel's types are stored in the
+.ctf.vmlinux section in the same kernel module, and all built-in kernel modules
+have their types in .ctf.$module_name.  Non-built-in kernel modules just have a
+.ctf section containing their types, which again might need their parent set to
+"shared_ctf".  (Out-of-tree kernel modules will have no such parent.)
+
+Once you've set up the parenthood relationships you can call ctf_close() on the
+shared type repository and forget about it entirely: it will be refcounted and
+destroyed when all its children are closed.
+
+
+You should end up with a family of CTF files, one per kernel module built-in or
+not and one for the core kernel, freely usable for whatever purpose you need.
  
  
  Invocation and build-system connections
@@ -52,15 +77,12 @@ Invocation and build-system connections
  
  dwarf2ctf's command-line syntax emphasises simplicity over compactness.  Linux
  has nearly-infinitely-long command lines these days, so we can take advantage of
-this, as long as we take a bit of care in the makefiles to avoid using shell
-metacharacters on makefile lines that run dwarf2ctf, since many Linux shells
-still have command-line length limits and a coredumping shell can ruin your
-whole day.
+this.
  
  Two syntaxes are supported.  The first shares types across multiple modules and
  the core kernel; the second is used for out-of-tree module building, and avoids
-sharing anything at all across modules, nor depending on the set of shared types
-defined for the core kernel.
+either sharing anything at all across modules or depending on the set of shared
+types defined for the core kernel.
  
  
  dwarf2ctf outputdir objects.builtin modules.builtin dedup.blacklist \
@@ -108,11 +130,12 @@ Makefile, is dedicated to creating these files, and to linking them into the
  kernel modules.  The dependency graph related to dwarf2ctf output is quite
  complex: modules and objects (ld -r'ed *.o files) are processed by dwarf2ctf to
  produce a number of files in the .ctf directory, and the final modules depend on
-the relevant ctf files.  .mod.ctf's go into the .ko's with the same stem name,
-but ctf.ko receives content from all the CTF files corresponding to built-in
-modules, and until dwarf2ctf runs and creates those files we cannot tell what
-those CTF files will be, though we do have a wildcard that matches them all.
-
+the relevant ctf files.  The .mod.ctf's go into the .ko's with the same stem
+name, but ctf.ko receives content from all the CTF files corresponding to
+built-in modules, and until dwarf2ctf runs and creates those files we cannot
+tell what those CTF files will be, though we do have a wildcard that matches
+them all.
+[
  GNU Make's 'secondary expansion' feature comes to the rescue here: we can
  compute a list of expected CTF filenames at runtime, given the names of the
  modules we are linking in.  For the builtin modules, we cheat and touch a stamp
@@ -136,29 +159,6 @@ to generate a file with a one-byte null in it instead, and teach the users of
  CTF sections to treat a one-byte-long 'CTF' section as if it were empty.
  
  
-Using dwarf2ctf output
-----------------------
-
-Using this data is fairly simple.  Once you've read the CTF sections from the
-kernel modules and inflated them (or ignored them if they are empty or, as just
-mentioned, one byte long), you simply need to look at the ctf_parent_name() for
-each module, and if it is set to "ctf", call ctf_import() to set the parent of
-this module to the CTF data you have read from the .ctf.shared_ctf section in
-the ctf.ko kernel module.  The core kernel's types are stored in the
-.ctf.vmlinux section in the same kernel module, and all built-in kernel modules
-have their types in .ctf.$module_name.  Non-built-in kernel modules just have a
-.ctf section containing their types, which again might need their parent set to
-"shared_ctf".  (Out-of-tree kernel modules will have no such parent.)
-
-Once you've set up the parenthood relationships you can call ctf_close() on the
-shared type repository and forget about it entirely: it will be refcounted and
-destroyed when all its children are closed.
-
-
-You should end up with a family of CTF files, one per kernel module built-in or
-not and one for the core kernel, freely usable for whatever purpose you need.
-
-
  Overview of dwarf2ctf operation
  --------
  
@@ -172,8 +172,7 @@ by no means the largest one, so the extra complexity is probably not worth it.)
  dwarf2ctf uses several other libraries to do this:
  
   - elfutils, used for DWARF parsing.  We could potentially write our own
-   DWARF parser, but elfutils works and is already used by both DTrace and
-   SystemTap, so isn't even likely to introduce a new dependency.
+   DWARF parser, but elfutils works and is tested.
  
   - glib, used for the GHashTable.  The rest of the kernel uses roll-your-own
     hash tables, but dwarf2ctf makes heavy demands of its hashtables: they must
@@ -202,19 +201,103 @@ output, several gigabytes when run over an allyesconfig kernel.
  
  
  Unless you're interested in how dwarf2ctf works internally, you can stop reading
-her.
-
+here.  If you are interested, now is a good time to read the comments above
+main() in scripts/dwarf2ctf/dwarf2ctf.c, which briefly describe dwarf2ctf's data
+structures and functions.
+
+
+Flow of Control
+---------------
+
+The /* C comments */ point to other sections of this document,
+
+Functions named in the /* Utilities */ section of dwarf2ctf.c are not mentioned
+here for simplicity's sake.
+
+[C]: Callback
+[R]: recursive
+[1]: Numbers: Mutually-recursive loop
+|: Several functions which all call the same functions
+->: Call from array of callbacks (filter_ctf_*() omitted as uninteresting)
+
+main()
+ /* See 'Initialization' */
+ init_assembly_tab()
+ init_builtin()
+ init_blacklist()
+ run()
+   init_tu_to_modules()
+   init_ctf_table()
+
+   /* Duplicate detection */
+
+   scan_duplicates()
+     process_file()                       /* Toplevel DWARF walkers */
+[C]    detect_duplicates_init()
+[R]    process_tu_func()
+[C]      assembly_filter_tab[]
+[C]      detect_duplicates()
+[ 1]       mark_shared()
+[R]          type_id()                    /* Type IDs */
+[C1]           mark_shared()
+[R]        mark_seen_contained()
+[C]    detect_duplicates_done()
+
+     process_file()
+[C]    detect_duplicates_init()
+[R]    process_tu_func()
+[C]      assembly_filter_tab[]
+[C]      detect_duplicates_alias_fixup()
+[R]        type_id()
+[C]          is_named_struct_union_enum()
+[R]        type_id()
+[C]          detect_duplicates_alias_fixup_internal()
+               mark_shared() (see above)
+[C]    detect_duplicates_done()
+
+   /* CTF construction */
+
+   process_file()
+[R]  process_tu_func()
+[C]    assembly_filter_tab[]
+[C]    construct_ctf()
+[ 2]     construct_ctf_id()
+[R3]       die_to_ctf()
+             assembly_tab[]
+[C]           -> assemble_ctf_base()
+              -> assemble_ctf_pointer()
+               | assemble_ctf_array()
+               | assemble_ctf_array_dimension()
+               | assemble_ctf_typedef()
+               | assemble_ctf_cvr_qual()
+               | assemble_ctf_variable()
+                   lookup_ctf_type()
+[ 2]                 construct_ctf_id()
+              -> assemble_ctf_enumeration()
+              -> assemble_ctf_enumerator()
+              -> assemble_ctf_struct_union()
+              -> assemble_ctf_su_member()
+[ 3]               die_to_ctf()
+[ 2]               construct_ctf_id()
+
+   write_types()
  
  Initialization
  --------------
  
+ init_assembly_tab()
+ init_builtin()
+ init_blacklist()
+ run()
+   init_tu_to_modules()
+   init_ctf_table()
+
  This happens at the top of main() and run(), and in various functions named
-init_*() (init_assembly_tab(), init_builtin(), init_blacklist(), and
-init_tu_to_modules()).  Of these, init_assembly_tab() and init_builtin() serve
-only to turn various static arrays and files mentioned on the command line into
-more useful internal representations (e.g. the assembly filter array of
-structures is turned into a pair of arrays indexed by DWARF tag), and
-init_blacklist() is described in the section on duplicate type detection below.
+init_*().  Of these, init_assembly_tab() and init_builtin() serve only to turn
+various static arrays and files mentioned on the command line into more useful
+internal representations (e.g. the assembly filter array of structures is turned
+into a pair of arrays indexed by DWARF tag), and init_blacklist() is described
+in the section on duplicate type detection below.
  
  init_ctf_table(), called both at initialization time and later during CTF
  assembly when new CTF files are found to be needed, creates a new CTF file in
@@ -230,19 +313,25 @@ mentioned in the list of modules and built-in modules, constructing a mapping
  from translation unit name back to the name of the kernel module it comes from,
  even if that module is built in to the kernel.  This is normally the same as the
  filename (sans extension), but for built-in kernel modules, the name comes from
-the modules.builtin file's entry for the translation unit instead, since types
-belonging to a built-in module are considered to be in part of that module even
-if the module happens to be built in to the kernel, so that the output can land
-in a .builtin.ctf file rather than being jammed into vmlinux.builtin.ctf with
-the core kernel's types.  This means that dwarf2ctf can operate in terms of the
-kernel module a type is contained within rather than having to think about the
-mapping between object file name, translation unit name and module name all the
-time.
+the modules.builtin file's entry for the translation unit instead, so that the
+output can land in a .builtin.ctf file rather than being jammed into
+vmlinux.builtin.ctf with the core kernel's types.
+
+This means that dwarf2ctf can operate in terms of the kernel module a type is
+contained within rather than having to think about the mapping between object
+file name, translation unit name and module name all the time.
  
  
  Toplevel DWARF walkers
  ----------------------
  
+     process_file()
+[C]    (per-TU initialization callback)
+[R]    process_tu_func()
+[C]      assembly_filter_tab[]
+[C]      (per-DIE callback)
+[C]    (per-TU cleanup callback)
+
  All routines in dwarf2ctf other than initialization and writeout are DWARF
  walkers: i.e., they walk over all DWARF DIEs in all object files specified on
  the command line and do something with every DIE.  This job is done by
@@ -279,6 +368,9 @@ only).
  Type IDs
  --------
  
+[R]  type_id()
+[C]    (optional per-type callback)       
+
  The only thing dwarf2ctf does which the Sun tool does not is the detection of
  duplicate and shared types, both within individual kernel modules and across
  modules.  Our ultimate goal is that a type that appears in the source code once
@@ -374,6 +466,30 @@ probably be implemented only when DEBUG is not defined.)
  Duplicate detection
  -------------------
  
+   scan_duplicates()
+     process_file()                       /* Toplevel DWARF walkers */
+[C]    detect_duplicates_init()
+[R]    process_tu_func()
+[C]      assembly_filter_tab[]
+[C]      detect_duplicates()
+[ 1]       mark_shared()
+[R]          type_id()                    /* Type IDs */
+[C1]           mark_shared()
+[R]        mark_seen_contained()
+[C]    detect_duplicates_done()
+
+     process_file()
+[C]    detect_duplicates_init()
+[R]    process_tu_func()
+[C]      assembly_filter_tab[]
+[C]      detect_duplicates_alias_fixup()
+[R]        type_id()
+[C]          is_named_struct_union_enum()
+[R]        type_id()
+[C]          detect_duplicates_alias_fixup_internal()
+               mark_shared() (see above)
+[C]    detect_duplicates_done()
+
  The job of the duplicate detection pass is to fill out the id_to_module hash,
  which maps type IDs to the module they appear in, with the two special cases
  that types that appear only in the core kernel are said to appear in the module
@@ -563,6 +679,29 @@ possibly-shared types that will need blacklisting.)
  CTF construction
  ----------------
  
+   process_file()
+[R]  process_tu_func()
+[C]    assembly_filter_tab[]
+[C]    construct_ctf()
+[ 2]     construct_ctf_id()
+[R3]       die_to_ctf()
+             assembly_tab[]
+[C]           -> assemble_ctf_base()
+              -> assemble_ctf_pointer()
+               | assemble_ctf_array()
+               | assemble_ctf_array_dimension()
+               | assemble_ctf_typedef()
+               | assemble_ctf_cvr_qual()
+               | assemble_ctf_variable()
+                   lookup_ctf_type()
+[ 2]                 construct_ctf_id()
+              -> assemble_ctf_enumeration()
+              -> assemble_ctf_enumerator()
+              -> assemble_ctf_struct_union()
+              -> assemble_ctf_su_member()
+[ 3]               die_to_ctf()
+[ 2]               construct_ctf_id()
+
  The next stage after the detection of duplicate and cross-module shared types is
  to generate CTF.  We generate all CTF at once before emitting it: this is
  potentially somewhat wasteful of memory, but in practice has not proved to be a
@@ -604,10 +743,10 @@ The most important functions in this phase are:
     special type ID for 'void', so we special-case both of these cases.
  
  These functions are mostly straightforward (though highly recursive, with all
-three plus CTF construction functions participating in one recursive loop,
-die_to_ctf() calling itself directly and even one situation, the
+three plus CTF construction functions participating in loop 2 above,
+die_to_ctf() calling itself directly, and even one situation, the
  already-mentioned unnamed structures/unions, in which die_to_ctf() is directly
-called back by a CTF construction function.)
+called back by a CTF construction function, in loop 3 above.)
  
  
  There are a few subtleties, though.  Firstly, error handling.  We consider that
@@ -864,7 +1003,9 @@ instance of it with more members.
  Writeout
  --------
  
-This couldn't really be simpler.  We create an output directory with the
-requested name, then work over the entire module_to_ctf_file hash, writing out
-every CTF file into a new suitably-named file via zlib's compressed file I/O
-functions.
+   write_types()
+
+This couldn't really be simpler, as the trivial call graph shows.  We create an
+output directory with the requested name, then work over the entire
+module_to_ctf_file hash, writing out every CTF file into a new suitably-named
+file via zlib's compressed file I/O functions.
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost

index de50b794033ae400641f63f3acdaf6bedcae1f6e..085d27d6e8b75166dcee002d08cc59517607d521 100644 (file)
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -138,6 +138,11 @@ ifndef CONFIG_DT_DISABLE_CTF
  #
  # Out-of-tree module CTF gets its own per-module set of stamp files, since its
  # CTF is rebuilt independently.
+#
+# Warning: cmd_ctf can expand to an emormously long command line, long enough
+# that many shells dump core trying to parse it.  We must avoid most shell
+# metacharacters in the definition of cmd_ctf, whereupon GNU make will invoke
+# the command directly rather than going through $SHELL.
  
  ifeq ($(KBUILD_EXTMOD),)
  ctf-dir := .ctf
author	Nick Alcock <nick.alcock@oracle.com>
	Wed, 10 Oct 2012 13:23:31 +0000 (14:23 +0100)
committer	Nick Alcock <nick.alcock@oracle.com>
	Mon, 29 Jun 2015 21:41:39 +0000 (22:41 +0100)
Documentation/dwarf2ctf		patch \| blob \| history
scripts/Makefile.modpost		patch \| blob \| history