Add some notes

author Matthew Wilcox <willy@infradead.org>

Fri, 21 Dec 2018 20:30:54 +0000 (15:30 -0500)

committer Liam R. Howlett <Liam.Howlett@Oracle.com>

Wed, 31 Jul 2019 14:52:33 +0000 (10:52 -0400)
author Matthew Wilcox <willy@infradead.org>
Fri, 21 Dec 2018 20:30:54 +0000 (15:30 -0500)
committer Liam R. Howlett <Liam.Howlett@Oracle.com>
Wed, 31 Jul 2019 14:52:33 +0000 (10:52 -0400)
diff --git a/Thoughts b/Thoughts

new file mode 100644 (file)

index 0000000..b11f691
--- /dev/null
+++ b/Thoughts
@@ -0,0 +1,111 @@
+The Maple Tree squeezes various bits in at various points which aren't
+necessarily obvious.  Usually, this is done by observing that pointers are
+N-byte aligned and thus the bottom log_2(N) bits are available for use.
+We don't use the high bits of pointers to store additional information
+because we don't know what bits are unused on any given architecture.
+
+Nodes are 128 bytes in size and are also aligned to 128 bytes, giving us
+7 bits for our own purposes in each entry in an internal node.  We need
+to store the type of the node pointed to (enum maple_type, four bits),
+and whether there are any unallocated slots anywhere below this node
+(for implementing xa_alloc).  That leaves two bits unused for now.
+
+The tree->ma_root slot pointer uses those two bits (if we end up needing
+extra bits, we'll choose some functionality to drop from the root).
+If the bottom two bits of the root entry have value '10', then the pointer
+is a pointer to a node.  Otherwise, the tree contains only a single entry
+at index 0.  If we attempt to store a single entry at index 0 which has
+this pattern in its bottom two bits, we allocate a node and store the
+entry in the node.
+
+state->alloc does not use the low 7 bits for storing the type of the
+node pointed at (since it has no type yet).  Instead it stores the
+number of nodes which still need to be allocated.  The first node allocated
+is pointed to by state->alloc.  Subsequent nodes allocated are pointed
+to by state->alloc->slots[n].
+
+state->node may also contain things which are not nodes.  In this
+case, bit 0 set indicates a tree location which is not a slot, bit
+1 set indicates an error.  Bit 2 (the 'unallocated slots' bit) is
+never set.  Bits 3-7 indicate the node type.  This means that (for the
+optimised iteration loop), the inline function can simply test (unsigned
+long)state->node & 127 and bail to the out-of-line functions if it's
+not 0.  'maple_dense' must remain as enum 0 for this to work.
+
+Leaf nodes do not store pointers to nodes, they store user data.
+Users may store almost any bit pattern.  As noted above, the optimisation
+of storing an entry at 0 in the root pointer cannot be done for data
+which have the bottom two bits set to '10'.  We also reserve values
+with the bottom two bits set to '10' which are below 4096 (ie 2, 6,
+10 .. 4094) for internal use.  Some APIs return errnos as a negative
+errno shifted right by two bits and the bottom two bits set to '10',
+and while choosing to store these values in the array is not an error,
+it may lead to confusion if you're testing for an error with xa_is_err().
+
+To summarise: Users may store any valid kernel pointer and any value
+between 0 and LONG_MAX (encoded as an xa_value).  Other values may or
+may not work well.
+
+-----
+
+Inserting multiples of 5:
+
+Insert p0 at 0: Tree contains p0 at root.
+
+Insert p1 at 5: We allocate n0:
+n0: (p0, 0, NULL, 4, p1, 5, NULL, 0)
+
+Insert p2 at 10: We append to n0:
+n0: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10, NULL, 0)
+
+Insert p3 at 15: We append to n0:
+n0: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10, NULL, 14, p3, 15, NULL)
+
+Insert p4 at 20: We allocate a replacement n0 as well as n1 and n2:
+n0: (n1, 10, n2, 0xff..ff)
+n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10)
+n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 0)
+
+Insert p5 at 25: We append to n2:
+n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25, NULL, 0)
+
+Insert p6 at 30: We allocate a replacement n2 and n3 and add to n0:
+n0: (n1, 10, n2, 20, n3, 0xff..ff)
+n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10)
+n2: (NULL, 14, p3, 15, NULL, 19, p4, 20)
+n3: (NULL, 24, p5, 25, NULL, 29, p6, 30, NULL, 0)
+
+Insert p7 at 35: We add to n3:
+n3: (NULL, 24, p5, 25, NULL, 29, p6, 30, NULL, 34, p7, 35, NULL, 0)
+
+Insert p8 at 3.  We allocate a replacement n1:
+n1: (p0, 0, NULL, 2, p8, 3, NULL, 4, p1, 5, NULL, 9, p2, 10)
+
+Insert p9 at 4.  It already has a slot open for it:
+n1: (p0, 0, NULL, 2, p8, 3, p9, 4, p1, 5, NULL, 9, p2, 10)
+
+Insert p10 at 1.  We allocate a replacement n1:
+n1: (p0, 0, p10, 1, NULL, 2, p8, 3, p9, 4, p1, 5, NULL, 9, p2)
+
+Insert p11 at 6.  We allocate a replacement n1, a new n4 and a replacement n0:
+n0: (n1, 6, n4, 10, n2, 20, n3, 0xff..ff)
+n1: (p0, 0, p10, 1, NULL, 2, p8, 3, p9, 4, p1, 5, p11, 6, NULL)
+n4: (NULL, 9, p2, 10)
+(yes, n4 violates the minimum occupancy requirement of a B-tree, but that's
+no worse than violating the minimum span requirement of a Maple Tree in
+terms of number of nodes allocated, height of tree or number of cachelines
+accessed.  We might choose to merge n4 with n2 in this specific instance).
+
+We considered this alternative:
+
+# Insert p6 at 30: We append to n2 again and change n0:
+# n0: (n1, 10, n2, 30, NULL, 0)
+# n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25, NULL, 29, p6)
+
+# Insert p7 at 35: We allocate a replacement n2 and n3 and change n0 right-to-left
+# n0: (n1, 10, n2, 25, n3, 0xff..ff)
+# n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10)
+# n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25)
+# n3: (NULL, 29, p6, 30, NULL, 34, p7, 35, NULL, 0)
+
+but decided against it because it makes range32s harder.
author	Matthew Wilcox <willy@infradead.org>
	Fri, 21 Dec 2018 20:30:54 +0000 (15:30 -0500)
committer	Liam R. Howlett <Liam.Howlett@Oracle.com>
	Wed, 31 Jul 2019 14:52:33 +0000 (10:52 -0400)