From: Matthew Wilcox Date: Fri, 21 Dec 2018 20:30:54 +0000 (-0500) Subject: Add some notes X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=39dbc1c28c6e7e8fb5cd733826cc7babb649b828;p=users%2Fjedix%2Flinux-maple.git Add some notes Not for upstream, obviously --- diff --git a/Thoughts b/Thoughts new file mode 100644 index 000000000000..b11f691a8f7d --- /dev/null +++ b/Thoughts @@ -0,0 +1,111 @@ +The Maple Tree squeezes various bits in at various points which aren't +necessarily obvious. Usually, this is done by observing that pointers are +N-byte aligned and thus the bottom log_2(N) bits are available for use. +We don't use the high bits of pointers to store additional information +because we don't know what bits are unused on any given architecture. + +Nodes are 128 bytes in size and are also aligned to 128 bytes, giving us +7 bits for our own purposes in each entry in an internal node. We need +to store the type of the node pointed to (enum maple_type, four bits), +and whether there are any unallocated slots anywhere below this node +(for implementing xa_alloc). That leaves two bits unused for now. + +The tree->ma_root slot pointer uses those two bits (if we end up needing +extra bits, we'll choose some functionality to drop from the root). +If the bottom two bits of the root entry have value '10', then the pointer +is a pointer to a node. Otherwise, the tree contains only a single entry +at index 0. If we attempt to store a single entry at index 0 which has +this pattern in its bottom two bits, we allocate a node and store the +entry in the node. + +state->alloc does not use the low 7 bits for storing the type of the +node pointed at (since it has no type yet). Instead it stores the +number of nodes which still need to be allocated. The first node allocated +is pointed to by state->alloc. Subsequent nodes allocated are pointed +to by state->alloc->slots[n]. + +state->node may also contain things which are not nodes. In this +case, bit 0 set indicates a tree location which is not a slot, bit +1 set indicates an error. Bit 2 (the 'unallocated slots' bit) is +never set. Bits 3-7 indicate the node type. This means that (for the +optimised iteration loop), the inline function can simply test (unsigned +long)state->node & 127 and bail to the out-of-line functions if it's +not 0. 'maple_dense' must remain as enum 0 for this to work. + +Leaf nodes do not store pointers to nodes, they store user data. +Users may store almost any bit pattern. As noted above, the optimisation +of storing an entry at 0 in the root pointer cannot be done for data +which have the bottom two bits set to '10'. We also reserve values +with the bottom two bits set to '10' which are below 4096 (ie 2, 6, +10 .. 4094) for internal use. Some APIs return errnos as a negative +errno shifted right by two bits and the bottom two bits set to '10', +and while choosing to store these values in the array is not an error, +it may lead to confusion if you're testing for an error with xa_is_err(). + +To summarise: Users may store any valid kernel pointer and any value +between 0 and LONG_MAX (encoded as an xa_value). Other values may or +may not work well. + +----- + +Inserting multiples of 5: + +Insert p0 at 0: Tree contains p0 at root. + +Insert p1 at 5: We allocate n0: +n0: (p0, 0, NULL, 4, p1, 5, NULL, 0) + +Insert p2 at 10: We append to n0: +n0: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10, NULL, 0) + +Insert p3 at 15: We append to n0: +n0: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10, NULL, 14, p3, 15, NULL) + +Insert p4 at 20: We allocate a replacement n0 as well as n1 and n2: +n0: (n1, 10, n2, 0xff..ff) +n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10) +n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 0) + +Insert p5 at 25: We append to n2: +n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25, NULL, 0) + +Insert p6 at 30: We allocate a replacement n2 and n3 and add to n0: +n0: (n1, 10, n2, 20, n3, 0xff..ff) +n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10) +n2: (NULL, 14, p3, 15, NULL, 19, p4, 20) +n3: (NULL, 24, p5, 25, NULL, 29, p6, 30, NULL, 0) + +Insert p7 at 35: We add to n3: +n3: (NULL, 24, p5, 25, NULL, 29, p6, 30, NULL, 34, p7, 35, NULL, 0) + +Insert p8 at 3. We allocate a replacement n1: +n1: (p0, 0, NULL, 2, p8, 3, NULL, 4, p1, 5, NULL, 9, p2, 10) + +Insert p9 at 4. It already has a slot open for it: +n1: (p0, 0, NULL, 2, p8, 3, p9, 4, p1, 5, NULL, 9, p2, 10) + +Insert p10 at 1. We allocate a replacement n1: +n1: (p0, 0, p10, 1, NULL, 2, p8, 3, p9, 4, p1, 5, NULL, 9, p2) + +Insert p11 at 6. We allocate a replacement n1, a new n4 and a replacement n0: +n0: (n1, 6, n4, 10, n2, 20, n3, 0xff..ff) +n1: (p0, 0, p10, 1, NULL, 2, p8, 3, p9, 4, p1, 5, p11, 6, NULL) +n4: (NULL, 9, p2, 10) +(yes, n4 violates the minimum occupancy requirement of a B-tree, but that's +no worse than violating the minimum span requirement of a Maple Tree in +terms of number of nodes allocated, height of tree or number of cachelines +accessed. We might choose to merge n4 with n2 in this specific instance). + +We considered this alternative: + +# Insert p6 at 30: We append to n2 again and change n0: +# n0: (n1, 10, n2, 30, NULL, 0) +# n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25, NULL, 29, p6) + +# Insert p7 at 35: We allocate a replacement n2 and n3 and change n0 right-to-left +# n0: (n1, 10, n2, 25, n3, 0xff..ff) +# n1: (p0, 0, NULL, 4, p1, 5, NULL, 9, p2, 10) +# n2: (NULL, 14, p3, 15, NULL, 19, p4, 20, NULL, 24, p5, 25) +# n3: (NULL, 29, p6, 30, NULL, 34, p7, 35, NULL, 0) + +but decided against it because it makes range32s harder.