From 5d534f331de2a218f85b8bca2b3aac9b8e011421 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox Date: Wed, 2 Jan 2019 16:18:41 -0500 Subject: [PATCH] More Thoughts Mostly relating to auxiliary data Signed-off-by: Matthew Wilcox --- Thoughts | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 54 insertions(+), 5 deletions(-) diff --git a/Thoughts b/Thoughts index 588b331ae5bf..e9fa08c2b427 100644 --- a/Thoughts +++ b/Thoughts @@ -14,9 +14,16 @@ Tree Root If the tree contains a single entry at index 0, it is usually stored in tree->ma_root. To optimise for the page cache, an entry which ends in -'00', '01' or '11' is stored in the root, but an entry which ends in -'10' will be stored in a node. Bit 2 is set if there are any NULL -entries in the tree. Bits 3-6 are used to store enum maple_type. +'00', '01' or '11' is stored in the root, but an entry which ends in '10' +will be stored in a node. Bit 2 is set if there are any NULL entries +in the tree. Bits 3-6 are used to store enum maple_type. + +The flags are used both to store some immutable inforation about this +tree (set at tree creation time) and dynamic information set under +the spinlock. Needed flags: irq/bh safe locking, whether to support +'reserved', whether to track max-range-free, whether to reap nodes which +only contain value entries, whether slot 0 is busy, how many mark bits +to support (5 bits), 18 mark bits (total 29 bits). Node Slots ---------- @@ -47,8 +54,39 @@ Non-root nodes can only have maple_range_* nodes as parents, so we only need to distinguish between range_16, range_32 and range_64. With 32-bit pointers, we can store up to 11 pointers in a range_64, 16 in a range_32 and 21 in a range_16, so encode a range_16 as 00 (Bits 2-6 encode the -the slot number), encode range_32 as 010 (Bits 3-6 encode the slot -number) and range_64 as 110 (Bits 3-6 encode the slot number). +slot number), encode range_32 as 010 (Bits 3-6 encode the slot number) +and range_64 as 110 (Bits 3-6 encode the slot number). + +Auxiliary Data +-------------- + +At tree creation time, the user can specify that they're willing to +trade off storing fewer entries in a tree in return for storing more +information in each node. + +The maple tree supports marking entries and searching for entries which +are marked. In order to do this efficiently, it has to store one bit per +mark per slot in each node. In the absence of auxiliary data, there are +no bits available in the range64 node, so we have to sacrifice a slot, and +so we may as well sacrifice a pivot. That gives us 128 bits for 7 slots, +which lets us support up to 18 mark bits. For range32 nodes, we normally +have 10 slots and 4 bytes free. That lets us support 3 mark bits without +giving up any slots, and once we've given up one slot plus one pivot, we +can support up to 14 mark bits. For range16 nodes, we normally have 12 +slots and 2 bytes free which lets us store 1 mark bit without giving up +any slots. Giving up one slot frees 10 bytes, which lets us store 8 mark +bits, and giving up two slots lets us store 17 mark bits. Dense nodes +can store 4 bits by giving up one slot, 9 bits by giving up two slots, +16 bits by giving up three slots and 23 bits by giving up four slots. + +Another use-case for auxiliary data is to record the largest range of +NULL entries available in this node. This optimises the tree for +allocating a range. + +We do not currently know of any users who want to use both marks and +max-free-range auxiliary data; for ease of implementation, only one of the +two is supported, but should a user show up, it would be straightforward +to add support in the future. Maple State ----------- @@ -133,3 +171,14 @@ We considered this alternative: # n3: (NULL, 29, p6, 30, NULL, 34, p7, 35, NULL, 0) but decided against it because it makes range32s harder. + + +Deep Thoughts +============= + +There are data structures in the kernel which provide support for +overlapping ranges. The XArray / Maple Tree have no support for this +idiom, but maybe this data structure could be enhanced to support those. +If we want to be rid of the rbtree, we'll need to do something like this. +(Examples: DRBD, x86 PAT. file locks might also benefit from being +converted from a linked list) -- 2.50.1