From: Liu Bo Date: Thu, 21 Dec 2017 05:05:19 +0000 (-0700) Subject: Btrfs: fix unexpected EEXIST from btrfs_get_extent X-Git-Tag: v4.1.12-124.31.3~1208 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=2017b656451c5ec3579f58361b8e23f391d9bbf9;p=users%2Fjedix%2Flinux-maple.git Btrfs: fix unexpected EEXIST from btrfs_get_extent Orabug: 27446668 This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ More details about how t2(dio read/write) runs into -EEXIST: When add_extent_mapping() gets -EEXIST for adding em [0, 32k), search_extent_mapping() would return [0, 8k) as existing em, even though start == existing->start, em is [0, 32k) and extent_map_end(em) > extent_map_end(existing), ie. 32k > 8k, then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0), and btrfs_get_extent() ends up returning -EEXIST, and dio read/write will get -EEXIST which is confusing applications. Here I also concluded all possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start After going thru the above case by case, it turns out that if start is within existing em (front inclusive), then the existing em should be returned, otherwise, we try our best to merge candidate em with sibling ems to form a larger em. Reported-by: David Vallender Signed-off-by: Liu Bo Reviewed-by: Anand Jain --- diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 27831a4aa8cb..ccf83d1f5c79 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6945,19 +6945,12 @@ insert: * existing will always be non-NULL, since there must be * extent causing the -EEXIST. */ - if (existing->start == em->start && - extent_map_end(existing) >= extent_map_end(em) && - em->block_start == existing->block_start) { - /* - * The existing extent map already encompasses the - * entire extent map we tried to add. - */ + if (start >= existing->start && + start < extent_map_end(existing)) { free_extent_map(em); em = existing; err = 0; - - } else if (start >= extent_map_end(existing) || - start <= existing->start) { + } else { /* * The existing extent map is the one nearest to * the [start, start + len) range which overlaps @@ -6972,10 +6965,6 @@ insert: em = NULL; } free_extent_map(existing); - } else { - free_extent_map(em); - em = existing; - err = 0; } } write_unlock(&em_tree->lock);