]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 years agobuckets.c fixups XXX squash
Kent Overstreet [Mon, 29 Mar 2021 00:56:25 +0000 (20:56 -0400)]
buckets.c fixups XXX squash

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add repair code for out of order keys in a btree node.
Kent Overstreet [Mon, 29 Mar 2021 04:19:05 +0000 (00:19 -0400)]
bcachefs: Add repair code for out of order keys in a btree node.

This just drops the offending key - in the bug report where this was
seen, it was clearly a single bit memory error, and fsck will fix the
missing key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Free iterator in bch2_btree_delete_range_trans()
Kent Overstreet [Mon, 29 Mar 2021 01:20:22 +0000 (21:20 -0400)]
bcachefs: Free iterator in bch2_btree_delete_range_trans()

This is specifically to speed up bch2_inode_rm(), so that we're not
traversing iterators we're done with.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Have journal reclaim thread flush more aggressively
Kent Overstreet [Mon, 29 Mar 2021 00:57:59 +0000 (20:57 -0400)]
bcachefs: Have journal reclaim thread flush more aggressively

This adds a new watermark for the journal reclaim when flushing btree
key cache entries - it should try and stay ahead of where foreground
threads doing transaction commits will enter direct journal reclaim.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't use bch2_inode_find_by_inum() in move.c
Kent Overstreet [Tue, 16 Mar 2021 22:08:10 +0000 (18:08 -0400)]
bcachefs: Don't use bch2_inode_find_by_inum() in move.c

Since move.c isn't aware of what subvolume we're in, we can't use the
standard inode lookup code - fortunately, we're just using it for
reading IO options.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Change inode allocation code for snapshots
Kent Overstreet [Mon, 15 Mar 2021 23:18:30 +0000 (19:18 -0400)]
bcachefs: Change inode allocation code for snapshots

For snapshots, when we allocate a new inode we want to allocate an inode
number that isn't in use in any other subvolume. We won't be able to use
ITER_SLOTS for this, inode allocation needs to change to use
BTREE_ITER_ALL_SNAPSHOTS.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Inode backpointers
Kent Overstreet [Tue, 2 Mar 2021 23:35:30 +0000 (18:35 -0500)]
bcachefs: Inode backpointers

This patch adds two new inode fields, bi_dir and bi_dir_offset, that
point back to the inode's dirent.

Since we're only adding fields for a single backpointer, files that have
been hardlinked won't necessarily have valid backpointers: we also add a
new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has
ever had multiple links to it. That's ok, because we only really need
this functionality for directories, which can never have multiple
hardlinks - when we add subvolumes, we'll need a way to enemurate and
print subvolumes, and this will let us reconstruct a path to a subvolume
root given a subvolume root inode.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Start using bpos.snapshot field
Kent Overstreet [Wed, 24 Mar 2021 22:02:16 +0000 (18:02 -0400)]
bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Split out bpos_cmp() and bkey_cmp()
Kent Overstreet [Thu, 4 Mar 2021 21:20:16 +0000 (16:20 -0500)]
bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a mechanism for running callbacks at trans commit time
Kent Overstreet [Thu, 4 Feb 2021 02:51:56 +0000 (21:51 -0500)]
bcachefs: Add a mechanism for running callbacks at trans commit time

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree key cache locking improvements
Kent Overstreet [Thu, 25 Mar 2021 03:37:33 +0000 (23:37 -0400)]
bcachefs: btree key cache locking improvements

The btree key cache mutex was becoming a significant bottleneck - it was
mainly used to protect the lists of dirty, clean and freed cached keys.

This patch eliminates the dirty and clean lists - instead, when we need
to scan for keys to drop from the cache we iterate over the rhashtable,
and thus we're able to remove most uses of that lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify btree_node_iter_init_pack_failed()
Kent Overstreet [Sun, 28 Mar 2021 01:00:26 +0000 (21:00 -0400)]
bcachefs: Simplify btree_node_iter_init_pack_failed()

Since we now make sure to always generate packed bkey formats that can
pack the min_key of a btree node, this path should actually never
happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to
Kent Overstreet [Sun, 28 Mar 2021 00:58:57 +0000 (20:58 -0400)]
bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to

When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix packed bkey format calculation for new btree roots
Kent Overstreet [Sat, 27 Mar 2021 00:29:04 +0000 (20:29 -0400)]
bcachefs: Fix packed bkey format calculation for new btree roots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix building of aux search trees
Kent Overstreet [Sat, 27 Mar 2021 00:10:59 +0000 (20:10 -0400)]
bcachefs: Fix building of aux search trees

We weren't packing the min/max keys, which was a major oversight and
completely disabled generating bkey_floats for adjacent nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Generate better bkey formats when splitting nodes
Kent Overstreet [Sat, 27 Mar 2021 00:08:56 +0000 (20:08 -0400)]
bcachefs: Generate better bkey formats when splitting nodes

On btree node split, we weren't ensuring the min_key of the new larger
node packs in the new format for this node. This triggers some painful
slowpaths in the bset.c aux search tree code - this patch fixes that by
calculating a new format for the new node with the new min_key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop bkey noops
Kent Overstreet [Thu, 25 Mar 2021 00:22:51 +0000 (20:22 -0400)]
bcachefs: Drop bkey noops

Bkey noops were introduced to deal with trimming inline data extents in
place in the btree: if the u64s field of a bkey was 0, that u64 was a
noop and we'd start looking for the next bkey immediately after it.

But extent handling has been lifted above the btree - we no longer
modify existing extents in place in the btree, and the compatibilty code
for old style extent btree nodes is gone, so we can completely drop this
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Increase default journal size
Kent Overstreet [Thu, 25 Mar 2021 02:49:05 +0000 (22:49 -0400)]
bcachefs: Increase default journal size

The default was 1/256th of the device and capped at 512MB, which is
fairly tiny these days.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use pcpu mode of six locks for interior nodes
Kent Overstreet [Wed, 24 Mar 2021 03:52:27 +0000 (23:52 -0400)]
bcachefs: Use pcpu mode of six locks for interior nodes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse()
Kent Overstreet [Wed, 24 Mar 2021 01:22:50 +0000 (21:22 -0400)]
bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse()

External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve inode deletion code
Kent Overstreet [Mon, 22 Mar 2021 02:01:12 +0000 (22:01 -0400)]
bcachefs: Improve inode deletion code

It had some silly redundancies.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add an .invalid method for bch2_btree_ptr_v2
Kent Overstreet [Mon, 22 Mar 2021 21:23:30 +0000 (17:23 -0400)]
bcachefs: Add an .invalid method for bch2_btree_ptr_v2

It was using the method for btree_ptr_v1, but that wasn't checking all
the fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Include snapshot field in bch2_bpos_to_text
Kent Overstreet [Mon, 22 Mar 2021 19:50:02 +0000 (15:50 -0400)]
bcachefs: Include snapshot field in bch2_bpos_to_text

More prep work for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Update iter->real_pos lazily
Kent Overstreet [Mon, 22 Mar 2021 01:16:52 +0000 (21:16 -0400)]
bcachefs: Update iter->real_pos lazily

peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()
Kent Overstreet [Sun, 21 Mar 2021 23:43:31 +0000 (19:43 -0400)]
bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()

Ideally we'll be getting rid of peek_with_updates(), but the callers
will need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve iter->real_pos handling
Kent Overstreet [Sun, 21 Mar 2021 23:32:01 +0000 (19:32 -0400)]
bcachefs: Improve iter->real_pos handling

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Internal btree iterator renaming
Kent Overstreet [Sun, 21 Mar 2021 23:22:58 +0000 (19:22 -0400)]
bcachefs: Internal btree iterator renaming

This just gives some internal helpers some better names.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill btree_iter_peek_uptodate()
Kent Overstreet [Sun, 21 Mar 2021 21:01:34 +0000 (17:01 -0400)]
bcachefs: Kill btree_iter_peek_uptodate()

Since we're no longer doing next() immediately followed by peek(), this
optimization isn't doing anything anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Iterators are now always consistent with iter->real_pos
Kent Overstreet [Sun, 21 Mar 2021 21:09:55 +0000 (17:09 -0400)]
bcachefs: Iterators are now always consistent with iter->real_pos

This means bch2_btree_iter_traverse_one() can be made more efficient.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos()
Kent Overstreet [Sun, 21 Mar 2021 22:09:02 +0000 (18:09 -0400)]
bcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos()

btree node iterators need to obey the regular btree node invarionts
w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have
less that it needs to check.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance
Kent Overstreet [Sun, 21 Mar 2021 20:55:25 +0000 (16:55 -0400)]
bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Get disk reservation when overwriting data in old snapshot
Kent Overstreet [Sun, 21 Mar 2021 04:03:34 +0000 (00:03 -0400)]
bcachefs: Get disk reservation when overwriting data in old snapshot

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Switch extent_handle_overwrites() to one key at a time
Kent Overstreet [Sun, 21 Mar 2021 01:04:57 +0000 (21:04 -0400)]
bcachefs: Switch extent_handle_overwrites() to one key at a time

Prep work for snapshots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Optimize bch2_btree_iter_verify_level()
Kent Overstreet [Sun, 21 Mar 2021 02:13:30 +0000 (22:13 -0400)]
bcachefs: Optimize bch2_btree_iter_verify_level()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix iterator picking
Kent Overstreet [Sun, 21 Mar 2021 02:05:39 +0000 (22:05 -0400)]
bcachefs: Fix iterator picking

comparison was wrong

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't unconditially version_upgrade in initialize
Kent Overstreet [Sun, 21 Mar 2021 20:20:40 +0000 (16:20 -0400)]
bcachefs: Don't unconditially version_upgrade in initialize

This is mkfs's job. Also, clean up the handling of feature bits some.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Validate bset version field against sb version fields
Kent Overstreet [Sun, 21 Mar 2021 20:03:23 +0000 (16:03 -0400)]
bcachefs: Validate bset version field against sb version fields

The superblock version fields need to be accurate to know whether a
filesystem is supported, thus we should be verifying them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't overwrite snapshot field in bch2_cut_back()
Kent Overstreet [Fri, 19 Mar 2021 20:37:24 +0000 (16:37 -0400)]
bcachefs: Don't overwrite snapshot field in bch2_cut_back()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill bkey ops->debugcheck method
Kent Overstreet [Sat, 20 Mar 2021 03:19:05 +0000 (23:19 -0400)]
bcachefs: Kill bkey ops->debugcheck method

This code used to be used for running some assertions on alloc info at
runtime, but it long predates fsck and hasn't been good for much in
ages - we can delete it now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Assert that iterators aren't being double freed
Kent Overstreet [Sat, 20 Mar 2021 00:40:31 +0000 (20:40 -0400)]
bcachefs: Assert that iterators aren't being double freed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Require all btree iterators to be freed
Kent Overstreet [Sat, 20 Mar 2021 00:29:11 +0000 (20:29 -0400)]
bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_iter_set_dontneed()
Kent Overstreet [Sat, 20 Mar 2021 02:54:18 +0000 (22:54 -0400)]
bcachefs: btree_iter_set_dontneed()

This is a bit clearer than using bch2_btree_iter_free().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fsck code refactoring
Kent Overstreet [Sat, 20 Mar 2021 02:34:54 +0000 (22:34 -0400)]
bcachefs: Fsck code refactoring

Change fsck code to always put btree iterators - also, make some flow
control improvements to deal with lock restarts better, and refactor
check_extents() to not walk extents twice for counting/checking
i_sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix btree iterator leak in extent_handle_overwrites()
Kent Overstreet [Fri, 19 Mar 2021 20:32:46 +0000 (16:32 -0400)]
bcachefs: Fix btree iterator leak in extent_handle_overwrites()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't list non journal devs in journal_debug_to_text()
Kent Overstreet [Fri, 19 Mar 2021 20:30:01 +0000 (16:30 -0400)]
bcachefs: Don't list non journal devs in journal_debug_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a print statement for when we go read-write
Kent Overstreet [Fri, 19 Mar 2021 17:23:01 +0000 (13:23 -0400)]
bcachefs: Add a print statement for when we go read-write

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill btree_iter_pos_changed()
Kent Overstreet [Tue, 16 Mar 2021 05:52:55 +0000 (01:52 -0400)]
bcachefs: Kill btree_iter_pos_changed()

this is used in only one place now, so just inline it into the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a btree iterator leak
Kent Overstreet [Tue, 16 Mar 2021 01:18:50 +0000 (21:18 -0400)]
bcachefs: Fix a btree iterator leak

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill reflink option
Kent Overstreet [Mon, 15 Mar 2021 21:26:19 +0000 (17:26 -0400)]
bcachefs: Kill reflink option

An option was added to control whether reflink support was on or off
because for a long time, reflink + inline data extent support was
missing - but that's since been fixed, so we can drop the option now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix read retry path for indirect extents
Kent Overstreet [Mon, 15 Mar 2021 01:30:08 +0000 (21:30 -0400)]
bcachefs: Fix read retry path for indirect extents

In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Consolidate bch2_read_retry and bch2_read()
Kent Overstreet [Sat, 13 Mar 2021 01:29:28 +0000 (20:29 -0500)]
bcachefs: Consolidate bch2_read_retry and bch2_read()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill ei_str_hash
Kent Overstreet [Tue, 2 Mar 2021 23:35:30 +0000 (18:35 -0500)]
bcachefs: Kill ei_str_hash

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use __bch2_trans_do() in a few more places
Kent Overstreet [Fri, 12 Mar 2021 22:52:42 +0000 (17:52 -0500)]
bcachefs: Use __bch2_trans_do() in a few more places

Minor cleanup, it was being open coded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Have fsck check for stripe pointers matching stripe
Kent Overstreet [Fri, 12 Mar 2021 21:55:28 +0000 (16:55 -0500)]
bcachefs: Have fsck check for stripe pointers matching stripe

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix locking in bch2_btree_iter_traverse_cached()
Kent Overstreet [Mon, 8 Mar 2021 22:09:13 +0000 (17:09 -0500)]
bcachefs: Fix locking in bch2_btree_iter_traverse_cached()

bch2_btree_iter_traverse() is supposed to ensure we have the correct
type of lock - it was downgrading if necessary, but if we entered with a
read lock it wasn't upgrading to an intent lock, oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS
Kent Overstreet [Sat, 20 Feb 2021 01:44:55 +0000 (20:44 -0500)]
bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify bch2_btree_iter_peek_prev()
Kent Overstreet [Fri, 5 Mar 2021 03:40:41 +0000 (22:40 -0500)]
bcachefs: Simplify bch2_btree_iter_peek_prev()

Since we added iter->real_pos, btree_iter_set_pos_to_(next|prev)_leaf no
longer modify iter->pos, so we don't have to save it at the start
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify for_each_btree_key()
Kent Overstreet [Fri, 5 Mar 2021 03:11:28 +0000 (22:11 -0500)]
bcachefs: Simplify for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix compat code for superblock
Kent Overstreet [Fri, 5 Mar 2021 00:06:26 +0000 (19:06 -0500)]
bcachefs: Fix compat code for superblock

The bkey compat code wasn't being run for btree roots in the superblock
clean section - this patch fixes it to use the journal entry validate
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bpos_diff()
Kent Overstreet [Thu, 4 Mar 2021 21:26:19 +0000 (16:26 -0500)]
bcachefs: Fix bpos_diff()

Previously, bpos_diff() did not handle borrows correctly. Minor thing
considering how it was used, but worth fixing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use bch2_bpos_to_text() more consistently
Kent Overstreet [Thu, 4 Mar 2021 20:20:22 +0000 (15:20 -0500)]
bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_iter_prev_slot()
Kent Overstreet [Wed, 3 Mar 2021 03:45:28 +0000 (22:45 -0500)]
bcachefs: btree_iter_prev_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Delete some dead code
Kent Overstreet [Wed, 3 Mar 2021 17:10:49 +0000 (12:10 -0500)]
bcachefs: Delete some dead code

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_iter_live()
Kent Overstreet [Sun, 21 Feb 2021 03:19:34 +0000 (22:19 -0500)]
bcachefs: btree_iter_live()

New helper to clean things up a bit - also, improve iter->flags
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve handling of extents in bch2_trans_update()
Kent Overstreet [Sun, 21 Feb 2021 01:51:57 +0000 (20:51 -0500)]
bcachefs: Improve handling of extents in bch2_trans_update()

The transaction update/commit path cares about whether it's inserting
extents or regular keys; extents require extra passes (handling of
overlapping extents) but sometimes we want to skip all that. This
clarifies things by adding a new member to btree_insert_entry specifying
whether the key being inserted is an extent, instead of overloading
BTREE_ITER_IS_EXTENTS.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use x-macros for more enums
Kent Overstreet [Sun, 21 Feb 2021 00:47:58 +0000 (19:47 -0500)]
bcachefs: Use x-macros for more enums

This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Rename BTREE_ID enums for consistency with other enums
Kent Overstreet [Sun, 21 Feb 2021 00:27:37 +0000 (19:27 -0500)]
bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Rename KEY_TYPE_whiteout -> KEY_TYPE_hash_whiteout
Kent Overstreet [Sun, 21 Feb 2021 00:09:53 +0000 (19:09 -0500)]
bcachefs: Rename KEY_TYPE_whiteout -> KEY_TYPE_hash_whiteout

Snapshots are going to need a different whiteout key type. Also, switch
to using BCH_BKEY_TYPES() to define the bkey value accessors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: KEY_TYPE_discard is no longer used
Kent Overstreet [Sat, 20 Feb 2021 04:41:40 +0000 (23:41 -0500)]
bcachefs: KEY_TYPE_discard is no longer used

KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()
Kent Overstreet [Sat, 20 Feb 2021 05:00:23 +0000 (00:00 -0500)]
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()

bcachefs has been aggressively migrating filesystems and btree nodes to
the new format for quite some time - this shouldn't affect anyone
anymore, and lets us delete a _lot_ of code. Also, it frees up
KEY_TYPE_discard for a new whiteout key type for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_btree_cache_scan()
Kent Overstreet [Tue, 28 Dec 2021 03:11:54 +0000 (22:11 -0500)]
bcachefs: Fix bch2_btree_cache_scan()

It was counting nodes on the freed list that it skips - because we want
to leave a few so that btree splits don't touch the allocator - as nodes
that it touched, meaning that if it was called with <= 3 nodes to
reclaim, and those nodes were on the freed list, it would never do any
work.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a mempool for the replicas delta list
Kent Overstreet [Sat, 24 Apr 2021 04:24:25 +0000 (00:24 -0400)]
bcachefs: Add a mempool for the replicas delta list

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a mempool for btree_trans bump allocator
Kent Overstreet [Sat, 24 Apr 2021 04:09:06 +0000 (00:09 -0400)]
bcachefs: Add a mempool for btree_trans bump allocator

This allocation is required for filesystem operations to make forward
progress, thus needs a mempool.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Start journal reclaim thread earlier
Kent Overstreet [Mon, 21 Jun 2021 20:30:52 +0000 (16:30 -0400)]
bcachefs: Start journal reclaim thread earlier

Especially in userspace, we sometime run into resource exhaustion issues
with starting up threads after mark and sweep/fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for copygc getting stuck waiting for reserve to be filled
Kent Overstreet [Sun, 18 Apr 2021 22:01:49 +0000 (18:01 -0400)]
bcachefs: Fix for copygc getting stuck waiting for reserve to be filled

This fixes a regression from the patch
  bcachefs: Fix copygc dying on startup

In general only the allocator thread itself should be updating
ca->allocator_state, the thread waking up the allocator setting it is an
ugly hack only needed to avoid racing with the copygc threads when we're
first starting up.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add allocator thread state to sysfs
Kent Overstreet [Sun, 18 Apr 2021 21:54:56 +0000 (17:54 -0400)]
bcachefs: Add allocator thread state to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Rip out copygc pd controller
Kent Overstreet [Sun, 18 Apr 2021 00:24:54 +0000 (20:24 -0400)]
bcachefs: Rip out copygc pd controller

We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add copygc wait to sysfs
Kent Overstreet [Tue, 13 Apr 2021 18:45:55 +0000 (14:45 -0400)]
bcachefs: Add copygc wait to sysfs

Currently debugging an issue with copygc not running when it's supposed
to, and this is an obvious first step.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix copygc threshold
Kent Overstreet [Tue, 13 Apr 2021 13:49:23 +0000 (09:49 -0400)]
bcachefs: Fix copygc threshold

Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't drop ptrs to btree nodes
Kent Overstreet [Fri, 16 Apr 2021 22:59:54 +0000 (18:59 -0400)]
bcachefs: Don't drop ptrs to btree nodes

If a ptr gen doesn't match the bucket gen, the bucket likely doesn't
contain the data we want - but it's still possible the data we want
might have been overwritten, and for btree node pointers we can verify
whether or not the node is the one we wanted with the node's sequence
number, so it's better to keep the pointer and try reading from it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a use-after-free in bch2_gc_mark_key()
Kent Overstreet [Fri, 16 Apr 2021 22:02:57 +0000 (18:02 -0400)]
bcachefs: Fix a use-after-free in bch2_gc_mark_key()

bch2_check_fix_ptrs() can update/reallocate k

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Bring back metadata only gc
Kent Overstreet [Fri, 16 Apr 2021 20:54:11 +0000 (16:54 -0400)]
bcachefs: Bring back metadata only gc

This is useful for the filesystem dump debugging tool - when we're
hitting bugs we want to skip as much of the recovery process as
possible, and the dump tool only needs to know where metadata lives.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_write_super to obey very_degraded option
Kent Overstreet [Fri, 9 Apr 2021 23:04:57 +0000 (19:04 -0400)]
bcachefs: Fix bch2_write_super to obey very_degraded option

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't fail mounts due to devices that are marked as failed
Kent Overstreet [Sat, 3 Apr 2021 03:41:10 +0000 (23:41 -0400)]
bcachefs: Don't fail mounts due to devices that are marked as failed

If a given set of replicas is entirely on failed devices, don't fail the
mount: we will still fail the mount if we have some copies on non failed
devices.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a cond_seched() to the allocator thread
Kent Overstreet [Mon, 5 Apr 2021 04:53:42 +0000 (00:53 -0400)]
bcachefs: Add a cond_seched() to the allocator thread

This is just a band-aid fix for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use x-macros for compat feature bits
Kent Overstreet [Mon, 5 Apr 2021 01:57:35 +0000 (21:57 -0400)]
bcachefs: Use x-macros for compat feature bits

This is to generate strings for them, so that we can print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix some (spurious) warnings about uninitialized vars
Kent Overstreet [Thu, 25 Mar 2021 02:11:22 +0000 (22:11 -0400)]
bcachefs: Fix some (spurious) warnings about uninitialized vars

These are only complained about when building in userspace, for some
reason.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an allocator startup race
Kent Overstreet [Fri, 12 Mar 2021 02:46:23 +0000 (21:46 -0500)]
bcachefs: Fix an allocator startup race

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bkey format generation for 32 bit fields
Kent Overstreet [Sun, 21 Mar 2021 03:55:36 +0000 (23:55 -0400)]
bcachefs: Fix bkey format generation for 32 bit fields

Having a packed format that can represent a field larger than the
unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Scan for old btree nodes if necessary on mount
Kent Overstreet [Mon, 22 Mar 2021 22:39:16 +0000 (18:39 -0400)]
bcachefs: Scan for old btree nodes if necessary on mount

We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add code to scan for/rewite old btree nodes
Kent Overstreet [Sun, 14 Mar 2021 23:01:14 +0000 (19:01 -0400)]
bcachefs: Add code to scan for/rewite old btree nodes

This adds a new data job type to scan for btree nodes in the old extent
format, and rewrite them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Dump journal state when we get stuck
Kent Overstreet [Wed, 24 Feb 2021 06:16:49 +0000 (01:16 -0500)]
bcachefs: Dump journal state when we get stuck

We had a bug reported where the journal is failing to allocate a journal
write - this should help figure out what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a 64 bit divide on 32 bit
Kent Overstreet [Sat, 20 Feb 2021 10:05:18 +0000 (05:05 -0500)]
bcachefs: Fix a 64 bit divide on 32 bit

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't use inode btree key cache in fsck code
Kent Overstreet [Mon, 8 Mar 2021 02:43:21 +0000 (21:43 -0500)]
bcachefs: Don't use inode btree key cache in fsck code

We had a cache coherency bug with the btree key cache in the fsck code -
this fixes fsck to be consistent about not using it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't call into journal reclaim when we're not supposed to
Kent Overstreet [Mon, 8 Mar 2021 00:04:16 +0000 (19:04 -0500)]
bcachefs: Don't call into journal reclaim when we're not supposed to

This was causing a deadlock when btree_update_nodes_writtes() invokes
journal reclaim because of the btree cache being too dirty.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Create allocator threads when allocating filesystem
Kent Overstreet [Fri, 5 Mar 2021 23:00:55 +0000 (18:00 -0500)]
bcachefs: Create allocator threads when allocating filesystem

We're seeing failures to mount because of a failure to start the
allocator threads, which currently happens fairly late in the mount
process, after walking all metadata, and kthread_create() fails if
something has tried to kill the mount process, which is probably not
what we want.

This patch avoids this issue by creating, but not starting, the
allocator threads when we preallocate all of our other in memory data
structures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM
Kent Overstreet [Wed, 24 Feb 2021 02:41:25 +0000 (21:41 -0500)]
bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM

bch2_btree_node_get_noiter() isn't used from the btree iterator code,
which retries with the btree node cache cannibalize lock held on
-ENOMEM, so we should do it ourself if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add error message for some allocation failures
Kent Overstreet [Tue, 23 Feb 2021 20:16:41 +0000 (15:16 -0500)]
bcachefs: Add error message for some allocation failures

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Extents may now cross btree node boundaries
Kent Overstreet [Wed, 10 Feb 2021 21:13:57 +0000 (16:13 -0500)]
bcachefs: Extents may now cross btree node boundaries

When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: iter->real_pos
Kent Overstreet [Fri, 12 Feb 2021 02:57:32 +0000 (21:57 -0500)]
bcachefs: iter->real_pos

We need to differentiate between the search position of a btree
iterator, vs. what it actually points at (what we found). This matters
for extents, where iter->pos will typically be the start of the key we
found and iter->real_pos will be the end of the key we found (which soon
won't necessarily be in the same btree node!) and it will also matter
for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>