]> www.infradead.org Git - users/hch/uuid.git/log
users/hch/uuid.git
2 years agobcachefs: fix stack corruption
Yuxuan Shui [Fri, 22 May 2020 14:50:05 +0000 (15:50 +0100)]
bcachefs: fix stack corruption

When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.

This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.

Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Wrap vmap() in memalloc_nofs_save()/restore()
Kent Overstreet [Thu, 21 May 2020 21:23:40 +0000 (17:23 -0400)]
bcachefs: Wrap vmap() in memalloc_nofs_save()/restore()

vmalloc() and vmap() don't take GFP_NOFS - this should be pushed further
up the IO path, but for now just doing the simple fix.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another iterator counting bug
Kent Overstreet [Fri, 15 May 2020 01:45:08 +0000 (21:45 -0400)]
bcachefs: Fix another iterator counting bug

We were marking the end of where we could insert incorrectly for
indirect extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix setquota
Kent Overstreet [Wed, 13 May 2020 21:53:33 +0000 (17:53 -0400)]
bcachefs: Fix setquota

We were returning -EINTR because we were failing to retry the btree
transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a workqueue deadlock
Kent Overstreet [Wed, 13 May 2020 04:15:28 +0000 (00:15 -0400)]
bcachefs: Fix a workqueue deadlock

writes running out of a workqueue (via dio path) could block and prevent
other writes from calling bch2_write_index() and completing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Validate that we read the correct btree node
Kent Overstreet [Tue, 12 May 2020 22:34:16 +0000 (18:34 -0400)]
bcachefs: Validate that we read the correct btree node

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fixes for startup on very full filesystems
Kent Overstreet [Tue, 12 May 2020 00:01:07 +0000 (20:01 -0400)]
bcachefs: Fixes for startup on very full filesystems

 - Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys
 - Don't strand buckest on the copygc freelist until after recovery is
   done and we're starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix initialization of bounce mempools
Kent Overstreet [Sat, 9 May 2020 03:15:42 +0000 (23:15 -0400)]
bcachefs: Fix initialization of bounce mempools

When they were converted to kvpmalloc pools they weren't converted to
pass the actual size of the allocation. Oops.

Also, validate the real length in the zstd decompression path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Some compression improvements
Kent Overstreet [Wed, 6 May 2020 19:37:04 +0000 (15:37 -0400)]
bcachefs: Some compression improvements

In __bio_map_or_bounce(), the check for if the bio is physically
contiguous is improved; it's now more readable and handles multi page
but contiguous bios.

Also when decompressing, we were doing a redundant memcpy in the case
where we were able to use vmap to map a bio contigiously.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix two more deadlocks
Kent Overstreet [Sat, 2 May 2020 20:21:35 +0000 (16:21 -0400)]
bcachefs: Fix two more deadlocks

Deadlock on shutdown:

btree_update_nodes_written() unblocks btree nodes from being written;
after doing so, it has to check if they were marked as needing to be
written and if so kick off those writes - if that doesn't happen, we'll
never release journal pins and shutdown will get stuck when flushing the
journal.

There was an error path where this didn't happen, because in the error
path we don't actually want those btree nodes write to happen; however,
we still have to kick off the write path so the journal pins get
released. The btree write path checks if we're in a journal error state
and doesn't do the actual write if we are.

Also - there was another deadlock because btree_update_nodes_written()
was taking the btree update off of the unwritten_list too soon - before
getting a journal reservation, which could fail and have to be retried.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another deadlock in btree_update_nodes_written()
Kent Overstreet [Fri, 1 May 2020 23:56:31 +0000 (19:56 -0400)]
bcachefs: Fix another deadlock in btree_update_nodes_written()

We also can't be blocking on btree node write locks while holding
btree_interior_update_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add some printks for error paths
Kent Overstreet [Wed, 29 Apr 2020 16:57:04 +0000 (12:57 -0400)]
bcachefs: Add some printks for error paths

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't issue writes that are more than 1 MB
Kent Overstreet [Wed, 29 Apr 2020 19:28:25 +0000 (15:28 -0400)]
bcachefs: Don't issue writes that are more than 1 MB

the bcachefs io path in io.c can't bounce writes larger than that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: More fixes for counting extent update iterators
Kent Overstreet [Fri, 24 Apr 2020 21:57:59 +0000 (17:57 -0400)]
bcachefs: More fixes for counting extent update iterators

This is unfortunately really fragile - hopefully we'll be able to think
of a new approach at some point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock
Kent Overstreet [Fri, 24 Apr 2020 22:25:11 +0000 (18:25 -0400)]
bcachefs: Fix a deadlock

btree_node_lock_increment() was incorrectly skipping over the current
iter when checking if we should increment a node we already have locked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Handle -EINTR bch2_migrate_index_update()
Kent Overstreet [Fri, 24 Apr 2020 18:08:56 +0000 (14:08 -0400)]
bcachefs: Handle -EINTR bch2_migrate_index_update()

peek_slot() shouldn't return -EINTR when there's only a single live
iterator, but that's tricky to guarantee - we seem to be returning
-EINTR when we shouldn't, but it's easy enough to handle in the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for the bkey compat path
Kent Overstreet [Fri, 24 Apr 2020 18:08:18 +0000 (14:08 -0400)]
bcachefs: Fix for the bkey compat path

In the write path, we were calling bch2_bkey_ops.compat() in the wrong
place.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a few tracepoints
Kent Overstreet [Sat, 11 Apr 2020 16:32:27 +0000 (12:32 -0400)]
bcachefs: Add a few tracepoints

Transaction restart tracing should probably be overhaulled at some
point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Slightly reduce btree split threshold
Kent Overstreet [Sat, 11 Apr 2020 16:31:16 +0000 (12:31 -0400)]
bcachefs: Slightly reduce btree split threshold

2/3rds performs a lot better than 3/4ths on the tested workloda, leading
to significanly fewer btree node compactions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve lockdep annotation in journalling code
Kent Overstreet [Sat, 11 Apr 2020 16:30:30 +0000 (12:30 -0400)]
bcachefs: Improve lockdep annotation in journalling code

bch2_journal_res_get() in nonblocking mode is equivalent to a trylock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug in bch2_journal_pin_copy()
Kent Overstreet [Sat, 11 Apr 2020 16:29:32 +0000 (12:29 -0400)]
bcachefs: Fix a locking bug in bch2_journal_pin_copy()

There was a race where the src pin would be flushed - releasing the last
pin on that sequence number - before adding the new journal pin. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another deadlock in the btree interior update path
Kent Overstreet [Tue, 7 Apr 2020 21:27:12 +0000 (17:27 -0400)]
bcachefs: Fix another deadlock in the btree interior update path

Can't take read locks on btree nodes while holding
btree_interior_update_lock. Also, fix a bug where we were leaking
journal prereservations.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()
Kent Overstreet [Tue, 7 Apr 2020 21:31:38 +0000 (17:31 -0400)]
bcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Account for ioclock slop when throttling rebalance thread
Kent Overstreet [Tue, 7 Apr 2020 17:49:14 +0000 (13:49 -0400)]
bcachefs: Account for ioclock slop when throttling rebalance thread

This should fix an issue where the rebalance thread was spinning

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock on starting an interior btree update
Kent Overstreet [Mon, 6 Apr 2020 01:49:17 +0000 (21:49 -0400)]
bcachefs: Fix a deadlock on starting an interior btree update

Not legal to block on a journal prereservation with btree locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a debug mode assertion
Kent Overstreet [Sat, 4 Apr 2020 20:47:59 +0000 (16:47 -0400)]
bcachefs: Fix a debug mode assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a debug assertion
Kent Overstreet [Sat, 4 Apr 2020 19:49:42 +0000 (15:49 -0400)]
bcachefs: Fix a debug assertion

This assertion was passing the wrong btree node type when inserting into
interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another error path locking bug
Kent Overstreet [Sat, 4 Apr 2020 19:45:06 +0000 (15:45 -0400)]
bcachefs: Fix another error path locking bug

btree_update_nodes_written() was leaking a btree node lock on failure to
get a journal reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a null ptr deref during journal replay
Kent Overstreet [Sat, 4 Apr 2020 17:54:19 +0000 (13:54 -0400)]
bcachefs: Fix a null ptr deref during journal replay

We were calling bch2_extent_can_insert() incorrectly; it should only be
called when the extents-to-keys pass is running because that's when we
could be splitting a compressed extent. Calling bch2_extent_can_insert()
without passing in a disk reservation was causing a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add another mssing bch2_trans_iter_put() call
Kent Overstreet [Wed, 1 Apr 2020 21:28:39 +0000 (17:28 -0400)]
bcachefs: Add another mssing bch2_trans_iter_put() call

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Trace where btree iterators are allocated
Kent Overstreet [Wed, 1 Apr 2020 21:14:14 +0000 (17:14 -0400)]
bcachefs: Trace where btree iterators are allocated

This will help with iterator overflow bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix fallocate FL_INSERT_RANGE
Kent Overstreet [Wed, 1 Apr 2020 20:07:57 +0000 (16:07 -0400)]
bcachefs: Fix fallocate FL_INSERT_RANGE

This was another bug because of bch2_btree_iter_set_pos() invalidating
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add print method for bch2_btree_ptr_v2
Kent Overstreet [Tue, 31 Mar 2020 20:25:30 +0000 (16:25 -0400)]
bcachefs: Add print method for bch2_btree_ptr_v2

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix journalling of interior node updates
Kent Overstreet [Tue, 31 Mar 2020 20:23:43 +0000 (16:23 -0400)]
bcachefs: Fix journalling of interior node updates

We weren't journalling updates done while splitting/compacting nodes -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix iterating of journal keys within a btree node
Kent Overstreet [Mon, 30 Mar 2020 22:11:13 +0000 (18:11 -0400)]
bcachefs: Fix iterating of journal keys within a btree node

Extent btrees no longer have weird special behaviour for min_key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug
Kent Overstreet [Mon, 30 Mar 2020 21:43:21 +0000 (17:43 -0400)]
bcachefs: Fix a locking bug

Dropping the wrong kind of lock can't lead to anything good...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix inodes pass in fsck
Kent Overstreet [Mon, 30 Mar 2020 18:29:06 +0000 (14:29 -0400)]
bcachefs: Fix inodes pass in fsck

It wasn't updated for the patch that switched inodes to using the offset
field of struct bkey.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix ec_stripe_update_ptrs()
Kent Overstreet [Mon, 30 Mar 2020 18:05:05 +0000 (14:05 -0400)]
bcachefs: Fix ec_stripe_update_ptrs()

bch2_btree_iter_set_pos() invalidates the key returned by peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check btree topology at startup
Kent Overstreet [Sun, 29 Mar 2020 20:48:53 +0000 (16:48 -0400)]
bcachefs: Check btree topology at startup

When initial btree gc was changed to overlay journal keys as it walks
the btree, it also stopped checking btree topology.

Previously, checking btree topology was a fairly complicated affair -
but it's much easier now that btree_ptr_v2 has min_key in the pointer.

This rewrites the old range_checks code and uses it in both runtime and
initial gc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't allocate memory while holding journal reservation
Kent Overstreet [Mon, 30 Mar 2020 16:33:30 +0000 (12:33 -0400)]
bcachefs: Don't allocate memory while holding journal reservation

This fixes a lockdep splat - allocating memory can call
bch2_clear_page_bits() which takes mark_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Reduce max nr of btree iters when lockdep is on
Kent Overstreet [Sun, 29 Mar 2020 21:01:05 +0000 (17:01 -0400)]
bcachefs: Reduce max nr of btree iters when lockdep is on

This is so we don't overflow MAX_LOCK_DEPTH.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill bkey_type_successor
Kent Overstreet [Tue, 7 Jan 2020 18:29:32 +0000 (13:29 -0500)]
bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Switch a BUG_ON() to a warning
Kent Overstreet [Sun, 29 Mar 2020 18:21:44 +0000 (14:21 -0400)]
bcachefs: Switch a BUG_ON() to a warning

This has popped and thus needs to be debugged, but the assertion firing
isn't necessarily fatal so switch it to a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use kvpmalloc mempools for compression bounce
Kent Overstreet [Sun, 29 Mar 2020 16:33:41 +0000 (12:33 -0400)]
bcachefs: Use kvpmalloc mempools for compression bounce

This fixes an issue where mounting would fail because of memory
fragmentation - previously the compression bounce buffers were using
get_free_pages().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Read journal when keep_journal on
Kent Overstreet [Sat, 28 Mar 2020 22:26:01 +0000 (18:26 -0400)]
bcachefs: Read journal when keep_journal on

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Various fixes for interior update path
Kent Overstreet [Sat, 28 Mar 2020 23:17:23 +0000 (19:17 -0400)]
bcachefs: Various fixes for interior update path

The locking was wrong, and we could get a use after free in the error
path where we weren't taking the entrie being freed off the unwritten
list.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use memalloc_nofs_save()
Kent Overstreet [Fri, 27 Mar 2020 21:38:51 +0000 (17:38 -0400)]
bcachefs: Use memalloc_nofs_save()

vmalloc allocations don't always obey GFP_NOFS - memalloc_nofs_save() is
the prefered approach for the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve error message in fsck
Kent Overstreet [Wed, 25 Mar 2020 20:13:00 +0000 (16:13 -0400)]
bcachefs: Improve error message in fsck

Seeing the extents that were overlapping is highly useful for figuring
out what went wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add an option for keeping journal entries after startup
Kent Overstreet [Wed, 25 Mar 2020 20:12:33 +0000 (16:12 -0400)]
bcachefs: Add an option for keeping journal entries after startup

This will be used by the userspace debug tools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an assertion when nothing to replay
Kent Overstreet [Wed, 25 Mar 2020 21:57:29 +0000 (17:57 -0400)]
bcachefs: Fix an assertion when nothing to replay

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Journal updates to interior nodes
Kent Overstreet [Sun, 9 Feb 2020 00:06:31 +0000 (19:06 -0500)]
bcachefs: Journal updates to interior nodes

Previously, the btree has always been self contained and internally
consistent on disk without anything from the journal - the journal just
contained pointers to the btree roots.

However, this meant that btree node split or compact operations - i.e.
anything that changes btree node topology and involves updates to
interior nodes - would require that interior btree node to be written
immediately, which means emitting a btree node write that's mostly empty
(using 4k of space on disk if the filesystemm blocksize is 4k to only
write perhaps ~100 bytes of new keys).

More importantly, this meant most btree node writes had to be FUA, and
consumer drives have a history of slow and/or buggy FUA support - other
filesystes have been bit by this.

This patch changes the interior btree update path to journal updates to
interior nodes, after the writes for the new btree nodes have completed.
Best of all, it turns out to simplify the interior node update path
somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Replay interior node keys
Kent Overstreet [Mon, 16 Mar 2020 02:32:03 +0000 (22:32 -0400)]
bcachefs: Replay interior node keys

This slightly modifies the journal replay code so that it can replay
updates to interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: trans_commit() path can now insert to interior nodes
Kent Overstreet [Mon, 16 Mar 2020 03:29:43 +0000 (23:29 -0400)]
bcachefs: trans_commit() path can now insert to interior nodes

This will be needed for the upcoming patches to journal updates to
interior btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Disable extent merging
Kent Overstreet [Tue, 24 Mar 2020 21:00:48 +0000 (17:00 -0400)]
bcachefs: Disable extent merging

Extent merging is currently broken, and will be reimplemented
differently soon - right now it only happens when btree nodes are being
compacted, which makes it difficult to test.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug in fsck
Kent Overstreet [Sat, 21 Mar 2020 18:47:00 +0000 (14:47 -0400)]
bcachefs: Fix a locking bug in fsck

This works around a btree locking issue - we can't be holding read locks
while taking write locks, which currently means we can't have live
iterators holding read locks at commit time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix count_iters_for_insert()
Kent Overstreet [Sat, 21 Mar 2020 18:08:01 +0000 (14:08 -0400)]
bcachefs: Fix count_iters_for_insert()

This fixes a transaction iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an iterator bug
Kent Overstreet [Wed, 18 Mar 2020 17:40:28 +0000 (13:40 -0400)]
bcachefs: Fix an iterator bug

We were incorrectly not restarting the transaction when re-traversing
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Shut down quicker
Kent Overstreet [Wed, 18 Mar 2020 15:46:46 +0000 (11:46 -0400)]
bcachefs: Shut down quicker

Internal writes (i.e. copygc/rebalance operations) shouldn't be blocking
on the allocator when we're going RO.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: BCH_FEATURE_new_extent_overwrite is now required
Kent Overstreet [Wed, 18 Mar 2020 15:40:07 +0000 (11:40 -0400)]
bcachefs: BCH_FEATURE_new_extent_overwrite is now required

The patch "bcachefs: Move extent overwrite handling out of core btree
code" should have been flipping on this feature bit; extent btree nodes
in the old format have to be rewritten before we can insert into them
with the new extent update path. Not turning on this feature bit was
causing us to go into an infinite loop where we keep rewriting btree
nodes over and over.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Clear BCH_FEATURE_extents_above_btree_updates on clean shutdown
Kent Overstreet [Mon, 16 Mar 2020 21:23:37 +0000 (17:23 -0400)]
bcachefs: Clear BCH_FEATURE_extents_above_btree_updates on clean shutdown

This is needed so that users can roll back to before "d9bb516b2d
bcachefs: Move extent overwrite handling out of core btree code", which
it appears may still be buggy.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another iterator leak
Kent Overstreet [Mon, 16 Mar 2020 19:48:58 +0000 (15:48 -0400)]
bcachefs: Fix another iterator leak

This updates bch2_rbio_narrow_crcs() to the current style for
transactional btree code, and fixes a rare panic on iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't use peek_filter() unnecessarily
Kent Overstreet [Mon, 16 Mar 2020 19:49:23 +0000 (15:49 -0400)]
bcachefs: Don't use peek_filter() unnecessarily

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a use after free in dio write path
Kent Overstreet [Mon, 16 Mar 2020 18:49:52 +0000 (14:49 -0400)]
bcachefs: Fix a use after free in dio write path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop unused export
Kent Overstreet [Mon, 16 Mar 2020 02:41:10 +0000 (22:41 -0400)]
bcachefs: Drop unused export

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Move extent overwrite handling out of core btree code
Kent Overstreet [Mon, 30 Dec 2019 19:37:25 +0000 (14:37 -0500)]
bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_iter_peek_with_updates()
Kent Overstreet [Thu, 5 Mar 2020 23:44:59 +0000 (18:44 -0500)]
bcachefs: btree_iter_peek_with_updates()

Introduce a new iterator method that provides a consistent view of the
btree plus uncommitted updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix build when CONFIG_BCACHEFS_DEBUG=n
Kent Overstreet [Sun, 15 Mar 2020 20:15:08 +0000 (16:15 -0400)]
bcachefs: Fix build when CONFIG_BCACHEFS_DEBUG=n

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: More btree iter invariants
Kent Overstreet [Tue, 18 Feb 2020 21:17:55 +0000 (16:17 -0500)]
bcachefs: More btree iter invariants

Ensure that iter->pos always lies between the start and end of iter->k
(the last key returned). Also, bch2_btree_iter_set_pos() now invalidates
the key that peek() or next() returned.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify bch2_btree_iter_peek_slot()
Kent Overstreet [Sat, 14 Mar 2020 01:41:22 +0000 (21:41 -0400)]
bcachefs: Simplify bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Iterator debug code improvements
Kent Overstreet [Tue, 18 Feb 2020 21:17:55 +0000 (16:17 -0500)]
bcachefs: Iterator debug code improvements

More aggressively checking iterator invariants, and fixing the resulting
bugs. Also greatly simplifying iter_next() and iter_next_slot() - they
were hyper optimized before, but the optimizations were getting too
brittle.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Skip 0 size deleted extents in journal replay
Kent Overstreet [Thu, 5 Mar 2020 23:43:31 +0000 (18:43 -0500)]
bcachefs: Skip 0 size deleted extents in journal replay

These are created by the new extent update path, but not used yet by the
recovery code and they break the existing recovery code, so we can just
skip them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Traverse iterator in journal replay
Kent Overstreet [Mon, 9 Mar 2020 20:15:54 +0000 (16:15 -0400)]
bcachefs: Traverse iterator in journal replay

This fixes a bug where we end up spinning in journal replay - in theory
this shouldn't be necessary though, transaction reset should be
re-traversing all iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't log errors that are expected during shutdown
Kent Overstreet [Mon, 9 Mar 2020 18:19:58 +0000 (14:19 -0400)]
bcachefs: Don't log errors that are expected during shutdown

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_dump_bset()
Kent Overstreet [Sat, 7 Mar 2020 22:20:39 +0000 (17:20 -0500)]
bcachefs: Fix bch2_dump_bset()

It's used in the write path when the bset isn't in the btree node
buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another iterator leak
Kent Overstreet [Sat, 7 Mar 2020 18:30:55 +0000 (13:30 -0500)]
bcachefs: Fix another iterator leak

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix off by one error in bch2_extent_crc_append()
Kent Overstreet [Thu, 5 Mar 2020 22:06:15 +0000 (17:06 -0500)]
bcachefs: Fix off by one error in bch2_extent_crc_append()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix extent_sort_fix_overlapping()
Kent Overstreet [Mon, 2 Mar 2020 22:08:19 +0000 (17:08 -0500)]
bcachefs: Fix extent_sort_fix_overlapping()

Recently the extent update path started emmiting 0 size whiteouts on
extent overwrite, as part of transitioning to moving extent handling
out of the core btree code.

Unfortunately, this broke the old code path that handles overlapping
extents when reading in btree nodes - it relies on sorting incomming
extents by start position, but the 0 size whiteouts broke that ordering.
Skipping over them before the main algorithm sees them fixes this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Some btree iterator improvements
Kent Overstreet [Mon, 2 Mar 2020 18:38:19 +0000 (13:38 -0500)]
bcachefs: Some btree iterator improvements

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Journal pin cleanups
Kent Overstreet [Thu, 27 Feb 2020 20:03:44 +0000 (15:03 -0500)]
bcachefs: Journal pin cleanups

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Dont't del sysfs dir until after we go RO
Kent Overstreet [Thu, 27 Feb 2020 20:03:53 +0000 (15:03 -0500)]
bcachefs: Dont't del sysfs dir until after we go RO

This will help for debugging hangs during unmount

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix error message on bucket sector count overflow
Kent Overstreet [Thu, 27 Feb 2020 03:29:52 +0000 (22:29 -0500)]
bcachefs: Fix error message on bucket sector count overflow

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve an error message
Kent Overstreet [Thu, 27 Feb 2020 01:39:06 +0000 (20:39 -0500)]
bcachefs: Improve an error message

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: BCH_SB_FEATURES_ALL
Kent Overstreet [Wed, 26 Feb 2020 22:34:27 +0000 (17:34 -0500)]
bcachefs: BCH_SB_FEATURES_ALL

BCH_FEATURE_btree_ptr_v2 wasn't getting set on new filesystems, oops

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: fix setting btree_node_accessed()
Kent Overstreet [Wed, 26 Feb 2020 22:25:13 +0000 (17:25 -0500)]
bcachefs: fix setting btree_node_accessed()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup
Kent Overstreet [Mon, 24 Feb 2020 20:25:00 +0000 (15:25 -0500)]
bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup

Nice performance optimization

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix incorrect initialization of btree_node_old_extent_overwrite()
Kent Overstreet [Wed, 26 Feb 2020 22:11:00 +0000 (17:11 -0500)]
bcachefs: Fix incorrect initialization of btree_node_old_extent_overwrite()

b->level and b->btree_id weren't set when the code was checking
btree_node_is_extents()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Issue discards when needed to allocate journal write
Kent Overstreet [Wed, 26 Feb 2020 20:58:36 +0000 (15:58 -0500)]
bcachefs: Issue discards when needed to allocate journal write

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS
Kent Overstreet [Wed, 26 Feb 2020 20:39:46 +0000 (15:39 -0500)]
bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS

All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.

Also convert more code to __bch2_trans_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Seralize btree_update operations at btree_update_nodes_written()
Kent Overstreet [Sat, 8 Feb 2020 21:39:37 +0000 (16:39 -0500)]
bcachefs: Seralize btree_update operations at btree_update_nodes_written()

Prep work for journalling updates to interior nodes - enforcing ordering
will greatly simplify those changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_ptr_v2
Kent Overstreet [Fri, 7 Feb 2020 18:38:02 +0000 (13:38 -0500)]
bcachefs: btree_ptr_v2

Add a new btree ptr type which contains the sequence number (random 64
bit cookie, actually) for that btree node - this lets us verify that
when we read in a btree node it really is the btree node we wanted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: introduce b->hash_val
Kent Overstreet [Tue, 18 Feb 2020 22:15:32 +0000 (17:15 -0500)]
bcachefs: introduce b->hash_val

This is partly prep work for introducing bch_btree_ptr_v2, but it'll
also be a bit of a performance boost by moving the full key out of the
hot part of struct btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix traversing to interior nodes
Kent Overstreet [Wed, 19 Feb 2020 00:29:33 +0000 (19:29 -0500)]
bcachefs: Fix traversing to interior nodes

NULL is used to mean "reach end of traversal" - we were only
initializing the leaf node in the iterator to the right sentinal value.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check for bad key version number
Kent Overstreet [Wed, 19 Feb 2020 01:02:41 +0000 (20:02 -0500)]
bcachefs: Check for bad key version number

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_ptr_swab for indirect extents
Kent Overstreet [Fri, 7 Feb 2020 01:15:15 +0000 (20:15 -0500)]
bcachefs: Fix bch2_ptr_swab for indirect extents

bch2_ptr_swab was never updated when the code for generic keys with
pointers was added - it assumed the entire val was only used for
pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Make BTREE_ITER_IS_EXTENTS private to iter code
Kent Overstreet [Fri, 31 Jan 2020 18:26:05 +0000 (13:26 -0500)]
bcachefs: Make BTREE_ITER_IS_EXTENTS private to iter code

Prep work for changing the core btree update path to handle extents like
regular keys; we need to reduce the scope of what BTREE_ITER_IS_EXTENTS
means

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: __bch2_btree_iter_set_pos()
Kent Overstreet [Fri, 31 Jan 2020 18:23:18 +0000 (13:23 -0500)]
bcachefs: __bch2_btree_iter_set_pos()

This one takes an additional argument for whether we're searching for >=
or > the search key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_and_journal_iter
Kent Overstreet [Sat, 28 Dec 2019 01:51:35 +0000 (20:51 -0500)]
bcachefs: btree_and_journal_iter

Introduce a new iterator that iterates over keys in the btree with keys
from the journal overlaid on top. This factors out what the erasure
coding init code was doing manually.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Make sure we're releasing btree iterators
Kent Overstreet [Tue, 18 Feb 2020 19:27:10 +0000 (14:27 -0500)]
bcachefs: Make sure we're releasing btree iterators

This wasn't originally required, but this is the model we're moving
towards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve an insert path optimization
Kent Overstreet [Fri, 31 Jan 2020 01:26:08 +0000 (20:26 -0500)]
bcachefs: Improve an insert path optimization

The insert path had an optimization to short circuit lookup
table/iterator fixups when overwriting an existing key with the same
size value - but it was incorrect when other key fields
(size/version) were changing. This is important for the upcoming rework
to have extent updates use the same insert path as regular keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an uninitialized field in bch_write_op
Kent Overstreet [Wed, 29 Jan 2020 18:05:04 +0000 (13:05 -0500)]
bcachefs: Fix an uninitialized field in bch_write_op

Regression from "bcachefs: Track incompressible data"

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>