]> www.infradead.org Git - users/griffoul/linux.git/log
users/griffoul/linux.git
2 years agobcachefs: Use blk_status_to_str()
Kent Overstreet [Thu, 2 Jul 2020 17:43:58 +0000 (13:43 -0400)]
bcachefs: Use blk_status_to_str()

Improved error messages are always a good thing

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't cap ios in dio write path at 2 MB
Kent Overstreet [Tue, 30 Jun 2020 14:12:45 +0000 (10:12 -0400)]
bcachefs: Don't cap ios in dio write path at 2 MB

It appears this was erronious, a different bug was responsible

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Refactor dio write code to reinit bch_write_op
Kent Overstreet [Mon, 29 Jun 2020 22:22:06 +0000 (18:22 -0400)]
bcachefs: Refactor dio write code to reinit bch_write_op

This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set
incorrectly, causing the completion to be delivered multiple times.
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_extent_can_insert() not being called
Kent Overstreet [Sun, 28 Jun 2020 22:11:12 +0000 (18:11 -0400)]
bcachefs: Fix bch2_extent_can_insert() not being called

It's supposed to check whether we're splitting a compressed extent and
if so get a bigger disk reservation - hence this fixes a "disk usage
increased by x without a reservaiton" bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a null ptr deref in bch2_btree_iter_traverse_one()
Kent Overstreet [Fri, 26 Jun 2020 17:56:21 +0000 (13:56 -0400)]
bcachefs: Fix a null ptr deref in bch2_btree_iter_traverse_one()

We use sentinal values that aren't NULL to indicate there's a btree node
at a higher level; occasionally, this may result in
btree_iter_up_until_good_node() stopping at one of those sentinal
values.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Track sectors of erasure coded data
Kent Overstreet [Fri, 19 Jun 2020 01:06:42 +0000 (21:06 -0400)]
bcachefs: Track sectors of erasure coded data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use btree reserve when appropriate
Kent Overstreet [Thu, 18 Jun 2020 21:16:29 +0000 (17:16 -0400)]
bcachefs: Use btree reserve when appropriate

Whenever we're doing an update that has pointers, that generally means
we need to do the update in order to release open bucket references - so
we should be using the btree open bucket reserve.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a kthread_should_stop() check to allocator thread
Kent Overstreet [Wed, 17 Jun 2020 22:20:26 +0000 (18:20 -0400)]
bcachefs: Add a kthread_should_stop() check to allocator thread

Turns out it's possible during shutdown for the allocator to get stuck
spinning on bch2_invalidate_buckets() without hitting any of the other
checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Change bch2_dump_bset() to also print key values
Kent Overstreet [Wed, 17 Jun 2020 21:33:53 +0000 (17:33 -0400)]
bcachefs: Change bch2_dump_bset() to also print key values

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock in the RO path
Kent Overstreet [Wed, 17 Jun 2020 21:30:38 +0000 (17:30 -0400)]
bcachefs: Fix a deadlock in the RO path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix incorrect gfp check
Kent Overstreet [Tue, 16 Jun 2020 00:18:02 +0000 (20:18 -0400)]
bcachefs: Fix incorrect gfp check

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix lock ordering with new btree cache code
Kent Overstreet [Mon, 15 Jun 2020 23:53:46 +0000 (19:53 -0400)]
bcachefs: Fix lock ordering with new btree cache code

The code that checks lock ordering was recently changed to go off of the
pos of the btree node, rather than the iterator, but the btree cache
code didn't update to handle iterators that point to cached bkeys. Oops

Also, update various debug code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: delete a slightly faulty assertion
Kent Overstreet [Mon, 15 Jun 2020 21:59:09 +0000 (17:59 -0400)]
bcachefs: delete a slightly faulty assertion

state lock isn't held at startup

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Increase size of btree node reserve
Kent Overstreet [Mon, 15 Jun 2020 21:38:26 +0000 (17:38 -0400)]
bcachefs: Increase size of btree node reserve

Also tweak the allocator to be more aggressive about keeping it full.
The recent changes to make updates to interior nodes transactional (and
thus generate updates to the alloc btree) all put more stress on the
btree node reserves.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Give bkey_cached_key same attributes as bpos
Kent Overstreet [Mon, 15 Jun 2020 20:59:36 +0000 (16:59 -0400)]
bcachefs: Give bkey_cached_key same attributes as bpos

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use cached iterators for alloc btree
Kent Overstreet [Sat, 5 Oct 2019 16:54:53 +0000 (12:54 -0400)]
bcachefs: Use cached iterators for alloc btree

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Btree key cache
Kent Overstreet [Fri, 8 Mar 2019 00:46:10 +0000 (19:46 -0500)]
bcachefs: Btree key cache

This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.

Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.

These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Implement a new gc that only recalcs oldest gen
Kent Overstreet [Mon, 15 Jun 2020 19:10:54 +0000 (15:10 -0400)]
bcachefs: Implement a new gc that only recalcs oldest gen

Full mark and sweep gc doesn't (yet?) work with the new btree key cache
code, but it also blocks updates to interior btree nodes for the
duration and isn't really necessary in practice; we aren't currently
attempting to repair errors in allocation info at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Turn c->state_lock into an rwsem
Kent Overstreet [Mon, 15 Jun 2020 18:58:47 +0000 (14:58 -0400)]
bcachefs: Turn c->state_lock into an rwsem

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add an internal option for reading entire journal
Kent Overstreet [Sat, 13 Jun 2020 22:43:14 +0000 (18:43 -0400)]
bcachefs: Add an internal option for reading entire journal

To be used the debug tool that dumps the contents of the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't deadlock when btree node reuse changes lock ordering
Kent Overstreet [Sat, 13 Jun 2020 02:29:48 +0000 (22:29 -0400)]
bcachefs: Don't deadlock when btree node reuse changes lock ordering

Btree node lock ordering is based on the logical key. However, 'struct
btree' may be reused for a different btree node under memory pressure.
This patch uses the new six lock callback to check if a btree node is no
longer the node we wanted to lock before blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock
Kent Overstreet [Fri, 12 Jun 2020 18:58:07 +0000 (14:58 -0400)]
bcachefs: Fix a deadlock

__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for
btree node lock ordering, this caused an off by one error that was
triggered by bch2_btree_node_get_sibling() getting the previous node.

This refactors the code to compare against btree node keys directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Refactor btree insert path
Kent Overstreet [Wed, 10 Jun 2020 01:00:29 +0000 (21:00 -0400)]
bcachefs: Refactor btree insert path

This splits out the journalling code from the btree update code; prep
work for the btree key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Always give out journal pre-res if we already have one
Kent Overstreet [Wed, 10 Jun 2020 00:54:36 +0000 (20:54 -0400)]
bcachefs: Always give out journal pre-res if we already have one

This is better than skipping the journal pre-reservation if we already
have one - we should still acount for the journal reservation we're
going to have to get.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: More open buckets
Kent Overstreet [Tue, 9 Jun 2020 19:44:03 +0000 (15:44 -0400)]
bcachefs: More open buckets

We need a larger open bucket reserve now that the btree interior update
path holds onto open bucket references; filesystems with many high
through devices may need more open buckets now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't allocate memory under the btree cache lock
Kent Overstreet [Tue, 9 Jun 2020 21:49:24 +0000 (17:49 -0400)]
bcachefs: Don't allocate memory under the btree cache lock

The btree cache lock is needed for reclaiming from the btree node cache,
and memory allocation can potentially spin and sleep (for 100 ms at a
time), so.. don't do that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a linked list bug
Kent Overstreet [Tue, 9 Jun 2020 20:25:07 +0000 (16:25 -0400)]
bcachefs: Fix a linked list bug

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Make open bucket reserves more conservative
Kent Overstreet [Tue, 9 Jun 2020 19:46:22 +0000 (15:46 -0400)]
bcachefs: Make open bucket reserves more conservative

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: btree_update_nodes_written() requires alloc reserve
Kent Overstreet [Tue, 9 Jun 2020 19:59:03 +0000 (15:59 -0400)]
bcachefs: btree_update_nodes_written() requires alloc reserve

Also, in the btree_update_start() path, if we already have a journal
pre-reservation we don't want to take another - that's a deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()
Kent Overstreet [Fri, 5 Jun 2020 13:01:23 +0000 (09:01 -0400)]
bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()

bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but
GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as
we're a filesystem now GFP_NOFS makes more sense now and is looser.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Call bch2_btree_iter_traverse() if necessary in commit path
Kent Overstreet [Mon, 8 Jun 2020 18:28:16 +0000 (14:28 -0400)]
bcachefs: Call bch2_btree_iter_traverse() if necessary in commit path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: bch2_trans_downgrade()
Kent Overstreet [Mon, 8 Jun 2020 17:26:48 +0000 (13:26 -0400)]
bcachefs: bch2_trans_downgrade()

bch2_btree_iter_downgrade() was looping over all iterators in a
transaction; bch2_trans_downgrade() should be doing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve warning for copygc failing to move data
Kent Overstreet [Thu, 4 Jun 2020 03:47:50 +0000 (23:47 -0400)]
bcachefs: Improve warning for copygc failing to move data

This will help narrow down which code is at fault when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Always increment bucket gen on bucket reuse
Kent Overstreet [Thu, 4 Jun 2020 03:46:15 +0000 (23:46 -0400)]
bcachefs: Always increment bucket gen on bucket reuse

Not doing so confuses copygc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill old allocator startup code
Kent Overstreet [Thu, 4 Jun 2020 02:11:10 +0000 (22:11 -0400)]
bcachefs: Kill old allocator startup code

It's not needed anymore since we can now write to buckets before
updating the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve assorted error messages
Kent Overstreet [Wed, 3 Jun 2020 22:27:07 +0000 (18:27 -0400)]
bcachefs: Improve assorted error messages

This also consolidates the various checks in bch2_mark_pointer() and
bch2_trans_mark_pointer().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock in bch2_btree_node_get_sibling()
Kent Overstreet [Tue, 2 Jun 2020 23:41:47 +0000 (19:41 -0400)]
bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()

There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add debug code to print btree transactions
Kent Overstreet [Tue, 2 Jun 2020 20:36:11 +0000 (16:36 -0400)]
bcachefs: Add debug code to print btree transactions

Intented to help debug deadlocks, since we can't use lockdep to check
btree node lock ordering.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Set filesystem features earlier in fs init path
Kent Overstreet [Wed, 3 Jun 2020 20:20:22 +0000 (16:20 -0400)]
bcachefs: Set filesystem features earlier in fs init path

Before we were setting features after allocating btree nodes, which
meant we were using the old btree pointer format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add an option to disable reflink support
Kent Overstreet [Tue, 2 Jun 2020 20:30:54 +0000 (16:30 -0400)]
bcachefs: Add an option to disable reflink support

Reflink might be buggy, so we're adding an option so users can help
bisect what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fixes for going RO
Kent Overstreet [Thu, 28 May 2020 20:06:13 +0000 (16:06 -0400)]
bcachefs: Fixes for going RO

Now that interior btree updates are fully transactional, we don't need
to write out alloc info in a loop. However, interior btree updates do
put more things in the journal, so we still need a loop in the RO
sequence.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't require alloc btree to be updated before buckets are used
Kent Overstreet [Thu, 28 May 2020 19:51:50 +0000 (15:51 -0400)]
bcachefs: Don't require alloc btree to be updated before buckets are used

This is to break a circular dependency in the shutdown path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: fsck_error_lock requires GFP_NOFS
Kent Overstreet [Thu, 28 May 2020 21:15:41 +0000 (17:15 -0400)]
bcachefs: fsck_error_lock requires GFP_NOFS

this fixes a lockdep splat

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Interior btree updates are now fully transactional
Kent Overstreet [Mon, 25 May 2020 18:57:06 +0000 (14:57 -0400)]
bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Factor out bch2_fs_btree_interior_update_init()
Kent Overstreet [Tue, 26 May 2020 00:35:53 +0000 (20:35 -0400)]
bcachefs: Factor out bch2_fs_btree_interior_update_init()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit()
Kent Overstreet [Mon, 25 May 2020 23:29:48 +0000 (19:29 -0400)]
bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix reading of alloc info after unclean shutdown
Kent Overstreet [Sun, 24 May 2020 18:06:10 +0000 (14:06 -0400)]
bcachefs: Fix reading of alloc info after unclean shutdown

When updates to interior nodes started being journalled, that meant that
after an unclean shutdown, until journal replay is done we can't walk
the btree without overlaying the updates from the journal.

The initial btree gc was changed to walk the btree overlaying keys from
the journal - but bch2_alloc_read() and bch2_stripes_read() were missed.
Major whoops...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: fix memalloc_nofs_restore() usage
Kent Overstreet [Wed, 27 May 2020 18:10:27 +0000 (14:10 -0400)]
bcachefs: fix memalloc_nofs_restore() usage

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Better error messages on bucket sector count overflows
Kent Overstreet [Sun, 24 May 2020 18:20:00 +0000 (14:20 -0400)]
bcachefs: Better error messages on bucket sector count overflows

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Be more rigorous about marking the filesystem clean
Kent Overstreet [Sun, 24 May 2020 17:37:44 +0000 (13:37 -0400)]
bcachefs: Be more rigorous about marking the filesystem clean

Previously, there was at least one error path where we could mark the
filesystem clean when we hadn't sucessfully written out alloc info.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Handle printing of null bkeys
Kent Overstreet [Tue, 26 May 2020 01:25:31 +0000 (21:25 -0400)]
bcachefs: Handle printing of null bkeys

This fixes a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add vmalloc fallback for decompress workspace
Kent Overstreet [Mon, 25 May 2020 22:47:21 +0000 (18:47 -0400)]
bcachefs: Add vmalloc fallback for decompress workspace

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Print out d_type in dirent_to_text()
Kent Overstreet [Sat, 23 May 2020 15:44:12 +0000 (11:44 -0400)]
bcachefs: Print out d_type in dirent_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: fix stack corruption
Yuxuan Shui [Fri, 22 May 2020 14:50:05 +0000 (15:50 +0100)]
bcachefs: fix stack corruption

When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.

This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.

Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Wrap vmap() in memalloc_nofs_save()/restore()
Kent Overstreet [Thu, 21 May 2020 21:23:40 +0000 (17:23 -0400)]
bcachefs: Wrap vmap() in memalloc_nofs_save()/restore()

vmalloc() and vmap() don't take GFP_NOFS - this should be pushed further
up the IO path, but for now just doing the simple fix.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another iterator counting bug
Kent Overstreet [Fri, 15 May 2020 01:45:08 +0000 (21:45 -0400)]
bcachefs: Fix another iterator counting bug

We were marking the end of where we could insert incorrectly for
indirect extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix setquota
Kent Overstreet [Wed, 13 May 2020 21:53:33 +0000 (17:53 -0400)]
bcachefs: Fix setquota

We were returning -EINTR because we were failing to retry the btree
transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a workqueue deadlock
Kent Overstreet [Wed, 13 May 2020 04:15:28 +0000 (00:15 -0400)]
bcachefs: Fix a workqueue deadlock

writes running out of a workqueue (via dio path) could block and prevent
other writes from calling bch2_write_index() and completing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Validate that we read the correct btree node
Kent Overstreet [Tue, 12 May 2020 22:34:16 +0000 (18:34 -0400)]
bcachefs: Validate that we read the correct btree node

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fixes for startup on very full filesystems
Kent Overstreet [Tue, 12 May 2020 00:01:07 +0000 (20:01 -0400)]
bcachefs: Fixes for startup on very full filesystems

 - Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys
 - Don't strand buckest on the copygc freelist until after recovery is
   done and we're starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix initialization of bounce mempools
Kent Overstreet [Sat, 9 May 2020 03:15:42 +0000 (23:15 -0400)]
bcachefs: Fix initialization of bounce mempools

When they were converted to kvpmalloc pools they weren't converted to
pass the actual size of the allocation. Oops.

Also, validate the real length in the zstd decompression path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Some compression improvements
Kent Overstreet [Wed, 6 May 2020 19:37:04 +0000 (15:37 -0400)]
bcachefs: Some compression improvements

In __bio_map_or_bounce(), the check for if the bio is physically
contiguous is improved; it's now more readable and handles multi page
but contiguous bios.

Also when decompressing, we were doing a redundant memcpy in the case
where we were able to use vmap to map a bio contigiously.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix two more deadlocks
Kent Overstreet [Sat, 2 May 2020 20:21:35 +0000 (16:21 -0400)]
bcachefs: Fix two more deadlocks

Deadlock on shutdown:

btree_update_nodes_written() unblocks btree nodes from being written;
after doing so, it has to check if they were marked as needing to be
written and if so kick off those writes - if that doesn't happen, we'll
never release journal pins and shutdown will get stuck when flushing the
journal.

There was an error path where this didn't happen, because in the error
path we don't actually want those btree nodes write to happen; however,
we still have to kick off the write path so the journal pins get
released. The btree write path checks if we're in a journal error state
and doesn't do the actual write if we are.

Also - there was another deadlock because btree_update_nodes_written()
was taking the btree update off of the unwritten_list too soon - before
getting a journal reservation, which could fail and have to be retried.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another deadlock in btree_update_nodes_written()
Kent Overstreet [Fri, 1 May 2020 23:56:31 +0000 (19:56 -0400)]
bcachefs: Fix another deadlock in btree_update_nodes_written()

We also can't be blocking on btree node write locks while holding
btree_interior_update_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add some printks for error paths
Kent Overstreet [Wed, 29 Apr 2020 16:57:04 +0000 (12:57 -0400)]
bcachefs: Add some printks for error paths

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't issue writes that are more than 1 MB
Kent Overstreet [Wed, 29 Apr 2020 19:28:25 +0000 (15:28 -0400)]
bcachefs: Don't issue writes that are more than 1 MB

the bcachefs io path in io.c can't bounce writes larger than that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: More fixes for counting extent update iterators
Kent Overstreet [Fri, 24 Apr 2020 21:57:59 +0000 (17:57 -0400)]
bcachefs: More fixes for counting extent update iterators

This is unfortunately really fragile - hopefully we'll be able to think
of a new approach at some point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock
Kent Overstreet [Fri, 24 Apr 2020 22:25:11 +0000 (18:25 -0400)]
bcachefs: Fix a deadlock

btree_node_lock_increment() was incorrectly skipping over the current
iter when checking if we should increment a node we already have locked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Handle -EINTR bch2_migrate_index_update()
Kent Overstreet [Fri, 24 Apr 2020 18:08:56 +0000 (14:08 -0400)]
bcachefs: Handle -EINTR bch2_migrate_index_update()

peek_slot() shouldn't return -EINTR when there's only a single live
iterator, but that's tricky to guarantee - we seem to be returning
-EINTR when we shouldn't, but it's easy enough to handle in the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for the bkey compat path
Kent Overstreet [Fri, 24 Apr 2020 18:08:18 +0000 (14:08 -0400)]
bcachefs: Fix for the bkey compat path

In the write path, we were calling bch2_bkey_ops.compat() in the wrong
place.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a few tracepoints
Kent Overstreet [Sat, 11 Apr 2020 16:32:27 +0000 (12:32 -0400)]
bcachefs: Add a few tracepoints

Transaction restart tracing should probably be overhaulled at some
point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Slightly reduce btree split threshold
Kent Overstreet [Sat, 11 Apr 2020 16:31:16 +0000 (12:31 -0400)]
bcachefs: Slightly reduce btree split threshold

2/3rds performs a lot better than 3/4ths on the tested workloda, leading
to significanly fewer btree node compactions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve lockdep annotation in journalling code
Kent Overstreet [Sat, 11 Apr 2020 16:30:30 +0000 (12:30 -0400)]
bcachefs: Improve lockdep annotation in journalling code

bch2_journal_res_get() in nonblocking mode is equivalent to a trylock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug in bch2_journal_pin_copy()
Kent Overstreet [Sat, 11 Apr 2020 16:29:32 +0000 (12:29 -0400)]
bcachefs: Fix a locking bug in bch2_journal_pin_copy()

There was a race where the src pin would be flushed - releasing the last
pin on that sequence number - before adding the new journal pin. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another deadlock in the btree interior update path
Kent Overstreet [Tue, 7 Apr 2020 21:27:12 +0000 (17:27 -0400)]
bcachefs: Fix another deadlock in the btree interior update path

Can't take read locks on btree nodes while holding
btree_interior_update_lock. Also, fix a bug where we were leaking
journal prereservations.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()
Kent Overstreet [Tue, 7 Apr 2020 21:31:38 +0000 (17:31 -0400)]
bcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Account for ioclock slop when throttling rebalance thread
Kent Overstreet [Tue, 7 Apr 2020 17:49:14 +0000 (13:49 -0400)]
bcachefs: Account for ioclock slop when throttling rebalance thread

This should fix an issue where the rebalance thread was spinning

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock on starting an interior btree update
Kent Overstreet [Mon, 6 Apr 2020 01:49:17 +0000 (21:49 -0400)]
bcachefs: Fix a deadlock on starting an interior btree update

Not legal to block on a journal prereservation with btree locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a debug mode assertion
Kent Overstreet [Sat, 4 Apr 2020 20:47:59 +0000 (16:47 -0400)]
bcachefs: Fix a debug mode assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a debug assertion
Kent Overstreet [Sat, 4 Apr 2020 19:49:42 +0000 (15:49 -0400)]
bcachefs: Fix a debug assertion

This assertion was passing the wrong btree node type when inserting into
interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix another error path locking bug
Kent Overstreet [Sat, 4 Apr 2020 19:45:06 +0000 (15:45 -0400)]
bcachefs: Fix another error path locking bug

btree_update_nodes_written() was leaking a btree node lock on failure to
get a journal reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a null ptr deref during journal replay
Kent Overstreet [Sat, 4 Apr 2020 17:54:19 +0000 (13:54 -0400)]
bcachefs: Fix a null ptr deref during journal replay

We were calling bch2_extent_can_insert() incorrectly; it should only be
called when the extents-to-keys pass is running because that's when we
could be splitting a compressed extent. Calling bch2_extent_can_insert()
without passing in a disk reservation was causing a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add another mssing bch2_trans_iter_put() call
Kent Overstreet [Wed, 1 Apr 2020 21:28:39 +0000 (17:28 -0400)]
bcachefs: Add another mssing bch2_trans_iter_put() call

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Trace where btree iterators are allocated
Kent Overstreet [Wed, 1 Apr 2020 21:14:14 +0000 (17:14 -0400)]
bcachefs: Trace where btree iterators are allocated

This will help with iterator overflow bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix fallocate FL_INSERT_RANGE
Kent Overstreet [Wed, 1 Apr 2020 20:07:57 +0000 (16:07 -0400)]
bcachefs: Fix fallocate FL_INSERT_RANGE

This was another bug because of bch2_btree_iter_set_pos() invalidating
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add print method for bch2_btree_ptr_v2
Kent Overstreet [Tue, 31 Mar 2020 20:25:30 +0000 (16:25 -0400)]
bcachefs: Add print method for bch2_btree_ptr_v2

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix journalling of interior node updates
Kent Overstreet [Tue, 31 Mar 2020 20:23:43 +0000 (16:23 -0400)]
bcachefs: Fix journalling of interior node updates

We weren't journalling updates done while splitting/compacting nodes -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix iterating of journal keys within a btree node
Kent Overstreet [Mon, 30 Mar 2020 22:11:13 +0000 (18:11 -0400)]
bcachefs: Fix iterating of journal keys within a btree node

Extent btrees no longer have weird special behaviour for min_key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a locking bug
Kent Overstreet [Mon, 30 Mar 2020 21:43:21 +0000 (17:43 -0400)]
bcachefs: Fix a locking bug

Dropping the wrong kind of lock can't lead to anything good...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix inodes pass in fsck
Kent Overstreet [Mon, 30 Mar 2020 18:29:06 +0000 (14:29 -0400)]
bcachefs: Fix inodes pass in fsck

It wasn't updated for the patch that switched inodes to using the offset
field of struct bkey.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix ec_stripe_update_ptrs()
Kent Overstreet [Mon, 30 Mar 2020 18:05:05 +0000 (14:05 -0400)]
bcachefs: Fix ec_stripe_update_ptrs()

bch2_btree_iter_set_pos() invalidates the key returned by peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check btree topology at startup
Kent Overstreet [Sun, 29 Mar 2020 20:48:53 +0000 (16:48 -0400)]
bcachefs: Check btree topology at startup

When initial btree gc was changed to overlay journal keys as it walks
the btree, it also stopped checking btree topology.

Previously, checking btree topology was a fairly complicated affair -
but it's much easier now that btree_ptr_v2 has min_key in the pointer.

This rewrites the old range_checks code and uses it in both runtime and
initial gc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't allocate memory while holding journal reservation
Kent Overstreet [Mon, 30 Mar 2020 16:33:30 +0000 (12:33 -0400)]
bcachefs: Don't allocate memory while holding journal reservation

This fixes a lockdep splat - allocating memory can call
bch2_clear_page_bits() which takes mark_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Reduce max nr of btree iters when lockdep is on
Kent Overstreet [Sun, 29 Mar 2020 21:01:05 +0000 (17:01 -0400)]
bcachefs: Reduce max nr of btree iters when lockdep is on

This is so we don't overflow MAX_LOCK_DEPTH.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill bkey_type_successor
Kent Overstreet [Tue, 7 Jan 2020 18:29:32 +0000 (13:29 -0500)]
bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Switch a BUG_ON() to a warning
Kent Overstreet [Sun, 29 Mar 2020 18:21:44 +0000 (14:21 -0400)]
bcachefs: Switch a BUG_ON() to a warning

This has popped and thus needs to be debugged, but the assertion firing
isn't necessarily fatal so switch it to a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use kvpmalloc mempools for compression bounce
Kent Overstreet [Sun, 29 Mar 2020 16:33:41 +0000 (12:33 -0400)]
bcachefs: Use kvpmalloc mempools for compression bounce

This fixes an issue where mounting would fail because of memory
fragmentation - previously the compression bounce buffers were using
get_free_pages().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Read journal when keep_journal on
Kent Overstreet [Sat, 28 Mar 2020 22:26:01 +0000 (18:26 -0400)]
bcachefs: Read journal when keep_journal on

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Various fixes for interior update path
Kent Overstreet [Sat, 28 Mar 2020 23:17:23 +0000 (19:17 -0400)]
bcachefs: Various fixes for interior update path

The locking was wrong, and we could get a use after free in the error
path where we weren't taking the entrie being freed off the unwritten
list.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use memalloc_nofs_save()
Kent Overstreet [Fri, 27 Mar 2020 21:38:51 +0000 (17:38 -0400)]
bcachefs: Use memalloc_nofs_save()

vmalloc allocations don't always obey GFP_NOFS - memalloc_nofs_save() is
the prefered approach for the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>