]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 years agobcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys
Kent Overstreet [Tue, 27 Apr 2021 18:02:00 +0000 (14:02 -0400)]
bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys

We're seeing livelocks that appear to be due to
bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks
from using the key cache lock - we probably shouldn't be reporting
objects that can't actually be freed yet.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent
Kent Overstreet [Fri, 30 Apr 2021 02:32:44 +0000 (22:32 -0400)]
bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: New tracepoint for bch2_trans_get_iter()
Kent Overstreet [Thu, 29 Apr 2021 20:56:17 +0000 (16:56 -0400)]
bcachefs: New tracepoint for bch2_trans_get_iter()

Trying to debug an issue where after traverse_all() we shouldn't have to
traverse any iterators... yet we are

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix __bch2_trans_get_iter()
Kent Overstreet [Tue, 27 Apr 2021 15:12:17 +0000 (11:12 -0400)]
bcachefs: Fix __bch2_trans_get_iter()

We need to also set iter->uptodate to indicate it needs to be traversed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Evict btree nodes we're deleting
Kent Overstreet [Sun, 25 Apr 2021 20:24:03 +0000 (16:24 -0400)]
bcachefs: Evict btree nodes we're deleting

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: New check_nlinks algorithm for snapshots
Kent Overstreet [Thu, 22 Apr 2021 01:08:49 +0000 (21:08 -0400)]
bcachefs: New check_nlinks algorithm for snapshots

With snapshots, using a radix tree for the table of link counts won't
work anymore because we also need to distinguish between inodes with
different snapshot IDs. Instead, this patch builds up a sorted array of
inodes that have hardlinks that we can binary search on - taking
advantage of the fact that with inode backpointers, the check_nlinks()
pass _only_ needs to concern itself with inodes that have hardlinks now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a null ptr deref
Kent Overstreet [Sun, 25 Apr 2021 02:33:25 +0000 (22:33 -0400)]
bcachefs: Fix a null ptr deref

Fix a few memory safety issues, found by asan in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: New and improved topology repair code
Kent Overstreet [Sat, 24 Apr 2021 20:32:35 +0000 (16:32 -0400)]
bcachefs: New and improved topology repair code

This splits out btree topology repair into a separate pass, and makes
some improvements:
 - When we have to pick which of two overlapping nodes to drop keys
   from, we use the btree node header sequence number to preserve the
   newer node

 - the gc code has been changed so that it doesn't bail out if we're
   continuing/ignoring on fsck error - this way the dump tool can skip
   running the repair pass but still walk all reachable metadata

 - add a new superblock flag indicating when a filesystem is known to
   have btree topology issues, and the topology repair pass should be
   run

 - changing the start/end of a node might mean keys in that node have to
   be deleted: this patch handles that better by splitting it out into a
   separate function and running it explicitly in the topology repair
   code, previously those keys were only being dropped when the btree
   node was read in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix key cache assertion
Kent Overstreet [Sat, 24 Apr 2021 22:02:59 +0000 (18:02 -0400)]
bcachefs: Fix key cache assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: New helper __bch2_btree_insert_keys_interior()
Kent Overstreet [Fri, 23 Apr 2021 23:25:27 +0000 (19:25 -0400)]
bcachefs: New helper __bch2_btree_insert_keys_interior()

Consolidate common parts of bch2_btree_insert_keys_interior() and
btree_split_insert_keys() - prep work for adding some new topology
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Rewrite btree nodes with errors
Kent Overstreet [Sat, 24 Apr 2021 06:47:41 +0000 (02:47 -0400)]
bcachefs: Rewrite btree nodes with errors

This patch adds self healing functionality for btree nodes - if we
notice a problem when reading a btree node, we just rewrite it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_verify_keylist_sorted
Kent Overstreet [Sat, 24 Apr 2021 04:59:29 +0000 (00:59 -0400)]
bcachefs: Fix bch2_verify_keylist_sorted

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an out of bounds read
Kent Overstreet [Sat, 24 Apr 2021 04:42:02 +0000 (00:42 -0400)]
bcachefs: Fix an out of bounds read

bch2_varint_decode() can read up to 7 bytes past the end of the buffer,
which means we need to allocate slightly larger key cache buffers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use mmap() instead of vmalloc_exec() in userspace
Kent Overstreet [Sat, 24 Apr 2021 04:38:16 +0000 (00:38 -0400)]
bcachefs: Use mmap() instead of vmalloc_exec() in userspace

Calling mmap() directly is much better than malloc() then mprotect(), we
end up with much less address space fragmentation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't BUG_ON() btree topology error
Kent Overstreet [Fri, 23 Apr 2021 20:05:49 +0000 (16:05 -0400)]
bcachefs: Don't BUG_ON() btree topology error

This replaces an assertion in the btree merge path with a
bch2_inconsistent_error() - fsck will fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix repair leading to replicas not marked
Kent Overstreet [Fri, 23 Apr 2021 20:18:43 +0000 (16:18 -0400)]
bcachefs: Fix repair leading to replicas not marked

bch2_check_fix_ptrs() was being called after checking if the replicas
set was marked - but repair could change which replicas set needed to be
marked. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Lookup/create lost+found lazily
Kent Overstreet [Tue, 20 Apr 2021 02:19:18 +0000 (22:19 -0400)]
bcachefs: Lookup/create lost+found lazily

This is prep work for subvolumes - each subvolume will have its own
lost+found.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't BUG() in update_replicas
Kent Overstreet [Wed, 21 Apr 2021 22:08:39 +0000 (18:08 -0400)]
bcachefs: Don't BUG() in update_replicas

Apparently, we have a bug where in mark and sweep while accounting for a
key, a replicas entry isn't found. Change the code to print out the key
we couldn't mark and halt instead of a BUG_ON().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a deadlock on journal reclaim
Kent Overstreet [Tue, 20 Apr 2021 21:09:25 +0000 (17:09 -0400)]
bcachefs: Fix a deadlock on journal reclaim

Flushing the btree key cache needs to use allocation reserves - journal
reclaim depends on flushing the btree key cache for making forward
progress, and the allocator and copygc depend on journal reclaim making
forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Update bch2_btree_verify()
Kent Overstreet [Wed, 21 Apr 2021 00:21:12 +0000 (20:21 -0400)]
bcachefs: Update bch2_btree_verify()

bch2_btree_verify() verifies that the btree node on disk matches what we
have in memory. This patch changes it to verify every replica, and also
fixes it for interior btree nodes - there's a mem_ptr field which is
used as a scratch space and needs to be zeroed out for comparing with
what's on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix two btree iterator leaks
Kent Overstreet [Wed, 21 Apr 2021 00:21:39 +0000 (20:21 -0400)]
bcachefs: Fix two btree iterator leaks

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Punt btree writes to workqueue to submit
Kent Overstreet [Tue, 6 Apr 2021 19:28:34 +0000 (15:28 -0400)]
bcachefs: Punt btree writes to workqueue to submit

We don't want to be submitting IO with btree locks held, and btree
writes usually aren't latency sensitive.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a use after free
Kent Overstreet [Mon, 19 Apr 2021 21:17:34 +0000 (17:17 -0400)]
bcachefs: Fix a use after free

Turns out, we weren't waiting on in flight btree writes when freeing
existing btree nodes. This lead to stray btree writes overwriting newly
allocated buckets, but only started showing itself with some of the
recent allocator work and another patch to move submitting of btree
writes to worqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix for btree_gc repairing interior btree ptrs
Kent Overstreet [Mon, 19 Apr 2021 21:07:20 +0000 (17:07 -0400)]
bcachefs: Fix for btree_gc repairing interior btree ptrs

Using the normal transaction commit path to insert and journal updates
to interior nodes hadn't been done before this repair code was written,
not surprising that there was a bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Preallocate trans mem in bch2_migrate_index_update()
Kent Overstreet [Mon, 19 Apr 2021 04:33:05 +0000 (00:33 -0400)]
bcachefs: Preallocate trans mem in bch2_migrate_index_update()

This will help avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Allocator refactoring
Kent Overstreet [Sun, 18 Apr 2021 00:37:04 +0000 (20:37 -0400)]
bcachefs: Allocator refactoring

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Always check for invalid bkeys in trans commit path
Kent Overstreet [Sun, 18 Apr 2021 21:44:35 +0000 (17:44 -0400)]
bcachefs: Always check for invalid bkeys in trans commit path

We check for this prior to metadata being written, but we're seeing some
strange bugs lately, and this will help catch those closer to where they
occur.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check that keys are in the correct btrees
Kent Overstreet [Sun, 18 Apr 2021 03:18:17 +0000 (23:18 -0400)]
bcachefs: Check that keys are in the correct btrees

We've started seeing bug reports of pointers to btree nodes being
detected in leaf nodes. This should catch that before it's happened, and
it's something we should've been checking anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Handle errors in bch2_trans_mark_update()
Kent Overstreet [Sun, 18 Apr 2021 21:26:34 +0000 (17:26 -0400)]
bcachefs: Handle errors in bch2_trans_mark_update()

It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Allocator thread doesn't need gc_lock anymore
Kent Overstreet [Sat, 17 Apr 2021 01:53:23 +0000 (21:53 -0400)]
bcachefs: Allocator thread doesn't need gc_lock anymore

Even with runtime gc (which currently isn't supported), runtime gc no
longer clears/recalculates the main set of bucket marks - it allocates
and calculates another set, updating the primary at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: gc shouldn't care about owned_by_allocator
Kent Overstreet [Sat, 17 Apr 2021 01:34:00 +0000 (21:34 -0400)]
bcachefs: gc shouldn't care about owned_by_allocator

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack
Kent Overstreet [Sat, 17 Apr 2021 00:35:20 +0000 (20:35 -0400)]
bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack

Upcoming patch is going to disallow multiple btree_trans on the stack.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an unused var warning in userspace
Kent Overstreet [Fri, 16 Apr 2021 21:34:53 +0000 (17:34 -0400)]
bcachefs: Fix an unused var warning in userspace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix some small memory leaks
Kent Overstreet [Fri, 16 Apr 2021 21:26:25 +0000 (17:26 -0400)]
bcachefs: Fix some small memory leaks

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify fsck remove_dirent()
Kent Overstreet [Fri, 16 Apr 2021 18:48:51 +0000 (14:48 -0400)]
bcachefs: Simplify fsck remove_dirent()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix transaction restarts due to upgrading of cloned iterators
Kent Overstreet [Fri, 16 Apr 2021 18:29:26 +0000 (14:29 -0400)]
bcachefs: Fix transaction restarts due to upgrading of cloned iterators

This fixes a regression from
  52d86202fd bcachefs: Improve bch2_btree_iter_traverse_all()

We want to avoid mucking with other iterators in the btree transaction
in operations that are only supposed to be touching individual iterators
- that patch was a cleanup to move lock ordering handling to
bch2_btree_iter_traverse_all(). But it broke upgrading of cloned
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix journal reclaim loop
Kent Overstreet [Fri, 16 Apr 2021 16:38:14 +0000 (12:38 -0400)]
bcachefs: Fix journal reclaim loop

When dirty key cache keys were separated from other journal pins, we
broke the loop conditional in __bch2_journal_reclaim() - it's supposed
to keep looping as long as there's work to do.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an RCU splat
Kent Overstreet [Thu, 15 Apr 2021 22:31:58 +0000 (18:31 -0400)]
bcachefs: Fix an RCU splat

Writepoints are never deallocated so the rcu_read_lock() isn't really
needed, but we are doing lockless list traversal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify bch2_set_nr_journal_buckets()
Kent Overstreet [Thu, 15 Apr 2021 00:23:58 +0000 (20:23 -0400)]
bcachefs: Simplify bch2_set_nr_journal_buckets()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_trans_mark_dev_sb()
Kent Overstreet [Thu, 15 Apr 2021 00:25:33 +0000 (20:25 -0400)]
bcachefs: Fix bch2_trans_mark_dev_sb()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve trans_restart_mem_realloced tracepoint
Kent Overstreet [Thu, 15 Apr 2021 16:50:09 +0000 (12:50 -0400)]
bcachefs: Improve trans_restart_mem_realloced tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't downgrade iterators in bch2_trans_get_iter()
Kent Overstreet [Thu, 15 Apr 2021 16:36:40 +0000 (12:36 -0400)]
bcachefs: Don't downgrade iterators in bch2_trans_get_iter()

This fixes a livelock with btree node splits.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve bch2_btree_iter_traverse_all()
Kent Overstreet [Wed, 14 Apr 2021 17:26:15 +0000 (13:26 -0400)]
bcachefs: Improve bch2_btree_iter_traverse_all()

By changing it to upgrade iterators to intent locks to avoid lock
restarts we can simplify __bch2_btree_node_lock() quite a bit - this
fixes a probable bug where it could potentially drop a lock on an
unrelated error but still succeed instead of causing a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix journal_reclaim_wait_done()
Kent Overstreet [Thu, 15 Apr 2021 02:15:55 +0000 (22:15 -0400)]
bcachefs: Fix journal_reclaim_wait_done()

Can't run arbitrary code inside a wait_event() conditional, due to
task state being weird...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_gc_done() error messages
Kent Overstreet [Thu, 15 Apr 2021 00:22:10 +0000 (20:22 -0400)]
bcachefs: Fix bch2_gc_done() error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't call bch2_btree_iter_traverse() unnecessarily
Kent Overstreet [Wed, 14 Apr 2021 21:45:31 +0000 (17:45 -0400)]
bcachefs: Don't call bch2_btree_iter_traverse() unnecessarily

If we let bch2_trans_commit() do it, it'll traverse iterators in sorted
order which means we'll get fewer lock restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Better iterator picking
Kent Overstreet [Wed, 14 Apr 2021 17:29:34 +0000 (13:29 -0400)]
bcachefs: Better iterator picking

Avoid cloning iterators if we don't have to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop old style btree node coalescing
Kent Overstreet [Wed, 14 Apr 2021 16:17:41 +0000 (12:17 -0400)]
bcachefs: Drop old style btree node coalescing

We have foreground btree node merging now, and any future btree node
merging improvements are going to be based off of that code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a perf test for multiple updates per commit
Kent Overstreet [Wed, 14 Apr 2021 16:10:17 +0000 (12:10 -0400)]
bcachefs: Add a perf test for multiple updates per commit

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Ensure bucket gen gc completes
Kent Overstreet [Tue, 13 Apr 2021 19:10:39 +0000 (15:10 -0400)]
bcachefs: Ensure bucket gen gc completes

We don't want it to block, if it can't allocate it should just continue
instead of possibly deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add the status of bucket gen gc to sysfs
Kent Overstreet [Tue, 13 Apr 2021 19:00:40 +0000 (15:00 -0400)]
bcachefs: Add the status of bucket gen gc to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash
Kent Overstreet [Tue, 13 Apr 2021 14:30:58 +0000 (10:30 -0400)]
bcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash

oops

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: BCH_BEATURE_atomic_nlink is obsolete
Kent Overstreet [Tue, 13 Apr 2021 14:26:59 +0000 (10:26 -0400)]
bcachefs: BCH_BEATURE_atomic_nlink is obsolete

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improved check_directory_structure()
Kent Overstreet [Wed, 7 Apr 2021 07:11:07 +0000 (03:11 -0400)]
bcachefs: Improved check_directory_structure()

Now that we have inode backpointers, we can simplify checking directory
structure: instead of doing a DFS from the filesystem root and then
checking if we found everything, we can iterate over every inode and see
if we can go up until we get to the root.

This patch also has a number of fixes and simplifications for the inode
backpointer checks. Also, it turns out we don't actually need the
BCH_INODE_BACKPTR_UNTRUSTED flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix fsck to not use bch2_link_trans()
Kent Overstreet [Fri, 9 Apr 2021 07:25:37 +0000 (03:25 -0400)]
bcachefs: Fix fsck to not use bch2_link_trans()

bch2_link_trans() uses the btree key cache for inode updates, and fsck
isn't supposed to - also, it's not really what we want for reattaching
unreachable inodes anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_trans_relock()
Kent Overstreet [Mon, 12 Apr 2021 18:00:07 +0000 (14:00 -0400)]
bcachefs: Fix bch2_trans_relock()

The patch that changed bch2_trans_relock() to not look at iter->uptodate
also tried to add an optimization by only having it relock
btree_iter_key() iterators (iterators that are live or have been marked
as keep). But, this wasn't thought through - this pops internal iterator
assertions because on transaction restart, when we're traversing
iterators we traverse all iterators marked as linked, and having
bch2_trans_relock() skip some of those mean that it can skil the
iterator that bch2_btree_iter_traverse_one() is currently traversing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Redo check_nlink fsck pass
Kent Overstreet [Thu, 8 Apr 2021 19:25:29 +0000 (15:25 -0400)]
bcachefs: Redo check_nlink fsck pass

Now that we have inode backpointers the check_nlink pass only is
concerned with files that have hardlinks, and can be simplified.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Inode backpointers are now required
Kent Overstreet [Wed, 7 Apr 2021 00:15:26 +0000 (20:15 -0400)]
bcachefs: Inode backpointers are now required

This lets us simplify fsck quite a bit, which we need for making fsck
snapshot aware.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Simplify hash table checks
Kent Overstreet [Wed, 7 Apr 2021 05:55:57 +0000 (01:55 -0400)]
bcachefs: Simplify hash table checks

Very early on there was a period where we were accidentally generating
dirents with trailing garbage; we've since dropped support for
filesystems that old and the fsck code can be dropped.

Also, this patch switches to a simpler algorithm for checking hash
tables. It's less efficient on hash collision - but with 64 bit keys,
those are very rare.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Check inodes at start of fsck
Kent Overstreet [Wed, 7 Apr 2021 01:41:48 +0000 (21:41 -0400)]
bcachefs: Check inodes at start of fsck

This splits out checking inode nlinks from the rest of the inode checks
and moves most of the inode checks to the start of fsck, so that other
fsck passes can depend on it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix BTREE_ITER_NOT_EXTENTS
Kent Overstreet [Fri, 9 Apr 2021 20:52:30 +0000 (16:52 -0400)]
bcachefs: Fix BTREE_ITER_NOT_EXTENTS

bch2_btree_iter_peek() wasn't properly checking for
BTREE_ITER_IS_EXTENTS when updating iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_gc_btree_gens()
Kent Overstreet [Fri, 9 Apr 2021 19:10:24 +0000 (15:10 -0400)]
bcachefs: Fix bch2_gc_btree_gens()

Since we're using a NOT_EXTENTS iterator, we shouldn't be setting the
iter pos to the start of the extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Make sure to kick journal reclaim when we're waiting on it
Kent Overstreet [Thu, 8 Apr 2021 20:15:03 +0000 (16:15 -0400)]
bcachefs: Make sure to kick journal reclaim when we're waiting on it

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocator
Kent Overstreet [Thu, 8 Apr 2021 01:04:04 +0000 (21:04 -0400)]
bcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocator

It used to be necessary for the allocator thread to batch up
invalidating buckets when possible - but since we added the btree key
cache that hasn't been a concern, and now it's causing the allocator
thread to livelock when the filesystem is nearly full.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop bch2_fsck_inode_nlink()
Kent Overstreet [Wed, 7 Apr 2021 01:19:25 +0000 (21:19 -0400)]
bcachefs: Drop bch2_fsck_inode_nlink()

We've had BCH_FEATURE_atomic_nlink for quite some time, we can drop this
now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Move some dirent checks to bch2_dirent_invalid()
Kent Overstreet [Wed, 7 Apr 2021 00:11:28 +0000 (20:11 -0400)]
bcachefs: Move some dirent checks to bch2_dirent_invalid()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve bset compaction
Kent Overstreet [Tue, 6 Apr 2021 19:33:19 +0000 (15:33 -0400)]
bcachefs: Improve bset compaction

The previous patch that fixed btree nodes being written too aggressively
now meant that we weren't sorting btree node bsets optimally - this
patch fixes that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't flush btree writes more aggressively because of btree key cache
Kent Overstreet [Thu, 1 Apr 2021 01:44:55 +0000 (21:44 -0400)]
bcachefs: Don't flush btree writes more aggressively because of btree key cache

We need to flush the btree key cache when it's too dirty, because
otherwise the shrinker won't be able to reclaim memory - this is done by
journal reclaim. But journal reclaim also kicks btree node writes: this
meant that btree node writes were getting kicked much too often just
because we needed to flush btree key cache keys.

This patch splits journal pins into two different lists, and teaches
journal reclaim to not flush btree node writes when it only needs to
flush key cache keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Eliminate more PAGE_SIZE uses
Kent Overstreet [Tue, 6 Apr 2021 18:00:56 +0000 (14:00 -0400)]
bcachefs: Eliminate more PAGE_SIZE uses

In userspace, we don't really have a well defined PAGE_SIZE and shouln't
be relying on it. This is some more incremental work to remove
references to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Increase BSET_CACHELINE to 256 bytes
Kent Overstreet [Tue, 6 Apr 2021 17:43:31 +0000 (13:43 -0400)]
bcachefs: Increase BSET_CACHELINE to 256 bytes

Linear searches have gotten cheaper relative to binary searches on
modern hardware, due to better branch prediction behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix a startup race
Kent Overstreet [Mon, 5 Apr 2021 05:23:55 +0000 (01:23 -0400)]
bcachefs: Fix a startup race

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix an uninitialized variable
Kent Overstreet [Mon, 5 Apr 2021 02:38:07 +0000 (22:38 -0400)]
bcachefs: Fix an uninitialized variable

Fortunately it was just used in an error message

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: kill bset_tree->max_key
Kent Overstreet [Sun, 4 Apr 2021 01:54:14 +0000 (21:54 -0400)]
bcachefs: kill bset_tree->max_key

Since we now ensure a btree node's max key fits in its packed format,
this isn't needed for the reasons it used to be - and, it was being used
inconsistently.

Also reorder struct btree a bit for performance, and kill some dead
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Eliminate memory barrier from fast path of journal_preres_put()
Kent Overstreet [Sun, 4 Apr 2021 01:31:02 +0000 (21:31 -0400)]
bcachefs: Eliminate memory barrier from fast path of journal_preres_put()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop some memset() calls
Kent Overstreet [Sun, 4 Apr 2021 01:09:13 +0000 (21:09 -0400)]
bcachefs: Drop some memset() calls

gcc is emitting rep stos here, which is silly (and slow) for an 8 byte
memset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill bch2_fs_usage_scratch_get()
Kent Overstreet [Sun, 4 Apr 2021 00:29:05 +0000 (20:29 -0400)]
bcachefs: Kill bch2_fs_usage_scratch_get()

This is an important cleanup, eliminating an unnecessary copy in the
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix livelock calling bch2_mark_bkey_replicas()
Kent Overstreet [Sat, 3 Apr 2021 23:41:09 +0000 (19:41 -0400)]
bcachefs: Fix livelock calling bch2_mark_bkey_replicas()

The bug was that we were trying to find a replicas entry that wasn't
sorted - but, we can also simplify the code by not using
bch2_mark_bkey_replicas and instead ensuring the list of replicas
entries exists directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Be more careful about JOURNAL_RES_GET_RESERVED
Kent Overstreet [Sat, 3 Apr 2021 20:24:13 +0000 (16:24 -0400)]
bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED

JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be
done to free up space in the journal. In particular, when we're flushing
keys from the key cache, if we're flushing them out of order we
shouldn't be using it, since we're using up our remaining space in the
journal without dropping a pin that will let us make forward progress.

With this patch, BTREE_INSERT_JOURNAL_RECLAIM without
BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on
journal reclaim if we're already in journal reclaim.

This means we need to propagate these errors up to journal reclaim,
indicating that flushing a journal pin should be retried in the future.

This is prep work for a patch to change the way journal reclaim works,
to split out flushing key cache keys because the btree key cache is too
dirty from journal reclaim because we need space in the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix journal deadlock
Kent Overstreet [Sat, 3 Apr 2021 23:27:05 +0000 (19:27 -0400)]
bcachefs: Fix journal deadlock

After we get a journal reservation, we need to use it - if we erorr out
of a transaction commit, we'll be eating into space in the journal and
if our transaction needs to make forward progress in order to reclaim
space in the journal, we'll deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix this_cpu_ptr() usage
Kent Overstreet [Sat, 3 Apr 2021 22:37:09 +0000 (18:37 -0400)]
bcachefs: Fix this_cpu_ptr() usage

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Increase commality between BTREE_ITER_NODES and BTREE_ITER_KEYS
Kent Overstreet [Sat, 3 Apr 2021 01:29:05 +0000 (21:29 -0400)]
bcachefs: Increase commality between BTREE_ITER_NODES and BTREE_ITER_KEYS

Eventually BTREE_ITER_NODES should be going away. This patch is to fix a
transaction iterator overflow in the btree node merge path because
BTREE_ITER_NODES iterators couldn't be reused.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix BTREE_FOREGROUND_MERGE_HYSTERESIS
Kent Overstreet [Wed, 31 Mar 2021 20:10:21 +0000 (16:10 -0400)]
bcachefs: Fix BTREE_FOREGROUND_MERGE_HYSTERESIS

We were multiplying instead of dividing - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Drop trans->nounlock
Kent Overstreet [Wed, 31 Mar 2021 20:43:50 +0000 (16:43 -0400)]
bcachefs: Drop trans->nounlock

Since we're no longer doing btree node merging post commit, we can now
delete a bunch of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Move btree node merging to before transaction commit
Kent Overstreet [Mon, 29 Mar 2021 05:13:31 +0000 (01:13 -0400)]
bcachefs: Move btree node merging to before transaction commit

Currently, BTREE_INSERT_NOUNLOCK makes it hard to ensure btree node
merging happens reliably - since btree node merging happens after
transaction commit, we can't drop btree locks and block when starting
the btree update.

This patch moves it to before transaction commit - and failure to do a
merge that we wanted to do just restarts the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts
Kent Overstreet [Wed, 31 Mar 2021 20:16:39 +0000 (16:16 -0400)]
bcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts

This means that btree node splits don't have to automatically trigger a
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Kill bch2_btree_node_get_sibling()
Kent Overstreet [Mon, 29 Mar 2021 05:13:31 +0000 (01:13 -0400)]
bcachefs: Kill bch2_btree_node_get_sibling()

This patch reworks the btree node merge path to use a second btree
iterator to get the sibling node - which means
bch2_btree_iter_get_sibling() can be deleted. Also, it uses
bch2_btree_iter_traverse_all() if necessary - which means it should be
more reliable. We don't currently even try to make it work when
trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction
commit, hopefully this will be a worthwhile tradeoff.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Change where merging of interior btree nodes is trigger from
Kent Overstreet [Wed, 31 Mar 2021 19:39:16 +0000 (15:39 -0400)]
bcachefs: Change where merging of interior btree nodes is trigger from

Previously, we were doing btree node merging from
bch2_btree_insert_node() - but this is called from the split path, when
we're in the middle of creating new nodes and deleting new nodes and the
iterators are in a weird state.

Also, this means we're starting a new btree_update while in the middle
of an existing one, and that's asking for deadlocks.

Much simpler and saner to trigger btree node merging _after_ the whole
btree node split path is finished.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve bch2_btree_update_start()
Kent Overstreet [Wed, 31 Mar 2021 19:21:37 +0000 (15:21 -0400)]
bcachefs: Improve bch2_btree_update_start()

bch2_btree_update_start() is now responsible for taking gc_lock and
upgrading the iterator to lock parent nodes - greatly simplifying error
handling and all of the callers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add a sysfs var for average btree write size
Kent Overstreet [Thu, 1 Apr 2021 01:07:37 +0000 (21:07 -0400)]
bcachefs: Add a sysfs var for average btree write size

Useful number for performance tuning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Improve bch2_trans_relock()
Kent Overstreet [Wed, 31 Mar 2021 00:35:46 +0000 (20:35 -0400)]
bcachefs: Improve bch2_trans_relock()

We're getting away from relying on iter->uptodate - this changes
bch2_trans_relock() to more directly specify which iterators should be
relocked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Move btree lock debugging to slowpath fn
Kent Overstreet [Wed, 31 Mar 2021 18:42:36 +0000 (14:42 -0400)]
bcachefs: Move btree lock debugging to slowpath fn

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't make foreground writes wait behind journal reclaim too long
Kent Overstreet [Wed, 31 Mar 2021 21:52:52 +0000 (17:52 -0400)]
bcachefs: Don't make foreground writes wait behind journal reclaim too long

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobuckets.c fixups XXX squash
Kent Overstreet [Mon, 29 Mar 2021 00:56:25 +0000 (20:56 -0400)]
buckets.c fixups XXX squash

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add repair code for out of order keys in a btree node.
Kent Overstreet [Mon, 29 Mar 2021 04:19:05 +0000 (00:19 -0400)]
bcachefs: Add repair code for out of order keys in a btree node.

This just drops the offending key - in the bug report where this was
seen, it was clearly a single bit memory error, and fsck will fix the
missing key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Free iterator in bch2_btree_delete_range_trans()
Kent Overstreet [Mon, 29 Mar 2021 01:20:22 +0000 (21:20 -0400)]
bcachefs: Free iterator in bch2_btree_delete_range_trans()

This is specifically to speed up bch2_inode_rm(), so that we're not
traversing iterators we're done with.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Have journal reclaim thread flush more aggressively
Kent Overstreet [Mon, 29 Mar 2021 00:57:59 +0000 (20:57 -0400)]
bcachefs: Have journal reclaim thread flush more aggressively

This adds a new watermark for the journal reclaim when flushing btree
key cache entries - it should try and stay ahead of where foreground
threads doing transaction commits will enter direct journal reclaim.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't use bch2_inode_find_by_inum() in move.c
Kent Overstreet [Tue, 16 Mar 2021 22:08:10 +0000 (18:08 -0400)]
bcachefs: Don't use bch2_inode_find_by_inum() in move.c

Since move.c isn't aware of what subvolume we're in, we can't use the
standard inode lookup code - fortunately, we're just using it for
reading IO options.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Change inode allocation code for snapshots
Kent Overstreet [Mon, 15 Mar 2021 23:18:30 +0000 (19:18 -0400)]
bcachefs: Change inode allocation code for snapshots

For snapshots, when we allocate a new inode we want to allocate an inode
number that isn't in use in any other subvolume. We won't be able to use
ITER_SLOTS for this, inode allocation needs to change to use
BTREE_ITER_ALL_SNAPSHOTS.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Inode backpointers
Kent Overstreet [Tue, 2 Mar 2021 23:35:30 +0000 (18:35 -0500)]
bcachefs: Inode backpointers

This patch adds two new inode fields, bi_dir and bi_dir_offset, that
point back to the inode's dirent.

Since we're only adding fields for a single backpointer, files that have
been hardlinked won't necessarily have valid backpointers: we also add a
new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has
ever had multiple links to it. That's ok, because we only really need
this functionality for directories, which can never have multiple
hardlinks - when we add subvolumes, we'll need a way to enemurate and
print subvolumes, and this will let us reconstruct a path to a subvolume
root given a subvolume root inode.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Start using bpos.snapshot field
Kent Overstreet [Wed, 24 Mar 2021 22:02:16 +0000 (18:02 -0400)]
bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>