]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 years agobcachefs: Ensure iter->real_pos is consistent with key returned
Kent Overstreet [Tue, 24 Aug 2021 20:54:36 +0000 (16:54 -0400)]
bcachefs: Ensure iter->real_pos is consistent with key returned

iter->real_pos needs to match the key returned or bad things will happen
when we go to update the key at that position. When we returned a
pending update from btree_trans_peek_updates(), this wasn't necessarily
the case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Add SPOS_MAX to bpos_to_text()
Kent Overstreet [Wed, 25 Aug 2021 00:31:44 +0000 (20:31 -0400)]
bcachefs: Add SPOS_MAX to bpos_to_text()

Better pretty printing ftw

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Free iterator if we have duplicate
Kent Overstreet [Mon, 23 Aug 2021 21:19:17 +0000 (17:19 -0400)]
bcachefs: Free iterator if we have duplicate

This helps - but does not fully fix - the outstanding "transaction
iterator overflow" bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix unhandled transaction restart in bch2_gc_btree_gens()
Kent Overstreet [Sun, 22 Aug 2021 16:56:56 +0000 (12:56 -0400)]
bcachefs: Fix unhandled transaction restart in bch2_gc_btree_gens()

This fixes https://github.com/koverstreet/bcachefs/issues/305

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: add progress stats to sysfs
Brett Holman [Fri, 23 Jul 2021 19:57:19 +0000 (13:57 -0600)]
bcachefs: add progress stats to sysfs

This adds progress stats to sysfs for copygc, rebalance, recovery, and the
cmd_job ioctls.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix 32 bit build failures
Brett Holman [Tue, 17 Aug 2021 23:14:26 +0000 (17:14 -0600)]
bcachefs: Fix 32 bit build failures

This fix replaces multiple 64 bit divisions with do_div() equivalents.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Be sure to check ptr->dev in copygc pred function
Kent Overstreet [Wed, 18 Aug 2021 20:19:28 +0000 (16:19 -0400)]
bcachefs: Be sure to check ptr->dev in copygc pred function

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Disk space accounting fix
Kent Overstreet [Tue, 17 Aug 2021 19:29:21 +0000 (15:29 -0400)]
bcachefs: Disk space accounting fix

DIV_ROUND_UP() wasn't doing what we wanted when passing it negative
numbers - fix it by just not passing it negative numbers anymore.

Also, no need to do the scaling by compression ratio for incompressible
data.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix a valgrind conditional jump
Kent Overstreet [Tue, 17 Aug 2021 19:03:53 +0000 (15:03 -0400)]
bcachefs: Fix a valgrind conditional jump

Valgrind was complaining about a jump depending on uninitialized memory
- we weren't, but this change makes the code less confusing for valgrind
to follow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Minor btree iter refactoring
Kent Overstreet [Sat, 7 Aug 2021 22:19:33 +0000 (18:19 -0400)]
bcachefs: Minor btree iter refactoring

This makes the flow control in bch2_btree_iter_peek() and
bch2_btree_iter_peek_prev() a bit cleaner.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix btree_trans_peek_updates()
Kent Overstreet [Sat, 7 Aug 2021 22:21:35 +0000 (18:21 -0400)]
bcachefs: Fix btree_trans_peek_updates()

Should have been using bpos_cmp(), not bkey_cmp().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix an unhandled transaction restart
Kent Overstreet [Thu, 5 Aug 2021 17:02:39 +0000 (13:02 -0400)]
bcachefs: Fix an unhandled transaction restart

__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may
cause a transaction restart, which we don't return an error for because
it doesn't prevent us from making forward progress on the read we're
submitting.

Instead, change __bch2_read() and bchfs_read() to check for transaction
restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Ensure that new inodes hit underlying btree
Kent Overstreet [Fri, 30 Jul 2021 22:01:33 +0000 (18:01 -0400)]
bcachefs: Ensure that new inodes hit underlying btree

Inode creation is done with non-cached btree iterators, but then in the
same transaction the inode may be updated again with a cached iterator -
it makes cache coherency easier if new inodes always land in the
underlying btree.

This patch adds a check to bch2_trans_update() - if the same key is
updated multiple times in the same transaction with both cached and non
cache iterators, use the non cached iterator.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Add flags field to bch2_inode_to_text()
Kent Overstreet [Fri, 30 Jul 2021 21:59:37 +0000 (17:59 -0400)]
bcachefs: Add flags field to bch2_inode_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Keep a sorted list of btree iterators
Kent Overstreet [Sat, 12 Jun 2021 19:45:45 +0000 (15:45 -0400)]
bcachefs: Keep a sorted list of btree iterators

This will be used to make other operations on btree iterators within a
transaction more efficient, and enable some other improvements to how we
manage btree iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Zero out mem_ptr field in btree ptr keys from journal replay
Kent Overstreet [Fri, 30 Jul 2021 18:33:06 +0000 (14:33 -0400)]
bcachefs: Zero out mem_ptr field in btree ptr keys from journal replay

This fixes a bad ptr deref on recovery from unclean shutdown in
bch2_btree_node_get_noiter().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't drop read locks at transaction commit time
Kent Overstreet [Wed, 28 Jul 2021 02:28:39 +0000 (22:28 -0400)]
bcachefs: Don't drop read locks at transaction commit time

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: traverse_all() shouldn't be restarting the transaction
Kent Overstreet [Wed, 28 Jul 2021 02:32:05 +0000 (22:32 -0400)]
bcachefs: traverse_all() shouldn't be restarting the transaction

We're only called by bch2_trans_begin() now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Kill BTREE_INSERT_NOUNLOCK
Kent Overstreet [Wed, 28 Jul 2021 02:15:04 +0000 (22:15 -0400)]
bcachefs: Kill BTREE_INSERT_NOUNLOCK

With the recent transaction restart changes, it's no longer needed - all
transaction commits have BTREE_INSERT_NOUNLOCK semantics.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Btree splits no longer automatically cause a transaction restart
Kent Overstreet [Sat, 24 Jul 2021 18:25:01 +0000 (14:25 -0400)]
bcachefs: Btree splits no longer automatically cause a transaction restart

With the new and improved handling of transaction restarts, this should
finally be safe.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: __bch2_trans_commit() no longer calls bch2_trans_reset()
Kent Overstreet [Sun, 25 Jul 2021 03:57:28 +0000 (23:57 -0400)]
bcachefs: __bch2_trans_commit() no longer calls bch2_trans_reset()

It's now the caller's responsibility to call bch2_trans_begin.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Ensure btree_iter_traverse() obeys iter->should_be_locked
Kent Overstreet [Thu, 22 Jul 2021 16:39:11 +0000 (12:39 -0400)]
bcachefs: Ensure btree_iter_traverse() obeys iter->should_be_locked

iter->should_be_locked means that if bch2_btree_iter_relock() fails, we
need to restart the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: bch2_btree_iter_traverse() shouldn't normally call traverse_all()
Kent Overstreet [Tue, 27 Jul 2021 22:01:52 +0000 (18:01 -0400)]
bcachefs: bch2_btree_iter_traverse() shouldn't normally call traverse_all()

If there's more than one iterator in the btree_trans, it's requried to
call bch2_trans_begin() to handle transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: trans->restarted
Kent Overstreet [Sun, 25 Jul 2021 21:19:52 +0000 (17:19 -0400)]
bcachefs: trans->restarted

Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Change lockrestart_do() to always call bch2_trans_begin()
Kent Overstreet [Wed, 28 Jul 2021 20:17:10 +0000 (16:17 -0400)]
bcachefs: Change lockrestart_do() to always call bch2_trans_begin()

More consistent behaviour means less likely to trip over ourselves in
silly ways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Clean up interior update paths
Kent Overstreet [Sat, 24 Jul 2021 21:38:15 +0000 (17:38 -0400)]
bcachefs: Clean up interior update paths

Btree node merging now happens prior to transaction commit, not after,
so we don't need to pay attention to BTREE_INSERT_NOUNLOCK.

Also, foreground_maybe_merge shouldn't be calling
bch2_btree_iter_traverse_all() - this is becoming private to the btree
iterator code and should only be called by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Use bch2_trans_begin() more consistently
Kent Overstreet [Sun, 25 Jul 2021 00:24:10 +0000 (20:24 -0400)]
bcachefs: Use bch2_trans_begin() more consistently

Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Always check for transaction restarts
Kent Overstreet [Sat, 24 Jul 2021 23:50:40 +0000 (19:50 -0400)]
bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: traverse_all() is responsible for clearing should_be_locked
Kent Overstreet [Sat, 24 Jul 2021 21:43:35 +0000 (17:43 -0400)]
bcachefs: traverse_all() is responsible for clearing should_be_locked

bch2_btree_iter_traverse_all() may loop, and it needs to clear
iter->should_be_locked on every iteration.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: bch2_trans_relock() only relocks iters that should be locked
Kent Overstreet [Tue, 27 Jul 2021 21:58:58 +0000 (17:58 -0400)]
bcachefs: bch2_trans_relock() only relocks iters that should be locked

This avoids unexpected lock restarts in bch2_btree_iter_traverse_all().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't traverse iterators in __bch2_trans_commit()
Kent Overstreet [Sun, 25 Jul 2021 18:20:43 +0000 (14:20 -0400)]
bcachefs: Don't traverse iterators in __bch2_trans_commit()

They should already be traversed, and we're asserting that since the
introduction of iter->should_be_locked

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Add an option for btree node mem ptr optimization
Kent Overstreet [Mon, 26 Jul 2021 19:52:41 +0000 (15:52 -0400)]
bcachefs: Add an option for btree node mem ptr optimization

bch2_btree_node_ptr_v2 has a field for stashing a pointer to the in
memory btree node; this is safe because we clear this field when reading
in nodes from disk and we never free in memory btree nodes - but, we
have bug reports that indicate something might be faulty with this
optimization, so let's add an option for it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Minor tracepoint improvements
Kent Overstreet [Sat, 24 Jul 2021 21:31:25 +0000 (17:31 -0400)]
bcachefs: Minor tracepoint improvements

Btree iterator tracepoints should print whether they're for the key
cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: bch2_btree_iter_relock_intent()
Kent Overstreet [Sat, 24 Jul 2021 21:12:51 +0000 (17:12 -0400)]
bcachefs: bch2_btree_iter_relock_intent()

This adds a new helper for btree_cache.c that does what we want where
the iterator is still being traverse - and also eliminates some
unnecessary transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Use bch2_trans_do() in bch2_btree_key_cache_journal_flush()
Kent Overstreet [Fri, 23 Jul 2021 22:26:38 +0000 (18:26 -0400)]
bcachefs: Use bch2_trans_do() in bch2_btree_key_cache_journal_flush()

We're working to standardize handling of transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix a btree iterator leak
Kent Overstreet [Sun, 25 Jul 2021 00:20:02 +0000 (20:20 -0400)]
bcachefs: Fix a btree iterator leak

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Pretty-ify bch2_bkey_val_to_text()
Kent Overstreet [Wed, 21 Jul 2021 17:55:51 +0000 (13:55 -0400)]
bcachefs: Pretty-ify bch2_bkey_val_to_text()

Don't print out the ": " when there isn't a value to print.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't squash return code in check_dirents()
Kent Overstreet [Wed, 21 Jul 2021 17:23:50 +0000 (13:23 -0400)]
bcachefs: Don't squash return code in check_dirents()

We were squashing BCH_FSCK_ERRORS_NOT_FIXED.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Use bch2_inode_find_by_inum() in truncate
Kent Overstreet [Wed, 21 Jul 2021 01:18:16 +0000 (21:18 -0400)]
bcachefs: Use bch2_inode_find_by_inum() in truncate

This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Handle lock restarts in bch2_xattr_get()
Kent Overstreet [Wed, 21 Jul 2021 01:07:21 +0000 (21:07 -0400)]
bcachefs: Handle lock restarts in bch2_xattr_get()

Snapshots add another btree lookup, thus we need to handle lock
restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't downgrade in traverse()
Kent Overstreet [Wed, 21 Jul 2021 00:14:44 +0000 (20:14 -0400)]
bcachefs: Don't downgrade in traverse()

Downgrading of btree iterators is something that should only happen
explicitly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: BSET_OFFSET()
Kent Overstreet [Fri, 16 Jul 2021 16:57:27 +0000 (12:57 -0400)]
bcachefs: BSET_OFFSET()

Add a field to struct bset for the sector offset within the btree node
where it was written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agoRevert "bcachefs: statfs bfree and bavail should be the same"
Kent Overstreet [Mon, 11 Sep 2023 03:35:02 +0000 (23:35 -0400)]
Revert "bcachefs: statfs bfree and bavail should be the same"

This reverts commit 664f9847bec525d396d62d2db094ca9020289ae0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Update btree ptrs after every write
Kent Overstreet [Sat, 10 Jul 2021 17:44:42 +0000 (13:44 -0400)]
bcachefs: Update btree ptrs after every write

This closes a significant hole (and last known hole) in our ability to
verify metadata. Previously, since btree nodes are log structured, we
couldn't detect lost btree writes that weren't the first write to a
given node. Additionally, this seems to have lead to some significant
metadata corruption on multi device filesystems with metadata
replication: since a write may have made it to one device and not
another, if we read that btree node back from the replica that did have
that write and started appending after that point, the other replica
would have a gap in the bset entries and reading from that replica
wouldn't find the rest of the bsets.

But, since updates to interior btree nodes are now journalled, we can
close this hole by updating pointers to btree nodes after every write
with the currently written number of sectors, without negatively
affecting performance. This means we will always detect lost or corrupt
metadata - it also means that our btree is now a curious hybrid of COW
and non COW btrees, with all the benefits of both (excluding
complexity).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Improve btree_bad_header() error message
Kent Overstreet [Thu, 15 Jul 2021 17:42:43 +0000 (13:42 -0400)]
bcachefs: Improve btree_bad_header() error message

We should always print out the full btree node ptr.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fixes for unit tests
Kent Overstreet [Thu, 15 Jul 2021 01:25:55 +0000 (21:25 -0400)]
bcachefs: Fixes for unit tests

The unit tests hadn't been updated for various recent btree changes -
this patch makes them work again.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix bch2_btree_iter_rewind()
Kent Overstreet [Thu, 15 Jul 2021 03:35:11 +0000 (23:35 -0400)]
bcachefs: Fix bch2_btree_iter_rewind()

We'd hit a BUG() when rewinding at the start of the btree on btrees with
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Improvements to fsck check_dirents()
Kent Overstreet [Thu, 15 Jul 2021 00:28:27 +0000 (20:28 -0400)]
bcachefs: Improvements to fsck check_dirents()

The fsck code handles transaction restarts in a very ad hoc way, and not
always correctly. This patch makes some improvements to check_dirents(),
but more work needs to be done to figure out how this kind of code
should be structured.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Tighten up btree_iter locking assertions
Kent Overstreet [Wed, 14 Jul 2021 19:13:27 +0000 (15:13 -0400)]
bcachefs: Tighten up btree_iter locking assertions

We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix a memory leak in the dio write path
Kent Overstreet [Wed, 14 Jul 2021 04:14:45 +0000 (00:14 -0400)]
bcachefs: Fix a memory leak in the dio write path

There were some error paths where we were leaking page refs - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Add an option for whether inodes use the key cache
Kent Overstreet [Sun, 13 Jun 2021 21:07:18 +0000 (17:07 -0400)]
bcachefs: Add an option for whether inodes use the key cache

We probably don't ever want to flip this off in production, but it may
be useful for certain kinds of testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix an allocator shutdown deadlock
Kent Overstreet [Tue, 13 Jul 2021 20:12:00 +0000 (16:12 -0400)]
bcachefs: Fix an allocator shutdown deadlock

On fstest generic/388, we were seeing sporadic deadlocks in the
emergency shutdown, where we'd get stuck shutting down the allocator
because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated
and then deallocated some btree nodes, putting them back on the
btree_reserve_cache, after the allocator shutdown code had already
cleared out that cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Add safe versions of varint encode/decode
Kent Overstreet [Tue, 13 Jul 2021 20:03:51 +0000 (16:03 -0400)]
bcachefs: Add safe versions of varint encode/decode

This adds safe versions of bch2_varint_(encode|decode) that don't read
or write past the end of the buffer, or varint being encoded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Add open_buckets to sysfs
Kent Overstreet [Tue, 13 Jul 2021 03:52:49 +0000 (23:52 -0400)]
bcachefs: Add open_buckets to sysfs

This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Ensure bad d_type doesn't oops in bch2_dirent_to_text()
Kent Overstreet [Tue, 13 Jul 2021 03:17:15 +0000 (23:17 -0400)]
bcachefs: Ensure bad d_type doesn't oops in bch2_dirent_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Kick off btree node writes from write completions
Kent Overstreet [Sun, 11 Jul 2021 20:41:14 +0000 (16:41 -0400)]
bcachefs: Kick off btree node writes from write completions

This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Mask out unknown compat features when going read-write
Kent Overstreet [Sun, 11 Jul 2021 17:54:07 +0000 (13:54 -0400)]
bcachefs: Mask out unknown compat features when going read-write

Compat features should be cleared if the filesystem was touched by a
version that doesn't support them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Really don't hold btree locks while btree IOs are in flight
Kent Overstreet [Sun, 11 Jul 2021 03:03:15 +0000 (23:03 -0400)]
bcachefs: Really don't hold btree locks while btree IOs are in flight

This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Regularize argument passing of btree_trans
Kent Overstreet [Sun, 11 Jul 2021 03:22:06 +0000 (23:22 -0400)]
bcachefs: Regularize argument passing of btree_trans

btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: docs: add docs for bch2_trans_reset
Dan Robertson [Thu, 8 Jul 2021 02:31:36 +0000 (22:31 -0400)]
bcachefs: docs: add docs for bch2_trans_reset

Add basic kernel docs for bch2_trans_reset and bch2_trans_begin.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: set disk state should check new_state
Dan Robertson [Thu, 8 Jul 2021 22:15:38 +0000 (18:15 -0400)]
bcachefs: set disk state should check new_state

A new device state that is not a valid state should return -EINVAL
in the disk set state ioctl.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE
Kent Overstreet [Tue, 6 Jul 2021 02:16:02 +0000 (22:16 -0400)]
bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE

Add a new flag to control assertions about updating to internal snapshot
nodes, that normally should not be written to - to be used in an
upcoming patch.

Also do some renaming - trigger_flags is now update_flags.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: bch2_d_types[]
Kent Overstreet [Tue, 6 Jul 2021 02:18:07 +0000 (22:18 -0400)]
bcachefs: bch2_d_types[]

Add readable names for d_type, and use it in dirent_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix bch2_btree_iter_peek_slot() assertion
Kent Overstreet [Tue, 6 Jul 2021 02:08:28 +0000 (22:08 -0400)]
bcachefs: Fix bch2_btree_iter_peek_slot() assertion

This assertion is checking that what the iterator points to is
consistent with iter->real_pos, and since it's an internal btree
ordering property it should be using bpos_cmp.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Split out SPOS_MAX
Kent Overstreet [Tue, 6 Jul 2021 02:02:07 +0000 (22:02 -0400)]
bcachefs: Split out SPOS_MAX

Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: add bcachefs xxhash support
jpsollie [Thu, 17 Jun 2021 11:42:09 +0000 (13:42 +0200)]
bcachefs: add bcachefs xxhash support

xxhash is a much faster algorithm compared to crc32.
could be used to speed up checksum calculation.
xxhash 64-bit only, as it is much faster on 64-bit CPUs compared to xxh32.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Prepare checksums for more advanced algorithms
jpsollie [Thu, 17 Jun 2021 09:29:59 +0000 (11:29 +0200)]
bcachefs: Prepare checksums for more advanced algorithms

Perform abstraction of hash calculation for advanced checksum algorithms.
Algorithms like xxhash do not store their state as a u64 int.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Enforce SYS_CAP_ADMIN within ioctls
Tobias Geerinckx-Rice [Sun, 4 Jul 2021 19:35:32 +0000 (21:35 +0200)]
bcachefs: Enforce SYS_CAP_ADMIN within ioctls

bch2_fs_ioctl() didn't distinguish between unsupported ioctls and those
which the current user is unauthorised to perform.  That kept the code
simple but meant that, for example, an unprivileged TIOCGWINSZ ioctl on
a bcachefs file would return -EPERM instead of the expected -ENOTTY.
The same call made by a privileged user would correctly return -ENOTTY.

Fix this discrepancy by moving the check for CAP_SYS_ADMIN into each
privileged ioctl function.

Signed-off-by: Tobias Geerinckx-Rice <me@tobias.gr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix bch2_btree_iter_peek_prev()
Kent Overstreet [Sun, 4 Jul 2021 03:57:09 +0000 (23:57 -0400)]
bcachefs: Fix bch2_btree_iter_peek_prev()

In !BTREE_ITER_IS_EXTENTS mode, we shouldn't be looking at k->size, i.e.
we shouldn't use bkey_start_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix bch2_acl_chmod() cleanup on error
Dan Robertson [Thu, 24 Jun 2021 01:52:41 +0000 (21:52 -0400)]
bcachefs: Fix bch2_acl_chmod() cleanup on error

Avoid calling kfree on the returned error pointer if
bch2_acl_from_disk fails.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: statfs bfree and bavail should be the same
Dan Robertson [Wed, 23 Jun 2021 23:25:00 +0000 (19:25 -0400)]
bcachefs: statfs bfree and bavail should be the same

The value of f_bfree and f_bavail should be the same. The value of
f_bfree is not currently scaled by the availability factor.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix shift-by-64 in bch2_bkey_format_validate()
Kent Overstreet [Thu, 24 Jun 2021 17:19:25 +0000 (13:19 -0400)]
bcachefs: Fix shift-by-64 in bch2_bkey_format_validate()

We need to ensure that packed formats can't represent fields larger than
the unpacked format, which is a bit tricky since the calculations can
also overflow a u64. This patch fixes a shift and simplifies the overall
calculations.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: fix truncate without a size change
Dan Robertson [Mon, 28 Jun 2021 00:54:34 +0000 (20:54 -0400)]
bcachefs: fix truncate without a size change

Do not attempt to shortcut a truncate when the given new size is
the same as the current size. There may be blocks allocated to the
file that extend beyond the i_size. The ctime and mtime should
not be updated in this case.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: fix ifdef for x86_64 asm
Dan Robertson [Sat, 3 Jul 2021 01:22:06 +0000 (21:22 -0400)]
bcachefs: fix ifdef for x86_64 asm

The implementation of prefetch_four_cachelines should use ifdef
CONFIG_X86_64 to conditionally compile x86_64 asm.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: ensure iter->should_be_locked is set
Dan Robertson [Tue, 29 Jun 2021 22:52:13 +0000 (18:52 -0400)]
bcachefs: ensure iter->should_be_locked is set

Ensure that iter->should_be_locked is set to true before we
call bch2_trans_update in __bch2_dev_usrdata_drop.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix unused variable warning when !BCACHEFS_DEBUG
Christopher James Halse Rogers [Fri, 25 Jun 2021 01:45:19 +0000 (11:45 +1000)]
bcachefs: Fix unused variable warning when !BCACHEFS_DEBUG

Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Use memalloc_nofs_save() in bch2_read_endio()
Kent Overstreet [Wed, 30 Jun 2021 19:44:11 +0000 (15:44 -0400)]
bcachefs: Use memalloc_nofs_save() in bch2_read_endio()

This solves a problematic memory allocation in bch2_bio_uncompress() ->
vmap().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix btree_node_read_all_replicas() error handling
Kent Overstreet [Wed, 23 Jun 2021 01:51:17 +0000 (21:51 -0400)]
bcachefs: Fix btree_node_read_all_replicas() error handling

We weren't checking bch2_btree_node_read_done() for errors, oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't loop into topology repair
Kent Overstreet [Wed, 23 Jun 2021 00:44:54 +0000 (20:44 -0400)]
bcachefs: Don't loop into topology repair

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't ratelimit certain fsck errors
Kent Overstreet [Mon, 21 Jun 2021 20:28:43 +0000 (16:28 -0400)]
bcachefs: Don't ratelimit certain fsck errors

It's unhelpful if we see "Halting mark and sweep to start topology
repair" but we don't see the error that triggered it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: ensure iter->should_be_locked is set
Dan Robertson [Thu, 17 Jun 2021 03:21:23 +0000 (23:21 -0400)]
bcachefs: ensure iter->should_be_locked is set

Ensure that iter->should_be_locked value is set to true before we
call bch2_trans_update in ec_stripe_update_ptrs.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Don't disable preemption unnecessarily
Kent Overstreet [Fri, 11 Jun 2021 03:34:02 +0000 (23:34 -0400)]
bcachefs: Don't disable preemption unnecessarily

Small improvements to some percpu utility code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Extensive triggers cleanups
Kent Overstreet [Fri, 11 Jun 2021 01:44:27 +0000 (21:44 -0400)]
bcachefs: Extensive triggers cleanups

 - We no longer mark subsets of extents, they're marked like regular
   keys now - which means we can drop the offset & sectors arguments
   to trigger functions
 - Drop other arguments that are no longer needed anymore in various
   places - fs_usage
 - Drop the logic for handling extents in bch2_mark_update() that isn't
   needed anymore, to match bch2_trans_mark_update()
 - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
   don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: fix truncate with ATTR_MODE
Kent Overstreet [Tue, 15 Jun 2021 02:29:54 +0000 (22:29 -0400)]
bcachefs: fix truncate with ATTR_MODE

After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Improve iter->should_be_locked
Kent Overstreet [Mon, 14 Jun 2021 22:16:10 +0000 (18:16 -0400)]
bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Kill __btree_delete_at()
Kent Overstreet [Mon, 14 Jun 2021 20:35:03 +0000 (16:35 -0400)]
bcachefs: Kill __btree_delete_at()

With trans->updates2 gone, we can now drop this helper and use
bch2_btree_delete_at() instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Make sure bch2_trans_mark_update uses correct iter flags
Kent Overstreet [Mon, 14 Jun 2021 20:32:44 +0000 (16:32 -0400)]
bcachefs: Make sure bch2_trans_mark_update uses correct iter flags

Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix a memory leak in dio write path
Kent Overstreet [Mon, 14 Jun 2021 18:47:26 +0000 (14:47 -0400)]
bcachefs: Fix a memory leak in dio write path

Commit c42bca92be928ce7dece5fc04cf68d0e37ee6718 "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type...
Janpieter Sollie [Sun, 13 Jun 2021 20:01:08 +0000 (22:01 +0200)]
bcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type-checksum enum

This fixes some rare cases where the metadata checksum option specified
may map to the wrong actual checksum type.

Signed-off-by: Janpieter Sollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Clear iter->should_be_locked in bch2_trans_reset
Kent Overstreet [Sun, 13 Jun 2021 02:33:53 +0000 (22:33 -0400)]
bcachefs: Clear iter->should_be_locked in bch2_trans_reset

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Don't underflow c->sectors_available
Kent Overstreet [Fri, 11 Jun 2021 03:33:27 +0000 (23:33 -0400)]
bcachefs: Don't underflow c->sectors_available

This rarely used error path should've been checking for underflow -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Kill bch2_btree_iter_peek_cached()
Kent Overstreet [Fri, 11 Jun 2021 00:15:50 +0000 (20:15 -0400)]
bcachefs: Kill bch2_btree_iter_peek_cached()

It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Allow shorter JSET_ENTRY_dev_usage entries
Kent Overstreet [Sat, 12 Jun 2021 21:20:02 +0000 (17:20 -0400)]
bcachefs: Allow shorter JSET_ENTRY_dev_usage entries

If the last entry(ies) would be all zeros, there's no need to write them
out - the read path already handles that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: mount: fix null deref with null devname
Dan Robertson [Thu, 10 Jun 2021 11:52:42 +0000 (07:52 -0400)]
bcachefs: mount: fix null deref with null devname

 - Fix null deref on mount when given a null device name.
 - Move the dev_name checks to return EINVAL when it is invalid.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 years agobcachefs: Fix null ptr deref when splitting compressed extents
Kent Overstreet [Sat, 12 Jun 2021 19:45:56 +0000 (15:45 -0400)]
bcachefs: Fix null ptr deref when splitting compressed extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Fix overflow in journal_replay_entry_early
Kent Overstreet [Fri, 11 Jun 2021 03:51:09 +0000 (23:51 -0400)]
bcachefs: Fix overflow in journal_replay_entry_early

If filesystem on disk was used by a version with a larger BCH_DATA_NR
thas the currently running version, we don't want this to cause a buffer
overrun.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Always zero memory from bch2_trans_kmalloc()
Kent Overstreet [Mon, 7 Jun 2021 20:50:30 +0000 (16:50 -0400)]
bcachefs: Always zero memory from bch2_trans_kmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Merging for indirect extents
Kent Overstreet [Sat, 15 May 2021 19:04:08 +0000 (15:04 -0400)]
bcachefs: Merging for indirect extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Improved extent merging
Kent Overstreet [Sat, 15 May 2021 04:37:37 +0000 (00:37 -0400)]
bcachefs: Improved extent merging

Previously, checksummed extents could only be merged when the checksum
covered only the currently live data.

xfstest generic/064 creates a test file, then uses finsert calls to
split the extent, then collapse calls to see if they get merged. But
without any reads to trigger the narrow_crcs path, each of the split
extents will still have a checksum for the entire original extent.

This patch improves the extent merge path so that if either of the
extents we're attempting to merge has a checksum that covers the entire
merged extent, we just use that checksum.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2 years agobcachefs: Re-implement extent merging in transaction commit path
Kent Overstreet [Thu, 29 Apr 2021 03:52:19 +0000 (23:52 -0400)]
bcachefs: Re-implement extent merging in transaction commit path

We haven't had extent merging in quite some time. It used to be done by
the btree code when sorting btree nodes, but that was eliminated as part
of the work to separate extent handling from core btree code.

This patch re-implements extent merging in the transaction commit path.
We don't currently have the ability to merge reflink pointers, we need
to do some work on the triggers code to be able to do that without
ending up with incorrect refcounts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>