Daniel Hill [Fri, 30 Sep 2022 03:37:15 +0000 (16:37 +1300)]
bcachefs: add counters for failed shrinker reclaim
This adds distinct counters for every reason the btree node shrinker can
fail to free an object - if our shrinker isn't making progress, this
will tell us why.
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 6 May 2024 13:16:33 +0000 (09:16 -0400)]
bcachefs: Fix sb_field_downgrade validation
- bch2_sb_downgrade_validate() wasn't checking for a downgrade entry
extending past the end of the superblock section
- for_each_downgrade_entry() is used in to_text() and needs to work on
malformed input; it also was missing a check for a field extending
past the end of the section
Reported-by: syzbot+e49ccab73449180bc9be@syzkaller.appspotmail.com Fixes: 84f1638795da ("bcachefs: bch_sb_field_downgrade") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Petr Vorel [Tue, 7 May 2024 15:37:57 +0000 (17:37 +0200)]
bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h
Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under
BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the
definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC.
There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC,
which use UUID_INIT() and are used only in libbcachefs. Therefore move
only BCACHEFS_STATFS_MAGIC value, which can be used outside of
libbcachefs for f_type field in struct statfs in statfs() or fstatfs().
Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Apr 2024 20:25:34 +0000 (16:25 -0400)]
bcachefs: Allocator prefers not to expand mi.btree_allocated bitmap
We now have a small bitmap in the member info section of the superblock
for "regions that have btree nodes", so that if we ever have to scan for
btree nodes in repair we don't have to scan the whole device(s).
This tweaks the allocator to prefer allocating from regions that are
already marked in this bitmap.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 May 2024 22:56:40 +0000 (18:56 -0400)]
bcachefs: Move nocow unlock to bch2_write_endio()
This fixes a lifetime issue; bch2_nocow_write_unlock() uses
PTR_BUCKET_POS(), which needs the device - but we drop our ref to the
device in bch2_write_endio().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 May 2024 07:59:45 +0000 (03:59 -0400)]
bcachefs: bch2_dev_have_ref()
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that
we're looking up a device without checking if it's present because we
have a reference to it already.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 May 2024 00:32:44 +0000 (20:32 -0400)]
bcachefs: move replica_set from bch_dev to bch_fs
This is needed for the next patch - the write submit path has to be able
to allocate a replica bio even when we weren't able to get a ref on the
device.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 30 Apr 2024 19:30:35 +0000 (15:30 -0400)]
bcachefs: bch2_dev_tryget()
Most uses of bch2_dev_bkey_exists() are going away, where we assume that
because a key references a device the device most exist - instead, we'll
be explicitly checking if the device exists and getting a reference to
it.
This adds the new helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 3 May 2024 18:43:54 +0000 (14:43 -0400)]
closures: closure_sync_timeout()
Add a new variant of closure_sync_timeout() that takes a timeout.
Note that when this returns -ETIME the closure will still be waiting on
something, i.e. it's not safe to return if you've got a stack allocated
closure.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The vfs[1] documentation describes free_inode as follows:
```
free_inode
this method is called from RCU callback. If you use call_rcu()
in ->destroy_inode to free ‘struct inode’ memory, then it’s
better to release memory in this method.
```
free_inode will be called by the RCU callback, so it might be better
to move the inode free operation to destroy_inode.
Similar to commit ae6b47b5653e ("fs/ntfs3: Change destroy_inode to
free_inode").
Kent Overstreet [Fri, 26 Apr 2024 04:32:56 +0000 (00:32 -0400)]
bcachefs: bch_member.last_journal_bucket
On recovery from clean shutdown we don't typically read the journal, but
we still want to avoid overwriting existing entries in the journal for
list_journal debugging.
Thus, add some fields to the member info section so we can remember
where we left off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Fri, 26 Apr 2024 03:21:35 +0000 (11:21 +0800)]
bcachefs: eliminate the uninitialized compilation warning in bch2_reconstruct_snapshots
When compiling the bcachefs-tools, the following compilation warning
is reported:
libbcachefs/snapshot.c: In function ‘bch2_reconstruct_snapshots’:
libbcachefs/snapshot.c:915:19: warning: ‘tree_id’ may be used uninitialized in this function [-Wmaybe-uninitialized]
915 | snapshot->v.tree = cpu_to_le32(tree_id);
libbcachefs/snapshot.c:903:6: note: ‘tree_id’ was declared here
903 | u32 tree_id;
| ^~~~~~~
This is a false alert, because @tree_id is changed in
bch2_snapshot_tree_create after it returns 0. And if this function
returns other value, @tree_id wouldn't be used. Thus there should
be nothing wrong in logical.
Although the report itself is a false alert, we can still make it more
explicit by setting the initial value of @tree_id to 0 (an invalid
tree ID).
Fixes: a292be3b68f3 ("bcachefs: Reconstruct missing snapshot nodes") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text()
When building for a 32-bit target, for which 'size_t' is 'unsigned int',
there are two warnings around mismatched format specifiers and argument
types:
In file included from fs/bcachefs/vstructs.h:5,
from fs/bcachefs/bcachefs_format.h:79,
from fs/bcachefs/bcachefs.h:207,
from fs/bcachefs/btree_key_cache.c:3:
fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text':
fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf'
192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__)
| ^~~~~~~~~~~
fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here
1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu);
| ~~^
| |
| long unsigned int
| %u
fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf'
192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__)
| ^~~~~~~~~~~
fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here
1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu);
| ~~^
| |
| long unsigned int
| %u
cc1: all warnings being treated as error
Use the proper 'size_t' specifier, '%zu', to clear up the warnings for
these platforms.
bcachefs: Fix type of flags parameter for some ->trigger() implementations
When building with clang's -Wincompatible-function-pointer-types-strict
(a warning designed to catch potential kCFI failures at build time),
there are several warnings along the lines of:
fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict]
118 | BCH_BKEY_TYPES()
| ^~~~~~~~~~~~~~~~
fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES'
394 | x(inode, 8) \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x'
117 | #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name,
| ^~~~~~~~~~~~~~~~~~~~
<scratch space>:277:1: note: expanded from here
277 | bch2_bkey_ops_inode
| ^~~~~~~~~~~~~~~~~~~
fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode'
26 | .trigger = bch2_trigger_inode, \
| ^~~~~~~~~~~~~~~~~~
There are several functions that did not have their flags parameter
converted to 'enum btree_iter_update_trigger_flags' in the recent
unification, which will cause kCFI failures at runtime because the
types, while ABI compatible (hence no warning from the non-strict
version of this warning), do not match exactly.
Fix up these functions (as well as a few other obvious functions that
should have it, even if there are no warnings currently) to resolve the
warnings and potential kCFI runtime failures.
Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 7 Apr 2024 03:58:01 +0000 (23:58 -0400)]
bcachefs: Kill gc_init_recurse()
This unifies the online and offline btree gc passes; we're not yet
running it online.
We now iterate over one level of the btree at a time - the same as
check_extents_to_backpointers(); this ordering preserves order of keys
regardless of btree splits and merges, which will be important when we
re-enable online gc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 7 Apr 2024 23:07:09 +0000 (19:07 -0400)]
bcachefs: Run bch2_check_fix_ptrs() via triggers
Currently, the reflink_p gc trigger does repair as well - turning a
reflink_p key into an error key if the reflink_v it points to doesn't
exist.
This won't work with online check/repair, because the repair path once
online will be subject to transaction restarts, but BTREE_TRIGGER_gc is
not idempotant - we can't run it multiple times if we get a transaction
restart.
So we need to split these paths; to do so this patch calls
check_fix_ptrs() by a new general path - a new trigger type,
BTREE_TRIGGER_check_repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 17 Apr 2024 02:35:02 +0000 (22:35 -0400)]
bcachefs: kill gc looping for bucket gens
looping when we change a bucket gen is not ideal - it means we risk
failing if we'd go into an infinite loop, and it's better to make
forward progress even if fsck doesn't fix everything.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 22 Apr 2024 23:01:40 +0000 (19:01 -0400)]
bcachefs: Consolidate mark_stripe_bucket() and trans_mark_stripe_bucket()
This eliminates some duplicated logic, and the gc path now handles
stripe updates and deletions - we need this since soon we're bringing
back runtime gc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 21 Apr 2024 02:19:48 +0000 (22:19 -0400)]
bcachefs: journal seq blacklist gc no longer has to walk btree
Since btree_ptr_v2, we no longer require the journal seq blacklist table
for skipping blacklisted bsets (btree node entries); the pointer to a
given node indicates how much data is present.
Therefore there's no longer any need for journal seq blacklist gc to
walk the btree - we can prune entries older than journal last_seq.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Apr 2024 18:49:22 +0000 (14:49 -0400)]
bcachefs: Btree key cache instrumentation
It turns out the btree key cache shrinker wasn't actually reclaiming
anything, prior to the previous patch. This adds instrumentation so that
if we have further issues we can see what's going on.
Specifically, sysfs internal/btree_key_cache is greatly expanded with
new counters, and the SRCU sequence numbers of the first 10 entries on
each pending freelist, and we also add trigger_btree_key_cache_shrink
for testing without having to prune all the system caches.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Common code doesn't test the error flag, so we don't need to set it in
bcachefs. We can use folio_end_read() to combine the setting (or not)
of the uptodate flag and clearing the lock flag.
Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>