]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 weeks agobcachefs: fix bch2_write_point_to_text() units
Kent Overstreet [Mon, 31 Mar 2025 00:04:16 +0000 (20:04 -0400)]
bcachefs: fix bch2_write_point_to_text() units

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Log original key being moved in data updates
Kent Overstreet [Sun, 30 Mar 2025 20:57:21 +0000 (16:57 -0400)]
bcachefs: Log original key being moved in data updates

There's something going on with the data move path; log the original key
being moved for debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: BCH_JSET_ENTRY_log_bkey
Kent Overstreet [Sun, 30 Mar 2025 20:50:59 +0000 (16:50 -0400)]
bcachefs: BCH_JSET_ENTRY_log_bkey

Add a journal entry type for logging - but logging a bkey, not a string;
to be used for data move path debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Reorder error messages that include journal debug
Kent Overstreet [Sun, 30 Mar 2025 13:30:04 +0000 (09:30 -0400)]
bcachefs: Reorder error messages that include journal debug

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Don't use designated initializers for disk_accounting_pos
Kent Overstreet [Sun, 30 Mar 2025 00:58:32 +0000 (20:58 -0400)]
bcachefs: Don't use designated initializers for disk_accounting_pos

Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.

Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Silence errors after emergency shutdown
Kent Overstreet [Sun, 30 Mar 2025 00:02:44 +0000 (20:02 -0400)]
bcachefs: Silence errors after emergency shutdown

We don't care about errors from asynchronous ops that were because we
did an emergency shutdown; silence them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: fix units in rebalance_status
Kent Overstreet [Sat, 29 Mar 2025 23:29:33 +0000 (19:29 -0400)]
bcachefs: fix units in rebalance_status

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: bch2_ioctl_subvolume_destroy() fixes
Kent Overstreet [Sat, 29 Mar 2025 23:01:09 +0000 (19:01 -0400)]
bcachefs: bch2_ioctl_subvolume_destroy() fixes

bch2_evict_subvolume_inodes() was getting stuck - due to incorrectly
pruning the dcache.

Also, fix missing permissions checks.

Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Clear fs_path_parent on subvolume unlink
Kent Overstreet [Sat, 29 Mar 2025 21:59:50 +0000 (17:59 -0400)]
bcachefs: Clear fs_path_parent on subvolume unlink

This fixes recursive subvolume removal.

Subvolume deletion is asynchronous; fs_path_parent, and thus the entry
in the subvolume_children btree, need to be cleared when the subvolume
is unlinked from the fs heirarchy - else we'll spuriously think a
subvolume has children and deletion will fail.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Change btree_insert_node() assertion to error
Kent Overstreet [Sat, 29 Mar 2025 18:22:29 +0000 (14:22 -0400)]
bcachefs: Change btree_insert_node() assertion to error

Debug for https://github.com/koverstreet/bcachefs/issues/843

Print useful debug info and go emergency read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Better printing of inconsistency errors
Kent Overstreet [Wed, 26 Mar 2025 14:41:33 +0000 (10:41 -0400)]
bcachefs: Better printing of inconsistency errors

Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: bch2_count_fsck_err()
Kent Overstreet [Fri, 28 Mar 2025 16:15:32 +0000 (12:15 -0400)]
bcachefs: bch2_count_fsck_err()

Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Better helpers for inconsistency errors
Kent Overstreet [Fri, 28 Mar 2025 15:59:09 +0000 (11:59 -0400)]
bcachefs: Better helpers for inconsistency errors

An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Consistent indentation of multiline fsck errors
Kent Overstreet [Wed, 26 Mar 2025 17:21:11 +0000 (13:21 -0400)]
bcachefs: Consistent indentation of multiline fsck errors

Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Add an "ignore unknown" option to bch2_parse_mount_opts()
Kent Overstreet [Tue, 25 Mar 2025 17:19:40 +0000 (13:19 -0400)]
bcachefs: Add an "ignore unknown" option to bch2_parse_mount_opts()

To be used by the mount helper in userspace, where we still have options
to be parsed by other layers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_time_stats_init_no_pcpu()
Kent Overstreet [Tue, 25 Mar 2025 14:52:00 +0000 (10:52 -0400)]
bcachefs: bch2_time_stats_init_no_pcpu()

Add a mode to disable automatic switching to percpu mode, useful when a
time_stats will only be used by one thread and we don't want to have to
flush the percpu buffers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix bch2_fs_get_tree() error path
Florian Albrechtskirchinger [Thu, 27 Mar 2025 13:31:08 +0000 (14:31 +0100)]
bcachefs: Fix bch2_fs_get_tree() error path

When a filesystem is mounted read-only, subsequent attempts to mount it
as read-write fail with EBUSY. Previously, the error path in
bch2_fs_get_tree() would unconditionally call __bch2_fs_stop(),
improperly freeing resources for a filesystem that was still actively
mounted. This change modifies the error path to only call
__bch2_fs_stop() if the superblock has no valid root dentry, ensuring
resources are not cleaned up prematurely when the filesystem is in use.

Signed-off-by: Florian Albrechtskirchinger <falbrechtskirchinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix logging in journal_entry_err_msg()
Kent Overstreet [Fri, 28 Mar 2025 16:01:41 +0000 (12:01 -0400)]
bcachefs: fix logging in journal_entry_err_msg()

We want to log errors all at once, not spread across multiple printks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: add missing newline in bch2_trans_updates_to_text()
Kent Overstreet [Wed, 26 Mar 2025 14:20:52 +0000 (10:20 -0400)]
bcachefs: add missing newline in bch2_trans_updates_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: print_string_as_lines: fix extra newline
Kent Overstreet [Wed, 26 Mar 2025 15:57:03 +0000 (11:57 -0400)]
bcachefs: print_string_as_lines: fix extra newline

Don't print a newline on empty string; this was causing us to also print
an extra newline when we got to the end of th string.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix WARN() in bch2_bkey_pick_read_device()
Kent Overstreet [Fri, 28 Mar 2025 16:35:05 +0000 (12:35 -0400)]
bcachefs: Fix WARN() in bch2_bkey_pick_read_device()

syzbot discovered that this one is possible: we have pointers, but none
of them are to valid devices.

Reported-by: syzbot+336a6e6a2dbb7d4dba9a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Don't return 0 size holes from bch2_seek_hole()
Kent Overstreet [Fri, 28 Mar 2025 15:29:04 +0000 (11:29 -0400)]
bcachefs: Don't return 0 size holes from bch2_seek_hole()

The hole we find in the btree might be fully dirty in the page cache. If
so, keep searching.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix bch2_seek_hole() locking
Kent Overstreet [Thu, 27 Mar 2025 17:34:13 +0000 (13:34 -0400)]
bcachefs: Fix bch2_seek_hole() locking

We can't call bch2_seek_pagecache_hole(), and block on page locks, with
btree locks held.

This is easily fixed because we're at the end of the transaction - we
can just unlock, we don't need a drop_locks_do().

Reported-by: https://github.com/nagalun
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Recovery no longer holds state_lock
Kent Overstreet [Wed, 26 Mar 2025 15:41:07 +0000 (11:41 -0400)]
bcachefs: Recovery no longer holds state_lock

state_lock guards against devices coming or leaving, changing state, or
the filesystem changing between ro <-> rw.

But it's not necessary for running recovery passes, and holding it
blocks asynchronous events that would cause us to go RO or kick out
devices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix permissions on version modparam
Kent Overstreet [Fri, 28 Mar 2025 15:03:14 +0000 (11:03 -0400)]
bcachefs: Fix permissions on version modparam

There's no reason for this not to be world readable - it provides the
currently supported on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: cond_resched() in journal_key_sort_cmp()
Kent Overstreet [Wed, 26 Mar 2025 15:44:30 +0000 (11:44 -0400)]
bcachefs: cond_resched() in journal_key_sort_cmp()

Fixes "task out to lunch" warnings during recovery on large machines
with lots of dirty data in the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix 'hung task' messages in btree node scan
Kent Overstreet [Wed, 26 Mar 2025 15:26:30 +0000 (11:26 -0400)]
bcachefs: Fix 'hung task' messages in btree node scan

btree node scan has to wait on kthread workers that scan each device,
potentially for awhile.

We would like this to be interruptible, but we may need a different
mechanism than signals for that - we've had bugs in the past where
mounts were failing due to checking for signals, and no explanation on
where they came from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix btree iter flags in data move (2)
Kent Overstreet [Mon, 17 Mar 2025 19:07:06 +0000 (15:07 -0400)]
bcachefs: Fix btree iter flags in data move (2)

Data move -> move_get_io_opts -> bch2_get_update_rebalance_opts

requires a not_extents iterator; this fixes the path where we're walking
the extents btree and chase a reflink pointer into the reflink btree.

bch2_lookup_indirect_extent() requires working with an extents iterator
(due to peek_slot() semantics), so we implement
bch2_lookup_indirect_extent_for_move().

This is simplified because there's no need to report
indirect_extent_missing_errors here, that can be deferred until fsck or
when a user reads that data.

Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Don't unnecessarily decrypt data when moving
Kent Overstreet [Mon, 24 Mar 2025 20:25:53 +0000 (16:25 -0400)]
bcachefs: Don't unnecessarily decrypt data when moving

There's various checks for "are we going to compress this" - but we're
not going to compress if we know it's incompressible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Document disk accounting keys and conuters
Kent Overstreet [Tue, 25 Mar 2025 14:28:53 +0000 (10:28 -0400)]
bcachefs: Document disk accounting keys and conuters

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Validate number of counters for accounting keys
Kent Overstreet [Tue, 25 Mar 2025 14:06:33 +0000 (10:06 -0400)]
bcachefs: Validate number of counters for accounting keys

We weren't checking that accounting keys have the expected number of
accounters. Originally we probably wanted to be flexible on this, but it
doesn't look like that will be required - accounting is extended by
adding new counter types, not more counters to an existing type.

This means we can drop a BUG_ON() that popped once in automated testing,
and the new validation will make that bug easier to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Use print_string_as_lines() for journal stuck messages
Kent Overstreet [Tue, 18 Mar 2025 21:35:50 +0000 (17:35 -0400)]
bcachefs: Use print_string_as_lines() for journal stuck messages

They were being truncated, printk has a 1k limit per call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix duplicate checksum error messages in write path
Kent Overstreet [Mon, 24 Mar 2025 20:40:22 +0000 (16:40 -0400)]
bcachefs: Fix duplicate checksum error messages in write path

Also, improve the message in prep_encoded_data() - it now prints
good/bad checksums, and checksum type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix silent short reads in data read retry path
Kent Overstreet [Mon, 24 Mar 2025 15:51:01 +0000 (11:51 -0400)]
bcachefs: Fix silent short reads in data read retry path

__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size
to "the size we can read from the current extent" with a swap, and
restores it to "the size for the total read" after the read_extent call
with another swap.

But we neglected to do the restore before the "if (ret) goto err;" -
which is a problem if we're retrying those errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix nonce inconsistency in bch2_write_prep_encoded_data()
Kent Overstreet [Tue, 25 Mar 2025 15:40:35 +0000 (11:40 -0400)]
bcachefs: Fix nonce inconsistency in bch2_write_prep_encoded_data()

If we're moving an extent that was partially overwritten,
bch2_write_rechecksum() will trim it to the currenty live range.

If we then also want to compress it, it'll be decrypted - but the nonce
has been advanced for the overwritten start of the extent that we
dropped, and we were using the nonce we calculated before rechecksum().

Reported-by: Gabriel de Perthuis <g2p.code@gmail.com>
Fixes: 127d90d2823e ("bcachefs: bch2_write_prep_encoded_data() now returns errcode")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill unnecessary bch2_dev_usage_read()
Kent Overstreet [Sat, 22 Mar 2025 01:16:50 +0000 (21:16 -0400)]
bcachefs: Kill unnecessary bch2_dev_usage_read()

bch2_dev_usage_read() is fairly expensive, we should optimize this more.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: btree node write errors now print btree node
Kent Overstreet [Sat, 22 Mar 2025 20:26:32 +0000 (16:26 -0400)]
bcachefs: btree node write errors now print btree node

It turned out a user was wondering why we were going read-only after a
write error, and he didn't realize he didn't have replication enabled -
this will make that more obvious, and we should be printing it anyways.

Link: https://www.reddit.com/r/bcachefs/comments/1jf9akl/large_data_transfers_switched_bcachefs_to_readonly/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix race in print_chain()
Kent Overstreet [Sat, 22 Mar 2025 01:53:41 +0000 (21:53 -0400)]
bcachefs: Fix race in print_chain()

00636 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b0
00636 Mem abort info:
00636   ESR = 0x0000000096000005
00636   EC = 0x25: DABT (current EL), IL = 32 bits
00636   SET = 0, FnV = 0
00636   EA = 0, S1PTW = 0
00636   FSC = 0x05: level 1 translation fault
00636 Data abort info:
00636   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
00636   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00636   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00636 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000101b10000
00636 [00000000000000b0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00636 Internal error: Oops: 0000000096000005 [#1] SMP
00636 Modules linked in:
00636 CPU: 12 UID: 0 PID: 79369 Comm: cat Not tainted 6.14.0-rc6-ktest-g3783b8973ab7 #17757
00636 Hardware name: linux,dummy-virt (DT)
00636 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00636 pc : print_chain+0xb8/0x170
00636 lr : print_chain+0xa0/0x170
00636 sp : ffffff80d9c1bbb0
00636 x29: ffffff80d9c1bbb0 x28: 0000000000000002 x27: ffffff80c1be8250
00636 x26: ffffff80dd9b0000 x25: 0000000000000020 x24: 000000000000002d
00636 x23: 000000000000003c x22: ffffffc080a54518 x21: ffffff80da6e00d0
00636 x20: ffffff80da6e0170 x19: ffffff80c1a1d240 x18: 00000000ffffffff
00636 x17: 3535303937202d3c x16: 203139202d3c2035 x15: 00000000ffffffff
00636 x14: 0000000000000000 x13: ffffff80d71b63f1 x12: 0000000000000006
00636 x11: ffffffc080beb1c0 x10: 0000000000000020 x9 : 00000000000134cc
00636 x8 : 0000000000000020 x7 : 0000000000000004 x6 : 0000000000000020
00636 x5 : ffffff80d71b63f7 x4 : ffffffc080a5451b x3 : 0000000000000000
00636 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
00636 Call trace:
00636  print_chain+0xb8/0x170 (P)
00636  bch2_check_for_deadlock+0x444/0x5a0
00636  bch2_btree_deadlock_read+0xb4/0x1c8
00636  full_proxy_read+0x74/0xd8
00636  vfs_read+0x90/0x300

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: btree_trans_restart_foreign_task()
Kent Overstreet [Fri, 21 Mar 2025 18:22:39 +0000 (14:22 -0400)]
bcachefs: btree_trans_restart_foreign_task()

In debug mode, we save the call stack on transaction restart - but
there's no locking, so we can't touch it if we're issuing the restart
from another thread.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_disk_accounting_mod2()
Kent Overstreet [Fri, 21 Mar 2025 16:29:56 +0000 (12:29 -0400)]
bcachefs: bch2_disk_accounting_mod2()

We're hitting some issues with uninitialized struct padding, flagged by
kmsan.

They appear to be falso positives, otherwise bch2_accounting_validate()
would have flagged them as "junk at end". But for now, we'll need to
initialize disk_accounting_pos with memset().

This adds a new helper, bch2_disk_accounting_mod2(), that initializes a
disk_accounting_pos and does the accounting mod all at once - so overall
things actually get slightly more ergonomic.

BCH_DISK_ACCOUNTING_replicas keys are left for now; KMSAN isn't warning
about them and they're a bit special.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: zero init journal bios
Kent Overstreet [Fri, 21 Mar 2025 15:30:09 +0000 (11:30 -0400)]
bcachefs: zero init journal bios

fix a kmsan splat

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Eliminate padding in move_bucket_key
Kent Overstreet [Thu, 20 Mar 2025 18:15:33 +0000 (14:15 -0400)]
bcachefs: Eliminate padding in move_bucket_key

We appear to be tripping over a compiler/kmsan bug with padding fields -
this is an easy workaround.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix a KMSAN splat in btree_update_nodes_written()
Kent Overstreet [Thu, 20 Mar 2025 18:54:49 +0000 (14:54 -0400)]
bcachefs: Fix a KMSAN splat in btree_update_nodes_written()

We may sometimes read from uninitialized memory; we know, and that's ok.

We check if a btree node has been reused before waiting on any
outstanding IO.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: kmsan asserts
Kent Overstreet [Thu, 20 Mar 2025 18:17:53 +0000 (14:17 -0400)]
bcachefs: kmsan asserts

Catching these early makes them a lot easier to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix kmsan warnings in bch2_extent_crc_pack()
Kent Overstreet [Thu, 20 Mar 2025 17:24:50 +0000 (13:24 -0400)]
bcachefs: Fix kmsan warnings in bch2_extent_crc_pack()

We store to all fields, so the kmsan warnings were spurious - but
initializing via stores to bitfields appear to have been giving the
compiler/kmsan trouble, and they're not necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Disable asm memcpys when kmsan enabled
Kent Overstreet [Thu, 20 Mar 2025 16:38:59 +0000 (12:38 -0400)]
bcachefs: Disable asm memcpys when kmsan enabled

kmsan doesn't know about inline assembly, obviously; this will close a
ton of syzbot bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Handle backpointers with unknown data types
Kent Overstreet [Wed, 19 Mar 2025 21:01:38 +0000 (17:01 -0400)]
bcachefs: Handle backpointers with unknown data types

New data types might be added later, so we don't want to disallow
unknown data types - that'll be a compatibility hassle later. Instead,
ignore them.

Reported-by: syzbot+3a290f5ff67ca3023834@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Count BCH_DATA_parity backpointers correctly
Kent Overstreet [Thu, 20 Mar 2025 15:53:50 +0000 (11:53 -0400)]
bcachefs: Count BCH_DATA_parity backpointers correctly

These are counted as stripe data in the corresponding alloc keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Run bch2_check_dirent_target() at lookup time
Kent Overstreet [Thu, 20 Mar 2025 15:41:07 +0000 (11:41 -0400)]
bcachefs: Run bch2_check_dirent_target() at lookup time

More on the "full online self healing" project:

We now run most of the dirent <-> inode consistency checks, with repair
code, at runtime - the exact same check and repair code that fsck runs.

This will allow us to repair the "dirent points to inode that does not
point back" inconsistencies that have been popping up at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Refactor bch2_check_dirent_target()
Kent Overstreet [Thu, 20 Mar 2025 15:06:50 +0000 (11:06 -0400)]
bcachefs: Refactor bch2_check_dirent_target()

Prep work for calling bch2_check_dirent_target() from bch2_lookup().

- Add an inline wrapper, if the target and backpointer match we can skip
  the function call.

- We don't (yet?) want to remove the dirent we did the lookup from (when
  we find a directory or subvol with multiple valid dirents pointing to
  it), we can defer on that until later. For now, add an "are we in
  fsck?" parameter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Move bch2_check_dirent_target() to namei.c
Kent Overstreet [Thu, 20 Mar 2025 15:06:50 +0000 (11:06 -0400)]
bcachefs: Move bch2_check_dirent_target() to namei.c

We're gradually running more and more fsck.c checks at runtime,
whereever applicable; when we do so they get moved out of fsck.c.

Next patch will call bch2_check_dirent_target() from bch2_lookup().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fs-common.c -> namei.c
Kent Overstreet [Thu, 20 Mar 2025 14:53:52 +0000 (10:53 -0400)]
bcachefs: fs-common.c -> namei.c

name <-> inode, code for managing the relationships between inodes and
dirents.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: EIO cleanup
Kent Overstreet [Thu, 20 Mar 2025 14:16:48 +0000 (10:16 -0400)]
bcachefs: EIO cleanup

Replace these with proper private error codes, so that when we get an
error message we're not sifting through the entire codebase to see where
it came from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_write_prep_encoded_data() now returns errcode
Kent Overstreet [Thu, 20 Mar 2025 14:30:51 +0000 (10:30 -0400)]
bcachefs: bch2_write_prep_encoded_data() now returns errcode

Prep work for killing off EIO and replacing them with proper private
error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Simplify bch2_write_op_error()
Kent Overstreet [Thu, 20 Mar 2025 14:25:15 +0000 (10:25 -0400)]
bcachefs: Simplify bch2_write_op_error()

There's no reason for the caller to do the actual logging, it's all done
the same.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix block/btree node size defaults
Kent Overstreet [Wed, 19 Mar 2025 16:33:40 +0000 (12:33 -0400)]
bcachefs: Fix block/btree node size defaults

We're fixing option parsing in userspace, it now obeys
OPT_SB_FIELD_SECTORS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Add missing smp_rmb()
Alan Huang [Tue, 18 Mar 2025 07:50:00 +0000 (15:50 +0800)]
bcachefs: Add missing smp_rmb()

The smp_rmb() guarantees that reads from reservations.counter
occur before accessing cur_entry_u64s. It's paired with the
atomic64_try_cmpxchg in journal_entry_open.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill JOURNAL_ERRORS()
Kent Overstreet [Tue, 18 Mar 2025 19:52:08 +0000 (15:52 -0400)]
bcachefs: Kill JOURNAL_ERRORS()

Convert these to standard error codes, which means we can pass them
outside the journal code, they're easier to pass to tracepoints, etc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Filesystem discard option now propagates to devices
Kent Overstreet [Thu, 13 Mar 2025 04:54:10 +0000 (00:54 -0400)]
bcachefs: Filesystem discard option now propagates to devices

the discard option is special, because it's both a filesystem and a
device option.

When set at the filesytsem level, it's supposed to propagate to (if set
persistently via sysfs) or override (if non persistently as a mount
option) the devices - that now works correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Device state is now a runtime option
Kent Overstreet [Thu, 13 Mar 2025 04:55:52 +0000 (00:55 -0400)]
bcachefs: Device state is now a runtime option

Other options can normally be set at runtime via sysfs, no reason for
this one not to be as well - it just doesn't support the degraded flags
argument this way, that requires the ioctl.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Setting foreground_target at runtime now triggers rebalance
Kent Overstreet [Thu, 13 Mar 2025 04:55:23 +0000 (00:55 -0400)]
bcachefs: Setting foreground_target at runtime now triggers rebalance

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Device options now use standard sysfs code
Kent Overstreet [Tue, 11 Mar 2025 22:44:25 +0000 (18:44 -0400)]
bcachefs: Device options now use standard sysfs code

Device options now use the common code for sysfs, and can superblock
fields (in a struct bch_member).

This replaces BCH_DEV_OPT_SETTERS(), which was weird and easy to miss.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill BCH_DEV_OPT_SETTERS()
Kent Overstreet [Thu, 13 Mar 2025 16:05:50 +0000 (12:05 -0400)]
bcachefs: Kill BCH_DEV_OPT_SETTERS()

Previously, device options had their superblock option field listed
separately, which was weird and easy to miss when defining options.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Remove spurious smp_mb()
Alan Huang [Tue, 18 Mar 2025 07:50:01 +0000 (15:50 +0800)]
bcachefs: Remove spurious smp_mb()

The smp_mb() is paired with nothing.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix incorrect state count
Alan Huang [Mon, 17 Mar 2025 17:54:24 +0000 (01:54 +0800)]
bcachefs: Fix incorrect state count

atomic64_read(&j->seq) - j->seq_write_started == JOURNAL_STATE_BUF_NR is
the condition in journal_entry_open where we return JOURNAL_ERR_max_open,
so journal_cur_seq(j) - seq == JOURNAL_STATE_BUF_NR means that the buf
corresponding to seq has started to write.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix btree iter flags in data move
Kent Overstreet [Mon, 17 Mar 2025 19:07:06 +0000 (15:07 -0400)]
bcachefs: Fix btree iter flags in data move

Rebalance requires a not_extents iterator.

This wasn't hit before because all_snapshots disableds is_extents on
snapshots btrees - but has no effect on the reflink btree.

Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Validate bch_sb.offset field
Kent Overstreet [Mon, 17 Mar 2025 17:58:51 +0000 (13:58 -0400)]
bcachefs: Validate bch_sb.offset field

This was missed - but it needs to be correct for the superblock recovery
tool that scans the start and end of the device for backup superblocks:
we don't want to pick up superblocks that belong to a different
partition that starts at a different offset.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_sb_validate() doesn't need bch_sb_handle
Kent Overstreet [Mon, 17 Mar 2025 14:54:21 +0000 (10:54 -0400)]
bcachefs: bch2_sb_validate() doesn't need bch_sb_handle

Minor refactoring, so that bch2_sb_validate() can be used in the new
userspace superblock recovery tool.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Add missing random.h includes
Kent Overstreet [Mon, 17 Mar 2025 15:28:26 +0000 (11:28 -0400)]
bcachefs: Add missing random.h includes

Fix build in userspace, and good hygeine.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Better incompat version/feature error messages
Kent Overstreet [Sat, 15 Mar 2025 23:57:20 +0000 (19:57 -0400)]
bcachefs: Better incompat version/feature error messages

If we can't mount because of an incompatibility, print what's supported
and unsupported - to help solve PEBKAC issues.

Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix offset_into_extent in data move path
Kent Overstreet [Sat, 15 Mar 2025 21:27:27 +0000 (17:27 -0400)]
bcachefs: Fix offset_into_extent in data move path

Fixes the following:

[   17.607394] kernel BUG at fs/bcachefs/reflink.c:261!
[   17.608316] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   17.608485] CPU: 0 UID: 0 PID: 564 Comm: bch-rebalance/3 Tainted: G           OE      6.14.0-rc6-arch1-gfcb0bd9609d2 #7 0efd7a8f4a00afeb2c5fb6e7ecb1aec8ddcbb1e1
[   17.608616] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   17.608736] Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650 TOMAHAWK WIFI (MS-7D75), BIOS 1.74 08/01/2023
[   17.608855] RIP: 0010:bch2_lookup_indirect_extent+0x252/0x290 [bcachefs]
[   17.609006] Code: 00 00 00 00 e8 7f 51 f5 ff 89 c3 85 c0 74 52 48 8b 7d b0 4c 89 ee e8 4d 4b f4 ff 48 63 d3 48 89 d0 31 d2 e9 2e ff ff ff 0f 0b <0f> 0b 48 8b 7d b0 4c 89 ee 48 89 55 a8 e8 2c 4b f4 ff 4c 8b 55 a8
[   17.609136] RSP: 0018:ffffa3714455f850 EFLAGS: 00010246
[   17.609261] RAX: 0000000000000080 RBX: ffff895891098790 RCX: 0000000000000000
[   17.609387] RDX: 0000000000000080 RSI: ffffa3714455fa90 RDI: ffff895889550000
[   17.609511] RBP: ffffa3714455f8c0 R08: ffff895891098790 R09: 0000000000000001
[   17.609637] R10: ffffa3714455f8d8 R11: ffffa3714455f950 R12: ffffa3714455fa58
[   17.609763] R13: ffff895891098790 R14: ffffa3714455fa58 R15: ffff895889550000
[   17.609888] FS:  0000000000000000(0000) GS:ffff896757c00000(0000) knlGS:0000000000000000
[   17.610015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.610143] CR2: 0000716b8cda2750 CR3: 0000000914e22000 CR4: 0000000000f50ef0
[   17.610272] PKRU: 55555554
[   17.610403] Call Trace:
[   17.610535]  <TASK>
[   17.610662]  ? __die_body.cold+0x19/0x27
[   17.610791]  ? die+0x2e/0x50
[   17.610918]  ? do_trap+0xca/0x110
[   17.611049]  ? do_error_trap+0x6a/0x90
[   17.611178]  ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.611331]  ? exc_invalid_op+0x50/0x70
[   17.611468]  ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.611620]  ? asm_exc_invalid_op+0x1a/0x20
[   17.611757]  ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.611911]  ? bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612084]  bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612256]  ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612431]  ? bch2_move_extent+0x3d7/0x6e0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612607]  ? __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612782]  __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.612959]  ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.613149]  do_rebalance+0x517/0x8d0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.613342]  ? local_clock_noinstr+0xd/0xd0
[   17.613518]  ? local_clock+0x15/0x30
[   17.613693]  ? __bch2_trans_get+0x152/0x300 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.613890]  ? __pfx_bch2_rebalance_thread+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]
[   17.614090]  bch2_rebalance_thread+0x66/0xb0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2]

The offset_into_extent bit was copied from the read path, but it's
unnecessary here, where we always want to read and move the entire
indirect extent, and it causes the assertion pop - because we're using a
non-extents iterator, which always points to the end of the reflink
pointer.

Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: use sha256() instead of crypto_shash API
Eric Biggers [Sun, 16 Mar 2025 03:47:17 +0000 (20:47 -0700)]
bcachefs: use sha256() instead of crypto_shash API

Just use sha256() instead of the clunky crypto API.  This is much
simpler.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Remove unnecessary softdeps on crc32c and crc64
Eric Biggers [Sun, 16 Mar 2025 03:03:19 +0000 (20:03 -0700)]
bcachefs: Remove unnecessary softdeps on crc32c and crc64

Since bcachefs does not access crc32c and crc64 through the crypto API,
there is no need to use module softdeps to ensure they are loaded.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: #if 0 out (enable|disable)_encryption()
Kent Overstreet [Sun, 16 Mar 2025 17:39:14 +0000 (13:39 -0400)]
bcachefs: #if 0 out (enable|disable)_encryption()

These weren't hooked up, but they probably should be - add some comments
for context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Improve can_write_extent()
Kent Overstreet [Sun, 16 Mar 2025 01:32:33 +0000 (21:32 -0400)]
bcachefs: Improve can_write_extent()

This fixes another "rebalance spinning and doing no work" issue;
rebalance was reading extents it wanted to move, but then failing in
bch2_write() -> bch2_alloc_sectors_start() due to being unable to
allocate sufficient replicas.

This was triggered by a user playing with the durability settings, the
foreground device was an NVME device with durability=2, and originally
he'd set the background device to durability=2 as well, but changed it
back to 1 (the default) after seeing IO errors.

That meant that with replicas=2, we want to move data off the NVME
device which satisfies that constraint, but with a single durability=1
device on the background target there's no way to move the extent to
that target while satisfiying the "required replicas" constraint.

The solution for now is for bch2_data_update_init() to check for this,
and return an error - before kicking off the read.

bch2_data_update_init() already had two different checks for "will we be
able to write this extent", with partially duplicated code, so this
patch combines and improves that logic.

Additionally, we now always bail out and return an error if there's
insufficient space on the destination target. Previously, we only did
this for BCH_WRITE_alloc_nowait moves, because it might be the case that
copygc just needs to free up space on the destination target.

But we really shouldn't kick off a move if the destination is full, we
can't currently distinguish between "really full" and "just need to wait
for copygc", and if we are going to wait on copygc it'd be better to do
that before kicking off the move.

This will additionally fix "rebalance spinning" issues caused by a
filesystem that has more data than can fit in background_target - which
is a valid scenario, since we don't exclude foreground/cache devices
when calculating filesystem capacity.

Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: trace_io_move_write_fail
Kent Overstreet [Sat, 15 Mar 2025 23:24:44 +0000 (19:24 -0400)]
bcachefs: trace_io_move_write_fail

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Increase blacklist range
Alan Huang [Sat, 15 Mar 2025 07:39:42 +0000 (15:39 +0800)]
bcachefs: Increase blacklist range

Now there are 16 journal buffers, 8 is too small to be enough.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: __bch2_read() now takes a btree_trans
Kent Overstreet [Mon, 10 Mar 2025 17:33:41 +0000 (13:33 -0400)]
bcachefs: __bch2_read() now takes a btree_trans

Next patch will be checking if the extent we're reading from matches the
IO failure we saw before marking the failure.

For this to work, __bch2_read() needs to take the same transaction
context that bch2_rbio_retry() uses to do that check.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: BCH_READ_data_update -> bch_read_bio.data_update
Kent Overstreet [Wed, 12 Mar 2025 20:56:09 +0000 (16:56 -0400)]
bcachefs: BCH_READ_data_update -> bch_read_bio.data_update

Read flags are codepath dependent and change as they're passed around,
while the fields in rbio._state are mostly fixed properties of that
particular object.

Losing track of BCH_READ_data_update would be bad, and previously it was
not obvious if it was always correctly set in the rbio, so this is a
safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Checksum errors get additional retries
Kent Overstreet [Sat, 8 Mar 2025 17:56:43 +0000 (12:56 -0500)]
bcachefs: Checksum errors get additional retries

It's possible for checksum errors to be transient - e.g. flakey
controller or cable, thus we need additional retries (besides retrying
from different replicas) before we can definitely return an error.

This is particularly important for the next patch, which will allow the
data move path to move extents with checksum errors - we don't want to
accidentally introduce bitrot due to a transient error!

- bch2_bkey_pick_read_device() is substantially reworked, and
  bch2_dev_io_failures is expanded to record more information about the
  type of failure (i.e. number of checksum errors).

  It now returns an error code that describes more precisely the reason
  for the failure - checksum error, io error, or offline device, instead
  of the previous generic "insufficient devices". This is important for
  the next patches that add poisoning, as we only want to poison extents
  when we've got real checksum errors (or perhaps IO errors?) - not
  because a device was offline.

- Add a new option and superblock field for the number of checksum
  retries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Print message on successful read retry
Kent Overstreet [Sat, 8 Mar 2025 23:42:34 +0000 (18:42 -0500)]
bcachefs: Print message on successful read retry

Users have been asking for this, and now that errors are returned to the
top level read retry path - we can.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Return errors to top level bch2_rbio_retry()
Kent Overstreet [Sun, 9 Mar 2025 00:37:10 +0000 (19:37 -0500)]
bcachefs: Return errors to top level bch2_rbio_retry()

Next patch will be adding an additional retry loop for checksum errors,
so that we can rule out transient errors before marking an extent as
poisoned.

Prerequisite to this is returning errors to bch2_rbio_retry(); this will
also let us add a "successful retry" message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: BCH_ERR_data_read_buffer_too_small
Kent Overstreet [Sat, 8 Mar 2025 16:37:51 +0000 (11:37 -0500)]
bcachefs: BCH_ERR_data_read_buffer_too_small

Now that the read path uses proper error codes, we can get rid of the
weird rbio->hole signalling to the move path that the read didn't
happen.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Read error message now indicates if it was for an internal move
Kent Overstreet [Sat, 8 Mar 2025 16:24:22 +0000 (11:24 -0500)]
bcachefs: Read error message now indicates if it was for an internal move

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path
Kent Overstreet [Tue, 11 Mar 2025 13:04:09 +0000 (09:04 -0400)]
bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path

When we do a read to a buffer that's mapped into userspace, it's
possible to get a spurious checksum error if userspace was modified the
buffer at the same time.

When we retry those, they have to be bounced before we know definitively
whether we're reading corrupt data.

But the retry path propagates read flags differently, so needs special
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Convert read path to standard error codes
Kent Overstreet [Fri, 7 Mar 2025 22:20:22 +0000 (17:20 -0500)]
bcachefs: Convert read path to standard error codes

Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard
error codes that describe precisely which error occured.

This is going to be used for the data move path, to move but poison
extents with checksum errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Debug params for data corruption injection
Kent Overstreet [Sat, 8 Mar 2025 23:42:56 +0000 (18:42 -0500)]
bcachefs: Debug params for data corruption injection

dm-flakey is busted, and this is simpler anyways - this lets us test the
checksum error retry ptahs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Don't create bch_io_failures unless it's needed
Kent Overstreet [Mon, 10 Mar 2025 15:54:13 +0000 (11:54 -0400)]
bcachefs: Don't create bch_io_failures unless it's needed

Only needed in retry path, no point in wasting stack space.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: bch2_bkey_ptrs_rebalance_opts()
Kent Overstreet [Thu, 13 Mar 2025 04:47:51 +0000 (00:47 -0400)]
bcachefs: bch2_bkey_ptrs_rebalance_opts()

Small optimization for bch2_bkey_sectors_need_rebalance()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Add a cond_resched() to btree cache teardown
Kent Overstreet [Fri, 14 Mar 2025 22:19:17 +0000 (18:19 -0400)]
bcachefs: Add a cond_resched() to btree cache teardown

[12308.606480] watchdog: BUG: soft lockup - CPU#18 stuck for 26s! [umount:48479]
[12308.606485] Modules linked in: bcachefs lz4hc_compress lz4_compress lz4_decompress sunrpc overlay nf_conntrack_netlink xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE bridge stp llc xfrm_user ip6table_nat ip6table_filter ip6_tables iptable_nat xt_addrtype iptable_filter ip_tables x_tables nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample ext4 mbcache jbd2 nls_iso8859_1 nls_cp850 vfat fat binfmt_misc skx_edac_common nfit edac_core libnvdimm cbc encrypted_keys intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common ipmi_ssif x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm drivetemp rapl intel_cstate coretemp mgag200 i2c_algo_bit ixgbe drm_shmem_helper drm_kms_helper mdio_devres xfrm_algo mdio drm ptp intel_uncore mei_me efi_pstore evdev uas pl2303 pps_core libphy usb_storage usbserial lpc_ich mei drm_panel_orientation_quirks acpi_power_meter tiny_power_button ipmi_si mfd_core intel_pch_thermal acpi_tad acpi_ipmi ioatdma
[12308.606541]  ipmi_devintf ipmi_msghandler dca wmi button efivarfs polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sha1_generic xhci_pci xhci_hcd aesni_intel ehci_pci ehci_hcd gf128mul crypto_simd cryptd usbcore hpwdt usb_common
[12308.606557] CPU: 18 UID: 0 PID: 48479 Comm: umount Tainted: G             L     6.14.0-rc6-x86_64-00159-ga09496a03e63 #1
[12308.606560] Tainted: [L]=SOFTLOCKUP
[12308.606561] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 07/20/2023
[12308.606563] RIP: 0010:clear_page_erms+0x7/0x10
[12308.606570] Code: 48 89 47 38 48 8d 7f 40 75 d9 90 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[12308.606572] RSP: 0018:ffff9ed5b622fba0 EFLAGS: 00010246
[12308.606574] RAX: 0000000000000000 RBX: ffff90347fffe6c0 RCX: 00000000000004c0
[12308.606575] RDX: ffffe34ea9bec1c0 RSI: 00000000000405f0 RDI: ffff902eafb07b40
[12308.606576] RBP: ffff9ed5b622fbf0 R08: 0000000000000001 R09: 0000000000000006
[12308.606577] R10: 0000000000040001 R11: 0000000000000000 R12: ffffe34ea9bec000
[12308.606578] R13: 0000000000000000 R14: 0000000000000006 R15: ffffe34ea9bed000
[12308.606580] FS:  00007fe704ecfb68(0000) GS:ffff9053fea00000(0000) knlGS:0000000000000000
[12308.606581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12308.606582] CR2: 00007f18159068ae CR3: 00000001314d0005 CR4: 00000000007726f0
[12308.606583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12308.606584] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[12308.606584] PKRU: 55555554
[12308.606585] Call Trace:
[12308.606587]  <IRQ>
[12308.606590]  ? show_regs.cold+0x19/0x28
[12308.606595]  ? watchdog_timer_fn.cold+0x3d/0x9d
[12308.606598]  ? __pfx_watchdog_timer_fn+0x10/0x10
[12308.606602]  ? __hrtimer_run_queues+0x12e/0x250
[12308.606607]  ? hrtimer_interrupt+0xfd/0x220
[12308.606609]  ? __sysvec_apic_timer_interrupt+0x53/0xe0
[12308.606614]  ? sysvec_apic_timer_interrupt+0x76/0xa0
[12308.606619]  </IRQ>
[12308.606620]  <TASK>
[12308.606620]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[12308.606626]  ? clear_page_erms+0x7/0x10
[12308.606628]  ? __free_pages_ok+0x374/0x640
[12308.606633]  free_frozen_pages+0x34/0x570
[12308.606636]  __folio_put+0x87/0xe0
[12308.606641]  free_large_kmalloc+0x70/0x80
[12308.606645]  kfree+0x2f6/0x390
[12308.606648]  kvfree+0x2d/0x40
[12308.606653]  __btree_node_data_free+0xaf/0xf0 [bcachefs]
[12308.606726]  btree_node_data_free+0x6a/0x80 [bcachefs]
[12308.606778]  bch2_fs_btree_cache_exit+0x262/0x440 [bcachefs]
[12308.606829]  bch2_fs_release+0xe8/0x340 [bcachefs]
[12308.606905]  kobject_put+0x60/0xc0
[12308.606908]  bch2_fs_free+0xdd/0x120 [bcachefs]
[12308.606981]  bch2_kill_sb+0x1e/0x30 [bcachefs]
[12308.607051]  deactivate_locked_super+0x32/0xb0
[12308.607055]  deactivate_super+0x40/0x50
[12308.607057]  cleanup_mnt+0xc3/0x160
[12308.607060]  __cleanup_mnt+0x12/0x20
[12308.607062]  task_work_run+0x5f/0xa0
[12308.607064]  syscall_exit_to_user_mode+0x194/0x1a0
[12308.607066]  do_syscall_64+0x67/0x170
[12308.607068]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[12308.607070] RIP: 0033:0x7fe704e66eed
[12308.607073] Code: 08 49 89 ca b8 a5 00 00 00 0f 05 48 89 c7 e8 8a e6 ff ff 48 83 c4

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: rebalance, copygc status also print stacktrace
Kent Overstreet [Thu, 13 Mar 2025 19:21:13 +0000 (15:21 -0400)]
bcachefs: rebalance, copygc status also print stacktrace

These are commonly needed when debugging, and saves from having to ask
users to dig.

Also, rebalance_status now includes pending rebalance work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Kill bch2_remount()
Kent Overstreet [Thu, 13 Mar 2025 15:44:52 +0000 (11:44 -0400)]
bcachefs: Kill bch2_remount()

Single caller, so inline it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Kill a bit of dead code
Kent Overstreet [Tue, 11 Mar 2025 13:31:03 +0000 (09:31 -0400)]
bcachefs: Kill a bit of dead code

Found with CC=clang W=1

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Use max() to improve gen_after()
Thorsten Blum [Tue, 11 Mar 2025 11:13:11 +0000 (12:13 +0100)]
bcachefs: Use max() to improve gen_after()

Use max() to simplify gen_after() and improve its readability.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Remove unnecessary byte allocation
Thorsten Blum [Sat, 8 Mar 2025 19:53:53 +0000 (20:53 +0100)]
bcachefs: Remove unnecessary byte allocation

The extra byte is not used - remove it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: We no longer read stripes into memory at startup
Kent Overstreet [Tue, 11 Feb 2025 01:15:40 +0000 (20:15 -0500)]
bcachefs: We no longer read stripes into memory at startup

And the stripes heap gets deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: trace_stripe_create
Kent Overstreet [Fri, 7 Mar 2025 19:30:29 +0000 (14:30 -0500)]
bcachefs: trace_stripe_create

Add a simple tracepoint for stripe creation, we'll want to expand this
later.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: get_existing_stripe() uses new stripe lru
Kent Overstreet [Tue, 11 Feb 2025 01:34:47 +0000 (20:34 -0500)]
bcachefs: get_existing_stripe() uses new stripe lru

Convert to the new persistent stripe LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: ec_stripe_delete() uses new stripe lru
Kent Overstreet [Tue, 11 Feb 2025 01:35:08 +0000 (20:35 -0500)]
bcachefs: ec_stripe_delete() uses new stripe lru

Convert to the new persistent stripe LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: journal write path comment
Kent Overstreet [Fri, 7 Mar 2025 17:00:56 +0000 (12:00 -0500)]
bcachefs: journal write path comment

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>