]> www.infradead.org Git - nvme.git/log
nvme.git
9 months agobcachefs: Fix fsck warning about btree_trans not passed to fsck error
Kent Overstreet [Mon, 15 Jul 2024 23:03:17 +0000 (19:03 -0400)]
bcachefs: Fix fsck warning about btree_trans not passed to fsck error

If a btree_trans is in use it's supposed to be passed to fsck_err so
that it can be unlocked if we're waiting on userspace input; but the
btree IO paths do call fsck errors where a btree_trans exists on the
stack but it's not passed through.

But it's ok, because it's unlocked while doing IO.

Fixes: a850bde6498b ("bcachefs: fsck_err() may now take a btree_trans")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Add an error message for insufficient rw journal devs
Kent Overstreet [Mon, 15 Jul 2024 20:30:44 +0000 (16:30 -0400)]
bcachefs: Add an error message for insufficient rw journal devs

This causes us to go read-only - need an error message saying why.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: varint: Avoid left-shift of a negative value
Tavian Barnes [Fri, 21 Jun 2024 20:39:58 +0000 (16:39 -0400)]
bcachefs: varint: Avoid left-shift of a negative value

Shifting a negative value left is undefined.

Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: darray: Don't pass NULL to memcpy()
Tavian Barnes [Fri, 21 Jun 2024 20:29:32 +0000 (16:29 -0400)]
bcachefs: darray: Don't pass NULL to memcpy()

memcpy's second parameter must not be NULL, even if size is zero.

Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill bch2_assert_btree_nodes_not_locked()
Kent Overstreet [Sun, 14 Jul 2024 23:51:01 +0000 (19:51 -0400)]
bcachefs: Kill bch2_assert_btree_nodes_not_locked()

We no longer track individual btree node locks with lockdep, so this
will never be enabled.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED
Kent Overstreet [Mon, 28 Aug 2023 20:13:18 +0000 (16:13 -0400)]
bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: __bch2_read(): call trans_begin() on every loop iter
Kent Overstreet [Sun, 14 Jul 2024 20:32:11 +0000 (16:32 -0400)]
bcachefs: __bch2_read(): call trans_begin() on every loop iter

perusal of /sys/kernel/debug/bcachefs/*/btree_transaction_stats shows
that the read path has been acculumalating unneeded paths on the reflink
btree, which we don't want.

The solution is to call bch2_trans_begin(), which drops paths not used
on previous loop iteration.

bch2_readahead:
  Max mem used: 0
  Transaction duration:
    count:      194235
                           since mount        recent
    duration of events
      min:                      150 ns
      max:                        9 ms
      total:                    838 ms
      mean:                       4 us          6 us
      stddev:                    34 us          7 us
    time between events
      min:                       10 ns
      max:                       15 h
      mean:                       2 s          12 s
      stddev:                     2 s           3 ms
  Maximum allocated btree paths (193):
    path: idx  2 ref 0:0 P   btree=extents l=0 pos 270943112:392:U32_MAX locks 0
    path: idx  3 ref 1:0   S btree=extents l=0 pos 270943112:24578:U32_MAX locks 1
    path: idx  4 ref 0:0 P   btree=reflink l=0 pos 0:24773509:0 locks 0
    path: idx  5 ref 0:0 P S btree=reflink l=0 pos 0:24773631:0 locks 1
    path: idx  6 ref 0:0 P S btree=reflink l=0 pos 0:24773759:0 locks 1
    path: idx  7 ref 0:0 P S btree=reflink l=0 pos 0:24773887:0 locks 1
    path: idx  8 ref 0:0 P S btree=reflink l=0 pos 0:24774015:0 locks 1
    path: idx  9 ref 0:0 P S btree=reflink l=0 pos 0:24774143:0 locks 1
    path: idx 10 ref 0:0 P S btree=reflink l=0 pos 0:24774271:0 locks 1
<many more reflink paths>

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: show none if label is not set
Hongbo Li [Fri, 12 Jul 2024 07:09:25 +0000 (15:09 +0800)]
bcachefs: show none if label is not set

If label is not set, the Label tag in superblock info show '(none)'.

```
[Before]
Device index:                               0
Label:
Version:                                    1.4: member_seq

[After]
Device index:                               0
Label:                                      (none)
Version:                                    1.4: member_seq
```

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: drop packed, aligned from bkey_inode_buf
Kent Overstreet [Fri, 12 Jul 2024 18:35:46 +0000 (14:35 -0400)]
bcachefs: drop packed, aligned from bkey_inode_buf

Unnecessary here, and this broke the rust bindings:

error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29025:1
      |
29025 | pub struct bkey_i_inode_v3 {
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
note: `bch_inode_v3` has a `#[repr(align)]` attribute
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1
      |
8949  | pub struct bch_inode_v3 {
      | ^^^^^^^^^^^^^^^^^^^^^^^

error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32826:1
      |
32826 | pub struct bkey_inode_buf {
      | ^^^^^^^^^^^^^^^^^^^^^^^^^
      |
note: `bch_inode_v3` has a `#[repr(align)]` attribute
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1
      |
8949  | pub struct bch_inode_v3 {
      | ^^^^^^^^^^^^^^^^^^^^^^^
note: `bkey_inode_buf` contains a field of type `bkey_i_inode_v3`
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32827:9
      |
32827 |     pub inode: bkey_i_inode_v3,
      |         ^^^^^
note: ...which contains a field of type `bch_inode_v3`
     --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29027:9
      |
29027 |     pub v: bch_inode_v3,
      |         ^

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree node scan: fall back to comparing by journal seq
Kent Overstreet [Fri, 12 Jul 2024 18:16:01 +0000 (14:16 -0400)]
bcachefs: btree node scan: fall back to comparing by journal seq

highly damaged filesystems, or filesystems that have been damaged and
repair and damaged again, may have sequence numbers we can't fully trust
- which in itself is something we need to debug.

Add a journal_seq fallback so that repair doesn't get stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Add lockdep support for btree node locks
Kent Overstreet [Thu, 21 Dec 2023 23:54:09 +0000 (18:54 -0500)]
bcachefs: Add lockdep support for btree node locks

This adds lockdep tracking for held btree locks with a single dep_map in
btree_trans, i.e. tracking all held btree locks as one object.

This is more practical and more useful than having lockdep track held
btree locks individually, because
 - we can take more locks than lockdep can track (unbounded, now that we
   have dynamically resizable btree paths)
 - there's no lock ordering between btree locks for lockdep to track (we
   do cycle detection)
 - and this makes it easy to teach lockdep that btree locks are not safe
   to hold while invoking memory reclaim.

The last rule is one that lockdep would never learn, because we only do
trylock() from within shrinkers - but we very much do not want to be
invoking memory reclaim while holding btree node locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agolockdep: lockdep_set_notrack_class()
Kent Overstreet [Fri, 22 Dec 2023 01:34:17 +0000 (20:34 -0500)]
lockdep: lockdep_set_notrack_class()

Add a new helper to disable lockdep tracking entirely for a given class.

This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Improve copygc_wait_to_text()
Kent Overstreet [Sat, 29 Jun 2024 20:04:40 +0000 (16:04 -0400)]
bcachefs: Improve copygc_wait_to_text()

printing the raw values can occasionally be very useful

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Convert clock code to u64s
Kent Overstreet [Sat, 29 Jun 2024 22:08:20 +0000 (18:08 -0400)]
bcachefs: Convert clock code to u64s

Eliminate possible integer truncation bugs on 32 bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Improve startup message
Kent Overstreet [Sat, 29 Jun 2024 15:43:23 +0000 (11:43 -0400)]
bcachefs: Improve startup message

We're not always mounting when we start the filesystem

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Self healing on read IO error
Kent Overstreet [Fri, 28 Jun 2024 17:28:30 +0000 (13:28 -0400)]
bcachefs: Self healing on read IO error

This repurposes the promote path, which already knows how to call
data_update() after a read: we now automatically rewrite bad data when
we get a read error and then successfully retry from a different
replica.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Make read_only a mount option again, but hidden
Kent Overstreet [Fri, 28 Jun 2024 22:10:47 +0000 (18:10 -0400)]
bcachefs: Make read_only a mount option again, but hidden

fsck passes read_only as a mount option, and it's required for
nochanges, which it also uses.

Usually read_only is handled by the VFS, but we need to be able to
handle it too; we just don't want to print it out twice, so mark it as a
hidden option.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_extent_crc_unpacked_to_text()
Kent Overstreet [Fri, 28 Jun 2024 20:25:39 +0000 (16:25 -0400)]
bcachefs: bch2_extent_crc_unpacked_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Ratelimit checksum error messages
Kent Overstreet [Fri, 28 Jun 2024 17:51:38 +0000 (13:51 -0400)]
bcachefs: Ratelimit checksum error messages

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: spelling fix
Kent Overstreet [Fri, 28 Jun 2024 17:36:00 +0000 (13:36 -0400)]
bcachefs: spelling fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Simplify btree key cache fill path
Kent Overstreet [Sat, 8 Jun 2024 21:49:11 +0000 (17:49 -0400)]
bcachefs: Simplify btree key cache fill path

Don't allocate the new bkey_cached until after we've done the btree
lookup; this means we can kill bkey_cached.valid.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Improve "unable to allocate journal write" message
Kent Overstreet [Sun, 23 Jun 2024 06:13:44 +0000 (02:13 -0400)]
bcachefs: Improve "unable to allocate journal write" message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Fix missing BTREE_TRIGGER_bucket_invalidate flag
Kent Overstreet [Sun, 23 Jun 2024 22:48:22 +0000 (18:48 -0400)]
bcachefs: Fix missing BTREE_TRIGGER_bucket_invalidate flag

This fixes an accounting mismatch for cached data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Ensure buffered writes write as much as they can
Kent Overstreet [Sun, 10 Sep 2023 21:29:39 +0000 (17:29 -0400)]
bcachefs: Ensure buffered writes write as much as they can

This adds a new helper, bch2_folio_reservation_get_partial(), which
reserves as many blocks as possible and may return partial success.

__bch2_buffered_write() is switched to the new helper - this fixes
fstests generic/275, the write until -ENOSPC test.

generic/230 now fails: this appears to be a test bug, where xfs_io isn't
looping after a partial write to get the error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: support STATX_DIOALIGN for statx file
Hongbo Li [Thu, 20 Jun 2024 13:21:12 +0000 (21:21 +0800)]
bcachefs: support STATX_DIOALIGN for statx file

Add support for STATX_DIOALIGN to bcachefs, so that direct I/O alignment
restrictions are exposed to userspace in a generic way.

[Before]
```
./statx_test /mnt/bcachefs/test
statx(/mnt/bcachefs/test) = 0
dio mem align:0
dio offset align:0
```

[After]
```
./statx_test /mnt/bcachefs/test
statx(/mnt/bcachefs/test) = 0
dio mem align:1
dio offset align:512
```

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: split out lru_format.h
Kent Overstreet [Wed, 19 Jun 2024 13:00:11 +0000 (09:00 -0400)]
bcachefs: split out lru_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_btree_key_cache_drop() now evicts
Kent Overstreet [Sat, 8 Jun 2024 19:20:53 +0000 (15:20 -0400)]
bcachefs: bch2_btree_key_cache_drop() now evicts

As part of improving btree key cache coherency, the bkey_cached.valid
flag is going away.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: set fgf order hint before starting a buffered write
Pankaj Raghav [Fri, 14 Jun 2024 10:50:31 +0000 (10:50 +0000)]
bcachefs: set fgf order hint before starting a buffered write

Set the preferred folio order in the fgp_flags by calling
fgf_set_order(). Page cache will try to allocate large folio of the
preferred order whenever possible instead of allocating multiple 0 order
folios.

This improves the buffered write performance up to 1.25x with default
mount options and up to 1.57x when mounted with no_data_io option with
the following fio workload:

fio --name=bcachefs --filename=/mnt/test  --size=100G \
     --ioengine=io_uring --iodepth=16 --rw=write --bs=128k

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: use FGP_WRITEBEGIN instead of combining individual flags
Pankaj Raghav [Fri, 14 Jun 2024 10:50:30 +0000 (10:50 +0000)]
bcachefs: use FGP_WRITEBEGIN instead of combining individual flags

Use FGP_WRITEBEGIN to avoid repeating the individual FGP flags before
starting a buffered write.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Reduce the scope of gc_lock
Kent Overstreet [Thu, 13 Jun 2024 21:07:36 +0000 (17:07 -0400)]
bcachefs: Reduce the scope of gc_lock

gc_lock is now only for synchronization between check_alloc_info and
interior btree updates - nothing else

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: per_cpu_sum()
Kent Overstreet [Thu, 13 Jun 2024 18:11:48 +0000 (14:11 -0400)]
bcachefs: per_cpu_sum()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agoMAINTAINERS: remove Brian Foster as a reviewer for bcachefs
Brian Foster [Mon, 10 Jun 2024 12:26:39 +0000 (08:26 -0400)]
MAINTAINERS: remove Brian Foster as a reviewer for bcachefs

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: kill key cache arg to bch2_assert_pos_locked()
Kent Overstreet [Sat, 8 Jun 2024 20:46:58 +0000 (16:46 -0400)]
bcachefs: kill key cache arg to bch2_assert_pos_locked()

this is an internal implementation detail - and we're improving key
cache coherency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree_path_cached_set()
Kent Overstreet [Sat, 8 Jun 2024 19:24:14 +0000 (15:24 -0400)]
bcachefs: btree_path_cached_set()

new helper - small refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree_node_unlock() assert
Kent Overstreet [Sat, 8 Jun 2024 19:25:12 +0000 (15:25 -0400)]
bcachefs: btree_node_unlock() assert

we have a separate helper for releasing write locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_gc_pos_to_text()
Kent Overstreet [Sat, 8 Jun 2024 00:53:02 +0000 (20:53 -0400)]
bcachefs: bch2_gc_pos_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_btree_id_to_text()
Kent Overstreet [Fri, 7 Jun 2024 22:19:39 +0000 (18:19 -0400)]
bcachefs: bch2_btree_id_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill gc_pos_btree_node()
Kent Overstreet [Sat, 8 Jun 2024 00:51:57 +0000 (20:51 -0400)]
bcachefs: Kill gc_pos_btree_node()

gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits
and merges

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Fix bch2_gc_accounting_done() locking
Kent Overstreet [Thu, 6 Jun 2024 18:33:27 +0000 (14:33 -0400)]
bcachefs: Fix bch2_gc_accounting_done() locking

The transaction commit path takes mark_lock, so we shouldn't be holding
it; use a bpos as an iterator so that we can drop and retake.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_accounting_mem_gc()
Kent Overstreet [Thu, 6 Jun 2024 17:48:54 +0000 (13:48 -0400)]
bcachefs: bch2_accounting_mem_gc()

Add a new helper to free zeroed out accounting entries, and use it in
bch2_replicas_gc2(); bch2_replicas_gc2() was killing superblock replicas
entries if their corresponding accounting counters were nonzero, but
that's incorrect - the superblock replicas entry needs to exist if the
accounting entry exists, not if it's nonzero, because we check and
create the replicas entry when creating the new accounting entry - we
don't know when it's becoming nonzero.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Refactor disk accounting data structures
Kent Overstreet [Thu, 6 Jun 2024 17:25:28 +0000 (13:25 -0400)]
bcachefs: Refactor disk accounting data structures

Break up the percpu counter allocations into individual allocations for
each disk accounting counter; this fixes an issue on large systems where
we have too many replica entries to for the percpu allocator's max
practical size.

Also, use just one eytzinger tree for the normal set of counters and the
gc counters; this simplifies accounting_gc_done() where we need the same
set of counters to be present in both tables.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: fix smatch data leak warning in fs usage ioctl
Brian Foster [Thu, 6 Jun 2024 13:58:26 +0000 (09:58 -0400)]
bcachefs: fix smatch data leak warning in fs usage ioctl

smatch warns that the copy of arg to userspace is a potential data
leak by virtue of arg.pad not being checked or zeroed. This was
introduced by the commit referenced below that switched arg from
being a zeroed runtime allocation to living on the stack. Fix by
simply zero initializing the structure.

Fixes: cde738a61e65 ("bcachefs: Convert bch2_ioctl_fs_usage() to new accounting")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Fix race in bch2_accounting_mem_insert()
Kent Overstreet [Wed, 5 Jun 2024 16:35:48 +0000 (12:35 -0400)]
bcachefs: Fix race in bch2_accounting_mem_insert()

bch2_accounting_mem_insert() drops and retakes mark_lock; thus, we need
to check if the entry in question has already been inserted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_btree_insert() - add btree iter flags
Ariel Miculas [Mon, 3 Jun 2024 20:47:31 +0000 (23:47 +0300)]
bcachefs: bch2_btree_insert() - add btree iter flags

The commit 65bd44239727 ("bcachefs: bch2_btree_insert_trans() no longer
specifies BTREE_ITER_cached") removes BTREE_ITER_cached from
bch2_btree_insert_trans, which causes the update_inode function from
bcachefs-tools to take a long time (~20s).  Add an iter_flags parameter
to bch2_btree_insert, so the users can specify iter update trigger
flags, such as BTREE_ITER_cached.

Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: BCH_IOCTL_QUERY_ACCOUNTING
Kent Overstreet [Fri, 1 Mar 2024 23:43:39 +0000 (18:43 -0500)]
bcachefs: BCH_IOCTL_QUERY_ACCOUNTING

Add a new ioctl that can return the new accounting counter types; it
takes as input a bitmask of accounting types to return.

This will be used for returning e.g. compression accounting and
rebalance_work accounting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: support REMAP_FILE_DEDUP in bch2_remap_file_range
Reed Riley [Sat, 11 May 2024 00:20:12 +0000 (00:20 +0000)]
bcachefs: support REMAP_FILE_DEDUP in bch2_remap_file_range

By removing the early-exit when REMAP_FILE_DEDUP is set, we should be
able to support the fideduperange ioctl, albeit less efficiently than if
we handled some of the extent locking and comparison logic inside
bcachefs.  Extent comparison logic already exists inside of
`__generic_remap_file_range_prep`.

Signed-off-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: support FS_IOC_SETFSLABEL
Hongbo Li [Mon, 3 Jun 2024 13:26:20 +0000 (21:26 +0800)]
bcachefs: support FS_IOC_SETFSLABEL

Implement support for FS_IOC_SETFSLABEL ioctl to set filesystem
label.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: support get fs label
Hongbo Li [Mon, 3 Jun 2024 13:26:19 +0000 (21:26 +0800)]
bcachefs: support get fs label

Implement support for FS_IOC_GETFSLABEL ioctl to read filesystem
label.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: implement FS_IOC_GETVERSION to support lsattr
Hongbo Li [Mon, 3 Jun 2024 13:26:18 +0000 (21:26 +0800)]
bcachefs: implement FS_IOC_GETVERSION to support lsattr

In this patch we add the FS_IOC_GETVERSION ioctl for getting
i_generation from inode, after that, users can list file's
generation number by using "lsattr".

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Unlock trans when waiting for user input in fsck
Kent Overstreet [Thu, 30 May 2024 01:14:40 +0000 (21:14 -0400)]
bcachefs: Unlock trans when waiting for user input in fsck

We can't hold locks while waiting for user input, that's a deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Add tracepoints for bch2_sync_fs() and bch2_fsync()
Youling Tang [Fri, 31 May 2024 02:35:09 +0000 (10:35 +0800)]
bcachefs: Add tracepoints for bch2_sync_fs() and bch2_fsync()

Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations.

The output in trace is as follows:
  sync-29779   [000] .....   193.700935: bch2_sync_fs: dev 254,16 wait 1
  <...>-40027  [002] .....   342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: track writeback errors using the generic tracking infrastructure
Youling Tang [Fri, 31 May 2024 02:31:15 +0000 (10:31 +0800)]
bcachefs: track writeback errors using the generic tracking infrastructure

We already using mapping_set_error() in bch2_writepage_io_done(), so all
we need to do is to use file_check_and_advance_wb_err() when handling
fsync() requests in bch2_fsync().

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_dir_emit() - fix directory reads in the fuse driver
Ariel Miculas [Thu, 30 May 2024 21:13:58 +0000 (00:13 +0300)]
bcachefs: bch2_dir_emit() - fix directory reads in the fuse driver

Commit 0c0cbfdb84725e9933a24ecf47c61bdeeda06ba2 dropped the ctx->pos
update before the call to dir_emit. This breaks the userspace
implementation, causing the directory reads to be stuck in an infinite
loop. This doesn't happen in the kernel because the vfs handles the
updates to ctx->pos, but in the fuse implementation nobody updates
it.

Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: twf: delete dead struct fields
Kent Overstreet [Thu, 30 May 2024 19:54:08 +0000 (15:54 -0400)]
bcachefs: twf: delete dead struct fields

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_stdio_redirect_readline_timeout()
Kent Overstreet [Thu, 30 May 2024 00:37:39 +0000 (20:37 -0400)]
bcachefs: bch2_stdio_redirect_readline_timeout()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: twf: convert bch2_stdio_redirect_readline() to darray
Kent Overstreet [Thu, 30 May 2024 00:34:48 +0000 (20:34 -0400)]
bcachefs: twf: convert bch2_stdio_redirect_readline() to darray

We now read the line from the buffer atomically, which means we have to
allow the buffer to grow past STDIO_REDIRECT_BUFSIZE if we're waiting
for a full line - this behaviour is necessary for
stdio_redirect_readline_timeout() in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Plumb more logging through stdio redirect
Kent Overstreet [Thu, 30 May 2024 02:06:00 +0000 (22:06 -0400)]
bcachefs: Plumb more logging through stdio redirect

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: fsck_err() may now take a btree_trans
Kent Overstreet [Fri, 9 Feb 2024 02:10:32 +0000 (21:10 -0500)]
bcachefs: fsck_err() may now take a btree_trans

fsck_err() now optionally takes a btree_trans; if the current thread has
one, it is required that it be passed.

The next patch will use this to unlock when waiting for user input.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree_types bitmask cleanups
Kent Overstreet [Wed, 29 May 2024 23:37:29 +0000 (19:37 -0400)]
bcachefs: btree_types bitmask cleanups

Make things more consistent and ensure that we're using u64 bitfields -
key types and btree ids are already around 32 bits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Delete old assertion for online fsck
Kent Overstreet [Sun, 7 Apr 2024 03:58:01 +0000 (23:58 -0400)]
bcachefs: Delete old assertion for online fsck

the order in which btree_gc walks keys have changed, so we no longer
have the sort of issues with online fsck this assertion was warning
about.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Initialize gc buckets in alloc trigger
Kent Overstreet [Wed, 29 May 2024 22:54:39 +0000 (18:54 -0400)]
bcachefs: Initialize gc buckets in alloc trigger

Needed for online fsck; we need the trigger to initialize newly
allocated buckets and generation number changes while gc is running.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Walk leaf to root in btree_gc
Kent Overstreet [Wed, 29 May 2024 22:53:48 +0000 (18:53 -0400)]
bcachefs: Walk leaf to root in btree_gc

Next change will move gc_alloc_start initialization into the alloc
trigger, so we have to mark those first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Don't block journal when finishing check_allocations()
Kent Overstreet [Wed, 29 May 2024 21:54:46 +0000 (17:54 -0400)]
bcachefs: Don't block journal when finishing check_allocations()

Blocking the journal was needed to finish checking old style accounting,
but that code is gone and it's not needed in the alloc rewrite,
mark_lock is sufficient for synchronization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_fs_get_tree() cleanup
Kent Overstreet [Wed, 29 May 2024 17:55:49 +0000 (13:55 -0400)]
bcachefs: bch2_fs_get_tree() cleanup

- improve error paths
- call bch2_fs_start() separately, after applying late-parsed options

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill bch2_mount()
Kent Overstreet [Wed, 29 May 2024 17:38:06 +0000 (13:38 -0400)]
bcachefs: Kill bch2_mount()

Fold into bch2_fs_get_tree()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Eytzinger accumulation for accounting keys
Kent Overstreet [Wed, 27 Dec 2023 16:33:21 +0000 (11:33 -0500)]
bcachefs: Eytzinger accumulation for accounting keys

The btree write buffer takes as input keys from the journal, sorts them,
deduplicates them, and flushes them back to the btree in sorted order.

The disk space accounting rewrite is moving accounting to normal btree
keys, with update (in this case deltas) accumulated in the write buffer
and then flushed to the btree; but this is going to increase the number
of keys handled by the write buffer by perhaps as much as a factor of
3x-5x.

The overhead from copying around and sorting this many keys would cause
a significant performance regression, but: there is huge locality in
updates to accounting keys that we can take advantage of.

Instead of appending accounting keys to the list of keys to be sorted,
this patch adds an eytzinger search tree of recently seen accounting
keys. We look up the accounting key in the eytzinger search tree and
apply the delta directly, adding it if it doesn't exist, and
periodically prune the eytzinger tree of unused entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch_acct_rebalance_work
Kent Overstreet [Tue, 19 Mar 2024 04:04:52 +0000 (00:04 -0400)]
bcachefs: bch_acct_rebalance_work

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch_acct_btree
Kent Overstreet [Thu, 29 Feb 2024 03:37:21 +0000 (22:37 -0500)]
bcachefs: bch_acct_btree

Add counters for how much disk space we're using per btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch_acct_snapshot
Kent Overstreet [Mon, 12 Feb 2024 07:17:02 +0000 (02:17 -0500)]
bcachefs: bch_acct_snapshot

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_fs_usage_base_to_text()
Kent Overstreet [Fri, 23 Feb 2024 22:23:41 +0000 (17:23 -0500)]
bcachefs: bch2_fs_usage_base_to_text()

Helper to show raw accounting in sysfs, mainly for debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_fs_accounting_to_text()
Kent Overstreet [Sun, 25 Feb 2024 00:58:07 +0000 (19:58 -0500)]
bcachefs: bch2_fs_accounting_to_text()

Helper to show raw accounting in sysfs, mainly for debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Convert bch2_compression_stats_to_text() to new accounting
Kent Overstreet [Sun, 25 Feb 2024 02:09:51 +0000 (21:09 -0500)]
bcachefs: Convert bch2_compression_stats_to_text() to new accounting

We no longer have to walk the whole btree to calculate compression
stats.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch_acct_compression
Kent Overstreet [Sun, 7 Jan 2024 02:42:36 +0000 (21:42 -0500)]
bcachefs: bch_acct_compression

This adds per-compression-type accounting of compressed and uncompressed
size as well as number of extents - meaning we can now see compression
ratio (without walking the whole filesystem).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_verify_accounting_clean()
Kent Overstreet [Sun, 18 Feb 2024 05:13:22 +0000 (00:13 -0500)]
bcachefs: bch2_verify_accounting_clean()

Verify that the in-memory accounting verifies the on-disk accounting
after a clean shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Convert bch2_replicas_gc2() to new accounting
Kent Overstreet [Mon, 12 Feb 2024 20:21:10 +0000 (15:21 -0500)]
bcachefs: Convert bch2_replicas_gc2() to new accounting

bch2_replicas_gc2() is used for garbage collection superblock replicas
entries that are empty - this converts it to the new accounting scheme.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Convert gc to new accounting
Kent Overstreet [Mon, 12 Feb 2024 03:48:05 +0000 (22:48 -0500)]
bcachefs: Convert gc to new accounting

Rewrite fsck/gc for the new accounting scheme.

This adds a second set of in-memory accounting counters for gc to use;
like with other parts of gc we run all trigger in TRIGGER_GC mode, then
compare what we calculated to existing in-memory accounting at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill replicas_journal_res
Kent Overstreet [Tue, 2 Jan 2024 05:22:57 +0000 (00:22 -0500)]
bcachefs: Kill replicas_journal_res

More dead code deletion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill fs_usage_online
Kent Overstreet [Tue, 2 Jan 2024 05:15:16 +0000 (00:15 -0500)]
bcachefs: Kill fs_usage_online

More dead code deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill bch2_fs_usage_to_text()
Kent Overstreet [Sun, 25 Feb 2024 01:04:48 +0000 (20:04 -0500)]
bcachefs: Kill bch2_fs_usage_to_text()

Dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Delete journal-buf-sharded old style accounting
Kent Overstreet [Thu, 28 Dec 2023 03:09:25 +0000 (22:09 -0500)]
bcachefs: Delete journal-buf-sharded old style accounting

More deletion of dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill writing old accounting to journal
Kent Overstreet [Mon, 1 Jan 2024 03:30:15 +0000 (22:30 -0500)]
bcachefs: Kill writing old accounting to journal

More ripping out of the old disk space accounting.

Note that the new disk space accounting is incompatible with the old,
and writing out old style disk space accounting with the new code is
infeasible.

This means upgrading and downgrading past this version requires
regenerating accounting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: kill bch2_fs_usage_read()
Kent Overstreet [Tue, 2 Jan 2024 04:36:23 +0000 (23:36 -0500)]
bcachefs: kill bch2_fs_usage_read()

With bch2_ioctl_fs_usage(), this is now dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Convert bch2_ioctl_fs_usage() to new accounting
Kent Overstreet [Sun, 7 Jan 2024 01:29:25 +0000 (20:29 -0500)]
bcachefs: Convert bch2_ioctl_fs_usage() to new accounting

This converts bch2_ioctl_fs_usage() to read from the new disk
accounting, via bch2_fs_replicas_usage_read().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill bch2_fs_usage_initialize()
Kent Overstreet [Sat, 6 Jan 2024 02:23:07 +0000 (21:23 -0500)]
bcachefs: Kill bch2_fs_usage_initialize()

Deleting code for the old disk accounting scheme.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: dev_usage updated by new accounting
Kent Overstreet [Tue, 2 Jan 2024 00:42:37 +0000 (19:42 -0500)]
bcachefs: dev_usage updated by new accounting

Reading disk accounting now requires an eytzinger lookup (see:
bch2_accounting_mem_read()), but the per-device counters are used
frequently enough that we'd like to still be able to read them with just
a percpu sum, as in the old code.

This patch special cases the device counters; when we update in-memory
accounting we also update the old style percpu counters if it's a deice
counter update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Coalesce accounting keys before journal replay
Kent Overstreet [Tue, 4 Jun 2024 22:31:13 +0000 (18:31 -0400)]
bcachefs: Coalesce accounting keys before journal replay

This fixes a performance regression in journal replay; without
colaescing accounting keys we have multiple keys at the same position,
which means journal_keys_peek_upto() has to skip past many overwritten
keys - turning journal replay into an O(n^2) algorithm.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Disk space accounting rewrite
Kent Overstreet [Thu, 9 Nov 2023 19:22:46 +0000 (14:22 -0500)]
bcachefs: Disk space accounting rewrite

Main part of the disk accounting rewrite.

This is a wholesale rewrite of the existing disk space accounting, which
relies on percepu counters that are sharded by journal buffer, and
rolled up and added to each journal write.

With the new scheme, every set of counters is a distinct key in the
accounting btree; this fixes scaling limitations of the old scheme,
where counters took up space in each journal entry and required multiple
percpu counters.

Now, in memory accounting requires a single set of percpu counters - not
multiple for each in flight journal buffer - and in the future we'll
probably also have counters that don't use in memory percpu counters,
they're not strictly required.

An accounting update is now a normal btree update, using the btree write
buffer path. At transaction commit time, we apply accounting updates to
the in memory counters, which are percpu counters indexed in an
eytzinger tree by the accounting key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree write buffer knows how to accumulate bch_accounting keys
Kent Overstreet [Fri, 17 Nov 2023 05:23:07 +0000 (00:23 -0500)]
bcachefs: btree write buffer knows how to accumulate bch_accounting keys

Teach the btree write buffer how to accumulate accounting keys - instead
of having the newer key overwrite the older key as we do with other
updates, we need to add them together.

Also, add a flag so that write buffer flush knows when journal replay is
finished flushing accounting, and teach it to hold accounting keys until
that flag is set.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Accumulate accounting keys in journal replay
Kent Overstreet [Thu, 28 Dec 2023 01:59:01 +0000 (20:59 -0500)]
bcachefs: Accumulate accounting keys in journal replay

Until accounting keys hit the btree, they are deltas, not new versions
of the existing key; this means we have to teach journal replay to
accumulate them.

Additionally, the journal doesn't track precisely which entries have
been flushed to the btree; it only tracks a range of entries that may
possibly still need to be flushed.

That means we need to compare accounting keys against the version in the
btree and only flush updates that are newer.

There's another wrinkle with the write buffer: if the write buffer
starts flushing accounting keys before journal replay has finished
flushing accounting keys, journal replay will see the version number
from the new updates and updates from the journal will be lost.

To avoid this, journal replay has to flush accounting keys first, and
we'll be adding a flag so that write buffer flush knows to hold
accounting keys until then.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: KEY_TYPE_accounting
Kent Overstreet [Wed, 27 Dec 2023 23:31:46 +0000 (18:31 -0500)]
bcachefs: KEY_TYPE_accounting

New key type for the disk space accounting rewrite.

 - Holds a variable sized array of u64s (may be more than one for
   accounting e.g. compressed and uncompressed size, or buckets and
   sectors for a given data type)

 - Updates are deltas, not new versions of the key: this means updates
   to accounting can happen via the btree write buffer, which we'll be
   teaching to accumulate deltas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: use new mount API
Thomas Bertschinger [Tue, 28 May 2024 04:36:11 +0000 (22:36 -0600)]
bcachefs: use new mount API

This updates bcachefs to use the new mount API:

- Update the file_system_type to use the new init_fs_context()
  function.

- Define the new fs_context_operations functions.

- No longer register bch2_mount() and bch2_remount(); these are now
  called via the new fs_context functions.

- Define a new helper type, bch2_opts_parse that includes a struct
  bch_opts and additionally a printbuf used to save options that can't
  be parsed until after the FS is opened. This enables us to parse as
  many options as possible prior to opening the filesystem while saving
  those options that need the open FS for later parsing.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Add error code to defer option parsing
Thomas Bertschinger [Tue, 28 May 2024 04:36:10 +0000 (22:36 -0600)]
bcachefs: Add error code to defer option parsing

This introduces a new error code, option_needs_open_fs, which is used to
indicate that an attempt was made to parse a mount option prior to
opening a filesystem, when that mount option requires an open filesystem
in order to be validated.

Returning this error results in bch2_parse_one_mount_opt() saving that
option for later parsing, after the filesystem is opened.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: add printbuf arg to bch2_parse_mount_opts()
Thomas Bertschinger [Tue, 28 May 2024 04:36:09 +0000 (22:36 -0600)]
bcachefs: add printbuf arg to bch2_parse_mount_opts()

Mount options that take the name of a device that may be part of a
filesystem, for example "metadata_target", cannot be validated until
after the filesystem has been opened. However, an attempt to parse those
options may be made prior to the filesystem being opened.

This change adds a printbuf parameter to bch2_parse_mount_opts() which
will be used to save those mount options, when they are supplied prior
to the FS being opened, so that they can be parsed later.

This functionality is not currently needed, but will be used after
bcachefs starts using the new mount API to parse mount options. This is
because using the new mount API, we will process mount options prior to
opening the FS, but the new API doesn't provide a convenient way to
"replay" mount option parsing. So we save these options ourselves to
accomplish this.

This change also splits out the code to parse a single option into
bch2_parse_one_mount_opt(), which will be useful when using the new
mount API which deals with a single mount option at a time.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: metadata version bucket_stripe_sectors
Kent Overstreet [Fri, 26 Apr 2024 00:45:00 +0000 (20:45 -0400)]
bcachefs: metadata version bucket_stripe_sectors

New on disk format version for bch_alloc->stripe_sectors and
BCH_DATA_unstriped - accounting for unstriped data in stripe buckets.

Upgrade/downgrade requires regenerating alloc info - but only if erasure
coding is in use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: BCH_DATA_unstriped
Kent Overstreet [Thu, 23 Nov 2023 21:34:03 +0000 (16:34 -0500)]
bcachefs: BCH_DATA_unstriped

Add a new pseudo data type, to track buckets that are members of a
stripe, but have unstriped data in them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch_alloc->stripe_sectors
Kent Overstreet [Thu, 23 Nov 2023 22:21:23 +0000 (17:21 -0500)]
bcachefs: bch_alloc->stripe_sectors

Add a separate counter to bch_alloc_v4 for amount of striped data; this
lets us separately track striped and unstriped data in a bucket, which
lets us see when erasure coding has failed to update extents with stripe
pointers, and also find buckets to continue updating if we crash mid way
through creating a new stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: check_key_has_inode()
Kent Overstreet [Sun, 26 May 2024 22:11:37 +0000 (18:11 -0400)]
bcachefs: check_key_has_inode()

Consolidate duplicated checks for extents/dirents/xattrs - these keys
should all have a corresponding inode of the correct type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: allow passing full device path for target options
Thomas Bertschinger [Sat, 25 May 2024 19:36:19 +0000 (13:36 -0600)]
bcachefs: allow passing full device path for target options

The output of mount options such as "metadata_target" in `/proc/mounts`
uses the full path to the device.

mount(8) from util-linux uses the output from `/proc/mounts` to pass
existing mount options when performing a remount, so bcachefs should
accept as input the same form that it prints as output.

Without this change:

$ mount -t bcachefs -o metadata_target=vdb /dev/vdb /mnt
$ strace mount -o remount /mnt
...
fsconfig(4, FSCONFIG_SET_STRING, "metadata_target", "/dev/vdb", 0) = -1 EINVAL (Invalid argument)
...

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: bch2_printbuf_strip_trailing_newline()
Kent Overstreet [Mon, 27 May 2024 02:20:34 +0000 (22:20 -0400)]
bcachefs: bch2_printbuf_strip_trailing_newline()

Add a new helper to fix inode_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: don't expose "read_only" as a mount option
Thomas Bertschinger [Sun, 26 May 2024 19:08:20 +0000 (13:08 -0600)]
bcachefs: don't expose "read_only" as a mount option

When "read_only" is exposed as a mount option, it is redundant with the
standard option "ro" and gives users multiple ways to specify that a
bcachefs filesystem should be mounted read-only. This presents the risk
of having inconsistent options specified.

This can be seen when remounting a read-only filesystem in read-write
mode, using mount(8) from util-linux. Because mount(8) parses the
existing mount options from `/proc/mounts` and applies them when
remounting, it can end up applying both "read_only" and "rw":

$ mount img -o ro /mnt
$ strace mount -o remount,rw /mnt
...
fsconfig(4, FSCONFIG_SET_FLAG, "read_only", NULL, 0) = 0
fsconfig(4, FSCONFIG_SET_FLAG, "rw", NULL, 0) = 0
...

Making "read_only" no longer a mount option means this edge case cannot
occur.

Fixes: 62719cf33c3a ("bcachefs: Fix nochanges/read_only interaction")
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>