Kent Overstreet [Mon, 15 Jan 2024 22:59:51 +0000 (17:59 -0500)]
bcachefs: Better journal tracepoints
Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jan 2024 22:56:22 +0000 (17:56 -0500)]
bcachefs: Avoid flushing the journal in the discard path
When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.
But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jan 2024 19:15:26 +0000 (14:15 -0500)]
bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount
Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code
that uses it to be woken up for other reasons, and fixes a bug where
rebalance wouldn't wake up when a scan was requested.
This raises the possibility of spurious wakeups, but callers should
always be able to handle that reasonably well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 11 Jan 2024 04:47:04 +0000 (23:47 -0500)]
bcachefs: Reduce would_deadlock restarts
We don't have to take locks in any particular ordering - we'll make
forward progress just fine - but if we try to stick to an ordering, it
can help to avoid excessive would_deadlock transaction restarts.
This tweaks the reflink path to take extents btree locks in the right
order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 11 Jan 2024 04:08:30 +0000 (23:08 -0500)]
bcachefs: Don't log errors if BCH_WRITE_ALLOC_NOWAIT
Previously, we added logging in the write path to ensure that any
unexpected errors getting reported to userspace have a log message; but
BCH_WRITE_ALLOC_NOWAIT is a special case, it's used for promotes where
errors are expected and not reported out to userspace - so we need to
silence those.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 3 Jan 2024 22:15:55 +0000 (17:15 -0500)]
bcachefs: increase max_active on io_complete_wq
this definitely should _not_ be 1, and we don't actually want any
concurrency limiting at all here - btree node read completions are
getting blocked behind btree node write submissions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 28 Dec 2023 04:19:09 +0000 (23:19 -0500)]
bcachefs: mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 28 Dec 2023 04:19:09 +0000 (23:19 -0500)]
bcachefs: trans_mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 28 Jun 2023 01:02:27 +0000 (21:02 -0400)]
bcachefs: bch_member->seq
Add new fields for split brain detection:
- bch_member->seq, which tracks the sequence number of the last superblock
write that happened to each member device
- bch_sb->write_time, which tracks the time of the last superblock write,
to allow detection of when two members have diverged but had the same
number of superblock writes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 23 Dec 2023 22:50:29 +0000 (17:50 -0500)]
bcachefs: Fix nochanges/read_only interaction
nochanges means "we cannot issue writes at all"; it's possible to go
into a pseudo read-write mode where we pin dirty metadata in memory,
which is used for fsck in dry run mode and doing journal replay on a
read only mount, but we do not want to allow an actual read-write mount
in nochanges mode.
But we do always want to allow early read-write, during recovery - this
patch clarifies that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Dec 2023 19:13:48 +0000 (14:13 -0500)]
bcachefs: Fix reattach_inode() for snapshots
reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume
to look up lost+found, but if we're in an interior node snapshot that
didn't make any sense.
Instead, this adds a dirent path for creating in a specific snapshot,
skipping the subvolume; and we also make sure to create lost+found in
the root snapshot, to avoid conflicts with lost+found being created in
overlapping snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 11 Dec 2023 00:26:30 +0000 (19:26 -0500)]
bcachefs: growable btree_paths
XXX: we're allocating memory with btree locks held - bad
We need to plumb through an error path so we can do
allocate_dropping_locks() - but we're merging this now because it fixes
a transaction path overflow caused by indirect extent fragmentation, and
the resize path is rare.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since the btree_paths array is now about to become growable, we have to
be careful not to refer to paths by pointer across contexts where they
may be reallocated.
This fixes the remaining btree_interior_update() paths - split and
merge.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Some tweaks to greatly reduce locking overhead for the list of btree
transactions, so that it can always be enabled: leave btree_trans
objects on the list when they're on the percpu single item freelist,
and only check for duplicates in the same process when
CONFIG_BCACHEFS_DEBUG is enabled
- don't zero out the full btree_trans() unless we allocated it from
the mempool
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 13 Dec 2023 01:08:29 +0000 (20:08 -0500)]
bcachefs: rcu protect trans->paths
Upcoming patches are going to be changing trans->paths to a
reallocatable buffer. We need to guard against use after free when it's
used by other threads; this introduces RCU protection to those paths and
changes them to check for trans->paths == NULL
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>