]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
15 months agoxfs_db: support the realtime refcountbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:30 +0000 (09:40 -0800)]
xfs_db: support the realtime refcountbt

Wire up various parts of xfs_db for realtime refcount support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: display the realtime refcount btree contents
Darrick J. Wong [Tue, 9 Jan 2024 17:40:29 +0000 (09:40 -0800)]
xfs_db: display the realtime refcount btree contents

Implement all the code we need to dump rtrefcountbt contents, starting
from the root inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibfrog: enable scrubbing of the realtime refcount data
Darrick J. Wong [Tue, 9 Jan 2024 17:40:29 +0000 (09:40 -0800)]
libfrog: enable scrubbing of the realtime refcount data

Add a new entry so that we can scrub the rtrefcountbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:29 +0000 (09:40 -0800)]
xfs: scrub the metadir path of rt refcount btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:29 +0000 (09:40 -0800)]
xfs: scrub the realtime refcount btree

Add code to scrub realtime refcount btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Tue, 9 Jan 2024 17:44:38 +0000 (09:44 -0800)]
xfs: report realtime refcount btree corruption errors to the health system

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Tue, 9 Jan 2024 17:40:28 +0000 (09:40 -0800)]
xfs: enable extent size hints for CoW operations

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Tue, 9 Jan 2024 17:40:28 +0000 (09:40 -0800)]
xfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:28 +0000 (09:40 -0800)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: refcover CoW leftovers in the realtime volume
Darrick J. Wong [Tue, 9 Jan 2024 17:40:28 +0000 (09:40 -0800)]
xfs: refcover CoW leftovers in the realtime volume

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Tue, 9 Jan 2024 17:40:28 +0000 (09:40 -0800)]
xfs: allow inodes to have the realtime and reflink flags

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Tue, 9 Jan 2024 17:40:27 +0000 (09:40 -0800)]
xfs: compute rtrmap btree max levels when reflink enabled

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Tue, 9 Jan 2024 17:40:27 +0000 (09:40 -0800)]
xfs: update rmap to allow cow staging extents in the rt rmap

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Tue, 9 Jan 2024 17:40:27 +0000 (09:40 -0800)]
xfs: create routine to allocate and initialize a realtime refcount btree inode

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Tue, 9 Jan 2024 17:44:28 +0000 (09:44 -0800)]
xfs: wire up realtime refcount btree cursors

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up a new inode fork type for the realtime refcount
Darrick J. Wong [Tue, 9 Jan 2024 17:40:27 +0000 (09:40 -0800)]
xfs: wire up a new inode fork type for the realtime refcount

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:26 +0000 (09:40 -0800)]
xfs: add metadata reservations for realtime refcount btree

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Tue, 9 Jan 2024 17:44:25 +0000 (09:44 -0800)]
xfs: add realtime refcount btree inode to metadata directory

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Tue, 9 Jan 2024 17:44:23 +0000 (09:44 -0800)]
xfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Tue, 9 Jan 2024 17:44:22 +0000 (09:44 -0800)]
xfs: prepare refcount functions to deal with rtrefcountbt

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Tue, 9 Jan 2024 17:40:25 +0000 (09:40 -0800)]
xfs: add realtime refcount btree operations

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Tue, 9 Jan 2024 17:40:25 +0000 (09:40 -0800)]
xfs: realtime refcount btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: define the on-disk realtime refcount btree format
Darrick J. Wong [Tue, 9 Jan 2024 17:44:20 +0000 (09:44 -0800)]
xfs: define the on-disk realtime refcount btree format

Start filling out the rtrefcount btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate refcount btree blocks. This prepares the way for connecting
the btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Tue, 9 Jan 2024 17:40:25 +0000 (09:40 -0800)]
xfs: namespace the maximum length/refcount symbols

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: introduce realtime refcount btree definitions
Darrick J. Wong [Tue, 9 Jan 2024 17:44:18 +0000 (09:44 -0800)]
xfs: introduce realtime refcount btree definitions

Add new realtime refcount btree definitions. The realtime refcount btree
will be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move xfs_refcount_update_defer_add to xfs_refcount_item.c
Darrick J. Wong [Tue, 9 Jan 2024 17:40:24 +0000 (09:40 -0800)]
xfs: move xfs_refcount_update_defer_add to xfs_refcount_item.c

Move the code that adds the incore xfs_refcount_update_item deferred
work data to a transaction live with the CUI log item code.  This means
that the refcount code no longer has to know about the inner workings of
the CUI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify usage of the rcur local variable in xfs_refcount_finish_one
Darrick J. Wong [Tue, 9 Jan 2024 17:44:16 +0000 (09:44 -0800)]
xfs: simplify usage of the rcur local variable in xfs_refcount_finish_one

Only update rcur when we know the final *pcur value.

Inspired-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: don't bother calling xfs_refcount_finish_one_cleanup in xfs_refcount_finish_one
Darrick J. Wong [Tue, 9 Jan 2024 17:44:15 +0000 (09:44 -0800)]
xfs: don't bother calling xfs_refcount_finish_one_cleanup in xfs_refcount_finish_one

In xfs_refcount_finish_one we know the cursor is non-zero when calling
xfs_refcount_finish_one_cleanup and we pass a 0 error variable.  This
means xfs_refcount_finish_one_cleanup is just doing a
xfs_btree_del_cursor.

Open code that and move xfs_refcount_finish_one_cleanup to
fs/xfs/xfs_refcount_item.c.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: reuse xfs_refcount_update_cancel_item
Darrick J. Wong [Tue, 9 Jan 2024 17:40:24 +0000 (09:40 -0800)]
xfs: reuse xfs_refcount_update_cancel_item

Reuse xfs_refcount_update_cancel_item to put the AG/RTG and free the
item in a few places that currently open code the logic.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a ci_entry helper
Darrick J. Wong [Tue, 9 Jan 2024 17:40:24 +0000 (09:40 -0800)]
xfs: add a ci_entry helper

Add a helper to translate from the item list head to the
refcount_intent_item structure and use it so shorten assignments and
avoid the need for extra local variables.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: clean up refcount log intent item tracepoint callsites
Darrick J. Wong [Tue, 9 Jan 2024 17:40:23 +0000 (09:40 -0800)]
xfs: clean up refcount log intent item tracepoint callsites

Pass the incore refcount intent structure to the tracepoints instead of
open-coding the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare refcount btree tracepoints for widening
Darrick J. Wong [Tue, 9 Jan 2024 17:44:11 +0000 (09:44 -0800)]
xfs: prepare refcount btree tracepoints for widening

Prepare the rest of refcount btree tracepoints for use with realtime
reflink by making them take the btree cursor object as a parameter.
This will save us a lot of trouble later on.

Remove the xfs_refcount_recover_extent tracepoint since it's already
covered by other refcount tracepoints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create specialized classes for refcount tracepoints
Darrick J. Wong [Tue, 9 Jan 2024 17:40:23 +0000 (09:40 -0800)]
xfs: create specialized classes for refcount tracepoints

The only user of the "ag" tracepoint event classes is the refcount
btree, so rename them to make that obvious and make them take the btree
cursor to simplify the arguments.  This will save us a lot of trouble
later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: give refcount btree cursor error tracepoints their own class
Darrick J. Wong [Tue, 9 Jan 2024 17:40:23 +0000 (09:40 -0800)]
xfs: give refcount btree cursor error tracepoints their own class

Convert all the refcount tracepoints to use the btree error tracepoint
class.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: use file write helper to populate files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:23 +0000 (09:40 -0800)]
mkfs: use file write helper to populate files

Use the file write helper to write files into the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: use libxfs_alloc_file_space to reallocate rt metadata
Darrick J. Wong [Tue, 9 Jan 2024 17:40:22 +0000 (09:40 -0800)]
xfs_repair: use libxfs_alloc_file_space to reallocate rt metadata

Now that libxfs_alloc_file_space can allocate and zero blocks, use it to
repair the realtime metadata instead of open-coding all this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: use libxfs_alloc_file_space for rtinit
Darrick J. Wong [Tue, 9 Jan 2024 17:40:22 +0000 (09:40 -0800)]
mkfs: use libxfs_alloc_file_space for rtinit

Since xfs_bmapi_write can now zero newly allocated blocks, use it to
initialize the realtime inodes instead of open coding this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibxfs: resync libxfs_alloc_file_space interface with the kernel
Darrick J. Wong [Tue, 9 Jan 2024 17:40:22 +0000 (09:40 -0800)]
libxfs: resync libxfs_alloc_file_space interface with the kernel

Make the userspace xfs_alloc_file_space behave (more or less) like the
kernel version, at least as far as the interface goes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: create the realtime rmap inode
Darrick J. Wong [Tue, 9 Jan 2024 17:40:22 +0000 (09:40 -0800)]
mkfs: create the realtime rmap inode

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_logprint: report realtime RUIs
Darrick J. Wong [Tue, 9 Jan 2024 17:40:22 +0000 (09:40 -0800)]
xfs_logprint: report realtime RUIs

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: allow sysadmins to add realtime reverse mapping indexes
Darrick J. Wong [Tue, 9 Jan 2024 17:40:21 +0000 (09:40 -0800)]
xfs_repair: allow sysadmins to add realtime reverse mapping indexes

Allow the sysadmin to use xfs_repair to upgrade an existing filesystem
to support the reverse mapping btree index for realtime volumes.  This
is needed for online fsck.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: reserve per-AG space while rebuilding rt metadata
Darrick J. Wong [Tue, 9 Jan 2024 17:40:21 +0000 (09:40 -0800)]
xfs_repair: reserve per-AG space while rebuilding rt metadata

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: rebuild the bmap btree for realtime files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:21 +0000 (09:40 -0800)]
xfs_repair: rebuild the bmap btree for realtime files

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: check for global free space concerns with default btree slack levels
Darrick J. Wong [Tue, 9 Jan 2024 17:40:21 +0000 (09:40 -0800)]
xfs_repair: check for global free space concerns with default btree slack levels

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: rebuild the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:20 +0000 (09:40 -0800)]
xfs_repair: rebuild the realtime rmap btree

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: always check realtime file mappings against incore info
Darrick J. Wong [Tue, 9 Jan 2024 17:40:20 +0000 (09:40 -0800)]
xfs_repair: always check realtime file mappings against incore info

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: check existing realtime rmapbt entries against observed rmaps
Darrick J. Wong [Tue, 9 Jan 2024 17:40:20 +0000 (09:40 -0800)]
xfs_repair: check existing realtime rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: find and mark the rtrmapbt inodes
Darrick J. Wong [Tue, 9 Jan 2024 17:40:20 +0000 (09:40 -0800)]
xfs_repair: find and mark the rtrmapbt inodes

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: refactor realtime inode check
Darrick J. Wong [Tue, 9 Jan 2024 17:40:20 +0000 (09:40 -0800)]
xfs_repair: refactor realtime inode check

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding
Darrick J. Wong [Tue, 9 Jan 2024 17:40:19 +0000 (09:40 -0800)]
xfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding

Collect reverse-mapping data for realtime files so that we can later
check and rebuild the reference count tree and the reverse mapping
tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: create a new set of incore rmap information for rt groups
Darrick J. Wong [Tue, 9 Jan 2024 17:40:19 +0000 (09:40 -0800)]
xfs_repair: create a new set of incore rmap information for rt groups

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: use realtime rmap btree data to check block types
Darrick J. Wong [Tue, 9 Jan 2024 17:40:19 +0000 (09:40 -0800)]
xfs_repair: use realtime rmap btree data to check block types

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: flag suspect long-format btree blocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:19 +0000 (09:40 -0800)]
xfs_repair: flag suspect long-format btree blocks

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibxfs: dirty buffers should be marked uptodate too
Darrick J. Wong [Tue, 9 Jan 2024 17:40:19 +0000 (09:40 -0800)]
libxfs: dirty buffers should be marked uptodate too

I started fuzz-testing the realtime rmap feature with a very large
number of realtime allocation groups.  There were so many rt groups that
repair had to rebuild /realtime in the metadata directory tree, and that
directory was big enough to spur the creation of a block format
directory.

Unfortunately, repair then walks both directory trees to look for
unconnceted files.  This part of phase 6 emits CRC errors on the newly
created buffers for the /realtime directory, declares the directory to
be garbage, and moves all the rt rmap inodes to /lost+found, resulting
in a corrupt fs.

Poking around in gdb, I noticed that the buffer contents were indeed
zero, and that UPTODATE was not set.  This was very strange, until I
added a watch on bp->b_flags to watch for accesses.  It turns out that
xfs_repair's prefetch code will _get a buffer and zero the contents if
UPTODATE is not set.

The directory tree code in libxfs will also _get a buffer, initialize
it, and log it to the coordinating transaction, which in this case is
the transactions used to reconnect the rmap btree inodes to /realtime.
At no point does any of that code ever set UPTODATE on the buffer, which
is why prefetch zaps the contents.

Hence change both buffer dirtying functions to set UPTODATE, since a
dirty buffer is by definition at least as recent as whatever's on disk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_spaceman: report health status of the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_spaceman: report health status of the realtime rmap btree

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: retest metadata across scrub groups after a repair
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_scrub: retest metadata across scrub groups after a repair

Certain types of metadata have dependencies that cross scrub groups.
For example, after a repair the part of realtime bitmap corresponding to
a realtime group, we potentially need to rebuild the realtime summary to
reflect the new bitmap contents.  The rtsummary is a separate scrub group
(metafiles) from the rgbitmap (rtgroup), which means that the rtsummary
repairs must be tracked by a separate scrub_item.

Create the necessary dependency table and code to make these kinds of
cross-group validations possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: check rtrmapbt metadata directory connections
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_scrub: check rtrmapbt metadata directory connections

Run the rt rmap btree metapath scrubber during phase 5 to ensure that
it's still connected to the metadir tree after we've pruned any bad
links.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibfrog: enable scrubbng of the realtime rmap
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
libfrog: enable scrubbng of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: support scrubbing rtgroup metadata paths
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_io: support scrubbing rtgroup metadata paths

Support scrubbing the metadata directory path of an rtgroup metadata
file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support rudimentary checks of the rtrmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: support rudimentary checks of the rtrmap btree

Perform some fairly superficial checks of the rtrmap btree.  We'll
do more sophisticated checks in xfs_repair, but provide enough of
a spot-check here that we can do simple things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Tue, 9 Jan 2024 17:40:16 +0000 (09:40 -0800)]
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the root inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: hook live realtime rmap operations during a repair operation
Darrick J. Wong [Tue, 9 Jan 2024 17:44:08 +0000 (09:44 -0800)]
xfs: hook live realtime rmap operations during a repair operation

Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Tue, 9 Jan 2024 17:44:07 +0000 (09:44 -0800)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:44:07 +0000 (09:44 -0800)]
xfs: online repair of the realtime rmap btree

Repair the realtime rmap btree while mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:16 +0000 (09:40 -0800)]
xfs: scrub the metadir path of rt rmap btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: scrub the realtime rmapbt

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow queued realtime intents to drain before scrubbing
Darrick J. Wong [Tue, 9 Jan 2024 17:43:58 +0000 (09:43 -0800)]
xfs: allow queued realtime intents to drain before scrubbing

When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself.  Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.

When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.

Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.

[Contains a few cleanups from hch]

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Tue, 9 Jan 2024 17:43:56 +0000 (09:43 -0800)]
xfs: report realtime rmap btree corruption errors to the health system

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

[Contains a minor bugfix from hch]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: use realtime EFI to free extents when realtime rmap is enabled
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: use realtime EFI to free extents when realtime rmap is enabled

When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks.  xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which means that when rtrmap is enabled, we have to
use realtime EFIs to maintain the expected order.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow inodes with zero extents but nonzero nblocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: allow inodes with zero extents but nonzero nblocks

Metadata inodes that store btrees will have zero extents and a nonzero
nblocks.  Adjust the inode verifier so that this combination is not
flagged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up a new inode fork type for the realtime rmap
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: wire up a new inode fork type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: add metadata reservations for realtime rmap btrees

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Tue, 9 Jan 2024 17:43:48 +0000 (09:43 -0800)]
xfs: add realtime reverse map inode to metadata directory

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a lockdep class key per rtgroup
Darrick J. Wong [Fri, 19 Jan 2024 00:41:48 +0000 (16:41 -0800)]
xfs: add a lockdep class key per rtgroup

Add a dynamic lockdep class key to each rtgroup.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Tue, 9 Jan 2024 17:40:13 +0000 (09:40 -0800)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:43:45 +0000 (09:43 -0800)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Tue, 9 Jan 2024 17:43:44 +0000 (09:43 -0800)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Tue, 9 Jan 2024 17:43:43 +0000 (09:43 -0800)]
xfs: realtime rmap btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: define the on-disk realtime rmap btree format
Darrick J. Wong [Tue, 9 Jan 2024 17:43:42 +0000 (09:43 -0800)]
xfs: define the on-disk realtime rmap btree format

Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: introduce realtime rmap btree definitions
Darrick J. Wong [Tue, 9 Jan 2024 17:43:42 +0000 (09:43 -0800)]
xfs: introduce realtime rmap btree definitions

Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c
Darrick J. Wong [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c

Move the code that adds the incore xfs_rmap_update_item deferred work
data to a transaction live with the RUI log item code.  This means that
the rmap code no longer has to know about the inner workings of the RUI
log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify usage of the rcur local variable in xfs_rmap_finish_one
Christoph Hellwig [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: simplify usage of the rcur local variable in xfs_rmap_finish_one

Only update rcur when we know the final *pcur value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one
Christoph Hellwig [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one

In xfs_rmap_finish_one we known the cursor is non-zero when calling
xfs_rmap_finish_one_cleanup and we pass a 0 error variable.  This means
xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor.

Open code that and move xfs_rmap_finish_one_cleanup to
fs/xfs/xfs_rmap_item.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor porting changes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: reuse xfs_rmap_update_cancel_item
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: reuse xfs_rmap_update_cancel_item

Reuse xfs_rmap_update_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a ri_entry helper
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: add a ri_entry helper

Add a helper to translate from the item list head to the
rmap_intent_item structure and use it so shorten assignments and avoid
the need for extra local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: clean up rmap log intent item tracepoint callsites
Darrick J. Wong [Tue, 9 Jan 2024 17:43:35 +0000 (09:43 -0800)]
xfs: clean up rmap log intent item tracepoint callsites

Pass the incore rmap structure to the tracepoints instead of open-coding
the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare rmap btree tracepoints for widening
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: prepare rmap btree tracepoints for widening

Prepare the rmap btree tracepoints for use with realtime rmap btrees by
making them take the btree cursor object as a parameter.  This will save
us a lot of trouble later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: give rmap btree cursor error tracepoints their own class
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs: give rmap btree cursor error tracepoints their own class

Create a new tracepoint class for btree-related errors, then convert all
the rmap tracepoints to use it.  Also fix the one tracepoint that was
abusing the old class by making it a separate tracepoint.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: attach rtgroup objects to btree cursors
Darrick J. Wong [Tue, 9 Jan 2024 17:43:32 +0000 (09:43 -0800)]
xfs: attach rtgroup objects to btree cursors

Make it so that we can attach realtime group objects to btree cursors.
This will be crucial for enabling rmap btrees in realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_logprint: report realtime EFIs
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs_logprint: report realtime EFIs

Decode the EFI format just enough to report if an EFI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support error injection when freeing rt extents
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs: support error injection when freeing rt extents

A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent.  Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support logging EFIs for realtime extents
Darrick J. Wong [Tue, 9 Jan 2024 17:43:31 +0000 (09:43 -0800)]
xfs: support logging EFIs for realtime extents

Teach the EFI mechanism how to free realtime extents.  We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.

Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents.  This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases.  Hopefully this
will make it easier to debug filesystem problems.

Previous versions of this patch accomplished this by setting the high
bit in each rt EFI extent.  This was found to be less transparent by
reviewers.

[Contains a bug fix and cleanups from hch]

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move xfs_extent_free_defer_add to xfs_extfree_item.c
Darrick J. Wong [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: move xfs_extent_free_defer_add to xfs_extfree_item.c

Move the code that adds the incore xfs_extent_free_item deferred work
data to a transaction live with the EFI log item code.  This means that
the allocator code no longer has to know about the inner workings of the
EFI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: remove xfs_defer_agfl_block
Christoph Hellwig [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: remove xfs_defer_agfl_block

xfs_free_extent_later can handle the extra AGFL special casing with
very little extra logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>