]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
10 months agomkfs: enable reflink on the realtime device xfs-realtime-refcount-rebase
Darrick J. Wong [Wed, 3 Jul 2024 21:22:36 +0000 (14:22 -0700)]
mkfs: enable reflink on the realtime device

Allow the creation of filesystems with both reflink and realtime volumes
enabled.  For now we don't support a realtime extent size > 1.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agomkfs: validate CoW extent size hint when rtinherit is set
Darrick J. Wong [Wed, 3 Jul 2024 21:22:36 +0000 (14:22 -0700)]
mkfs: validate CoW extent size hint when rtinherit is set

Extent size hints exist to nudge the behavior of the file data block
allocator towards trying to make aligned allocations.  Therefore, it
doesn't make sense to allow a hint that isn't a multiple of the
fundamental allocation unit for a given file.

This means that if the sysadmin is formatting with rtinherit set on the
root dir, validate_cowextsize_hint needs to check the hint value on a
simulated realtime file to make sure that it's correct.  This hasn't
been necessary in the past since one cannot have a CoW hint without a
reflink filesystem, and we previously didn't allow rt reflink
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_logprint: report realtime CUIs
Darrick J. Wong [Wed, 3 Jul 2024 21:22:36 +0000 (14:22 -0700)]
xfs_logprint: report realtime CUIs

Decode the CUI format just enough to report if an CUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: allow sysadmins to add realtime reflink
Darrick J. Wong [Wed, 3 Jul 2024 21:22:35 +0000 (14:22 -0700)]
xfs_repair: allow sysadmins to add realtime reflink

Allow the sysadmin to use xfs_repair to upgrade an existing filesystem
to support the realtime reference count btree, and therefore reflink on
realtime volumes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: validate CoW extent size hint on rtinherit directories
Darrick J. Wong [Wed, 3 Jul 2024 21:22:35 +0000 (14:22 -0700)]
xfs_repair: validate CoW extent size hint on rtinherit directories

XFS allows a sysadmin to change the rt extent size when adding a rt
section to a filesystem after formatting.  If there are any directories
with both a cowextsize hint and rtinherit set, the hint could become
misaligned with the new rextsize.  Offer to fix the problem if we're in
modify mode and the verifier didn't trip.  If we're in dry run mode,
we let the kernel fix it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: allow realtime files to have the reflink flag set
Darrick J. Wong [Wed, 3 Jul 2024 21:22:35 +0000 (14:22 -0700)]
xfs_repair: allow realtime files to have the reflink flag set

Now that we allow reflink on the realtime volume, allow that combination
of inode flags if the feature's enabled.  Note that we now allow inodes
to have rtinherit even if there's no realtime volume, since the kernel
has never restricted that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: rebuild the realtime refcount btree
Darrick J. Wong [Wed, 3 Jul 2024 21:22:35 +0000 (14:22 -0700)]
xfs_repair: rebuild the realtime refcount btree

Use the collected reference count information to rebuild the btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: reject unwritten shared extents
Darrick J. Wong [Wed, 3 Jul 2024 21:22:34 +0000 (14:22 -0700)]
xfs_repair: reject unwritten shared extents

We don't allow sharing of unwritten extents, which means that repair
should reject an unwritten extent if someone else has already claimed
the space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: check existing realtime refcountbt entries against observed refcounts
Darrick J. Wong [Wed, 3 Jul 2024 21:22:34 +0000 (14:22 -0700)]
xfs_repair: check existing realtime refcountbt entries against observed refcounts

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime refcount
btree (particularly if we're in -n mode) to detect rtrefcountbt
problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: compute refcount data for the realtime groups
Darrick J. Wong [Wed, 3 Jul 2024 21:22:34 +0000 (14:22 -0700)]
xfs_repair: compute refcount data for the realtime groups

At the end of phase 4, compute reference count information for realtime
groups from the realtime rmap information collected, just like we do for
AGs in the data section.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: find and mark the rtrefcountbt inode
Darrick J. Wong [Wed, 3 Jul 2024 21:22:34 +0000 (14:22 -0700)]
xfs_repair: find and mark the rtrefcountbt inode

Make sure that we find the realtime refcountbt inode and mark it
appropriately, just in case we find a rogue inode claiming to
be an rtrefcount, or just plain garbage in the superblock field.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: use realtime refcount btree data to check block types
Darrick J. Wong [Wed, 3 Jul 2024 21:22:34 +0000 (14:22 -0700)]
xfs_repair: use realtime refcount btree data to check block types

Use the realtime refcount btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: allow CoW staging extents in the realtime rmap records
Darrick J. Wong [Wed, 3 Jul 2024 21:22:33 +0000 (14:22 -0700)]
xfs_repair: allow CoW staging extents in the realtime rmap records

Don't flag the rt rmap btree as having errors if there are CoW staging
extent records in it and the filesystem supports.  As far as reporting
leftover staging extents, we'll report them when we scan the rt refcount
btree, in a future patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_spaceman: report health of the realtime refcount btree
Darrick J. Wong [Wed, 3 Jul 2024 21:22:33 +0000 (14:22 -0700)]
xfs_spaceman: report health of the realtime refcount btree

Report the health of the realtime reference count btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: copy the realtime refcount btree
Darrick J. Wong [Wed, 3 Jul 2024 21:22:33 +0000 (14:22 -0700)]
xfs_db: copy the realtime refcount btree

Copy the realtime refcountbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: support the realtime refcountbt
Darrick J. Wong [Wed, 3 Jul 2024 21:22:32 +0000 (14:22 -0700)]
xfs_db: support the realtime refcountbt

Wire up various parts of xfs_db for realtime refcount support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: display the realtime refcount btree contents
Darrick J. Wong [Wed, 3 Jul 2024 21:22:32 +0000 (14:22 -0700)]
xfs_db: display the realtime refcount btree contents

Implement all the code we need to dump rtrefcountbt contents, starting
from the root inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agolibfrog: enable scrubbing of the realtime refcount data
Darrick J. Wong [Wed, 3 Jul 2024 21:22:32 +0000 (14:22 -0700)]
libfrog: enable scrubbing of the realtime refcount data

Add a new entry so that we can scrub the rtrefcountbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agofixup
Darrick J. Wong [Mon, 12 Aug 2024 08:21:19 +0000 (10:21 +0200)]
fixup

10 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Mon, 12 Aug 2024 08:17:57 +0000 (10:17 +0200)]
xfs: scrub the metadir path of rt refcount btree files

Source kernel commit: 08745bdf226a413246fc4edb2947985804dbcb86

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Sun, 11 Aug 2024 06:58:22 +0000 (08:58 +0200)]
xfs: scrub the realtime refcount btree

Source kernel commit: 844d7f8755a67b01391da92b99a5342c8b2b83f4

Add code to scrub realtime refcount btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Sun, 11 Aug 2024 06:57:22 +0000 (08:57 +0200)]
xfs: report realtime refcount btree corruption errors to the health system

Source kernel commit: 8c9b0300f04b2a9993f6712a820b93988bcee1b6

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Wed, 3 Jul 2024 21:22:31 +0000 (14:22 -0700)]
xfs: enable extent size hints for CoW operations

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Wed, 3 Jul 2024 21:22:31 +0000 (14:22 -0700)]
xfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Wed, 3 Jul 2024 21:22:31 +0000 (14:22 -0700)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: refcover CoW leftovers in the realtime volume
Darrick J. Wong [Wed, 29 May 2024 04:13:18 +0000 (21:13 -0700)]
xfs: refcover CoW leftovers in the realtime volume

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Wed, 3 Jul 2024 21:22:30 +0000 (14:22 -0700)]
xfs: allow inodes to have the realtime and reflink flags

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Wed, 3 Jul 2024 21:22:30 +0000 (14:22 -0700)]
xfs: compute rtrmap btree max levels when reflink enabled

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Wed, 3 Jul 2024 21:22:30 +0000 (14:22 -0700)]
xfs: update rmap to allow cow staging extents in the rt rmap

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Sun, 11 Aug 2024 06:55:30 +0000 (08:55 +0200)]
xfs: create routine to allocate and initialize a realtime refcount btree inode

Source kernel commit: 0066145ac851fd746ed22e523c3b60062e94c250

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Sun, 11 Aug 2024 06:52:51 +0000 (08:52 +0200)]
xfs: wire up realtime refcount btree cursors

Source kernel commit: fb0ac941a3e35fe16375f89d8d817e2790aeab35

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: wire up a new inode fork type for the realtime refcount
Darrick J. Wong [Wed, 3 Jul 2024 21:22:29 +0000 (14:22 -0700)]
xfs: wire up a new inode fork type for the realtime refcount

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Wed, 29 May 2024 04:13:12 +0000 (21:13 -0700)]
xfs: add metadata reservations for realtime refcount btree

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Sun, 11 Aug 2024 06:42:26 +0000 (08:42 +0200)]
xfs: add realtime refcount btree inode to metadata directory

Source kernel commit: 2a935930d4ffea7f2fab2a0305727e858ebf6487

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Wed, 3 Jul 2024 21:22:28 +0000 (14:22 -0700)]
xfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Wed, 29 May 2024 04:13:09 +0000 (21:13 -0700)]
xfs: prepare refcount functions to deal with rtrefcountbt

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Wed, 29 May 2024 04:13:09 +0000 (21:13 -0700)]
xfs: add realtime refcount btree operations

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Wed, 3 Jul 2024 21:22:28 +0000 (14:22 -0700)]
xfs: realtime refcount btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: define the on-disk realtime refcount btree format
Darrick J. Wong [Wed, 29 May 2024 04:13:07 +0000 (21:13 -0700)]
xfs: define the on-disk realtime refcount btree format

Start filling out the rtrefcount btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate refcount btree blocks. This prepares the way for connecting
the btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Wed, 3 Jul 2024 21:22:27 +0000 (14:22 -0700)]
xfs: namespace the maximum length/refcount symbols

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: introduce realtime refcount btree definitions
Darrick J. Wong [Wed, 3 Jul 2024 21:22:27 +0000 (14:22 -0700)]
xfs: introduce realtime refcount btree definitions

Add new realtime refcount btree definitions. The realtime refcount btree
will be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agomkfs: use file write helper to populate files
Darrick J. Wong [Wed, 3 Jul 2024 21:22:27 +0000 (14:22 -0700)]
mkfs: use file write helper to populate files

Use the file write helper to write files into the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agolibxfs: resync libxfs_alloc_file_space interface with the kernel
Darrick J. Wong [Wed, 3 Jul 2024 21:22:26 +0000 (14:22 -0700)]
libxfs: resync libxfs_alloc_file_space interface with the kernel

Make the userspace xfs_alloc_file_space behave (more or less) like the
kernel version, at least as far as the interface goes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agomkfs: create the realtime rmap inode
Darrick J. Wong [Fri, 9 Aug 2024 12:34:30 +0000 (14:34 +0200)]
mkfs: create the realtime rmap inode

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_logprint: report realtime RUIs
Darrick J. Wong [Wed, 3 Jul 2024 21:22:25 +0000 (14:22 -0700)]
xfs_logprint: report realtime RUIs

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: allow sysadmins to add realtime reverse mapping indexes
Darrick J. Wong [Wed, 3 Jul 2024 21:22:25 +0000 (14:22 -0700)]
xfs_repair: allow sysadmins to add realtime reverse mapping indexes

Allow the sysadmin to use xfs_repair to upgrade an existing filesystem
to support the reverse mapping btree index for realtime volumes.  This
is needed for online fsck.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: reserve per-AG space while rebuilding rt metadata
Darrick J. Wong [Fri, 9 Aug 2024 12:34:04 +0000 (14:34 +0200)]
xfs_repair: reserve per-AG space while rebuilding rt metadata

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: rebuild the bmap btree for realtime files
Darrick J. Wong [Wed, 3 Jul 2024 21:22:25 +0000 (14:22 -0700)]
xfs_repair: rebuild the bmap btree for realtime files

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: check for global free space concerns with default btree slack levels
Darrick J. Wong [Fri, 9 Aug 2024 12:32:54 +0000 (14:32 +0200)]
xfs_repair: check for global free space concerns with default btree slack levels

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: rebuild the realtime rmap btree
Darrick J. Wong [Fri, 9 Aug 2024 12:23:29 +0000 (14:23 +0200)]
xfs_repair: rebuild the realtime rmap btree

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: always check realtime file mappings against incore info
Darrick J. Wong [Wed, 3 Jul 2024 21:22:24 +0000 (14:22 -0700)]
xfs_repair: always check realtime file mappings against incore info

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: check existing realtime rmapbt entries against observed rmaps
Darrick J. Wong [Wed, 3 Jul 2024 21:22:24 +0000 (14:22 -0700)]
xfs_repair: check existing realtime rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: find and mark the rtrmapbt inodes
Darrick J. Wong [Fri, 9 Aug 2024 12:20:14 +0000 (14:20 +0200)]
xfs_repair: find and mark the rtrmapbt inodes

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: refactor realtime inode check
Darrick J. Wong [Fri, 9 Aug 2024 12:18:17 +0000 (14:18 +0200)]
xfs_repair: refactor realtime inode check

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: collect realtime reverse-mapping data for refcount/rmap tree rebuilding
Darrick J. Wong [Wed, 3 Jul 2024 21:22:23 +0000 (14:22 -0700)]
xfs_repair: collect realtime reverse-mapping data for refcount/rmap tree rebuilding

Collect reverse-mapping data for realtime files so that we can later
check and rebuild the reference count tree and the reverse mapping
tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: create a new set of incore rmap information for rt groups
Darrick J. Wong [Wed, 3 Jul 2024 21:22:23 +0000 (14:22 -0700)]
xfs_repair: create a new set of incore rmap information for rt groups

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: use realtime rmap btree data to check block types
Darrick J. Wong [Wed, 3 Jul 2024 21:22:23 +0000 (14:22 -0700)]
xfs_repair: use realtime rmap btree data to check block types

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_repair: flag suspect long-format btree blocks
Darrick J. Wong [Wed, 3 Jul 2024 21:22:22 +0000 (14:22 -0700)]
xfs_repair: flag suspect long-format btree blocks

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_spaceman: report health status of the realtime rmap btree
Darrick J. Wong [Fri, 9 Aug 2024 12:17:15 +0000 (14:17 +0200)]
xfs_spaceman: report health status of the realtime rmap btree

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_scrub: retest metadata across scrub groups after a repair
Darrick J. Wong [Wed, 3 Jul 2024 21:22:22 +0000 (14:22 -0700)]
xfs_scrub: retest metadata across scrub groups after a repair

Certain types of metadata have dependencies that cross scrub groups.
For example, after a repair the part of realtime bitmap corresponding to
a realtime group, we potentially need to rebuild the realtime summary to
reflect the new bitmap contents.  The rtsummary is a separate scrub group
(metafiles) from the rgbitmap (rtgroup), which means that the rtsummary
repairs must be tracked by a separate scrub_item.

Create the necessary dependency table and code to make these kinds of
cross-group validations possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agolibfrog: enable scrubbing of the realtime rmap
Darrick J. Wong [Mon, 12 Aug 2024 08:20:54 +0000 (10:20 +0200)]
libfrog: enable scrubbing of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Wed, 3 Jul 2024 21:22:21 +0000 (14:22 -0700)]
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Wed, 3 Jul 2024 21:22:21 +0000 (14:22 -0700)]
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Wed, 3 Jul 2024 21:22:20 +0000 (14:22 -0700)]
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Mon, 12 Aug 2024 08:44:45 +0000 (10:44 +0200)]
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the root inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: hook live realtime rmap operations during a repair operation
Darrick J. Wong [Fri, 9 Aug 2024 12:11:25 +0000 (14:11 +0200)]
xfs: hook live realtime rmap operations during a repair operation

Source kernel commit: 95ca3a8b151f34e4084aeade83ef25893a41f37e

Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Wed, 29 May 2024 04:12:58 +0000 (21:12 -0700)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Sat, 10 Aug 2024 19:42:27 +0000 (21:42 +0200)]
xfs: online repair of the realtime rmap btree

Source kernel commit: f813af307d62d4c4d620a358bbd406f89ffdeca2

Repair the realtime rmap btree while mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agofixup
Christoph Hellwig [Mon, 12 Aug 2024 08:13:26 +0000 (10:13 +0200)]
fixup

10 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Mon, 12 Aug 2024 08:13:17 +0000 (10:13 +0200)]
xfs: scrub the metadir path of rt rmap btree files

Source kernel commit: 5b353bd96b1c83ae9ad6ac665d6a2db2c085857b

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agofixup
Christoph Hellwig [Fri, 9 Aug 2024 12:09:36 +0000 (14:09 +0200)]
fixup

10 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Fri, 9 Aug 2024 12:08:26 +0000 (14:08 +0200)]
xfs: scrub the realtime rmapbt

Source kernel commit: 15b31f2d8b71d1e775e9f1fa3cf4d740fa4e917f

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agofixup
Christoph Hellwig [Fri, 9 Aug 2024 12:06:48 +0000 (14:06 +0200)]
fixup

10 months agoxfs: allow queued realtime intents to drain before scrubbing
Darrick J. Wong [Sat, 10 Aug 2024 19:41:55 +0000 (21:41 +0200)]
xfs: allow queued realtime intents to drain before scrubbing

Source kernel commit: 77b81645574605ea0c0199ec32fc4a9cdc87bc53

When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself.  Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.

When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.

Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.

[Contains a few cleanups from hch]

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agofixup
Christoph Hellwig [Fri, 9 Aug 2024 12:02:54 +0000 (14:02 +0200)]
fixup

10 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Fri, 9 Aug 2024 12:02:30 +0000 (14:02 +0200)]
xfs: report realtime rmap btree corruption errors to the health system

Source kernel commit: 803aff2fabcc0cd3f3a41313c1dda6b61a63491e

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Fri, 9 Aug 2024 12:00:20 +0000 (14:00 +0200)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Source kernel commit: 30e82ca461da5c3ee75c78678d943bfc39ad2544

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Fri, 9 Aug 2024 11:59:21 +0000 (13:59 +0200)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Source kernel commit: 3fe80153ddb793f21d2d5fd5063acf594573f943

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

[Contains a minor bugfix from hch]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: allow inodes with zero extents but nonzero nblocks
Darrick J. Wong [Wed, 29 May 2024 04:11:54 +0000 (21:11 -0700)]
xfs: allow inodes with zero extents but nonzero nblocks

Metadata inodes that store btrees will have zero extents and a nonzero
nblocks.  Adjust the inode verifier so that this combination is not
flagged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: wire up a new inode fork type for the realtime rmap
Darrick J. Wong [Wed, 3 Jul 2024 21:22:17 +0000 (14:22 -0700)]
xfs: wire up a new inode fork type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Sat, 10 Aug 2024 19:41:15 +0000 (21:41 +0200)]
xfs: add metadata reservations for realtime rmap btrees

Source kernel commit: 96e6d68712d5d66f3980fc5e6f889c4428bf8c38

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Mon, 12 Aug 2024 08:41:56 +0000 (10:41 +0200)]
xfs: add realtime reverse map inode to metadata directory

Source kernel commit: 7dbfdba02ee58377503d271a9dac7f9b761b6d7f

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Wed, 3 Jul 2024 21:22:17 +0000 (14:22 -0700)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Wed, 3 Jul 2024 21:22:16 +0000 (14:22 -0700)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Wed, 29 May 2024 04:11:49 +0000 (21:11 -0700)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Fri, 9 Aug 2024 11:56:30 +0000 (13:56 +0200)]
xfs: realtime rmap btree transaction reservations

Source kernel commit: 2b08b631d6ad701ba6dda366fde4ae19cb66774a

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agofixup
Christoph Hellwig [Mon, 12 Aug 2024 08:11:48 +0000 (10:11 +0200)]
fixup

10 months agoxfs: define the on-disk realtime rmap btree format
Darrick J. Wong [Mon, 12 Aug 2024 08:11:15 +0000 (10:11 +0200)]
xfs: define the on-disk realtime rmap btree format

Source kernel commit: 9c9cd9a8ca9179b851d5eb73428f16a23f1937d5

Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: introduce realtime rmap btree definitions
Darrick J. Wong [Wed, 3 Jul 2024 21:22:15 +0000 (14:22 -0700)]
xfs: introduce realtime rmap btree definitions

Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Wed, 3 Jul 2024 21:22:15 +0000 (14:22 -0700)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agofixup
Christoph Hellwig [Sat, 10 Aug 2024 19:39:10 +0000 (21:39 +0200)]
fixup

10 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Sat, 10 Aug 2024 19:39:03 +0000 (21:39 +0200)]
xfs: allow inode-based btrees to reserve space in the data device

Source kernel commit: 0f422758c646e3160fb1d1285303c91de56a2d18

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
10 months agoxfs: attach rtgroup objects to btree cursors
Darrick J. Wong [Wed, 3 Jul 2024 21:22:15 +0000 (14:22 -0700)]
xfs: attach rtgroup objects to btree cursors

Make it so that we can attach realtime group objects to btree cursors.
This will be crucial for enabling rmap btrees in realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong [Wed, 3 Jul 2024 21:22:15 +0000 (14:22 -0700)]
xfs: update btree keys correctly when _insrec splits an inode root block

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Wed, 29 May 2024 04:11:32 +0000 (21:11 -0700)]
xfs: support storing records in the inode core root

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Wed, 3 Jul 2024 21:22:14 +0000 (14:22 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Wed, 3 Jul 2024 21:22:14 +0000 (14:22 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: support leaves in the incore btree root block in xfs_iroot_realloc
Darrick J. Wong [Wed, 3 Jul 2024 21:22:14 +0000 (14:22 -0700)]
xfs: support leaves in the incore btree root block in xfs_iroot_realloc

Add some logic to xfs_iroot_realloc so that we can handle leaf records
in the btree root block correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: generalize the btree root reallocation function
Darrick J. Wong [Wed, 3 Jul 2024 21:22:13 +0000 (14:22 -0700)]
xfs: generalize the btree root reallocation function

In preparation for storing realtime rmap btree roots in an inode fork,
make xfs_iroot_realloc take an ops structure that takes care of all the
btree-specific geometry pieces.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
10 months agoxfs: standardize the btree maxrecs function parameters
Darrick J. Wong [Wed, 3 Jul 2024 21:22:13 +0000 (14:22 -0700)]
xfs: standardize the btree maxrecs function parameters

Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions.  This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>