]> www.infradead.org Git - users/hch/xfs.git/log
users/hch/xfs.git
9 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Mon, 23 Sep 2024 20:42:04 +0000 (13:42 -0700)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Mon, 23 Sep 2024 20:42:03 +0000 (13:42 -0700)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

[Contains a minor bugfix from hch]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: allow inodes with zero extents but nonzero nblocks
Darrick J. Wong [Mon, 23 Sep 2024 20:42:02 +0000 (13:42 -0700)]
xfs: allow inodes with zero extents but nonzero nblocks

Metadata inodes that store btrees will have zero extents and a nonzero
nblocks.  Adjust the inode verifier so that this combination is not
flagged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: wire up a new inode fork type for the realtime rmap
Darrick J. Wong [Mon, 23 Sep 2024 20:42:01 +0000 (13:42 -0700)]
xfs: wire up a new inode fork type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Mon, 23 Sep 2024 20:42:01 +0000 (13:42 -0700)]
xfs: add metadata reservations for realtime rmap btrees

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Mon, 23 Sep 2024 20:42:00 +0000 (13:42 -0700)]
xfs: add realtime reverse map inode to metadata directory

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add realtime rmap btree block detection to log recovery
Darrick J. Wong [Mon, 23 Sep 2024 20:41:59 +0000 (13:41 -0700)]
xfs: add realtime rmap btree block detection to log recovery

Identify rtrmapbt blocks in the log correctly so that we can
validate them during log recovery.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: support recovering rmap intent items targetting realtime extents
Darrick J. Wong [Mon, 23 Sep 2024 20:41:58 +0000 (13:41 -0700)]
xfs: support recovering rmap intent items targetting realtime extents

Now that we have rmap on the realtime device, log recovery has to
support remapping extents on the realtime volume.  Make this work.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Mon, 23 Sep 2024 20:41:57 +0000 (13:41 -0700)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Mon, 23 Sep 2024 20:41:57 +0000 (13:41 -0700)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Mon, 23 Sep 2024 20:41:56 +0000 (13:41 -0700)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Mon, 23 Sep 2024 20:41:55 +0000 (13:41 -0700)]
xfs: realtime rmap btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: define the on-disk realtime rmap btree format
Darrick J. Wong [Mon, 23 Sep 2024 20:41:54 +0000 (13:41 -0700)]
xfs: define the on-disk realtime rmap btree format

Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: introduce realtime rmap btree definitions
Darrick J. Wong [Mon, 23 Sep 2024 20:41:53 +0000 (13:41 -0700)]
xfs: introduce realtime rmap btree definitions

Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Mon, 23 Sep 2024 20:41:53 +0000 (13:41 -0700)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: prepare rmap btree cursor tracepoints for realtime
Darrick J. Wong [Mon, 23 Sep 2024 20:41:52 +0000 (13:41 -0700)]
xfs: prepare rmap btree cursor tracepoints for realtime

Rework the rmap btree cursor tracepoints in preparation to handle the
realtime rmap btree cursor.  Mostly this involves renaming the field to
"rmapbno" and extracting the group number from the cursor when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Mon, 23 Sep 2024 20:41:51 +0000 (13:41 -0700)]
xfs: allow inode-based btrees to reserve space in the data device

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong [Mon, 23 Sep 2024 20:41:50 +0000 (13:41 -0700)]
xfs: update btree keys correctly when _insrec splits an inode root block

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Mon, 23 Sep 2024 20:41:49 +0000 (13:41 -0700)]
xfs: support storing records in the inode core root

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Mon, 23 Sep 2024 20:41:49 +0000 (13:41 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Mon, 23 Sep 2024 20:41:48 +0000 (13:41 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: tidy up xfs_bmap_broot_realloc a bit
Darrick J. Wong [Mon, 23 Sep 2024 20:41:47 +0000 (13:41 -0700)]
xfs: tidy up xfs_bmap_broot_realloc a bit

Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller.  Also
use the correct bmbt pointer type for the sizeof operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: make xfs_iroot_realloc a bmap btree function
Darrick J. Wong [Mon, 23 Sep 2024 20:41:46 +0000 (13:41 -0700)]
xfs: make xfs_iroot_realloc a bmap btree function

Move the inode fork btree root reallocation function part of the btree
ops because it's now mostly bmbt-specific code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: make xfs_iroot_realloc take the new numrecs instead of deltas
Darrick J. Wong [Mon, 23 Sep 2024 20:41:45 +0000 (13:41 -0700)]
xfs: make xfs_iroot_realloc take the new numrecs instead of deltas

Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number.  This will make the callsites easier to understand.

Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported.  This will be addressed in
a subsequent patch.

Return the new btree root to reduce the amount of code clutter.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: refactor the inode fork memory allocation functions
Darrick J. Wong [Mon, 23 Sep 2024 20:41:45 +0000 (13:41 -0700)]
xfs: refactor the inode fork memory allocation functions

Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function.  Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: tidy up xfs_iroot_realloc
Darrick J. Wong [Mon, 23 Sep 2024 20:41:44 +0000 (13:41 -0700)]
xfs: tidy up xfs_iroot_realloc

Tidy up this function a bit before we start refactoring the memory
handling and move the function to the bmbt code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: enable metadata directory feature
Darrick J. Wong [Mon, 23 Sep 2024 20:41:43 +0000 (13:41 -0700)]
xfs: enable metadata directory feature

Enable the metadata directory feature.  With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.

The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist.  A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.

For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: update sb field checks when metadir is turned on
Darrick J. Wong [Mon, 23 Sep 2024 20:41:42 +0000 (13:41 -0700)]
xfs: update sb field checks when metadir is turned on

When metadir is enabled, we want to check the two new rtgroups fields,
and we don't want to check the old inumbers that are now in the metadir.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: persist quota flags with metadir
Darrick J. Wong [Mon, 23 Sep 2024 20:41:42 +0000 (13:41 -0700)]
xfs: persist quota flags with metadir

It's annoying that one has to keep reminding XFS about what quota
options it should mount with, since the quota flags recording the
previous state are sitting right there in the primary superblock.  Even
more strangely, there exists a noquota option to disable quotas
completely, so it's odder still that providing no options is the same as
noquota.

Starting with metadir, let's change the behavior so that if the user
does not specify any quota-related mount options at all, the ondisk
quota flags will be used to bring up quota.  In other words, the
filesystem will mount in the same state and with the same functionality
as it had during the last mount.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: scrub quota file metapaths
Darrick J. Wong [Mon, 23 Sep 2024 20:41:41 +0000 (13:41 -0700)]
xfs: scrub quota file metapaths

Enable online fsck for quota file metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: use metadir for quota inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:41:40 +0000 (13:41 -0700)]
xfs: use metadir for quota inodes

Store the quota inodes in the /quota metadata directory if metadir is
enabled.  This enables us to stop using the sb_[ugp]uotino fields in the
superblock.  From this point on, all metadata files will be children of
the metadata directory tree root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: refactor xfs_qm_destroy_quotainos
Darrick J. Wong [Mon, 23 Sep 2024 20:41:39 +0000 (13:41 -0700)]
xfs: refactor xfs_qm_destroy_quotainos

Reuse this function instead of open-coding the logic.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
Darrick J. Wong [Mon, 30 Sep 2024 20:49:00 +0000 (13:49 -0700)]
xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t

Now that we've finished adding allocation groups to the realtime volume,
let's make the file block mapping address (xfs_rtblock_t) a segmented
value just like we do on the data device.  This means that group number
and block number conversions can be done with shifting and masking
instead of integer division.

While in theory we could continue caching the rgno shift value in
m_rgblklog, the fact that we now always use the shift value means that
we have an opportunity to increase the redundancy of the rt geometry by
storing it in the ondisk superblock and adding more sb verifier code.
Reuse the space vacated by sb_bad_feature2 to store the rgblklog value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
Darrick J. Wong [Mon, 30 Sep 2024 20:47:22 +0000 (13:47 -0700)]
xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file mapping block
lengths because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
Darrick J. Wong [Mon, 30 Sep 2024 20:43:12 +0000 (13:43 -0700)]
xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file block offsets
because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: mask off the rtbitmap and summary inodes when metadir in use
Darrick J. Wong [Mon, 23 Sep 2024 20:41:39 +0000 (13:41 -0700)]
xfs: mask off the rtbitmap and summary inodes when metadir in use

Set the rtbitmap and summary file inumbers to NULLFSINO in the
superblock and make sure they're zeroed whenever we write the superblock
to disk, to mimic mkfs behavior.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: scrub metadir paths for rtgroup metadata
Darrick J. Wong [Mon, 23 Sep 2024 20:41:38 +0000 (13:41 -0700)]
xfs: scrub metadir paths for rtgroup metadata

Add the code we need to scan the metadata directory paths of rt group
metadata files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: repair realtime group superblock
Darrick J. Wong [Mon, 23 Sep 2024 20:41:37 +0000 (13:41 -0700)]
xfs: repair realtime group superblock

Repair the realtime superblock if it has become out of date with the
primary superblock.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: scrub the realtime group superblock
Darrick J. Wong [Mon, 23 Sep 2024 20:41:36 +0000 (13:41 -0700)]
xfs: scrub the realtime group superblock

Enable scrubbing of realtime group superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't coalesce file mappings that cross rtgroup boundaries in scrub
Darrick J. Wong [Mon, 23 Sep 2024 20:41:35 +0000 (13:41 -0700)]
xfs: don't coalesce file mappings that cross rtgroup boundaries in scrub

The bmbt scrubber will combine file mappings if they are mergeable to
reduce the number of cross-referencing checks.  However, we shouldn't
combine mappings that cross rt group boundaries because that will cause
verifiers to trip incorrectly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: make the RT allocator rtgroup aware
Christoph Hellwig [Mon, 23 Sep 2024 20:41:35 +0000 (13:41 -0700)]
xfs: make the RT allocator rtgroup aware

Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise.  A simple rotor is
provided to pick the placement for initial allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: don't merge ioends across RTGs
Christoph Hellwig [Mon, 23 Sep 2024 20:41:34 +0000 (13:41 -0700)]
xfs: don't merge ioends across RTGs

Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries.  Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: use realtime EFI to free extents when rtgroups are enabled
Darrick J. Wong [Mon, 23 Sep 2024 20:41:33 +0000 (13:41 -0700)]
xfs: use realtime EFI to free extents when rtgroups are enabled

When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks.  When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.

For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs.  Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support error injection when freeing rt extents
Darrick J. Wong [Mon, 23 Sep 2024 20:41:32 +0000 (13:41 -0700)]
xfs: support error injection when freeing rt extents

A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent.  Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support logging EFIs for realtime extents
Darrick J. Wong [Mon, 23 Sep 2024 20:41:31 +0000 (13:41 -0700)]
xfs: support logging EFIs for realtime extents

Teach the EFI mechanism how to free realtime extents.  We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.

Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents.  This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases.  Hopefully this
will make it easier to debug filesystem problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: force swapext to a realtime file to use the file content exchange ioctl
Darrick J. Wong [Mon, 23 Sep 2024 20:41:31 +0000 (13:41 -0700)]
xfs: force swapext to a realtime file to use the file content exchange ioctl

xfs_swap_extent_rmap does not use log items to track the overall
progress of an attempt to swap the extent mappings between two files.
If the system crashes in the middle of swapping a partially written
realtime extent, the mapping will be left in an inconsistent state
wherein a file can point to multiple extents on the rt volume.

The new file range exchange functionality handles this correctly, so all
callers must upgrade to that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: store rtgroup information with a bmap intent
Darrick J. Wong [Mon, 23 Sep 2024 20:41:30 +0000 (13:41 -0700)]
xfs: store rtgroup information with a bmap intent

Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped.  We will need
this functionality once we start enabling rmap and reflink on the rt
volume.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: grow the realtime section when realtime groups are enabled
Darrick J. Wong [Mon, 23 Sep 2024 20:41:29 +0000 (13:41 -0700)]
xfs: grow the realtime section when realtime groups are enabled

Enable growing the rt section when realtime groups are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: encode the rtsummary in big endian format
Darrick J. Wong [Mon, 23 Sep 2024 20:41:28 +0000 (13:41 -0700)]
xfs: encode the rtsummary in big endian format

Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words.  There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.  Encode the summary
information in big endian format, like most of the rest of the
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: encode the rtbitmap in big endian format
Darrick J. Wong [Mon, 23 Sep 2024 20:41:27 +0000 (13:41 -0700)]
xfs: encode the rtbitmap in big endian format

Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words.  There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: add block headers to realtime bitmap and summary blocks
Darrick J. Wong [Mon, 23 Sep 2024 20:41:27 +0000 (13:41 -0700)]
xfs: add block headers to realtime bitmap and summary blocks

Upgrade rtbitmap and rtsummary blocks to have self describing metadata
like most every other thing in XFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: export the geometry of realtime groups to userspace
Darrick J. Wong [Mon, 23 Sep 2024 20:41:26 +0000 (13:41 -0700)]
xfs: export the geometry of realtime groups to userspace

Create an ioctl so that the kernel can report the status of realtime
groups to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: record rt group metadata errors in the health system
Darrick J. Wong [Mon, 23 Sep 2024 20:41:25 +0000 (13:41 -0700)]
xfs: record rt group metadata errors in the health system

Record the state of per-rtgroup metadata sickness in the rtgroup
structure for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: convert sick_map loops to use ARRAY_SIZE
Darrick J. Wong [Mon, 23 Sep 2024 20:41:24 +0000 (13:41 -0700)]
xfs: convert sick_map loops to use ARRAY_SIZE

Convert these arrays to use ARRAY_SIZE insteead of requiring an empty
sentinel array element at the end.  This saves memory and would have
avoided a bug that worked its way into the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: add frextents to the lazysbcounters when rtgroups enabled
Darrick J. Wong [Mon, 23 Sep 2024 20:41:23 +0000 (13:41 -0700)]
xfs: add frextents to the lazysbcounters when rtgroups enabled

Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled.  This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: add a helper to prevent bmap merges across rtgroup boundaries
Christoph Hellwig [Mon, 23 Sep 2024 20:41:23 +0000 (13:41 -0700)]
xfs: add a helper to prevent bmap merges across rtgroup boundaries

Except for the rt superblock, realtime groups do not store any metadata
at the start (or end) of the group.  There is nothing to prevent the
bmap code from merging allocations from multiple groups into a single
bmap record.  Add a helper to check for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: massage the commit message after pulling this into rtgroups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: check that rtblock extents do not break rtsupers or rtgroups
Darrick J. Wong [Mon, 23 Sep 2024 20:41:22 +0000 (13:41 -0700)]
xfs: check that rtblock extents do not break rtsupers or rtgroups

Check that rt block pointers do not point to the realtime superblock and
that allocated rt space extents do not cross rtgroup boundaries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: export realtime group geometry via XFS_FSOP_GEOM
Darrick J. Wong [Mon, 23 Sep 2024 20:41:21 +0000 (13:41 -0700)]
xfs: export realtime group geometry via XFS_FSOP_GEOM

Export the realtime geometry information so that userspace can query it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: update realtime super every time we update the primary fs super
Darrick J. Wong [Mon, 23 Sep 2024 20:41:20 +0000 (13:41 -0700)]
xfs: update realtime super every time we update the primary fs super

Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super.  Avoid
changing the log to support logging to the rt device by using ordered
buffers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: check the realtime superblock at mount time
Darrick J. Wong [Mon, 23 Sep 2024 20:41:20 +0000 (13:41 -0700)]
xfs: check the realtime superblock at mount time

Check the realtime superblock at mount time, to ensure that the label
and uuids actually match the primary superblock on the data device.  If
the rt superblock is good, attach it to the xfs_mount so that the log
can use ordered buffers to keep this primary in sync with the primary
super on the data device.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: define the format of rt groups
Darrick J. Wong [Mon, 23 Sep 2024 20:41:19 +0000 (13:41 -0700)]
xfs: define the format of rt groups

Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes.  rt supers are conditionally enabled by a
predicate function so that they can be disabled if we ever implement
zoned storage support for the realtime volume.

For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoiomap: add a merge boundary flag
Christoph Hellwig [Mon, 23 Sep 2024 20:41:18 +0000 (13:41 -0700)]
iomap: add a merge boundary flag

File systems might have boundaries over which merges aren't possible.
In fact these are very common, although most of the time some kind of
header at the beginning of this region (e.g. XFS alloation groups, ext4
block groups) automatically create a merge barrier.  But if that is
not present, say for a device purely used for data we need to manually
communicate that to iomap.

Add a IOMAP_F_BOUNDARY flag to never merge I/O into a previous mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: fix rt device offset calculations for FITRIM
Darrick J. Wong [Fri, 27 Sep 2024 22:40:59 +0000 (15:40 -0700)]
xfs: fix rt device offset calculations for FITRIM

FITRIM on xfs has this bizarro uapi where we flatten all the physically
addressable storage across two block devices into a linear address
space.  In this address space, the realtime device comes immediately
after the data device.  Therefore, the xfs_trim_rtdev_extents has to
convert its input parameters from the linear address space to actual
rtdev block addresses on the realtime volume.

Right now the address space conversion is done in units of rtblocks.
However, a future patchset will convert xfs_rtblock_t to be a segmented
address space (group:blkno) like the data device.  Change the conversion
code to be done in units of daddrs since those will never be segmented.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: make RT extent numbers relative to the rtgroup
Christoph Hellwig [Mon, 23 Sep 2024 20:41:17 +0000 (13:41 -0700)]
xfs: make RT extent numbers relative to the rtgroup

To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup.  The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: refactor xfs_rtsummary_blockcount
Christoph Hellwig [Mon, 23 Sep 2024 20:41:16 +0000 (13:41 -0700)]
xfs: refactor xfs_rtsummary_blockcount

Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well.  This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.

This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap that seems like a price
worth paying.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: refactor xfs_rtbitmap_blockcount
Christoph Hellwig [Mon, 23 Sep 2024 20:41:16 +0000 (13:41 -0700)]
xfs: refactor xfs_rtbitmap_blockcount

Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.

This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: factor out a xfs_growfs_check_rtgeom helper
Christoph Hellwig [Mon, 23 Sep 2024 20:41:15 +0000 (13:41 -0700)]
xfs: factor out a xfs_growfs_check_rtgeom helper

Split the check that the rtsummary fits into the log into a separate
helper, and use xfs_growfs_rt_alloc_fake_mount to calculate the new RT
geometry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: avoid division for the 0-rtx growfs check]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: use xfs_growfs_rt_alloc_fake_mount in xfs_growfs_rt_alloc_blocks
Christoph Hellwig [Mon, 23 Sep 2024 20:41:14 +0000 (13:41 -0700)]
xfs: use xfs_growfs_rt_alloc_fake_mount in xfs_growfs_rt_alloc_blocks

Use xfs_growfs_rt_alloc_fake_mount instead of manually recalculating
the RT bitmap geometry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: factor out a xfs_growfs_rt_alloc_fake_mount helper
Darrick J. Wong [Mon, 23 Sep 2024 20:41:13 +0000 (13:41 -0700)]
xfs: factor out a xfs_growfs_rt_alloc_fake_mount helper

Split the code to set up a fake mount point to calculate new RT
geometry out of xfs_growfs_rt_bmblock so that it can be reused.

Note that this changes the rmblocks calculation method to be based
on the passed in rblocks and extsize and not the explicitly passed
one, but both methods will always lead to the same result.  The new
version just does a little bit more math while being more general.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: calculate RT bitmap and summary blocks based on sb_rextents
Christoph Hellwig [Mon, 23 Sep 2024 20:41:12 +0000 (13:41 -0700)]
xfs: calculate RT bitmap and summary blocks based on sb_rextents

Use the on-disk rextents to calculate the bitmap and summary blocks
instead of the calculated one so that we can refactor the helpers for
calculating them.

As the RT bitmap and summary scrubbers already check that sb_rextents
match the block count this does not change coverage of the scrubber.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove XFS_ILOCK_RT*
Darrick J. Wong [Mon, 23 Sep 2024 20:41:12 +0000 (13:41 -0700)]
xfs: remove XFS_ILOCK_RT*

Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: move RT bitmap and summary information to the rtgroup
Christoph Hellwig [Mon, 23 Sep 2024 20:41:11 +0000 (13:41 -0700)]
xfs: move RT bitmap and summary information to the rtgroup

Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.

Code using the inodes now needs to operate on a rtgroup.  Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add rtgroup-based realtime scrubbing context management
Darrick J. Wong [Mon, 23 Sep 2024 20:41:10 +0000 (13:41 -0700)]
xfs: add rtgroup-based realtime scrubbing context management

Create a pair of helpers to deal with setting up the necessary incore
context to check metadata records against the realtime metadata.  Right
now this is limited to locking the realtime bitmap and summary inodes,
but as we add rmap and reflink to the realtime device this will grow to
include btree cursors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support caching rtgroup metadata inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:41:09 +0000 (13:41 -0700)]
xfs: support caching rtgroup metadata inodes

Create the necessary per-rtgroup infrastructure that we need to load
metadata inodes into memory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: add a lockdep class key for rtgroup inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:41:08 +0000 (13:41 -0700)]
xfs: add a lockdep class key for rtgroup inodes

Add a dynamic lockdep class key for rtgroup inodes.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks
when nesting ILOCKs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: define locking primitives for realtime groups
Darrick J. Wong [Mon, 23 Sep 2024 20:41:08 +0000 (13:41 -0700)]
xfs: define locking primitives for realtime groups

Define helper functions to lock all metadata inodes related to a
realtime group.  There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: create incore realtime group structures
Darrick J. Wong [Mon, 23 Sep 2024 20:41:07 +0000 (13:41 -0700)]
xfs: create incore realtime group structures

Create an incore object that will contain information about a realtime
allocation group.  This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: clean up xfs_getfsmap_helper arguments
Christoph Hellwig [Mon, 23 Sep 2024 20:41:06 +0000 (13:41 -0700)]
xfs: clean up xfs_getfsmap_helper arguments

The calling conventions for xfs_getfsmap_helper are confusing -- callers
pass in an rmap record, but they must also supply startblock and
blockcount in daddr units.  This was bolted onto the original fsmap
implementation so that we could report *something* for realtime
volumes, which do not support rmap and hence can draw only from the rt
free space bitmap.  Free space on the rt volume can be more than 2^32
fsblocks long, which means that we can't use the rmap startblock or
blockcount fields.

This is confusing for callers, because they must supplying redundant
data, but not all of it is used.  Streamline this by creating a separate
fsmap irec structure that contains exactly the data we need, once.

Note that we actually do need rm_startblock for rmap key comparisons
when we're actually querying an rmap btree, so leave that field but
document why it's there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: repair metadata directory file path connectivity
Darrick J. Wong [Mon, 23 Sep 2024 20:41:05 +0000 (13:41 -0700)]
xfs: repair metadata directory file path connectivity

Fix disconnected or incorrect metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: confirm dotdot target before replacing it during a repair
Darrick J. Wong [Mon, 23 Sep 2024 20:41:04 +0000 (13:41 -0700)]
xfs: confirm dotdot target before replacing it during a repair

xfs_dir_replace trips an assertion if you tell it to change a dirent to
point to an inumber that it already points at.  Look up the dotdot entry
directly to confirm that we need to make a change.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: check metadata directory file path connectivity
Darrick J. Wong [Mon, 23 Sep 2024 20:41:04 +0000 (13:41 -0700)]
xfs: check metadata directory file path connectivity

Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use.  IOWs, check that "/quota/user" in the
metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: move repair temporary files to the metadata directory tree
Darrick J. Wong [Mon, 23 Sep 2024 20:41:03 +0000 (13:41 -0700)]
xfs: move repair temporary files to the metadata directory tree

Due to resource acquisition rules, we have to create the ondisk
temporary files used to stage a filesystem repair before we can acquire
a reference to the inode that we actually want to repair.  Therefore,
we do not know at tempfile creation time whether the tempfile will
belong to the regular directory tree or the metadata directory tree.

This distinction becomes important when the swapext code tries to figure
out the quota accounting of the two files whose mappings are being
swapped.  The swapext code assumes that accounting updates are required
for a file if dqattach attaches dquots.  Metadir files are never
accounted in quota, which means that swapext must not update the quota
accounting when swapping in a repaired directory/xattr/rtbitmap structure.

Prior to the swapext call, therefore, both files must be marked as
METADIR for dqattach so that dqattach will ignore them.  Add support for
a repair tempfile to be switched to the metadir tree and switched back
before being released so that ifree will just free the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: check the metadata directory inumber in superblocks
Darrick J. Wong [Mon, 23 Sep 2024 20:41:02 +0000 (13:41 -0700)]
xfs: check the metadata directory inumber in superblocks

When metadata directories are enabled, make sure that the secondary
superblocks point to the metadata directory.  This isn't strictly
required because the secondaries are only used to recover damaged
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: scrub metadata directories
Darrick J. Wong [Mon, 23 Sep 2024 20:41:01 +0000 (13:41 -0700)]
xfs: scrub metadata directories

Teach online scrub about the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: fix di_metatype field of inodes that won't load
Darrick J. Wong [Mon, 23 Sep 2024 20:41:01 +0000 (13:41 -0700)]
xfs: fix di_metatype field of inodes that won't load

Make sure that the di_metatype field is at least set plausibly so that
later scrubbers could set the real type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: adjust parent pointer scrubber for sb-rooted metadata files
Darrick J. Wong [Mon, 23 Sep 2024 20:41:00 +0000 (13:41 -0700)]
xfs: adjust parent pointer scrubber for sb-rooted metadata files

Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.

For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent.  For
purposes of validating nlink, we pretend that there is a parent pointer.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: metadata files can have xattrs if metadir is enabled
Darrick J. Wong [Mon, 23 Sep 2024 20:40:59 +0000 (13:40 -0700)]
xfs: metadata files can have xattrs if metadir is enabled

If parent pointers are enabled, then metadata files will store parent
pointers in xattrs, just like files in the user visible directory tree.
Therefore, scrub and repair need to handle attr forks for metadata files
on metadir filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't fail repairs on metadata files with no attr fork
Darrick J. Wong [Mon, 23 Sep 2024 20:40:58 +0000 (13:40 -0700)]
xfs: don't fail repairs on metadata files with no attr fork

Fix a minor bug where we fail repairs on metadata files that do not have
attr forks because xrep_metadata_inode_subtype doesn't filter ENOENT.

Fixes: 5a8e07e799721 ("xfs: repair the inode core and forks of a metadata inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: do not count metadata directory files when doing online quotacheck
Darrick J. Wong [Mon, 23 Sep 2024 20:40:58 +0000 (13:40 -0700)]
xfs: do not count metadata directory files when doing online quotacheck

Previously, we stated that files in the metadata directory tree are not
counted in the dquot information.  Fix the online quotacheck code to
reflect this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: refactor directory tree root predicates
Darrick J. Wong [Mon, 23 Sep 2024 20:40:57 +0000 (13:40 -0700)]
xfs: refactor directory tree root predicates

Metadata directory trees make reasoning about the parent of a file more
difficult.  Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock.  Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.

Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: record health problems with the metadata directory
Darrick J. Wong [Mon, 23 Sep 2024 20:40:56 +0000 (13:40 -0700)]
xfs: record health problems with the metadata directory

Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: adjust xfs_bmap_add_attrfork for metadir
Darrick J. Wong [Mon, 23 Sep 2024 20:40:55 +0000 (13:40 -0700)]
xfs: adjust xfs_bmap_add_attrfork for metadir

Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers.  In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: mark quota inodes as metadata files
Darrick J. Wong [Mon, 23 Sep 2024 20:40:54 +0000 (13:40 -0700)]
xfs: mark quota inodes as metadata files

When we're creating quota files at mount time, make sure to mark them as
metadir inodes if appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't count metadata directory files to quota
Darrick J. Wong [Mon, 23 Sep 2024 20:40:54 +0000 (13:40 -0700)]
xfs: don't count metadata directory files to quota

Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage.  This will also
quiet down complaints about dquot usage not matching du output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: allow bulkstat to return metadata directories
Darrick J. Wong [Mon, 23 Sep 2024 20:40:53 +0000 (13:40 -0700)]
xfs: allow bulkstat to return metadata directories

Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.

(Metadata files of course require per-file scrub code and hence do not
need exposure.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: advertise metadata directory feature
Darrick J. Wong [Mon, 23 Sep 2024 20:40:52 +0000 (13:40 -0700)]
xfs: advertise metadata directory feature

Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: hide metadata inodes from everyone because they are special
Darrick J. Wong [Mon, 23 Sep 2024 20:40:51 +0000 (13:40 -0700)]
xfs: hide metadata inodes from everyone because they are special

Metadata inodes are private files and therefore cannot be exposed to
userspace.  This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs.  As such, we mark
them S_PRIVATE, which stops all that.

While we're at it, put them in a separate lockdep class so that it won't
get confused by "recursive" i_rwsem locking such as what happens when we
write to a rt file and need to allocate from the rt bitmap file.  The
static function that we use to do this will be exported in the rtgroups
patchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: disable the agi rotor for metadata inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:40:51 +0000 (13:40 -0700)]
xfs: disable the agi rotor for metadata inodes

Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk.  Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log.  Therefore, disable
AGI rotoring for metadata inode allocations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: read and write metadata inode directory tree
Darrick J. Wong [Mon, 23 Sep 2024 20:40:50 +0000 (13:40 -0700)]
xfs: read and write metadata inode directory tree

Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: enforce metadata inode flag
Darrick J. Wong [Mon, 23 Sep 2024 20:40:49 +0000 (13:40 -0700)]
xfs: enforce metadata inode flag

Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>