]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
15 months agoxfs_scrub: check rtrmapbt metadata directory connections
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_scrub: check rtrmapbt metadata directory connections

Run the rt rmap btree metapath scrubber during phase 5 to ensure that
it's still connected to the metadir tree after we've pruned any bad
links.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibfrog: enable scrubbng of the realtime rmap
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
libfrog: enable scrubbng of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: support scrubbing rtgroup metadata paths
Darrick J. Wong [Tue, 9 Jan 2024 17:40:18 +0000 (09:40 -0800)]
xfs_io: support scrubbing rtgroup metadata paths

Support scrubbing the metadata directory path of an rtgroup metadata
file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support rudimentary checks of the rtrmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: support rudimentary checks of the rtrmap btree

Perform some fairly superficial checks of the rtrmap btree.  We'll
do more sophisticated checks in xfs_repair, but provide enough of
a spot-check here that we can do simple things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:17 +0000 (09:40 -0800)]
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Tue, 9 Jan 2024 17:40:16 +0000 (09:40 -0800)]
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the root inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: hook live realtime rmap operations during a repair operation
Darrick J. Wong [Tue, 9 Jan 2024 17:44:08 +0000 (09:44 -0800)]
xfs: hook live realtime rmap operations during a repair operation

Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Tue, 9 Jan 2024 17:44:07 +0000 (09:44 -0800)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Tue, 9 Jan 2024 17:44:07 +0000 (09:44 -0800)]
xfs: online repair of the realtime rmap btree

Repair the realtime rmap btree while mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Tue, 9 Jan 2024 17:40:16 +0000 (09:40 -0800)]
xfs: scrub the metadir path of rt rmap btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: scrub the realtime rmapbt

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow queued realtime intents to drain before scrubbing
Darrick J. Wong [Tue, 9 Jan 2024 17:43:58 +0000 (09:43 -0800)]
xfs: allow queued realtime intents to drain before scrubbing

When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself.  Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.

When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.

Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.

[Contains a few cleanups from hch]

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Tue, 9 Jan 2024 17:43:56 +0000 (09:43 -0800)]
xfs: report realtime rmap btree corruption errors to the health system

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:40:15 +0000 (09:40 -0800)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

[Contains a minor bugfix from hch]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: use realtime EFI to free extents when realtime rmap is enabled
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: use realtime EFI to free extents when realtime rmap is enabled

When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks.  xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which means that when rtrmap is enabled, we have to
use realtime EFIs to maintain the expected order.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow inodes with zero extents but nonzero nblocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: allow inodes with zero extents but nonzero nblocks

Metadata inodes that store btrees will have zero extents and a nonzero
nblocks.  Adjust the inode verifier so that this combination is not
flagged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: wire up a new inode fork type for the realtime rmap
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: wire up a new inode fork type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Tue, 9 Jan 2024 17:40:14 +0000 (09:40 -0800)]
xfs: add metadata reservations for realtime rmap btrees

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Tue, 9 Jan 2024 17:43:48 +0000 (09:43 -0800)]
xfs: add realtime reverse map inode to metadata directory

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a lockdep class key per rtgroup
Darrick J. Wong [Fri, 19 Jan 2024 00:41:48 +0000 (16:41 -0800)]
xfs: add a lockdep class key per rtgroup

Add a dynamic lockdep class key to each rtgroup.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Tue, 9 Jan 2024 17:40:13 +0000 (09:40 -0800)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Tue, 9 Jan 2024 17:43:45 +0000 (09:43 -0800)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Tue, 9 Jan 2024 17:43:44 +0000 (09:43 -0800)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Tue, 9 Jan 2024 17:43:43 +0000 (09:43 -0800)]
xfs: realtime rmap btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: define the on-disk realtime rmap btree format
Darrick J. Wong [Tue, 9 Jan 2024 17:43:42 +0000 (09:43 -0800)]
xfs: define the on-disk realtime rmap btree format

Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: introduce realtime rmap btree definitions
Darrick J. Wong [Tue, 9 Jan 2024 17:43:42 +0000 (09:43 -0800)]
xfs: introduce realtime rmap btree definitions

Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c
Darrick J. Wong [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c

Move the code that adds the incore xfs_rmap_update_item deferred work
data to a transaction live with the RUI log item code.  This means that
the rmap code no longer has to know about the inner workings of the RUI
log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify usage of the rcur local variable in xfs_rmap_finish_one
Christoph Hellwig [Tue, 9 Jan 2024 17:40:12 +0000 (09:40 -0800)]
xfs: simplify usage of the rcur local variable in xfs_rmap_finish_one

Only update rcur when we know the final *pcur value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one
Christoph Hellwig [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one

In xfs_rmap_finish_one we known the cursor is non-zero when calling
xfs_rmap_finish_one_cleanup and we pass a 0 error variable.  This means
xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor.

Open code that and move xfs_rmap_finish_one_cleanup to
fs/xfs/xfs_rmap_item.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor porting changes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: reuse xfs_rmap_update_cancel_item
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: reuse xfs_rmap_update_cancel_item

Reuse xfs_rmap_update_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a ri_entry helper
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: add a ri_entry helper

Add a helper to translate from the item list head to the
rmap_intent_item structure and use it so shorten assignments and avoid
the need for extra local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: clean up rmap log intent item tracepoint callsites
Darrick J. Wong [Tue, 9 Jan 2024 17:43:35 +0000 (09:43 -0800)]
xfs: clean up rmap log intent item tracepoint callsites

Pass the incore rmap structure to the tracepoints instead of open-coding
the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: prepare rmap btree tracepoints for widening
Darrick J. Wong [Tue, 9 Jan 2024 17:40:11 +0000 (09:40 -0800)]
xfs: prepare rmap btree tracepoints for widening

Prepare the rmap btree tracepoints for use with realtime rmap btrees by
making them take the btree cursor object as a parameter.  This will save
us a lot of trouble later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: give rmap btree cursor error tracepoints their own class
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs: give rmap btree cursor error tracepoints their own class

Create a new tracepoint class for btree-related errors, then convert all
the rmap tracepoints to use it.  Also fix the one tracepoint that was
abusing the old class by making it a separate tracepoint.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: attach rtgroup objects to btree cursors
Darrick J. Wong [Tue, 9 Jan 2024 17:43:32 +0000 (09:43 -0800)]
xfs: attach rtgroup objects to btree cursors

Make it so that we can attach realtime group objects to btree cursors.
This will be crucial for enabling rmap btrees in realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_logprint: report realtime EFIs
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs_logprint: report realtime EFIs

Decode the EFI format just enough to report if an EFI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support error injection when freeing rt extents
Darrick J. Wong [Tue, 9 Jan 2024 17:40:10 +0000 (09:40 -0800)]
xfs: support error injection when freeing rt extents

A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent.  Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support logging EFIs for realtime extents
Darrick J. Wong [Tue, 9 Jan 2024 17:43:31 +0000 (09:43 -0800)]
xfs: support logging EFIs for realtime extents

Teach the EFI mechanism how to free realtime extents.  We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.

Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents.  This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases.  Hopefully this
will make it easier to debug filesystem problems.

Previous versions of this patch accomplished this by setting the high
bit in each rt EFI extent.  This was found to be less transparent by
reviewers.

[Contains a bug fix and cleanups from hch]

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move xfs_extent_free_defer_add to xfs_extfree_item.c
Darrick J. Wong [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: move xfs_extent_free_defer_add to xfs_extfree_item.c

Move the code that adds the incore xfs_extent_free_item deferred work
data to a transaction live with the EFI log item code.  This means that
the allocator code no longer has to know about the inner workings of the
EFI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: remove xfs_defer_agfl_block
Christoph Hellwig [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: remove xfs_defer_agfl_block

xfs_free_extent_later can handle the extra AGFL special casing with
very little extra logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: remove duplicate asserts in xfs_defer_extent_free
Christoph Hellwig [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: remove duplicate asserts in xfs_defer_extent_free

The bno/len verification is already done by the calls to
xfs_verify_rtbext / xfs_verify_fsbext, and reporting a corruption error
seem like the better handling than tripping an assert anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: reuse xfs_extent_free_cancel_item
Darrick J. Wong [Tue, 9 Jan 2024 17:40:09 +0000 (09:40 -0800)]
xfs: reuse xfs_extent_free_cancel_item

Reuse xfs_extent_free_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: add a xefi_entry helper
Darrick J. Wong [Tue, 9 Jan 2024 17:40:08 +0000 (09:40 -0800)]
xfs: add a xefi_entry helper

Add a helper to translate from the item list head to the
xfs_extent_free_item structure and use it so shorten assignments and
avoid the need for extra local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: pass the fsbno to xfs_perag_intent_get
Christoph Hellwig [Tue, 9 Jan 2024 17:40:08 +0000 (09:40 -0800)]
xfs: pass the fsbno to xfs_perag_intent_get

All callers of xfs_perag_intent_get have a fsbno and need boilerplate
code to turn that into an agno.  Just pass the fsbno to
xfs_perag_intent_get and look up the agno there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: convert "skip_discard" to a proper flags bitset
Darrick J. Wong [Tue, 9 Jan 2024 17:40:08 +0000 (09:40 -0800)]
xfs: convert "skip_discard" to a proper flags bitset

Convert the boolean to skip discard on free into a proper flags field so
that we can add more flags in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: clean up extent free log intent item tracepoint callsites
Darrick J. Wong [Tue, 9 Jan 2024 17:40:08 +0000 (09:40 -0800)]
xfs: clean up extent free log intent item tracepoint callsites

Pass the incore EFI structure to the tracepoints instead of open-coding
the argument passing.  This cleans up the call sites a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Tue, 9 Jan 2024 17:40:08 +0000 (09:40 -0800)]
xfs: allow inode-based btrees to reserve space in the data device

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: simplify xfs_ag_resv_free signature
Darrick J. Wong [Tue, 9 Jan 2024 17:40:07 +0000 (09:40 -0800)]
xfs: simplify xfs_ag_resv_free signature

It's not possible to fail at increasing fdblocks, so get rid of all the
error returns here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong [Tue, 9 Jan 2024 17:40:07 +0000 (09:40 -0800)]
xfs: update btree keys correctly when _insrec splits an inode root block

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Tue, 9 Jan 2024 17:43:21 +0000 (09:43 -0800)]
xfs: support storing records in the inode core root

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Tue, 9 Jan 2024 17:40:07 +0000 (09:40 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Tue, 9 Jan 2024 17:40:07 +0000 (09:40 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: support leaves in the incore btree root block in xfs_iroot_realloc
Darrick J. Wong [Tue, 9 Jan 2024 17:40:06 +0000 (09:40 -0800)]
xfs: support leaves in the incore btree root block in xfs_iroot_realloc

Add some logic to xfs_iroot_realloc so that we can handle leaf records
in the btree root block correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: generalize the btree root reallocation function
Darrick J. Wong [Tue, 9 Jan 2024 17:43:18 +0000 (09:43 -0800)]
xfs: generalize the btree root reallocation function

In preparation for storing realtime rmap btree roots in an inode fork,
make xfs_iroot_realloc take an ops structure that takes care of all the
btree-specific geometry pieces.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: standardize the btree maxrecs function parameters
Darrick J. Wong [Tue, 9 Jan 2024 17:43:17 +0000 (09:43 -0800)]
xfs: standardize the btree maxrecs function parameters

Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions.  This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: rearrange xfs_iroot_realloc a bit
Darrick J. Wong [Tue, 9 Jan 2024 17:40:06 +0000 (09:40 -0800)]
xfs: rearrange xfs_iroot_realloc a bit

Rearrange the innards of xfs_iroot_realloc so that we can reduce
duplicated code prior to genericizing the function.  No functional
changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: move the zero records logic into xfs_bmap_broot_space_calc
Darrick J. Wong [Tue, 9 Jan 2024 17:40:05 +0000 (09:40 -0800)]
xfs: move the zero records logic into xfs_bmap_broot_space_calc

The bmap btree cannot ever have zero records in an incore btree block.
If the number of records drops to zero, that means we're converting the
fork to extents format and are trying to remove the tree.  This logic
won't hold for the future realtime rmap btree, so move the logic into
the bmbt code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: hoist the code that moves the incore inode fork broot memory
Darrick J. Wong [Tue, 9 Jan 2024 17:43:14 +0000 (09:43 -0800)]
xfs: hoist the code that moves the incore inode fork broot memory

Whenever we change the size of the memory buffer holding an inode fork
btree root block, we have to copy the contents over.  Refactor all this
into a single function that handles both, in preparation for making
xfs_iroot_realloc more generic.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: fix a sloppy memory handling bug in xfs_iroot_realloc
Darrick J. Wong [Tue, 9 Jan 2024 17:40:05 +0000 (09:40 -0800)]
xfs: fix a sloppy memory handling bug in xfs_iroot_realloc

While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block.  However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records.  We /should/ be copying keys.

Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)).  However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: refactor creation of bmap btree roots
Darrick J. Wong [Tue, 9 Jan 2024 17:40:05 +0000 (09:40 -0800)]
xfs: refactor creation of bmap btree roots

Now that we've created inode fork helpers to allocate and free btree
roots, create a new bmap btree helper to create a new bmbt root, and
refactor the extents <-> btree conversion functions to use our new
helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: refactor the allocation and freeing of incore inode fork btree roots
Darrick J. Wong [Tue, 9 Jan 2024 17:43:12 +0000 (09:43 -0800)]
xfs: refactor the allocation and freeing of incore inode fork btree roots

Refactor the code that allocates and freese the incore inode fork btree
roots.  This will help us disentangle some of the weird logic when we're
creating and tearing down inode-based btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: replace shouty XFS_BM{BT,DR} macros
Darrick J. Wong [Tue, 9 Jan 2024 17:43:11 +0000 (09:43 -0800)]
xfs: replace shouty XFS_BM{BT,DR} macros

Replace all the shouty bmap btree and bmap disk root macros with actual
functions, and fix a type handling error in the xattr code that the
macros previously didn't care about.

sed \
 -e 's/XFS_BMBT_BLOCK_LEN/xfs_bmbt_block_len/g' \
 -e 's/XFS_BMBT_REC_ADDR/xfs_bmbt_rec_addr/g' \
 -e 's/XFS_BMBT_KEY_ADDR/xfs_bmbt_key_addr/g' \
 -e 's/XFS_BMBT_PTR_ADDR/xfs_bmbt_ptr_addr/g' \
 -e 's/XFS_BMDR_REC_ADDR/xfs_bmdr_rec_addr/g' \
 -e 's/XFS_BMDR_KEY_ADDR/xfs_bmdr_key_addr/g' \
 -e 's/XFS_BMDR_PTR_ADDR/xfs_bmdr_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_PTR_ADDR/xfs_bmap_broot_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_SPACE_CALC/xfs_bmap_broot_space_calc/g' \
 -e 's/XFS_BMAP_BROOT_SPACE/xfs_bmap_broot_space/g' \
 -e 's/XFS_BMDR_SPACE_CALC/xfs_bmdr_space_calc/g' \
 -e 's/XFS_BMAP_BMDR_SPACE/xfs_bmap_bmdr_space/g' \
 -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: format realtime groups
Darrick J. Wong [Tue, 9 Jan 2024 17:40:04 +0000 (09:40 -0800)]
mkfs: format realtime groups

Create filesystems with the realtime group feature enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: add headers to realtime summary blocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:04 +0000 (09:40 -0800)]
mkfs: add headers to realtime summary blocks

When the rtgroups feature is enabled, format rtsummary blocks with the
appropriate block headers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agomkfs: add headers to realtime bitmap blocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:04 +0000 (09:40 -0800)]
mkfs: add headers to realtime bitmap blocks

When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: use histograms to speed up phase 8 on the realtime volume
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_scrub: use histograms to speed up phase 8 on the realtime volume

Use the same statistical methods that we use on the data volume to
compute the minimum threshold size for fstrims on the realtime volume.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: trim realtime volumes too
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_scrub: trim realtime volumes too

On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume.  This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: call GETFSMAP for each rt group in parallel
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_scrub: call GETFSMAP for each rt group in parallel

If realtime groups are enabled, we should take advantage of the sharding
to speed up the spacemap scans.  Do so by issuing per-rtgroup GETFSMAP
calls.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_scrub: scrub realtime allocation group metadata
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_scrub: scrub realtime allocation group metadata

Scan realtime group metadata as part of phase 2, just like we do for AG
metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_spaceman: report on realtime group health
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_spaceman: report on realtime group health

Add the realtime group status to the health reporting done by
xfs_spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: display rt group in verbose fsmap output
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_io: display rt group in verbose fsmap output

Display the rt group number in the fsmap output, just like we do for
regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: display rt group in verbose bmap output
Darrick J. Wong [Tue, 9 Jan 2024 17:40:02 +0000 (09:40 -0800)]
xfs_io: display rt group in verbose bmap output

Display the rt group number in the bmap -v output, just like we do for
regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: add a command to display realtime group information
Darrick J. Wong [Tue, 9 Jan 2024 17:40:02 +0000 (09:40 -0800)]
xfs_io: add a command to display realtime group information

Add a new 'rginfo' command to xfs_io so that we can display realtime
group geometry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: add a command to display allocation group information
Darrick J. Wong [Tue, 9 Jan 2024 17:40:02 +0000 (09:40 -0800)]
xfs_io: add a command to display allocation group information

Add a new 'aginfo' command to xfs_io so that we can display allocation
group geometry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_io: support scrubbing rtgroup metadata
Darrick J. Wong [Tue, 9 Jan 2024 17:40:02 +0000 (09:40 -0800)]
xfs_io: support scrubbing rtgroup metadata

Support scrubbing all rtgroup metadata with a scrubv call.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_mdrestore: restore rt group superblocks to realtime device
Darrick J. Wong [Tue, 9 Jan 2024 17:40:02 +0000 (09:40 -0800)]
xfs_mdrestore: restore rt group superblocks to realtime device

Support restoring realtime device metadata to the realtime device, if
the dumped filesystem had one.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: dump rt summary blocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:01 +0000 (09:40 -0800)]
xfs_db: dump rt summary blocks

Now that rtsummary blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: dump rt bitmap blocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:01 +0000 (09:40 -0800)]
xfs_db: dump rt bitmap blocks

Now that rtbitmap blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support dumping little-endian values
Darrick J. Wong [Tue, 26 Mar 2024 21:25:28 +0000 (14:25 -0700)]
xfs_db: support dumping little-endian values

Make it so that getbitval can handle little endian numbers.  This will
be used in subsequent patches for dumping rt bitmap words, and for
decoding fsverity descriptors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: hoist bit scraping function
Darrick J. Wong [Tue, 26 Mar 2024 20:23:03 +0000 (13:23 -0700)]
xfs_db: hoist bit scraping function

Hoist the bit scraping code into a separate helper function so that we
can create a _le version later, and also enable some testing functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: metadump realtime devices
Darrick J. Wong [Tue, 9 Jan 2024 17:40:01 +0000 (09:40 -0800)]
xfs_db: metadump realtime devices

Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file.  Currently, this is limited to the rt group
superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: report rtgroups via version command
Darrick J. Wong [Tue, 9 Jan 2024 17:40:01 +0000 (09:40 -0800)]
xfs_db: report rtgroups via version command

Report the rtgroups feature in the version command output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: enable conversion of rt space units
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: enable conversion of rt space units

Teach the xfs_db convert function about realtime extents, blocks, and
realtime group numbers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: implement check for rt superblocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: implement check for rt superblocks

Implement the bare minimum needed to avoid xfs_check regressions when
realtime groups are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: listify the definition of dbm_t
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: listify the definition of dbm_t

Convert this enum definition to a list so that code adding elements to
the enum do not have to reflow the whole thing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support changing the label and uuid of rt superblocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: support changing the label and uuid of rt superblocks

Update the label and uuid commands to change the rt superblocks along
with the filesystem superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: support dumping realtime superblocks
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: support dumping realtime superblocks

Allow debugging of realtime superblocks, and add the relevant fields in
the fs superblock that point us at the existence and location of the rt
supers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_db: listify the definition of enum typnm
Darrick J. Wong [Tue, 9 Jan 2024 17:39:59 +0000 (09:39 -0800)]
xfs_db: listify the definition of enum typnm

Convert the enum definition into a list so that future patches adding
things to enum typnm don't have to reflow the entire thing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: support adding rtgroups to a filesystem
Darrick J. Wong [Tue, 9 Jan 2024 17:39:59 +0000 (09:39 -0800)]
xfs_repair: support adding rtgroups to a filesystem

Allow users to add the rtgroups feature to a filesystem if the
filesystem does not already have a realtime volume attached.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: repair rtsummary block headers
Darrick J. Wong [Tue, 9 Jan 2024 17:39:59 +0000 (09:39 -0800)]
xfs_repair: repair rtsummary block headers

Check and repair the new block headers attached to rtsummary blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: repair rtbitmap block headers
Darrick J. Wong [Tue, 9 Jan 2024 17:39:59 +0000 (09:39 -0800)]
xfs_repair: repair rtbitmap block headers

Check and repair the new block headers attached to rtbitmap blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: improve rtbitmap discrepancy reporting
Darrick J. Wong [Tue, 9 Jan 2024 17:39:59 +0000 (09:39 -0800)]
xfs_repair: improve rtbitmap discrepancy reporting

Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches.  This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs_repair: support realtime groups
Darrick J. Wong [Tue, 9 Jan 2024 17:39:58 +0000 (09:39 -0800)]
xfs_repair: support realtime groups

Support the realtime group feature.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibxfs: implement some sanity checking for enormous rgcount
Darrick J. Wong [Tue, 9 Jan 2024 17:39:58 +0000 (09:39 -0800)]
libxfs: implement some sanity checking for enormous rgcount

Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large.  If the read fails, only load the first
rtgroup and warn the user.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agolibfrog: report rt groups in output
Darrick J. Wong [Tue, 9 Jan 2024 17:39:58 +0000 (09:39 -0800)]
libfrog: report rt groups in output

Report realtime group geometry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
15 months agoxfs: scrub each rtgroup's portion of the rtbitmap separately
Darrick J. Wong [Tue, 9 Jan 2024 17:43:09 +0000 (09:43 -0800)]
xfs: scrub each rtgroup's portion of the rtbitmap separately

Create a new scrub type code so that userspace can scrub each rtgroup's
portion of the rtbitmap file separately.  This reduces the long tail
latency that results from scanning the entire bitmap all at once, and
prepares us for future patchsets, wherein we'll need to be able to lock
a specific rtgroup so that we can rebuild that rtgroup's part of the
rtbitmap contents from the rtgroup's rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>