Darrick J. Wong [Mon, 23 Sep 2024 20:42:29 +0000 (13:42 -0700)]
xfs: add realtime refcount btree inode to metadata directory
Add a metadir path to select the realtime refcount btree inode and load
it at mount time. The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:26 +0000 (13:42 -0700)]
xfs: add a realtime flag to the refcount update log redo items
Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt. We'll
wire up the actual refcount code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:25 +0000 (13:42 -0700)]
xfs: prepare refcount functions to deal with rtrefcountbt
Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions. Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.
Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:25 +0000 (13:42 -0700)]
xfs: add realtime refcount btree operations
Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:23 +0000 (13:42 -0700)]
xfs: define the on-disk realtime refcount btree format
Start filling out the rtrefcount btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate refcount btree blocks. This prepares the way for connecting
the btree operations implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Add new realtime refcount btree definitions. The realtime refcount btree
will be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:20 +0000 (13:42 -0700)]
xfs: prepare refcount btree cursor tracepoints for realtime
Rework the refcount btree cursor tracepoints in preparation to handle the
realtime refcount btree cursor. Mostly this involves renaming the field to
"refcbno" and extracting the group number from the cursor when possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:19 +0000 (13:42 -0700)]
xfs: hook live realtime rmap operations during a repair operation
Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:16 +0000 (13:42 -0700)]
xfs: support repairing metadata btrees rooted in metadir inodes
Adapt the repair code so that we can stage a new btree in the data fork
area of a metadir inode and reap the old blocks. We already have nearly
all of the infrastructure; the only parts that were missing were the
metadata inode reservation handling.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:15 +0000 (13:42 -0700)]
xfs: repair rmap btree inodes
Teach the inode repair code how to deal with realtime rmap btree inodes
that won't load properly. This is most likely moot since the filesystem
generally won't mount without the rtrmapbt inodes being usable, but
we'll add this for completeness.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:13 +0000 (13:42 -0700)]
xfs: walk the rt reverse mapping tree when rebuilding rmap
When we're rebuilding the data device rmap, if we encounter an "rmap"
format fork, we have to walk the (realtime) rmap btree inode to build
the appropriate mappings.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:11 +0000 (13:42 -0700)]
xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings
Teach the bmbt scrubber how to perform a comprehensive check that the
rmapbt does not contain /any/ mappings that are not described by bmbt
records when it's dealing with a realtime file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:08 +0000 (13:42 -0700)]
xfs: allow queued realtime intents to drain before scrubbing
When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself. Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.
When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.
Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.
[Contains a few cleanups from hch]
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:07 +0000 (13:42 -0700)]
xfs: fix scrub tracepoints when inode-rooted btrees are involved
Fix a minor mistakes in the scrub tracepoints that can manifest when
inode-rooted btrees are enabled. The existing code worked fine for bmap
btrees, but we should tighten the code up to be less sloppy.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:05 +0000 (13:42 -0700)]
xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs
The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees. Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in those maxlevels.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:03 +0000 (13:42 -0700)]
xfs: wire up rmap map and unmap to the realtime rmapbt
Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks. This enables us to
perform rmap operations against the correct btree.
[Contains a minor bugfix from hch]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:01 +0000 (13:42 -0700)]
xfs: wire up a new inode fork type for the realtime rmap
Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:42:00 +0000 (13:42 -0700)]
xfs: add realtime reverse map inode to metadata directory
Add a metadir path to select the realtime rmap btree inode and load
it at mount time. The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:57 +0000 (13:41 -0700)]
xfs: add a realtime flag to the rmap update log redo items
Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt. We'll
wire up the actual rmap code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:57 +0000 (13:41 -0700)]
xfs: prepare rmap functions to deal with rtrmapbt
Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions. Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:56 +0000 (13:41 -0700)]
xfs: add realtime rmap btree operations
Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:55 +0000 (13:41 -0700)]
xfs: realtime rmap btree transaction reservations
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:54 +0000 (13:41 -0700)]
xfs: define the on-disk realtime rmap btree format
Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:53 +0000 (13:41 -0700)]
xfs: introduce realtime rmap btree definitions
Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:53 +0000 (13:41 -0700)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:52 +0000 (13:41 -0700)]
xfs: prepare rmap btree cursor tracepoints for realtime
Rework the rmap btree cursor tracepoints in preparation to handle the
realtime rmap btree cursor. Mostly this involves renaming the field to
"rmapbno" and extracting the group number from the cursor when possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:50 +0000 (13:41 -0700)]
xfs: update btree keys correctly when _insrec splits an inode root block
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block. We don't
pass a pointer to the new block to the caller because that work has
already been done. The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.
Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:49 +0000 (13:41 -0700)]
xfs: support storing records in the inode core root
Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree. This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:49 +0000 (13:41 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function. Remove some unnecessary conditionals and clean
up a few function calls in the new function. Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:48 +0000 (13:41 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function. Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:47 +0000 (13:41 -0700)]
xfs: tidy up xfs_bmap_broot_realloc a bit
Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller. Also
use the correct bmbt pointer type for the sizeof operation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:45 +0000 (13:41 -0700)]
xfs: make xfs_iroot_realloc take the new numrecs instead of deltas
Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number. This will make the callsites easier to understand.
Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported. This will be addressed in
a subsequent patch.
Return the new btree root to reduce the amount of code clutter.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:45 +0000 (13:41 -0700)]
xfs: refactor the inode fork memory allocation functions
Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function. Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:43 +0000 (13:41 -0700)]
xfs: enable metadata directory feature
Enable the metadata directory feature. With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.
The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist. A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.
For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:20:39 +0000 (13:20 -0700)]
xfs: report realtime block quota limits on realtime directories
On the data device, calling statvfs on a projinherit directory results
in the block and avail counts being curtailed to the project quota block
limits, if any are set. Do the same for realtime files or directories,
only use the project quota rt block limits.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:20:38 +0000 (13:20 -0700)]
xfs: advertise realtime quota support in the xqm stat files
Add a fifth column to this (really old) stat file to advertise that the
kernel supports quota for realtime volumes. This will be used by
fstests to detect kernel support.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:42 +0000 (13:41 -0700)]
xfs: persist quota flags with metadir
It's annoying that one has to keep reminding XFS about what quota
options it should mount with, since the quota flags recording the
previous state are sitting right there in the primary superblock. Even
more strangely, there exists a noquota option to disable quotas
completely, so it's odder still that providing no options is the same as
noquota.
Starting with metadir, let's change the behavior so that if the user
does not specify any quota-related mount options at all, the ondisk
quota flags will be used to bring up quota. In other words, the
filesystem will mount in the same state and with the same functionality
as it had during the last mount.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:40 +0000 (13:41 -0700)]
xfs: use metadir for quota inodes
Store the quota inodes in the /quota metadata directory if metadir is
enabled. This enables us to stop using the sb_[ugp]uotino fields in the
superblock. From this point on, all metadata files will be children of
the metadata directory tree root.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 4 Oct 2024 00:16:49 +0000 (17:16 -0700)]
xfs: use rtgroup busy extent list for FITRIM
For filesystems that have rtgroups and hence use the busy extent list
for freed rt space, use that busy extent list so that FITRIM can issue
discard commands asynchronously without worrying about other callers
accidentally allocating and using space that is being discarded.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 2 Oct 2024 15:52:07 +0000 (08:52 -0700)]
xfs: implement busy extent tracking for rtgroups
For rtgroups filesystems, track newly freed (rt) space through the log
until the rt EFIs have been committed to disk. This way we ensure that
space cannot be reused until all traces of the old owner are gone.
As a fringe benefit, we now support -o discard on the realtime device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 3 Oct 2024 23:09:52 +0000 (16:09 -0700)]
xfs: port the perag discard code to handle generic groups
Port xfs_discard_extents and its tracepoints to handle generic groups
instead of just perags. This is needed to enable busy extent tracking
for rtgroups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 8 Oct 2024 23:41:12 +0000 (16:41 -0700)]
xfs: move the min and max group block numbers to xfs_group
Move the min and max agblock numbers to the generic xfs_group structure
so that we can start building validators for extents within an rtgroup.
While we're at it, use check_add_overflow for the extent length
computation because that has much better overflow checking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 9 Oct 2024 00:43:56 +0000 (17:43 -0700)]
xfs: fix minor bug in xfs_verify_agbno
There's a minor bug in xfs_verify_agbno -- min_block ought to be the
first agblock number in the AG that can be used by non-static metadata.
Unfortunately, we set it to the last agblock of the static metadata.
Fortunately this works due to the <= check, but this isn't technically
correct.
Instead, change the check to < and set it to the next agblock past the
static metadata. This hasn't been an issue up to now, but we're going
to move these things into the generic group struct, and this will cause
problems with rtgroups, where min_block can be zero for an rtgroup.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 3 Oct 2024 20:12:08 +0000 (13:12 -0700)]
xfs: move the group geometry into struct xfs_groups
Add/move the blocks, blklog and blkmask fields to the generic groups
structure so that code can work with AGs and RTGs by just using the
right index into the array.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 30 Sep 2024 20:49:00 +0000 (13:49 -0700)]
xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
Now that we've finished adding allocation groups to the realtime volume,
let's make the file block mapping address (xfs_rtblock_t) a segmented
value just like we do on the data device. This means that group number
and block number conversions can be done with shifting and masking
instead of integer division.
While in theory we could continue caching the rgno shift value in
m_rgblklog, the fact that we now always use the shift value means that
we have an opportunity to increase the redundancy of the rt geometry by
storing it in the ondisk superblock and adding more sb verifier code.
Reuse the space vacated by sb_bad_feature2 to store the rgblklog value.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 30 Sep 2024 20:47:22 +0000 (13:47 -0700)]
xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file mapping block
lengths because the rtb helpers soon will not do the right thing there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 30 Sep 2024 20:43:12 +0000 (13:43 -0700)]
xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file block offsets
because the rtb helpers soon will not do the right thing there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:39 +0000 (13:41 -0700)]
xfs: mask off the rtbitmap and summary inodes when metadir in use
Set the rtbitmap and summary file inumbers to NULLFSINO in the
superblock and make sure they're zeroed whenever we write the superblock
to disk, to mimic mkfs behavior.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:35 +0000 (13:41 -0700)]
xfs: don't coalesce file mappings that cross rtgroup boundaries in scrub
The bmbt scrubber will combine file mappings if they are mergeable to
reduce the number of cross-referencing checks. However, we shouldn't
combine mappings that cross rt group boundaries because that will cause
verifiers to trip incorrectly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 23 Sep 2024 20:41:35 +0000 (13:41 -0700)]
xfs: make the RT allocator rtgroup aware
Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise. A simple rotor is
provided to pick the placement for initial allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 23 Sep 2024 20:41:34 +0000 (13:41 -0700)]
xfs: don't merge ioends across RTGs
Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries. Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:33 +0000 (13:41 -0700)]
xfs: use realtime EFI to free extents when rtgroups are enabled
When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks. When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.
For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs. Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:32 +0000 (13:41 -0700)]
xfs: support error injection when freeing rt extents
A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent. Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:31 +0000 (13:41 -0700)]
xfs: support logging EFIs for realtime extents
Teach the EFI mechanism how to free realtime extents. We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.
Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents. This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases. Hopefully this
will make it easier to debug filesystem problems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:31 +0000 (13:41 -0700)]
xfs: force swapext to a realtime file to use the file content exchange ioctl
xfs_swap_extent_rmap does not use log items to track the overall
progress of an attempt to swap the extent mappings between two files.
If the system crashes in the middle of swapping a partially written
realtime extent, the mapping will be left in an inconsistent state
wherein a file can point to multiple extents on the rt volume.
The new file range exchange functionality handles this correctly, so all
callers must upgrade to that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 4 Oct 2024 17:09:22 +0000 (10:09 -0700)]
xfs: store rtgroup information with a bmap intent
Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped. We will need
this functionality once we start enabling rmap and reflink on the rt
volume. Technically speaking we need it even for !rtgroups filesystems
to prevent the (dummy) rtgroup 0 from going away, even though this will
never happen.
As a bonus, we can rework the xfs_bmap_deferred_class tracepoint to use
the xfs_group object to figure out the type and group number, widen the
group block number field to fit 64-bit quantities, and get rid of the
now redundant opdev and rtblock fields.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:28 +0000 (13:41 -0700)]
xfs: encode the rtsummary in big endian format
Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words. There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file. Encode the summary
information in big endian format, like most of the rest of the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 23 Sep 2024 20:41:27 +0000 (13:41 -0700)]
xfs: encode the rtbitmap in big endian format
Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words. There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>