Darrick J. Wong [Tue, 7 Mar 2023 03:55:54 +0000 (19:55 -0800)]
xfs: use realtime EFI to free extents when realtime rmap is enabled
When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks. xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which means that when rtrmap is enabled, we have to
use realtime EFIs to maintain the expected order.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:53 +0000 (19:55 -0800)]
xfs: wire up a new inode fork type for the realtime rmap
Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:52 +0000 (19:55 -0800)]
xfs: add realtime reverse map inode to metadata directory
Add a metadir path to select the realtime rmap btree inode and load
it at mount time. The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:50 +0000 (19:55 -0800)]
xfs: add a realtime flag to the rmap update log redo items
Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt. We'll
wire up the actual rmap code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:49 +0000 (19:55 -0800)]
xfs: prepare rmap functions to deal with rtrmapbt
Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions. Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:48 +0000 (19:55 -0800)]
xfs: add realtime rmap btree operations
Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:48 +0000 (19:55 -0800)]
xfs: realtime rmap btree transaction reservations
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:47 +0000 (19:55 -0800)]
xfs: define the on-disk realtime rmap btree format
Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:47 +0000 (19:55 -0800)]
xfs: introduce realtime rmap btree definitions
Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:46 +0000 (19:55 -0800)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 16 Nov 2023 20:54:31 +0000 (12:54 -0800)]
xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c
Move the code that adds the incore xfs_rmap_update_item deferred work
data to a transaction live with the RUI log item code. This means that
the rmap code no longer has to know about the inner workings of the RUI
log items.
As a consequence, we can get rid of the _{get,put}_group helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 2 Nov 2023 12:08:08 +0000 (13:08 +0100)]
xfs: simplify usage of the rcur local variable in xfs_rmap_finish_one
Only update rcur when we know the final *pcur value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 2 Nov 2023 12:02:14 +0000 (13:02 +0100)]
xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one
In xfs_rmap_finish_one we known the cursor is non-zero when calling
xfs_rmap_finish_one_cleanup and we pass a 0 error variable. This means
xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor.
Open code that and move xfs_rmap_finish_one_cleanup to
fs/xfs/xfs_rmap_item.c.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor porting changes] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 17 Nov 2023 17:33:04 +0000 (09:33 -0800)]
xfs: add a ri_entry helper
Add a helper to translate from the item list head to the
rmap_intent_item structure and use it so shorten assignments and avoid
the need for extra local variables.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:44 +0000 (19:55 -0800)]
xfs: prepare rmap btree tracepoints for widening
Prepare the rmap btree tracepoints for use with realtime rmap btrees by
making them take the btree cursor object as a parameter. This will save
us a lot of trouble later on.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:43 +0000 (19:55 -0800)]
xfs: give rmap btree cursor error tracepoints their own class
Create a new tracepoint class for btree-related errors, then convert all
the rmap tracepoints to use it. Also fix the one tracepoint that was
abusing the old class by making it a separate tracepoint.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:42 +0000 (19:55 -0800)]
xfs: support error injection when freeing rt extents
A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent. Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:42 +0000 (19:55 -0800)]
xfs: support logging EFIs for realtime extents
Teach the EFI mechanism how to free realtime extents. We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.
Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents. This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases. Hopefully this
will make it easier to debug filesystem problems.
Previous versions of this patch accomplished this by setting the high
bit in each rt EFI extent. This was found to be less transparent by
reviewers.
[Contains a bug fix and cleanups from hch]
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 16 Nov 2023 19:48:37 +0000 (11:48 -0800)]
xfs: move xfs_extent_free_defer_add to xfs_extfree_item.c
Move the code that adds the incore xfs_extent_free_item deferred work
data to a transaction live with the EFI log item code. This means that
the allocator code no longer has to know about the inner workings of the
EFI log items.
As a consequence, we can get rid of the _{get,put}_group helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 16 Nov 2023 00:24:01 +0000 (16:24 -0800)]
xfs: remove duplicate asserts in xfs_defer_extent_free
The bno/len verification is already done by the calls to
xfs_verify_rtbext / xfs_verify_fsbext, and reporting a corruption error
seem like the better handling than tripping an assert anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 17 Nov 2023 17:09:31 +0000 (09:09 -0800)]
xfs: add a xefi_entry helper
Add a helper to translate from the item list head to the
xfs_extent_free_item structure and use it so shorten assignments and
avoid the need for extra local variables.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 2 Nov 2023 06:16:07 +0000 (07:16 +0100)]
xfs: pass the fsbno to xfs_perag_intent_get
All callers of xfs_perag_intent_get have a fsbno and need boilerplate
code to turn that into an agno. Just pass the fsbno to
xfs_perag_intent_get and look up the agno there.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:38 +0000 (19:55 -0800)]
xfs: update btree keys correctly when _insrec splits an inode root block
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block. We don't
pass a pointer to the new block to the caller because that work has
already been done. The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.
Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:38 +0000 (19:55 -0800)]
xfs: support storing records in the inode core root
Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree. This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:37 +0000 (19:55 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function. Remove some unnecessary conditionals and clean
up a few function calls in the new function. Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:37 +0000 (19:55 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function. Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:36 +0000 (19:55 -0800)]
xfs: generalize the btree root reallocation function
In preparation for storing realtime rmap btree roots in an inode fork,
make xfs_iroot_realloc take an ops structure that takes care of all the
btree-specific geometry pieces.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:35 +0000 (19:55 -0800)]
xfs: standardize the btree maxrecs function parameters
Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions. This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:34 +0000 (19:55 -0800)]
xfs: move the zero records logic into xfs_bmap_broot_space_calc
The bmap btree cannot ever have zero records in an incore btree block.
If the number of records drops to zero, that means we're converting the
fork to extents format and are trying to remove the tree. This logic
won't hold for the future realtime rmap btree, so move the logic into
the bmbt code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:33 +0000 (19:55 -0800)]
xfs: hoist the code that moves the incore inode fork broot memory
Whenever we change the size of the memory buffer holding an inode fork
btree root block, we have to copy the contents over. Refactor all this
into a single function that handles both, in preparation for making
xfs_iroot_realloc more generic.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:33 +0000 (19:55 -0800)]
xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block. However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records. We /should/ be copying keys.
Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:32 +0000 (19:55 -0800)]
xfs: refactor creation of bmap btree roots
Now that we've created inode fork helpers to allocate and free btree
roots, create a new bmap btree helper to create a new bmbt root, and
refactor the extents <-> btree conversion functions to use our new
helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:32 +0000 (19:55 -0800)]
xfs: refactor the allocation and freeing of incore inode fork btree roots
Refactor the code that allocates and freese the incore inode fork btree
roots. This will help us disentangle some of the weird logic when we're
creating and tearing down inode-based btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:31 +0000 (19:55 -0800)]
xfs: replace shouty XFS_BM{BT,DR} macros
Replace all the shouty bmap btree and bmap disk root macros with actual
functions, and fix a type handling error in the xattr code that the
macros previously didn't care about.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 3 Jan 2023 19:12:55 +0000 (11:12 -0800)]
xfs_scrub: trim realtime volumes too
On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume. This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 25 Aug 2022 21:43:51 +0000 (14:43 -0700)]
xfs_db: metadump realtime devices
Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file. Currently, this is limited to the rt group
superblocks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Sat, 13 Aug 2022 00:41:32 +0000 (17:41 -0700)]
xfs_db: support dumping realtime superblocks
Allow debugging of realtime superblocks, and add the relevant fields in
the fs superblock that point us at the existence and location of the rt
supers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches. This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 9 Nov 2023 23:34:10 +0000 (15:34 -0800)]
libxfs: implement some sanity checking for enormous rgcount
Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large. If the read fails, only load the first
rtgroup and warn the user.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:29 +0000 (19:55 -0800)]
xfs: scrub each rtgroup's portion of the rtbitmap separately
Create a new scrub type code so that userspace can scrub each rtgroup's
portion of the rtbitmap file separately. This reduces the long tail
latency that results from scanning the entire bitmap all at once, and
prepares us for future patchsets, wherein we'll need to be able to lock
a specific rtgroup so that we can rebuild that rtgroup's part of the
rtbitmap contents from the rtgroup's rmap btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 9 Nov 2023 17:35:51 +0000 (09:35 -0800)]
xfs: use an incore rtgroup rotor for rtpick
During the 6.7 merge window, Linus noticed that the realtime allocator
was doing some sketchy things trying to encode a u64 sequence counter
into the rtbitmap file's atime. The sketchy casting of a struct pointer
to a u64 pointer has subtly broken several times over the past decade as
the codebase has transitioned to using the VFS i_atime field and that
field has changed in size and layout over time.
Since the goal of the rtpick code is to _suggest_ a starting place for
new rt file allocations, the repeated breakage has not resulted in
inconsistent metadata. IOWs, it's a hint.
For rtgroups, we don't need this complex code to cut the rtextents space
into fractions. Add an rtgroup rotor and use that for rtpick, similar
to AG rotoring on the data device. The new rotor does not persist,
which reduces the logging overhead slightly.
Darrick J. Wong [Tue, 7 Mar 2023 03:55:28 +0000 (19:55 -0800)]
xfs: store rtgroup information with a bmap intent
Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped. We will need
this functionality once we start enabling rmap and reflink on the rt
volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:27 +0000 (19:55 -0800)]
xfs: encode the rtsummary in big endian format
Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words. There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file. Encode the summary
information in big endian format, like most of the rest of the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:26 +0000 (19:55 -0800)]
xfs: encode the rtbitmap in little endian format
Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words. There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.
The natural format of a bitmap is (IMHO) little endian, because the byte
offsets of the bitmap data should always increase in step with the
information being indexed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:24 +0000 (19:55 -0800)]
xfs: define locking primitives for realtime groups
Define helper functions to lock all metadata inodes related to a
realtime group. There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:23 +0000 (19:55 -0800)]
xfs: add frextents to the lazysbcounters when rtgroups enabled
Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled. This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:23 +0000 (19:55 -0800)]
xfs: check that rtblock extents do not overlap with the rt group metadata
The ondisk format specifies that the start of each realtime group must
have a superblock so that rt space mappings never cross an rtgroup
boundary. Check that rt block pointers obey this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:20 +0000 (19:55 -0800)]
xfs: write secondary realtime superblocks to disk
Create some library functions to make it easy to update all the
secondary realtime superblocks on disk; this will be used by growfs,
xfs_db, mkfs, and xfs_repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:20 +0000 (19:55 -0800)]
xfs: update primary realtime super every time we update the primary fs super
Every time we update parts of the primary filesystem superblock that are
echoed in the primary rt super, we should update that primary realtime
super. Avoid an ondisk log format change by using ordered buffers to
write the primary rt super.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 7 Mar 2023 03:55:18 +0000 (19:55 -0800)]
xfs: create incore realtime group structures
Create an incore object that will contain information about a realtime
allocation group. This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>