Darrick J. Wong [Tue, 9 Jan 2024 17:40:05 +0000 (09:40 -0800)]
xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block. However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records. We /should/ be copying keys.
Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:40:05 +0000 (09:40 -0800)]
xfs: refactor creation of bmap btree roots
Now that we've created inode fork helpers to allocate and free btree
roots, create a new bmap btree helper to create a new bmbt root, and
refactor the extents <-> btree conversion functions to use our new
helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:43:12 +0000 (09:43 -0800)]
xfs: refactor the allocation and freeing of incore inode fork btree roots
Refactor the code that allocates and freese the incore inode fork btree
roots. This will help us disentangle some of the weird logic when we're
creating and tearing down inode-based btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:43:11 +0000 (09:43 -0800)]
xfs: replace shouty XFS_BM{BT,DR} macros
Replace all the shouty bmap btree and bmap disk root macros with actual
functions, and fix a type handling error in the xattr code that the
macros previously didn't care about.
Darrick J. Wong [Tue, 9 Jan 2024 17:40:03 +0000 (09:40 -0800)]
xfs_scrub: trim realtime volumes too
On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume. This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 26 Mar 2024 21:25:28 +0000 (14:25 -0700)]
xfs_db: support dumping little-endian values
Make it so that getbitval can handle little endian numbers. This will
be used in subsequent patches for dumping rt bitmap words, and for
decoding fsverity descriptors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:40:01 +0000 (09:40 -0800)]
xfs_db: metadump realtime devices
Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file. Currently, this is limited to the rt group
superblocks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:40:00 +0000 (09:40 -0800)]
xfs_db: support dumping realtime superblocks
Allow debugging of realtime superblocks, and add the relevant fields in
the fs superblock that point us at the existence and location of the rt
supers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches. This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:58 +0000 (09:39 -0800)]
libxfs: implement some sanity checking for enormous rgcount
Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large. If the read fails, only load the first
rtgroup and warn the user.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:43:09 +0000 (09:43 -0800)]
xfs: scrub each rtgroup's portion of the rtbitmap separately
Create a new scrub type code so that userspace can scrub each rtgroup's
portion of the rtbitmap file separately. This reduces the long tail
latency that results from scanning the entire bitmap all at once, and
prepares us for future patchsets, wherein we'll need to be able to lock
a specific rtgroup so that we can rebuild that rtgroup's part of the
rtbitmap contents from the rtgroup's rmap btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:57 +0000 (09:39 -0800)]
xfs: use an incore rtgroup rotor for rtpick
During the 6.7 merge window, Linus noticed that the realtime allocator
was doing some sketchy things trying to encode a u64 sequence counter
into the rtbitmap file's atime. The sketchy casting of a struct pointer
to a u64 pointer has subtly broken several times over the past decade as
the codebase has transitioned to using the VFS i_atime field and that
field has changed in size and layout over time.
Since the goal of the rtpick code is to _suggest_ a starting place for
new rt file allocations, the repeated breakage has not resulted in
inconsistent metadata. IOWs, it's a hint.
For rtgroups, we don't need this complex code to cut the rtextents space
into fractions. Add an rtgroup rotor and use that for rtpick, similar
to AG rotoring on the data device. The new rotor does not persist,
which reduces the logging overhead slightly.
Darrick J. Wong [Tue, 9 Jan 2024 17:39:57 +0000 (09:39 -0800)]
xfs: store rtgroup information with a bmap intent
Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped. We will need
this functionality once we start enabling rmap and reflink on the rt
volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:57 +0000 (09:39 -0800)]
xfs: encode the rtsummary in big endian format
Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words. There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file. Encode the summary
information in big endian format, like most of the rest of the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:56 +0000 (09:39 -0800)]
xfs: encode the rtbitmap in little endian format
Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words. There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.
The natural format of a bitmap is (IMHO) little endian, because the byte
offsets of the bitmap data should always increase in step with the
information being indexed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:56 +0000 (09:39 -0800)]
xfs: define locking primitives for realtime groups
Define helper functions to lock all metadata inodes related to a
realtime group. There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:55 +0000 (09:39 -0800)]
xfs: add frextents to the lazysbcounters when rtgroups enabled
Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled. This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:55 +0000 (09:39 -0800)]
xfs: check that rtblock extents do not overlap with the rt group metadata
The ondisk format specifies that the start of each realtime group must
have a superblock so that rt space mappings never cross an rtgroup
boundary. Check that rt block pointers obey this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:54 +0000 (09:39 -0800)]
xfs: write secondary realtime superblocks to disk
Create some library functions to make it easy to update all the
secondary realtime superblocks on disk; this will be used by growfs,
xfs_db, mkfs, and xfs_repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:42:55 +0000 (09:42 -0800)]
xfs: update primary realtime super every time we update the primary fs super
Every time we update parts of the primary filesystem superblock that are
echoed in the primary rt super, we should update that primary realtime
super. Avoid an ondisk log format change by using ordered buffers to
write the primary rt super.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:42:54 +0000 (09:42 -0800)]
xfs: reduce rt summary file levels for rtgroups filesystems
The rt summary file is supposed to be large enough to track the number
of log2(rtextentcount) free space extents that start in a given rt
bitmap block. Prior to rt groups, there could be a single 2^52 block
free extent, which implies a summary file with 53 levels.
However, each rtgroup uses its first rt extent to hold a superblock,
so there can't be any free extents longer than the length of a group.
Groups are limited to 2^32-1 blocks, which means that the longest
freespace will be counted in level 31. Hence we only need 32 levels.
Adjust the rextslog computation to create smaller rt summary files for
rtgroups filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:42:52 +0000 (09:42 -0800)]
xfs: create incore realtime group structures
Create an incore object that will contain information about a realtime
allocation group. This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:53 +0000 (09:39 -0800)]
xfs: refactor realtime inode locking
Create helper functions to deal with locking realtime metadata inodes.
This enables us to maintain correct locking order once we start adding
the realtime rmap and refcount btree inodes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:52 +0000 (09:39 -0800)]
xfs_db: access realtime file blocks
Now that we have the ability to point the io cursor at the realtime
device, let's make it so that the "dblock" command can walk the contents
of realtime files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:51 +0000 (09:39 -0800)]
xfs_db: support passing the realtime device to the debugger
Create a new -R flag so that sysadmins can pass the realtime device to
the xfs debugger. Since we can now have superblocks on the rt device,
we need this to be able to inspect/dump/etc.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:50 +0000 (09:39 -0800)]
xfs_repair: allow sysadmins to add metadata directories
Allow the sysadmin to use xfs_repair to upgrade an existing filesystem
to support metadata directories. This will be needed to upgrade
filesystems to support realtime rmap and reflink.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:50 +0000 (09:39 -0800)]
xfs_repair: do not count metadata directory files when doing quotacheck
Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the offline quotacheck code in
xfs_repair and xfs_check to reflect this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:50 +0000 (09:39 -0800)]
xfs_repair: truncate and unmark orphaned metadata inodes
If an inode claims to be a metadata inode but wasn't linked in either
directory tree, remove the attr fork and reset the data fork if the
contents weren't regular extent mappings before moving the inode to the
lost+found.
We don't ifree the inode, because it's possible that the inode was not
actually a metadata inode but simply got corrupted due to bitflips or
something, and we'd rather let the sysadmin examine what's left of the
file instead of photorec'ing it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:50 +0000 (09:39 -0800)]
xfs_repair: drop all the metadata directory files during pass 4
Drop the entire metadata directory tree during pass 4 so that we can
reinitialize the entire tree in phase 6. The existing metadata files
(rtbitmap, rtsummary, quotas) will be reattached to the newly rebuilt
directory tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:49 +0000 (09:39 -0800)]
xfs_repair: metadata dirs are never plausible root dirs
Metadata directories are never candidates to be the root of the
user-accessible directory tree. Update has_plausible_rootdir to ignore
them all, as well as detecting the case where the superblock incorrectly
thinks both trees have the same root.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:49 +0000 (09:39 -0800)]
xfs_repair: adjust keep_fsinos to handle metadata directories
On a filesystem with metadata directories, we only want to automatically
mark the two root directories present because those are the only two
statically allocated inode numbers -- the rt summary inode is now just a
regular file in a directory.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:49 +0000 (09:39 -0800)]
xfs_repair: mark space used by metadata files
Track space used by metadata files as a separate incore extent type.
This ensures that we can warn about cross-linked metadata files, even
though we are going to rebuild the entire metadata directory tree in the
end.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:48 +0000 (09:39 -0800)]
xfs_repair: don't let metadata and regular files mix
Track whether or not inodes thought they were metadata inodes. We
cannot allow metadata inodes to appear in the regular directory tree,
and we cannot allow regular inodes to appear in the metadata directory
tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Create a helper function to grab a realtime metadata inode. When
metadir arrives, the bitmap and summary inodes can float, so we'll
turn this function into a "load or allocate" function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:47 +0000 (09:39 -0800)]
xfs_repair: dont check metadata directory dirent inumbers
Phase 6 always rebuilds the entire metadata directory tree, and repair
quietly ignores all the DIFLAG2_METADATA directory inodes that it finds.
As a result, none of the metadata directories are marked inuse in the
incore data. Therefore, the is_inode_free checks are not valid for
anything we find in a metadata directory.
Therefore, avoid checking is_inode_free when scanning metadata directory
dirents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:47 +0000 (09:39 -0800)]
xfs_repair: reject regular directory dirents that point to metadata fieles
If a directory that's in the regular (non-metadata) directory tree has
an entry that points to a metadata file, trash the dirent. Files are
not allowed to cross between the two trees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:46 +0000 (09:39 -0800)]
xfs_repair: don't zero the incore secondary super when zeroing
If secondary_sb_whack detects nonzero bytes beyond the end of the ondisk
superblock, it will try to zero the end of the ondisk buffer as well as
the incore superblock prior to scan_ag using that incore super to
rewrite the ondisk super.
However, the metadata directory feature adds a sb_metadirino field to
the incore super. On disk, this is stored in the same slot as
sb_rbmino, but we wanted to cache both inumbers incore to minimize the
churn. Therefore, it is now only safe to zero the "end" of an xfs_dsb
buffer, and never an xfs_sb object.
Most of the XFS codebase moved off that second behavior long ago, with
the exception of this one part of repair. The zeroing probably ought to
be turned into explicit logic to zero fields that weren't defined with
the featureset encoded in the primary superblock, but for now we'll
resort to always resetting the values from the xfs_mount's xfs_sb.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Jan 2024 17:39:45 +0000 (09:39 -0800)]
xfs_db: mask superblock fields when metadir feature is enabled
When the metadata directory feature is enabled, mask the superblock
fields (rt, quota inodes) that got migrated to the directory tree.
Similarly, hide the 'metadirino' field when the feature is disabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>