Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super. Avoid
changing the log to support logging to the rt device by using ordered
buffers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes. rt supers are protected by a separate rocompat
bit so that we can leave them off if the rt device is zoned.
Add a xfs_sb_version_hasrtgroups so that xfs_repair knows how to zero
the tail of superblocks.
For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup. The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.
Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well. This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.
This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap that seems like a price
worth paying.
Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.
This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.
Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.
Code using the inodes now needs to operate on a rtgroup. Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.
Add a dynamic lockdep class key for rtgroup inodes. This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order. Each class can have 8 subclasses, and for now we will only have
2 inodes per group. This enables rtgroup order and inode order checks
when nesting ILOCKs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Define helper functions to lock all metadata inodes related to a
realtime group. There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Create an incore object that will contain information about a realtime
allocation group. This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Track the RT summary file size in blocks, just like the RT bitmap
file. While we have users of both units, blocks are used slightly
more often and this matches the bitmap file for consistency.
xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused,
so remove them to simplify refactoring other rtbitmap helpers. They
can be added back or simply open coded when actually needed.
Christoph Hellwig [Wed, 3 Jul 2024 21:21:59 +0000 (14:21 -0700)]
xfs: simplify xfs_rtalloc_query_range
There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t. Pass the rtxnum directly and
simply the interface.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Wed, 3 Jul 2024 21:21:59 +0000 (14:21 -0700)]
xfs: remove xfs_rtb_to_rtxrem
Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem. Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 9 Jul 2024 05:46:15 +0000 (22:46 -0700)]
libxfs: remove duplicate rtalloc declarations in libxfs.h
These already come from xfs_rtbitmap.h.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: libxfs, not xfs] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
When growfs sets an extent size, it doesn't updated the m_rtxblklog and
m_rtxblkmask values, which could lead to incorrect usage of them if they
were set before and can't be used for the new extent size.
Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and
also use it when calculating the new RT geometry instead of disabling
the optimization there.
Christoph Hellwig [Tue, 30 Jul 2024 20:58:38 +0000 (13:58 -0700)]
xfs_repair: refactor generate_rtinfo
Move the allocation of the computed values into generate_rtinfo, and thus
make the variables holding them private in rt.c, and clean up a few
formatting nits.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move functions to fix build errors] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 20:31:50 +0000 (13:31 -0700)]
xfs_repair: stop preallocating blocks in mk_rbmino and mk_rsumino
Now that repair is using libxfs_rtfile_initialize_blocks to write to the
rtbitmap and rtsummary inodes, space allocation is already taken care of
that helper and there is no need to preallocate it. Remove the code to
do so.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 20:31:06 +0000 (13:31 -0700)]
xfs_repair: use libxfs_rtfile_initialize_blocks
Use libxfs_rtfile_initialize_blocks to write the re-computed rtbitmap
and rtsummary contents. This removes duplicate code and prepares for
even more sharing once the rtgroup features adds a metadata header to
the rtbitmap and rtsummary blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 20:25:34 +0000 (13:25 -0700)]
mkfs: use xfs_rtfile_initialize_blocks
Use the new libxfs helper for initializing the rtbitmap/summary files
for rtgroup-enabled file systems. Also skip the zeroing of the blocks
for rtgroup file systems as we'll overwrite every block instantly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 23:54:04 +0000 (16:54 -0700)]
xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
To prepare for being able to join an already locked rtbitmap inode to a
transaction split out separate helpers for joining the transaction from
the locking helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:12 +0000 (10:54 -0700)]
xfs: factor out rtbitmap/summary initialization helpers
Add helpers to libxfs that can be shared by growfs and mkfs for
initializing the rtbitmap and summary, and by passing the optional
data pointer also by repair for rebuilding them. This will become
even more useful when the rtgroups feature adѕ a metadata header
to each block, which means even more shared code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor documentation and data advance tweaks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 18:17:09 +0000 (11:17 -0700)]
xfs: factor out a xfs_validate_rt_geometry helper
Split the RT geometry validation in the early mount code into a
helper than can be reused by repair (from which this code was
apparently originally stolen anyway).
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: u64 return value for calc_rbmblocks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 18:17:09 +0000 (11:17 -0700)]
xfs: remove xfs_validate_rtextents
Replace xfs_validate_rtextents with an open coded check for 0
rtextents. The name for the function implies it does a lot more
than a zero check, which is more obvious when open coded.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:57 +0000 (14:21 -0700)]
xfs_db: access realtime file blocks
Now that we have the ability to point the io cursor at the realtime
device, let's make it so that the "dblock" command can walk the contents
of realtime files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:56 +0000 (14:21 -0700)]
xfs_db: support passing the realtime device to the debugger
Create a new -R flag so that sysadmins can pass the realtime device to
the xfs debugger. Since we can now have superblocks on the rt device,
we need this to be able to inspect/dump/etc.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:56 +0000 (14:21 -0700)]
xfs_repair: allow sysadmins to add metadata directories
Allow the sysadmin to use xfs_repair to upgrade an existing filesystem
to support metadata directories. This will be needed to upgrade
filesystems to support realtime rmap and reflink.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:56 +0000 (14:21 -0700)]
xfs_repair: do not count metadata directory files when doing quotacheck
Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the offline quotacheck code in
xfs_repair and xfs_check to reflect this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:55 +0000 (14:21 -0700)]
xfs_repair: truncate and unmark orphaned metadata inodes
If an inode claims to be a metadata inode but wasn't linked in either
directory tree, remove the attr fork and reset the data fork if the
contents weren't regular extent mappings before moving the inode to the
lost+found.
We don't ifree the inode, because it's possible that the inode was not
actually a metadata inode but simply got corrupted due to bitflips or
something, and we'd rather let the sysadmin examine what's left of the
file instead of photorec'ing it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:55 +0000 (14:21 -0700)]
xfs_repair: drop all the metadata directory files during pass 4
Drop the entire metadata directory tree during pass 4 so that we can
reinitialize the entire tree in phase 6. The existing metadata files
(rtbitmap, rtsummary, quotas) will be reattached to the newly rebuilt
directory tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:55 +0000 (14:21 -0700)]
xfs_repair: metadata dirs are never plausible root dirs
Metadata directories are never candidates to be the root of the
user-accessible directory tree. Update has_plausible_rootdir to ignore
them all, as well as detecting the case where the superblock incorrectly
thinks both trees have the same root.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:55 +0000 (14:21 -0700)]
xfs_repair: adjust keep_fsinos to handle metadata directories
In keep_fsinos, mark the root of the metadata directory tree as inuse.
The realtime bitmap and summary files still come after the root
directories, so this is a fairly simple change to the loop test.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:54 +0000 (14:21 -0700)]
xfs_repair: mark space used by metadata files
Track space used by metadata files as a separate incore extent type.
This ensures that we can warn about cross-linked metadata files, even
though we are going to rebuild the entire metadata directory tree in the
end.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:54 +0000 (14:21 -0700)]
xfs_repair: don't let metadata and regular files mix
Track whether or not inodes thought they were metadata inodes. We
cannot allow metadata inodes to appear in the regular directory tree,
and we cannot allow regular inodes to appear in the metadata directory
tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:54 +0000 (14:21 -0700)]
xfs_repair: rebuild the metadata directory
Check the dirents in metadata directories for problems and repair them
if necessary. Also make sure that the sb-rooted inodes (root, metadir
root, rt bitmap, rt summary) are always allocated in that order.
Note that xfs_repair will always rebuild the metadata directory tree
itself, so we only need to report problems, not fix them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Create a helper function to grab a realtime metadata inode. When
metadir arrives, the bitmap and summary inodes can float, so we'll
turn this function into a "load or allocate" function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:52 +0000 (14:21 -0700)]
xfs_repair: dont check metadata directory dirent inumbers
Phase 6 always rebuilds the entire metadata directory tree, and repair
quietly ignores all the DIFLAG2_METADATA directory inodes that it finds.
As a result, none of the metadata directories are marked inuse in the
incore data. Therefore, the is_inode_free checks are not valid for
anything we find in a metadata directory.
Therefore, avoid checking is_inode_free when scanning metadata directory
dirents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:50 +0000 (14:21 -0700)]
xfs_db: disable xfs_check when metadir is enabled
As of July 2024, xfs_repair can detect more types of corruptions than
xfs_check does. I don't think it makes sense to maintain the xfs_check
code anymore, so let's just turn it off for any filesystem that has
metadata directory trees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use. IOWs, check that "/quota/user" in the
metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:02 +0000 (21:11 -0700)]
xfs: adjust xfs_bmap_add_attrfork for metadir
Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers. In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:48 +0000 (14:21 -0700)]
xfs: allow bulkstat to return metadata directories
Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.
(Metadata files of course require per-file scrub code and hence do not
need exposure.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:47 +0000 (14:21 -0700)]
xfs: disable the agi rotor for metadata inodes
Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk. Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log. Therefore, disable
AGI rotoring for metadata inode allocations.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:58 +0000 (21:10 -0700)]
xfs: read and write metadata inode directory tree
Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:57 +0000 (21:10 -0700)]
xfs: enforce metadata inode flag
Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:55 +0000 (21:10 -0700)]
xfs: define the on-disk format for the metadir feature
Define the on-disk layout and feature flags for the metadata inode
directory feature. Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:54 +0000 (21:10 -0700)]
xfs: iget for metadata inodes
Create a xfs_imeta_iget function for metadata inodes to ensure that when
we try to iget a metadata file, the inobt thinks a metadata inode is in
use and that the file type matches what we are expecting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 3 Jul 2024 21:21:45 +0000 (14:21 -0700)]
xfs: pass the icreate args object to xfs_dialloc
Pass the xfs_icreate_args object to xfs_dialloc since we can extract the
relevant mode (really just the file type) and parent inumber from there.
This simplifies the calling convention in preparation for the next
patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>