Christoph Hellwig [Sun, 4 Aug 2024 09:51:16 +0000 (11:51 +0200)]
xfs: return -ENOENT when trying to scrub non-existing rtgroup
Provide a fallback for scrub code trying to scrub RTG 0 when it doesn't
actually exist for a file system with the RTGROUPS feature bit, but without
any RT extents.
Darrick J. Wong [Wed, 26 Jun 2024 18:19:00 +0000 (11:19 -0700)]
xfs: don't coalesce file mappings that cross allocation group boundaries
The bmbt scrubber will combine file mappings if they are mergeable to
reduce the number of cross-referencing checks. However, we shouldn't
combine mappings that cross rt group boundaries because that will cause
verifiers to trip incorrectly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 5 Aug 2024 20:32:55 +0000 (13:32 -0700)]
xfs: make the RT allocator rtgroup aware
Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise. A simple rotor is
provided to pick the placement for initial allocations.
Christoph Hellwig [Mon, 8 Jul 2024 21:38:06 +0000 (14:38 -0700)]
xfs: don't merge ioends across RTGs
Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries. Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 1 Aug 2024 14:35:00 +0000 (16:35 +0200)]
xfs: use realtime EFI to free extents when rtgroups are enabled
When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks. When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.
For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs. Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 2 Aug 2024 04:20:09 +0000 (06:20 +0200)]
xfs: support error injection when freeing rt extents
A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent. Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:39 +0000 (21:11 -0700)]
xfs: support logging EFIs for realtime extents
Teach the EFI mechanism how to free realtime extents. We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.
Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents. This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases. Hopefully this
will make it easier to debug filesystem problems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:22 +0000 (21:11 -0700)]
xfs: force swapext to a realtime file to use the file content exchange ioctl
xfs_swap_extent_rmap does not use log items to track the overall
progress of an attempt to swap the extent mappings between two files.
If the system crashes in the middle of swapping a partially written
realtime extent, the mapping will be left in an inconsistent state
wherein a file can point to multiple extents on the rt volume.
The new file range exchange functionality handles this correctly, so all
callers must upgrade to that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:21 +0000 (21:11 -0700)]
xfs: store rtgroup information with a bmap intent
Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped. We will need
this functionality once we start enabling rmap and reflink on the rt
volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:21 +0000 (21:11 -0700)]
xfs: encode the rtsummary in big endian format
Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words. There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file. Encode the summary
information in big endian format, like most of the rest of the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:19 +0000 (21:11 -0700)]
xfs: encode the rtbitmap in big endian format
Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words. There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:17 +0000 (21:11 -0700)]
xfs: add frextents to the lazysbcounters when rtgroups enabled
Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled. This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 3 Oct 2023 08:26:14 +0000 (10:26 +0200)]
xfs: add a helper to prevent bmap merges across rtgroup boundaries
Except for the rt superblock, realtime groups do not store any metadata
at the start (or end) of the group. There is nothing to prevent the
bmap code from merging allocations from multiple groups into a single
bmap record. Add a helper to check for this case.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: massage the commit message after pulling this into rtgroups] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 1 Aug 2024 11:32:34 +0000 (13:32 +0200)]
xfs: update realtime super every time we update the primary fs super
Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super. Avoid
changing the log to support logging to the rt device by using ordered
buffers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:13 +0000 (21:11 -0700)]
xfs: check the realtime superblock at mount time
Check the realtime superblock at mount time, to ensure that the label
and uuids actually match the primary superblock on the data device. If
the rt superblock is good, attach it to the xfs_mount so that the log
can use ordered buffers to keep this primary in sync with the primary
super on the data device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Sun, 4 Aug 2024 09:01:47 +0000 (11:01 +0200)]
xfs: define the format of rt groups
Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes. rt supers are protected by a separate rocompat
bit so that we can leave them off if the rt device is zoned.
Add a xfs_sb_version_hasrtgroups so that xfs_repair knows how to zero
the tail of superblocks.
For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 5 Aug 2024 05:01:11 +0000 (07:01 +0200)]
xfs: make RT extent numbers relative to the rtgroup
To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup. The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.
Christoph Hellwig [Sun, 4 Aug 2024 20:00:10 +0000 (22:00 +0200)]
xfs: refactor xfs_rtsummary_blockcount
Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well. This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.
This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap that seems like a price
worth paying.
Christoph Hellwig [Fri, 2 Aug 2024 03:51:57 +0000 (05:51 +0200)]
xfs: refactor xfs_rtbitmap_blockcount
Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.
This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.
Christoph Hellwig [Sun, 4 Aug 2024 09:01:18 +0000 (11:01 +0200)]
xfs: factor out a xfs_growfs_rt_alloc_fake_mount helper
Split the code to set up a fake mount point to calculate new RT
geometry out of xfs_growfs_rt_bmblock so that it can be reused.
Note that this changes the rmblocks calculation method to be based
on the passed in rblocks and extsize and not the explicitly passed
one, but both methods will always lead to the same result. The new
version just does a little bit more math while being more general.
Christoph Hellwig [Fri, 2 Aug 2024 03:28:04 +0000 (05:28 +0200)]
xfs: calculate RT bitmap and summary blocks based on sb_rextents
Use the on-disk rextents to calculate the bitmap and summary blocks
instead of the calculated one so that we can refactor the helpers for
calculating them.
As the RT bitmap and summary scrubbers already check that sb_rextents
match the block count this does not change coverage of the scrubber.
Christoph Hellwig [Sun, 4 Aug 2024 20:11:28 +0000 (22:11 +0200)]
xfs: move RT bitmap and summary information to the rtgroup
Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.
Code using the inodes now needs to operate on a rtgroup. Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.
Create a pair of helpers to deal with setting up the necessary incore
context to check metadata records against the realtime metadata. Right
now this is limited to locking the realtime bitmap and summary inodes,
but as we add rmap and reflink to the realtime device this will grow to
include btree cursors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 1 Aug 2024 04:38:53 +0000 (21:38 -0700)]
xfs: add a lockdep class key for rtgroup inodes
Add a dynamic lockdep class key for rtgroup inodes. This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order. Each class can have 8 subclasses, and for now we will only have
2 inodes per group. This enables rtgroup order and inode order checks
when nesting ILOCKs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 1 Aug 2024 11:28:32 +0000 (13:28 +0200)]
xfs: define locking primitives for realtime groups
Define helper functions to lock all metadata inodes related to a
realtime group. There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Sun, 4 Aug 2024 09:00:47 +0000 (11:00 +0200)]
xfs: create incore realtime group structures
Create an incore object that will contain information about a realtime
allocation group. This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 8 Jul 2024 21:36:01 +0000 (14:36 -0700)]
iomap: add a merge boundary flag
File systems might have boundaries over which merges aren't possible.
In fact these are very common, although most of the time some kind of
header at the beginning of this region (e.g. XFS alloation groups, ext4
block groups) automatically create a merge barrier. But if that is
not present, say for a device purely used for data we need to manually
communicate that to iomap.
Add a IOMAP_F_BOUNDARY flag to never merge I/O into a previous mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:56 +0000 (21:11 -0700)]
xfs: rearrange xfs_fsmap.c a little bit
The order of the functions in this file has gotten a little confusing
over the years. Specifically, the two data device implementations
(bnobt and rmapbt) could be adjacent in the source code instead of split
in two by the logdev and rtdev fsmap implementations. We're about to
add more functionality to this file, so rearrange things now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 4 Aug 2024 09:00:12 +0000 (11:00 +0200)]
xfs: replace m_rsumsize with m_rsumblocks
Track the RT summary file size in blocks, just like the RT bitmap
file. While we have users of both units, blocks are used slightly
more often and this matches the bitmap file for consistency.
Christoph Hellwig [Fri, 2 Aug 2024 02:51:04 +0000 (04:51 +0200)]
xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused,
so remove them to simplify refactoring other rtbitmap helpers. They
can be added back or simply open coded when actually needed.
Christoph Hellwig [Thu, 1 Aug 2024 18:40:48 +0000 (20:40 +0200)]
xfs: make the rtalloc start hint a xfs_rtblock_t
0 is a valid start RT extent, and with pending changes it will become
both more common and non-unique. Switch to pass a xfs_rtblock_t instead
so that we can use NULLRTBLOCK to determine if a hint was set or not.
Christoph Hellwig [Wed, 31 Jul 2024 19:37:18 +0000 (12:37 -0700)]
xfs: rework the rtalloc fallback handling
xfs_rtallocate currently has two fallbacks, when an allocation fails:
1) drop the requested extent size alignment, if any, and retry
2) ignore the locality hint
Oddly enough it does those in order, as trying a different location
is more in line with what the user asked for, and does it in a very
unstructured way.
Lift the fallback to try to allocate without the locality hint into
xfs_rtallocate to both perform them in a more sensible order and to
clean up the code.
Christoph Hellwig [Wed, 31 Jul 2024 18:02:54 +0000 (11:02 -0700)]
xfs: factor out a xfs_rtallocate helper
Split out a helper from xfs_rtallocate that performs the actual
allocation. This keeps the scope of the xfs_rtalloc_args structure
contained, and prepares for rtgroups support.
Christoph Hellwig [Mon, 1 Jul 2024 18:57:44 +0000 (11:57 -0700)]
xfs: simplify xfs_rtalloc_query_range
There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t. Pass the rtxnum directly and
simply the interface.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 1 Jul 2024 18:55:19 +0000 (11:55 -0700)]
xfs: remove xfs_rtb_to_rtxrem
Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem. Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Sat, 29 Jun 2024 16:14:53 +0000 (09:14 -0700)]
xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
This function tries to find a suitable free space extent starting from
a particular rtbitmap block. Some time ago, I added a clamping function
to prevent the free space scans from running off the end of the bitmap,
but I didn't quite get the logic right.
Let's say there's an allocation request with a minlen of 5 and a maxlen
of 32 and we're scanning the last rtbitmap block. If we come within 4
rtx of the end of the rt volume, maxlen will get clamped to 4. If the
next 3 rtx are free, we could have satisfied the allocation, but the
code setting partial besti/bestlen for "minlen < maxlen" will think that
we're doing a non-variable allocation and ignore it.
The root of this problem is overwriting maxlen; I should have stuffed
the results in a different variable, which would not have introduced
this bug.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 2 Aug 2024 06:51:05 +0000 (08:51 +0200)]
xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
The near rt allocator employs two allocation strategies -- first it
tries to allocate at exactly @start. If that fails, it will pivot back
and forth around that starting point looking for an appropriately sized
free space.
However, I clamped maxlen ages ago to prevent the exact allocation scan
from running off the end of the rt volume. This, I realize, was
excessive. If the allocation request is (say) for 32 rtx but the start
position is 5 rtx from the end of the volume, we clamp maxlen to 5. If
the exact allocation fails, we then pivot back and forth looking for 5
rtx, even though the original intent was to try to get 32 rtx.
If we then find 5 rtx when we could have gotten 32 rtx, we've not done
as well as we could have. This may be moot if the caller immediately
comes back for more space, but it might not be. Either way, we can do
better here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 28 Jun 2024 16:40:47 +0000 (09:40 -0700)]
xfs: clean up xfs_rtallocate_extent_exact a bit
Before we start doing more surgery on the rt allocator, let's clean up
the exact allocator so that it doesn't change its arguments and uses the
helper introduced in the previous patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 28 Jun 2024 16:24:31 +0000 (09:24 -0700)]
xfs: refactor aligning bestlen to prod
There are two places in xfs_rtalloc.c where we want to make sure that a
count of rt extents is aligned with a particular prod(uct) factor. In
one spot, we actually use rounddown(), albeit unnecessarily if prod < 2.
In the other case, we open-code this rounding inefficiently by promoting
the 32-bit length value to a 64-bit value and then performing a 64-bit
division to figure out the subtraction.
Refactor this into a single helper that uses the correct types and
division method for the type, and skips the division entirely unless
prod is large enough to make a difference.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 28 Jun 2024 16:08:07 +0000 (09:08 -0700)]
xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
The loop conditional here is not quite correct because an rtbitmap block
can represent rtextents beyond the end of the rt volume. There's no way
that it makes sense to scan for free space beyond EOFS, so don't do it.
This overrun has been present since v2.6.0.
Also fix the type of bestlen, which was incorrectly converted.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 28 Jun 2024 16:02:36 +0000 (09:02 -0700)]
xfs: don't return too-short extents from xfs_rtallocate_extent_block
If xfs_rtallocate_extent_block is asked for a variable-sized allocation,
it will try to return the best-sized free extent, which is apparently
the largest one that it finds starting in this rtbitmap block. It will
then trim the size of the extent as needed to align it with prod.
However, it misses one thing -- rounding down the best-fit candidate to
the required alignment could make the extent shorter than minlen. In
the case where minlen > 1, we'd rather the caller relaxed its alignment
requirements and tried again, as the allocator already supports that.
Returning a too-short extent that causes xfs_bmapi_write to return
ENOSR if there aren't enough nmaps to handle multiple new allocations,
which can then cause filesystem shutdowns.
I haven't seen this happen on any production systems, but then I don't
think it's very common to set a per-file extent size hint on realtime
files. I tripped it while working on the rtgroups feature and pounding
on the realtime allocator enthusiastically.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 4 Aug 2024 08:58:24 +0000 (10:58 +0200)]
xfs: ensure rtx mask/shift are correct after growfs
When growfs sets an extent size, it doesn't updated the m_rtxblklog and
m_rtxblkmask values, which could lead to incorrect usage of them if they
were set before and can't be used for the new extent size.
Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and
also use it when calculating the new RT geometry instead of disabling
the optimization there.
Christoph Hellwig [Sun, 4 Aug 2024 08:54:39 +0000 (10:54 +0200)]
xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
After going great length to calculate the transaction reservation for
the new geometry, we should also use it to allocate the transaction it
was calculated for.
Fixes: 578bd4ce7100 ("xfs: recompute growfsrtfree transaction reservation while growing rt volume") Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Tue, 30 Jul 2024 23:54:04 +0000 (16:54 -0700)]
xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
To prepare for being able to join an already locked rtbitmap inode to a
transaction split out separate helpers for joining the transaction from
the locking helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:12 +0000 (10:54 -0700)]
xfs: factor out rtbitmap/summary initialization helpers
Add helpers to libxfs that can be shared by growfs and mkfs for
initializing the rtbitmap and summary, and by passing the optional
data pointer also by repair for rebuilding them. This will become
even more useful when the rtgroups feature adѕ a metadata header
to each block, which means even more shared code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor documentation and data advance tweaks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:11 +0000 (10:54 -0700)]
xfs: factor out a xfs_growfs_rt_bmblock helper
Add a helper to contain the per-rtbitmap block logic in xfs_growfs_rt.
Note that this helper now allocates a new fake mount structure for
each rtbitmap block iteration instead of reusing the memory for an
entire growfs call. Compared to all the other work done when freeing
the blocks the overhead for this is in the noise and it keeps the code
nicely modular.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:10 +0000 (10:54 -0700)]
xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
Currently the various low-level RT allocator functions call into
xfs_rtallocate_range directly, which ties them into the locking protocol
for the RT bitmap. As these helpers already return the allocated range,
lift the call to xfs_rtallocate_range into xfs_bmap_rtalloc so that it
happens as high as possible in the stack, which will simplify future
changes to the locking protocol.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:09 +0000 (10:54 -0700)]
xfs: cleanup the calling convention for xfs_rtpick_extent
xfs_rtpick_extent never returns an error. Do away with the error return
and directly return the picked extent instead of doing that through a
call by reference argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:06 +0000 (10:54 -0700)]
xfs: make the RT rsum_cache mandatory
Currently the RT mount code simply ignores an allocation failure for the
rsum_cache. The code mostly works fine with it, but not having it leads
to nasty corner cases in the growfs code that we don't really handle
well. Switch to failing the mount if we can't allocate the memory, the
file system would not exactly be useful in such a constrained environment
to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:05 +0000 (10:54 -0700)]
xfs: factor out a xfs_validate_rt_geometry helper
Split the RT geometry validation in the early mount code into a
helper than can be reused by repair (from which this code was
apparently originally stolen anyway).
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: u64 return value for calc_rbmblocks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 30 Jul 2024 17:54:04 +0000 (10:54 -0700)]
xfs: remove xfs_validate_rtextents
Replace xfs_validate_rtextents with an open coded check for 0
rtextents. The name for the function implies it does a lot more
than a zero check, which is more obvious when open coded.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 14 Jun 2024 21:07:48 +0000 (14:07 -0700)]
xfs: confirm dotdot target before replacing it during a repair
xfs_dir_replace trips an assertion if you tell it to change a dirent to
point to an inumber that it already points at. Look up the dotdot entry
directly to confirm that we need to make a change.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use. IOWs, check that "/quota/user" in the
metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:06 +0000 (21:11 -0700)]
xfs: move repair temporary files to the metadata directory tree
Due to resource acquisition rules, we have to create the ondisk
temporary files used to stage a filesystem repair before we can acquire
a reference to the inode that we actually want to repair. Therefore,
we do not know at tempfile creation time whether the tempfile will
belong to the regular directory tree or the metadata directory tree.
This distinction becomes important when the swapext code tries to figure
out the quota accounting of the two files whose mappings are being
swapped. The swapext code assumes that accounting updates are required
for a file if dqattach attaches dquots. Metadir files are never
accounted in quota, which means that swapext must not update the quota
accounting when swapping in a repaired directory/xattr/rtbitmap structure.
Prior to the swapext call, therefore, both files must be marked as
METADIR for dqattach so that dqattach will ignore them. Add support for
a repair tempfile to be switched to the metadir tree and switched back
before being released so that ifree will just free the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:06 +0000 (21:11 -0700)]
xfs: check the metadata directory inumber in superblocks
When metadata directories are enabled, make sure that the secondary
superblocks point to the metadata directory. This isn't strictly
required because the secondaries are only used to recover damaged
filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 27 Jun 2024 20:59:09 +0000 (13:59 -0700)]
xfs: adjust parent pointer scrubber for sb-rooted metadata files
Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.
For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent. For
purposes of validating nlink, we pretend that there is a parent pointer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:04 +0000 (21:11 -0700)]
xfs: metadata files can have xattrs if metadir is enabled
If metadata directory trees are enabled, it's possible that some future
metadata file might want to store information in extended attributes.
Or, if parent pointers are enabled, then children of the metadir tree
need parent pointers. Either way, we start allowing xattr data when
metadir is enabled, so we now need check and repair to examine attr
forks for metadata files on metadir filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:04 +0000 (21:11 -0700)]
xfs: do not count metadata directory files when doing online quotacheck
Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the online quotacheck code to
reflect this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 25 Jun 2024 18:50:11 +0000 (11:50 -0700)]
xfs: refactor directory tree root predicates
Metadata directory trees make reasoning about the parent of a file more
difficult. Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock. Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.
Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:02 +0000 (21:11 -0700)]
xfs: adjust xfs_bmap_add_attrfork for metadir
Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers. In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:02 +0000 (21:11 -0700)]
xfs: don't count metadata directory files to quota
Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage. This will also
quiet down complaints about dquot usage not matching du output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:01 +0000 (21:11 -0700)]
xfs: allow bulkstat to return metadata directories
Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.
(Metadata files of course require per-file scrub code and hence do not
need exposure.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:11:00 +0000 (21:11 -0700)]
xfs: hide metadata inodes from everyone because they are special
Metadata inodes are private files and therefore cannot be exposed to
userspace. This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs. As such, we mark
them S_PRIVATE, which stops all that.
While we're at it, put them in a separate lockdep class so that it won't
get confused by "recursive" i_rwsem locking such as what happens when we
write to a rt file and need to allocate from the rt bitmap file. The
static function that we use to do this will be exported in the rtgroups
patchset.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:59 +0000 (21:10 -0700)]
xfs: disable the agi rotor for metadata inodes
Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk. Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log. Therefore, disable
AGI rotoring for metadata inode allocations.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:58 +0000 (21:10 -0700)]
xfs: read and write metadata inode directory tree
Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:57 +0000 (21:10 -0700)]
xfs: enforce metadata inode flag
Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:55 +0000 (21:10 -0700)]
xfs: define the on-disk format for the metadir feature
Define the on-disk layout and feature flags for the metadata inode
directory feature. Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 29 May 2024 04:10:54 +0000 (21:10 -0700)]
xfs: iget for metadata inodes
Create a xfs_imeta_iget function for metadata inodes to ensure that when
we try to iget a metadata file, the inobt thinks a metadata inode is in
use and that the file type matches what we are expecting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 1 Jul 2024 16:54:51 +0000 (09:54 -0700)]
xfs: pass the icreate args object to xfs_dialloc
Pass the xfs_icreate_args object to xfs_dialloc since we can extract the
relevant mode (really just the file type) and parent inumber from there.
This simplifies the calling convention in preparation for the next
patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 1 Jul 2024 18:39:34 +0000 (11:39 -0700)]
xfs: match on the global RT inode numbers in xfs_is_metadata_inode
Match the inode number instead of the inode pointers, as the inode
pointers in the superblock will go away soon.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: port to my tree, make the parameter a const pointer] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 1 Jul 2024 23:30:24 +0000 (16:30 -0700)]
xfs: attr forks require attr, not attr2
It turns out that I misunderstood the difference between the attr and
attr2 feature bits. "attr" means that at some point an attr fork was
created somewhere in the filesystem. "attr2" means that inodes have
variable-sized forks, but says nothing about whether or not there
actually /are/ attr forks in the system.
If we have an attr fork, we only need to check that attr is set.
Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <djwong@kernel.org>