Christoph Hellwig [Sat, 14 Sep 2024 07:14:17 +0000 (09:14 +0200)]
xfs: store a generic group structure in the intents
Replace the pag and rtg pointers in the extent free, bmap, rmap and
refcount intent structures with a pointer to the generic code, and
the merge the helpers that are identifical now.
Note: there might be opportunity to push this further down and unify
even more code.
Christoph Hellwig [Sat, 14 Sep 2024 06:44:00 +0000 (08:44 +0200)]
xfs: convert busy extent tracking to the generic group structure
Split busy extent tracking from struct xfs_perag into its own private
structure, which can be pointed to by the generic group structure.
Note that this structure is now dynamically allocated instead of embedded
as the upcoming zone XFS code doesn't need it and will also have an
unusually high number of groups due to hardware constraints. Dynamically
allocating the structure this is a big memory saver for this case.
Christoph Hellwig [Thu, 19 Sep 2024 06:28:29 +0000 (08:28 +0200)]
xfs: factor out a generic xfs_group structure
Split the lookup and refcount handling of struct xfs_perag into an
embedded xfs_group structure that can be reused for the upcoming
realtime groups.
It will be extended with more features later.
Note that he xg_type field will only need a single bit even with
realtime group support. For now it fills a hole, but it might be
worth to fold it into another field if we can use this space better.
Christoph Hellwig [Sat, 31 Aug 2024 08:10:49 +0000 (11:10 +0300)]
xfs: convert remaining tracepoints to pass pag structures
Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.
Christoph Hellwig [Sun, 1 Sep 2024 05:39:19 +0000 (08:39 +0300)]
xfs: pass the pag to the xrep_newbt_extent_class tracepoints
This requires moving a few of the callsites a little bit to ensure that
we already have the reference, but allows for the decoding to only happen
when tracing is actually enabled, and cleans up the callsites a bit.
Christoph Hellwig [Sun, 1 Sep 2024 05:34:33 +0000 (08:34 +0300)]
xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points
This requires holding the pag refcount a little longer, but allows for the
decoding to only happen when tracing is actually enabled, and cleans up the
callsites a bit.
Christoph Hellwig [Sun, 1 Sep 2024 05:26:13 +0000 (08:26 +0300)]
xfs: pass objects to the xrep_ibt_walk_rmap tracepoint
Pass the perag structure and the irec so that the decoding is only done
when tracing is actually enabled and the call sites look a lot neater,
and remove the pointless class indirection.
Christoph Hellwig [Sat, 31 Aug 2024 07:39:59 +0000 (10:39 +0300)]
xfs: pass objects to the xfs_irec_merge_{pre,post} tracepoints
Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.
Christoph Hellwig [Thu, 19 Sep 2024 07:02:34 +0000 (09:02 +0200)]
xfs: constify pag arguments to trace points
Trace points never modify their arguments. Mark all the pag objects
passed to trace points. The exception is the xfs_ag_resv_class, which
uses the xfs_perag_resv helper that can't be marked const due to
other users modifying the returned structure.
Christoph Hellwig [Sat, 31 Aug 2024 08:18:02 +0000 (11:18 +0300)]
xfs: remove the agno argument to xfs_free_ag_extent
xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer. Use that instead of passing the redundant argument,
and do the same for the tracepoint.
Christoph Hellwig [Thu, 19 Sep 2024 13:15:10 +0000 (15:15 +0200)]
xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail,
which isn't really helpful during log recovery. Remove the flag and
stick to the default GFP_KERNEL policies.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 1 Sep 2024 08:09:32 +0000 (11:09 +0300)]
xfs: merge the perag freeing helpers
There is no good reason to have two different routines for freeing perag
structures for the unmount and error cases. Add two arguments to specify
the range of AGs to free to xfs_free_perag, and use that to replace
xfs_free_unused_perag_range.
The addition RCU grace period for the error case is harmless, and the
extra check for the AG to actually exist is not required now that the
callers pass the exact known allocated range.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 8 Sep 2024 07:53:41 +0000 (10:53 +0300)]
xfs: pass the exact range to initialize to xfs_initialize_perag
Currently only the new agcount is passed to xfs_initialize_perag, which
requires lookups of existing AGs to skip them and complicates error
handling. Also pass the previous agcount so that the range that
xfs_initialize_perag operates on is exactly defined. That way the
extra lookups can be avoided, and error handling can clean up the
exact range from the old count to the last added perag structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Wed, 28 Aug 2024 04:58:02 +0000 (07:58 +0300)]
xfs: use xas_for_each_marked in xfs_reclaim_inodes_count
xfs_reclaim_inodes_count iterates over all AGs to sum up the reclaimable
inodes counts. There is no point in grabbing a reference to the them or
unlock the RCU critical section for each iteration, so switch to the
more efficient xas_for_each_marked iterator.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Wed, 21 Aug 2024 05:31:46 +0000 (07:31 +0200)]
xfs: simplify tagged perag iteration
Pass the old perag structure to the tagged loop helpers so that they can
grab the old agno before releasing the reference. This removes the need
to separately track the agno and the iterator macro, and thus also
obsoletes the for_each_perag_tag syntactic sugar.
Christoph Hellwig [Wed, 21 Aug 2024 05:27:51 +0000 (07:27 +0200)]
xfs: move the tagged perag lookup helpers to xfs_icache.c
The tagged perag helpers are only used in xfs_icache.c in the kernel code
and not at all in xfsprogs. Move them to xfs_icache.c in preparation for
switching to an xarray, for which I have no plan to implement the tagged
lookup functions for userspace.
Darrick J. Wong [Thu, 15 Aug 2024 18:49:48 +0000 (11:49 -0700)]
xfs: check for shared rt extents when rebuilding rt file's data fork
When we're rebuilding the data fork of a realtime file, we need to
cross-reference each mapping with the rt refcount btree to ensure that
the reflink flag is set if there are any shared extents found.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:45 +0000 (11:49 -0700)]
xfs: walk the rt reference count tree when rebuilding rmap
When we're rebuilding the data device rmap, if we encounter a "refcount"
format fork, we have to walk the (realtime) refcount btree inode to
build the appropriate mappings.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:44 +0000 (11:49 -0700)]
xfs: check new rtbitmap records against rt refcount btree
When we're rebuilding the realtime bitmap, check the proposed free
extents against the rt refcount btree to make sure we don't commit any
grievous errors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:43 +0000 (11:49 -0700)]
xfs: don't flag quota rt block usage on rtreflink filesystems
Quota space usage is allowed to exceed the size of the physical storage
when reflink is enabled. Now that we have reflink for the realtime
volume, apply this same logic to the rtb repair logic.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:41 +0000 (11:49 -0700)]
xfs: detect and repair misaligned rtinherit directory cowextsize hints
If we encounter a directory that has been configured to pass on a CoW
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review and/or turn it off because that is a
misconfiguration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:40 +0000 (11:49 -0700)]
xfs: check reference counts of gaps between rt refcount records
If there's a gap between records in the rt refcount btree, we ought to
cross-reference the gap with the rtrmap records to make sure that there
aren't any overlapping records for a region that doesn't have any shared
ownership.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:35 +0000 (11:49 -0700)]
xfs: check that the rtrefcount maxlevels doesn't increase when growing fs
The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees. Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in the rt refcount btree maxlevels.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:33 +0000 (11:49 -0700)]
xfs: apply rt extent alignment constraints to CoW extsize hint
The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint. Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.
Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:33 +0000 (11:49 -0700)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files. This apparently was done to disable
delayed allocation for realtime files.
However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files. In this case, the
logic will incorrectly send the write through the delalloc write path.
Fix this by adjusting the logic slightly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:29 +0000 (11:49 -0700)]
xfs: enable CoW for realtime data
Update our write paths to support copy on write on the rt volume. This
works in more or less the same way as it does on the data device, with
the major exception that we never do delalloc on the rt volume.
Because we consider unwritten CoW fork staging extents to be incore
quota reservation, we update xfs_quota_reserve_blkres to support this
case. Though xfs doesn't allow rt and quota together, the change is
trivial and we shouldn't leave a logic bomb here.
While we're at it, add a missing xfs_mod_delalloc call when we remove
delalloc block reservation from the inode. This is largely irrelvant
since realtime files do not use delalloc, but we want to avoid leaving
logic bombs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Thu, 15 Aug 2024 18:49:24 +0000 (11:49 -0700)]
xfs: refactor xfs_reflink_find_shared
Move lookup of the perag structure from the callers into the helpers,
and return the offset into the extent of the shared region instead of
the block number that needs post-processing. This prepares the
callsites for the creation of an rt-specific variant in the next patch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: port to the middle of the rtreflink series for cleanliness] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:23 +0000 (11:49 -0700)]
xfs: wire up a new inode fork type for the realtime refcount
Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:22 +0000 (11:49 -0700)]
xfs: add realtime refcount btree inode to metadata directory
Add a metadir path to select the realtime refcount btree inode and load
it at mount time. The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:19 +0000 (11:49 -0700)]
xfs: add a realtime flag to the refcount update log redo items
Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt. We'll
wire up the actual refcount code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:18 +0000 (11:49 -0700)]
xfs: prepare refcount functions to deal with rtrefcountbt
Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions. Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.
Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:17 +0000 (11:49 -0700)]
xfs: add realtime refcount btree operations
Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:16 +0000 (11:49 -0700)]
xfs: define the on-disk realtime refcount btree format
Start filling out the rtrefcount btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate refcount btree blocks. This prepares the way for connecting
the btree operations implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Add new realtime refcount btree definitions. The realtime refcount btree
will be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:13 +0000 (11:49 -0700)]
xfs: prepare refcount btree cursor tracepoints for realtime
Rework the refcount btree cursor tracepoints in preparation to handle the
realtime refcount btree cursor. Mostly this involves renaming the field to
"refcbno" and extracting the group number from the cursor when possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 15 Aug 2024 18:49:12 +0000 (11:49 -0700)]
xfs: hook live realtime rmap operations during a repair operation
Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>