]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
8 months agoxfs_db: support the realtime refcountbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs_db: support the realtime refcountbt

Wire up various parts of xfs_db for realtime refcount support so that we
can dump the rt refcount btree contents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: display the realtime refcount btree contents
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs_db: display the realtime refcount btree contents

Implement all the code we need to dump rtrefcountbt contents, starting
from the inode root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoman: document userspace API changes due to rt reflink
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
man: document userspace API changes due to rt reflink

Update documentation to describe userspace ABI changes made for realtime
reflink support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agolibfrog: enable scrubbing of the realtime refcount data
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
libfrog: enable scrubbing of the realtime refcount data

Add a new entry so that we can scrub the rtrefcountbt and its metadata
directory tree path.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs: scrub the metadir path of rt refcount btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: scrub the realtime refcount btree

Source kernel commit: 844d7f8755a67b01391da92b99a5342c8b2b83f4

Add code to scrub realtime refcount btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
8 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: report realtime refcount btree corruption errors to the health system

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: enable extent size hints for CoW operations

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: recover CoW leftovers in the realtime volume
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: recover CoW leftovers in the realtime volume

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: allow inodes to have the realtime and reflink flags

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: compute rtrmap btree max levels when reflink enabled

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: update rmap to allow cow staging extents in the rt rmap

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: create routine to allocate and initialize a realtime refcount btree inode

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: wire up realtime refcount btree cursors

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: wire up a new inode fork type for the realtime refcount
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: wire up a new inode fork type for the realtime refcount

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: add metadata reservations for realtime refcount btree

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: add realtime refcount btree inode to metadata directory

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: prepare refcount functions to deal with rtrefcountbt

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: add realtime refcount btree operations

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: realtime refcount btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: introduce realtime refcount btree ondisk definitions
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
xfs: introduce realtime refcount btree ondisk definitions

Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
xfs: namespace the maximum length/refcount symbols

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agomkfs: create the realtime rmap inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
mkfs: create the realtime rmap inode

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_logprint: report realtime RUIs
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_logprint: report realtime RUIs

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: reserve per-AG space while rebuilding rt metadata
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: reserve per-AG space while rebuilding rt metadata

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: rebuild the bmap btree for realtime files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: rebuild the bmap btree for realtime files

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: check for global free space concerns with default btree slack levels
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: check for global free space concerns with default btree slack levels

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: rebuild the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: rebuild the realtime rmap btree

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: always check realtime file mappings against incore info
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: always check realtime file mappings against incore info

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: check existing realtime rmapbt entries against observed rmaps
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: check existing realtime rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: find and mark the rtrmapbt inodes
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: find and mark the rtrmapbt inodes

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: refactor realtime inode check
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: refactor realtime inode check

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: create a new set of incore rmap information for rt groups
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: create a new set of incore rmap information for rt groups

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: use realtime rmap btree data to check block types
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: use realtime rmap btree data to check block types

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: flag suspect long-format btree blocks
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: flag suspect long-format btree blocks

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_spaceman: report health status of the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_spaceman: report health status of the realtime rmap btree

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support so that we can
dump the btree contents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the inode root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: don't abort when bmapping on a non-extents/bmbt fork
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: don't abort when bmapping on a non-extents/bmbt fork

We're going to introduce new fork formats, so let's fix the problem that
xfs_db's bmap command aborts when the fork format isn't one of the
existing ones.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoman: document userspace API changes due to rt rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
man: document userspace API changes due to rt rmap

Update documentation to describe userspace ABI changes made for realtime
rmap support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agolibfrog: enable scrubbing of the realtime rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
libfrog: enable scrubbing of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt and its metadata
directory tree path too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: online repair of the realtime rmap btree

Source kernel commit: f813af307d62d4c4d620a358bbd406f89ffdeca2

Repair the realtime rmap btree while mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
8 months agoxfs: online repair of realtime bitmaps for a realtime group
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: online repair of realtime bitmaps for a realtime group

For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:35 +0000 (12:44 -0700)]
xfs: scrub the metadir path of rt rmap btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:35 +0000 (12:44 -0700)]
xfs: scrub the realtime rmapbt

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Tue, 15 Oct 2024 19:44:35 +0000 (12:44 -0700)]
xfs: report realtime rmap btree corruption errors to the health system

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:35 +0000 (12:44 -0700)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:35 +0000 (12:44 -0700)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

[Contains a minor bugfix from hch]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: allow inodes with zero extents but nonzero nblocks
Darrick J. Wong [Tue, 15 Oct 2024 19:44:34 +0000 (12:44 -0700)]
xfs: allow inodes with zero extents but nonzero nblocks

Metadata inodes that store btrees will have zero extents and a nonzero
nblocks.  Adjust the inode verifier so that this combination is not
flagged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: wire up a new inode fork type for the realtime rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:34 +0000 (12:44 -0700)]
xfs: wire up a new inode fork type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Tue, 15 Oct 2024 19:44:34 +0000 (12:44 -0700)]
xfs: add metadata reservations for realtime rmap btrees

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Tue, 15 Oct 2024 19:44:34 +0000 (12:44 -0700)]
xfs: add realtime reverse map inode to metadata directory

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Tue, 15 Oct 2024 19:44:33 +0000 (12:44 -0700)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:33 +0000 (12:44 -0700)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:33 +0000 (12:44 -0700)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:33 +0000 (12:44 -0700)]
xfs: realtime rmap btree transaction reservations

Source kernel commit: 2b08b631d6ad701ba6dda366fde4ae19cb66774a

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
8 months agoxfs: introduce realtime rmap btree ondisk definitions
Darrick J. Wong [Tue, 15 Oct 2024 19:44:33 +0000 (12:44 -0700)]
xfs: introduce realtime rmap btree ondisk definitions

Add the ondisk structure definitions for realtime rmap btrees. The
realtime rmap btree will be rooted from a hidden inode so it needs to
have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate rmap btree
blocks. This prepares the way for connecting the btree operations
implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Tue, 15 Oct 2024 19:44:32 +0000 (12:44 -0700)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Tue, 15 Oct 2024 19:44:32 +0000 (12:44 -0700)]
xfs: allow inode-based btrees to reserve space in the data device

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong [Tue, 15 Oct 2024 19:44:32 +0000 (12:44 -0700)]
xfs: update btree keys correctly when _insrec splits an inode root block

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Tue, 15 Oct 2024 19:44:31 +0000 (12:44 -0700)]
xfs: support storing records in the inode core root

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Tue, 15 Oct 2024 19:44:31 +0000 (12:44 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Tue, 15 Oct 2024 19:44:30 +0000 (12:44 -0700)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: tidy up xfs_bmap_broot_realloc a bit
Darrick J. Wong [Tue, 15 Oct 2024 19:44:30 +0000 (12:44 -0700)]
xfs: tidy up xfs_bmap_broot_realloc a bit

Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller.  Also
use the correct bmbt pointer type for the sizeof operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: make xfs_iroot_realloc a bmap btree function
Darrick J. Wong [Tue, 15 Oct 2024 19:44:29 +0000 (12:44 -0700)]
xfs: make xfs_iroot_realloc a bmap btree function

Move the inode fork btree root reallocation function part of the btree
ops because it's now mostly bmbt-specific code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: make xfs_iroot_realloc take the new numrecs instead of deltas
Darrick J. Wong [Tue, 15 Oct 2024 19:44:29 +0000 (12:44 -0700)]
xfs: make xfs_iroot_realloc take the new numrecs instead of deltas

Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number.  This will make the callsites easier to understand.

Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported.  This will be addressed in
a subsequent patch.

Return the new btree root to reduce the amount of code clutter.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: refactor the inode fork memory allocation functions
Darrick J. Wong [Tue, 15 Oct 2024 19:44:28 +0000 (12:44 -0700)]
xfs: refactor the inode fork memory allocation functions

Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function.  Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: tidy up xfs_iroot_realloc
Darrick J. Wong [Tue, 15 Oct 2024 19:44:28 +0000 (12:44 -0700)]
xfs: tidy up xfs_iroot_realloc

Tidy up this function a bit before we start refactoring the memory
handling and move the function to the bmbt code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: port ondisk structure checks from xfs/122 to the kernel
Darrick J. Wong [Fri, 1 Nov 2024 17:34:36 +0000 (10:34 -0700)]
xfs: port ondisk structure checks from xfs/122 to the kernel

Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122.  Roughly drafted with:

sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort

and then manual fixups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: separate space btree structures in xfs_ondisk.h
Darrick J. Wong [Fri, 1 Nov 2024 17:34:22 +0000 (10:34 -0700)]
xfs: separate space btree structures in xfs_ondisk.h

Create a separate section for space management btrees so that they're
not mixed in with file structures.  Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: convert struct typedefs in xfs_ondisk.h
Darrick J. Wong [Fri, 1 Nov 2024 17:34:21 +0000 (10:34 -0700)]
xfs: convert struct typedefs in xfs_ondisk.h

Replace xfs_foo_t with struct xfs_foo where appropriate.  The next patch
will import more checks from xfs/122, and it's easier to automate
deduplication if we don't have to reason about typedefs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: enable metadata directory feature
Darrick J. Wong [Tue, 15 Oct 2024 19:44:28 +0000 (12:44 -0700)]
xfs: enable metadata directory feature

Enable the metadata directory feature.  With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.

The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist.  A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.

For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 months agomkfs: enable rt quota options
Darrick J. Wong [Tue, 15 Oct 2024 19:44:27 +0000 (12:44 -0700)]
mkfs: enable rt quota options

Now that the kernel supports quota and realtime devices, allow people to
format filesystems with permanent quota options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_quota: report warning limits for realtime space quotas
Darrick J. Wong [Tue, 15 Oct 2024 19:44:27 +0000 (12:44 -0700)]
xfs_quota: report warning limits for realtime space quotas

Report the number of warnings that a user will get for exceeding the
soft limit of a realtime volume.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agomkfs: add quota flags when setting up filesystem
Darrick J. Wong [Tue, 15 Oct 2024 19:44:27 +0000 (12:44 -0700)]
mkfs: add quota flags when setting up filesystem

If we're creating a metadir filesystem, the quota accounting and
enforcement flags persist until the sysadmin changes them.  Add a means
to specify those qflags at format time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: try not to trash qflags on metadir filesystems
Darrick J. Wong [Tue, 15 Oct 2024 19:44:27 +0000 (12:44 -0700)]
xfs_repair: try not to trash qflags on metadir filesystems

Try to preserve the accounting and enforcement quota flags when
repairing filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: support quota inodes in the metadata directory
Darrick J. Wong [Tue, 15 Oct 2024 19:44:27 +0000 (12:44 -0700)]
xfs_repair: support quota inodes in the metadata directory

Handle quota inodes on metadir filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: hoist the secondary sb qflags handling
Darrick J. Wong [Tue, 15 Oct 2024 19:44:26 +0000 (12:44 -0700)]
xfs_repair: hoist the secondary sb qflags handling

Hoist all the secondary superblock qflags and quota inode modification
code into a separate function so that we can disable it in the next
patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_repair: refactor quota inumber handling
Darrick J. Wong [Tue, 15 Oct 2024 19:44:26 +0000 (12:44 -0700)]
xfs_repair: refactor quota inumber handling

In preparation for putting quota files in the metadata directory tree,
refactor repair's quota inumber handling to use its own variables
instead of the xfs_mount's.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_db: support metadir quotas
Darrick J. Wong [Tue, 15 Oct 2024 19:44:26 +0000 (12:44 -0700)]
xfs_db: support metadir quotas

Support finding the quota files in the metadata directory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agolibfrog: scrub quota file metapaths
Darrick J. Wong [Tue, 15 Oct 2024 19:44:26 +0000 (12:44 -0700)]
libfrog: scrub quota file metapaths

Support scrubbing quota file metadir paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs: scrub quota file metapaths
Darrick J. Wong [Tue, 15 Oct 2024 19:39:29 +0000 (12:39 -0700)]
xfs: scrub quota file metapaths

Enable online fsck for quota file metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 months agoxfs: use metadir for quota inodes
Darrick J. Wong [Tue, 15 Oct 2024 19:39:28 +0000 (12:39 -0700)]
xfs: use metadir for quota inodes

Store the quota inodes in the /quota metadata directory if metadir is
enabled.  This enables us to stop using the sb_[ugp]uotino fields in the
superblock.  From this point on, all metadata files will be children of
the metadata directory tree root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 months agomkfs: format realtime groups
Darrick J. Wong [Tue, 15 Oct 2024 19:44:25 +0000 (12:44 -0700)]
mkfs: format realtime groups

Create filesystems with the realtime group feature enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agomkfs: add headers to realtime bitmap blocks
Darrick J. Wong [Tue, 15 Oct 2024 19:44:25 +0000 (12:44 -0700)]
mkfs: add headers to realtime bitmap blocks

When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers.  libxfs takes care of the actual writing for
us, so all we have to do is ensure that the bitmap is the correct size.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_scrub: use histograms to speed up phase 8 on the realtime volume
Darrick J. Wong [Tue, 15 Oct 2024 19:44:25 +0000 (12:44 -0700)]
xfs_scrub: use histograms to speed up phase 8 on the realtime volume

Use the same statistical methods that we use on the data volume to
compute the minimum threshold size for fstrims on the realtime volume.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_scrub: trim realtime volumes too
Darrick J. Wong [Tue, 15 Oct 2024 19:44:25 +0000 (12:44 -0700)]
xfs_scrub: trim realtime volumes too

On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume.  This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_scrub: call GETFSMAP for each rt group in parallel
Darrick J. Wong [Tue, 15 Oct 2024 19:44:25 +0000 (12:44 -0700)]
xfs_scrub: call GETFSMAP for each rt group in parallel

If realtime groups are enabled, we should take advantage of the sharding
to speed up the spacemap scans.  Do so by issuing per-rtgroup GETFSMAP
calls.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_scrub: check rtgroup metadata directory connections
Darrick J. Wong [Tue, 15 Oct 2024 19:44:24 +0000 (12:44 -0700)]
xfs_scrub: check rtgroup metadata directory connections

Run the rtgroup metapath scrubber during phase 5 to ensure that any
rtgroup metadata files are still connected to the metadir tree after
we've pruned any bad links.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_scrub: scrub realtime allocation group metadata
Darrick J. Wong [Tue, 15 Oct 2024 19:44:24 +0000 (12:44 -0700)]
xfs_scrub: scrub realtime allocation group metadata

Scan realtime group metadata as part of phase 2, just like we do for AG
metadata.  For pre-rtgroup filesystems, pretend that this is a "rtgroup
0" scrub request because the kernel expects that.  Replace the old
cond_wait code with a scrub barrier because they're equivalent for two
items that cannot be scrubbed in parallel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_spaceman: report on realtime group health
Darrick J. Wong [Tue, 15 Oct 2024 19:44:24 +0000 (12:44 -0700)]
xfs_spaceman: report on realtime group health

Add the realtime group status to the health reporting done by
xfs_spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_io: display rt group in verbose fsmap output
Darrick J. Wong [Tue, 15 Oct 2024 19:44:24 +0000 (12:44 -0700)]
xfs_io: display rt group in verbose fsmap output

Display the rt group number in the fsmap output, just like we do for
regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_io: display rt group in verbose bmap output
Darrick J. Wong [Tue, 15 Oct 2024 19:44:24 +0000 (12:44 -0700)]
xfs_io: display rt group in verbose bmap output

Display the rt group number in the bmap -v output, just like we do for
regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
8 months agoxfs_io: add a command to display realtime group information
Darrick J. Wong [Tue, 15 Oct 2024 19:44:23 +0000 (12:44 -0700)]
xfs_io: add a command to display realtime group information

Add a new 'rginfo' command to xfs_io so that we can display realtime
group geometry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>