]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
9 months agoscrub: support internal RT sections xfs-zoned-2024-21-07
Christoph Hellwig [Thu, 31 Oct 2024 06:18:14 +0000 (07:18 +0100)]
scrub: support internal RT sections

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: support creating internal RT devices
Christoph Hellwig [Wed, 28 Aug 2024 07:31:26 +0000 (10:31 +0300)]
mkfs: support creating internal RT devices

Default to use all sequential write required zoned for the RT device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agorepair: fix the RT device check in process_dinode_int
Christoph Hellwig [Thu, 31 Oct 2024 04:07:30 +0000 (05:07 +0100)]
repair: fix the RT device check in process_dinode_int

Don't look at the variable for the rtname command line option, but
the actual file system geometry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoFIXUP: xfs: support an internal zoned rtdev
Christoph Hellwig [Tue, 27 Aug 2024 15:21:51 +0000 (18:21 +0300)]
FIXUP: xfs: support an internal zoned rtdev

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support an internal zoned rtdev
Christoph Hellwig [Thu, 31 Oct 2024 17:51:02 +0000 (18:51 +0100)]
xfs: support an internal zoned rtdev

Source kernel commit: bb2d1019d901d96f89e8618cdf46088cb67640bd

Allow creating an RT subvolume on the same device as the main data
data.  This is mostly used for SMR HDDs where the conventional zones
are used for the data device and the sequential write required zones
for the zoned RT section.  One day we should also support the log
on sequential write required zones, but that is not supported here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agouse xfs_rtb_to_daddr in more places
Christoph Hellwig [Tue, 27 Aug 2024 15:20:09 +0000 (18:20 +0300)]
use xfs_rtb_to_daddr in more places

Fix up various places to use the proper helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agorepair: also update the last written flag when rebuilding the rmap inode
Christoph Hellwig [Mon, 28 Oct 2024 08:16:20 +0000 (09:16 +0100)]
repair: also update the last written flag when rebuilding the rmap inode

Otherwise zoned file systems on non-zoned devices will be badly corrupted
after repair as this counter is used to initialize the write pointer
for them (it is entirely unused on actual zoned devices).

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: don't set the zoned flag without a RT device
Christoph Hellwig [Sun, 27 Oct 2024 08:38:28 +0000 (09:38 +0100)]
mkfs: don't set the zoned flag without a RT device

This papers over _mkfs_dev in xfstests keeping all options but
dropping the RT device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: pick a sensible default rtgroup size for zoned file systems
Christoph Hellwig [Sun, 27 Oct 2024 08:07:28 +0000 (09:07 +0100)]
mkfs: pick a sensible default rtgroup size for zoned file systems

We'll need enough groups to satisfy the deault group count.  So
start with a reasonablish size and adjust down until it fits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: better zone input validation
Christoph Hellwig [Sun, 27 Oct 2024 07:56:08 +0000 (08:56 +0100)]
mkfs: better zone input validation

Don't support empty zoned realtime devices and reject user specified
rgsize when on an actual zoned device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: document the zoned parameter
Christoph Hellwig [Sun, 27 Oct 2024 07:47:08 +0000 (08:47 +0100)]
mkfs: document the zoned parameter

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: set a defaulval for -r zoned
Christoph Hellwig [Sun, 27 Oct 2024 07:38:56 +0000 (08:38 +0100)]
mkfs: set a defaulval for -r zoned

This allows to just specify "-r zoned" instead of the more verbose
"-r zoned=1".

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: fix metadir conflict check
Christoph Hellwig [Sun, 27 Oct 2024 07:43:35 +0000 (08:43 +0100)]
mkfs: fix metadir conflict check

metadir is in mopts, not ropts.  Also improve the error message a little
bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agolibfrog: report the zoned flag
Christoph Hellwig [Thu, 24 Oct 2024 08:56:44 +0000 (10:56 +0200)]
libfrog: report the zoned flag

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: report a XFS_FSOP_GEOM_FLAGS_ZONED in the file system geometry
Christoph Hellwig [Thu, 24 Oct 2024 08:58:13 +0000 (10:58 +0200)]
xfs: report a XFS_FSOP_GEOM_FLAGS_ZONED in the file system geometry

Source kernel commit: 79473ee0b7f1157cefef5e4b180fa14e30641d36

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: fix rtmblocks handling for zoned file systems on conventional devices
Christoph Hellwig [Thu, 24 Oct 2024 11:56:09 +0000 (13:56 +0200)]
mkfs: fix rtmblocks handling for zoned file systems on conventional devices

nr_zones will be zero for non-zoned devices, so this was the wrong check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agofixup
Christoph Hellwig [Wed, 23 Oct 2024 11:25:50 +0000 (13:25 +0200)]
fixup

9 months agoxfs: constify feature checks
Christoph Hellwig [Wed, 23 Oct 2024 11:24:48 +0000 (13:24 +0200)]
xfs: constify feature checks

Source kernel commit: 52a3f6d90a651c3eb14a4d7845934888d8eb9089

We'll need to call them on a const structure in growfs in a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agofixup
Christoph Hellwig [Wed, 23 Oct 2024 06:33:11 +0000 (08:33 +0200)]
fixup

9 months agoxfs: clear sb_rbmblocks to zero for zoned file systems
Christoph Hellwig [Wed, 23 Oct 2024 11:31:19 +0000 (13:31 +0200)]
xfs: clear sb_rbmblocks to zero for zoned file systems

Source kernel commit: 72b6d637661f2a037fb0dac08a34966f2b3f52c1

Zoned file systems don't have a RT bitmap, so they should not set this
value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agorepair: initialize the used counter when rebuilding the rmap inode
Christoph Hellwig [Mon, 21 Oct 2024 14:45:37 +0000 (16:45 +0200)]
repair: initialize the used counter when rebuilding the rmap inode

When creating a new rmap inode during phase 6 the used counter is
currently always set to 0, which will cause premature zone resets
for full zones or lead to an assert triggering when freeing blocks
on open zones.  Ensure that repair sets the proper used count.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: support zone gaps
Christoph Hellwig [Thu, 10 Oct 2024 13:09:13 +0000 (15:09 +0200)]
mkfs: support zone gaps

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agofixups
Christoph Hellwig [Thu, 10 Oct 2024 11:18:21 +0000 (13:18 +0200)]
fixups

9 months agoxfs: support zone gaps
Christoph Hellwig [Fri, 18 Oct 2024 14:23:06 +0000 (16:23 +0200)]
xfs: support zone gaps

Source kernel commit: 7ebf6897445538e8c6277581428728b5b2d3c357

Zoned devices can have gaps beyoned the usable capacity of a zone and the
end in the LBA/daddr address space.  In other words, the hardware
equivalent to the RT groups already takes care of the power of 2
alignment for us.  In this case the sparse FSB/RTB address space maps 1:1
to the device address space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: reflink conflicts with zoned file systems for now
Christoph Hellwig [Fri, 16 Aug 2024 18:23:12 +0000 (20:23 +0200)]
mkfs: reflink conflicts with zoned file systems for now

Until GC is enhanced to not unshared reflinked blocks we better prohibit
this combination.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: calculate zone overprovisioning when specifying size
Hans Holmberg [Tue, 10 Sep 2024 08:55:02 +0000 (08:55 +0000)]
mkfs: calculate zone overprovisioning when specifying size

When size is specified for zoned file systems, calculate the required
over provisioning to back the requested capacity.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: support creating zoned file systems
Christoph Hellwig [Tue, 8 Oct 2024 07:49:32 +0000 (09:49 +0200)]
mkfs: support creating zoned file systems

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agorepair: support repairing zoned file systems
Christoph Hellwig [Tue, 8 Oct 2024 07:47:20 +0000 (09:47 +0200)]
repair: support repairing zoned file systems

Note really much to do here.  Mostly ignore the validation and
regeneration of the bitmap and summary inodes.  Eventually this
could grow a bit of validation of the hardware zone state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoFIXUP: xfs: support zoned RT devices
Christoph Hellwig [Fri, 13 Sep 2024 08:37:02 +0000 (10:37 +0200)]
FIXUP: xfs: support zoned RT devices

Mostly more libxfs glue, and a few crude hacks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support zoned RT devices
Christoph Hellwig [Fri, 18 Oct 2024 14:22:15 +0000 (16:22 +0200)]
xfs: support zoned RT devices

Source kernel commit: 2e881e292c279aff05529b87d9ec33b7b78b8fab

WARNING: this is early prototype code.

The zoned allocator works by handing out data blocks to the direct or
buffered write code at the place where XFS currently does block
allocations.  It does not actually insert them into the bmap extent tree
at this time, but only after I/O completion when we known the block number.

The zoned allocator works on any kind of device, including conventional
devices or conventional zones by having a crude write pointer emulation.
For zone devices active zone management is fully support, as is
zone capacity < zone size.

The two major limitations are:

- there is no support for unwritten extents and thus persistent
file preallocations from fallocate().  This is inherent to an
always out of place write scheme as there is no way to persistently
preallocate blocks for an indefinite number of overwrites
- because the metadata blocks and data blocks are on different
device you can run out of space for metadata while having plenty
of space for data and vice versa.  This is inherent to a scheme
where we use different devices or pools for each.

For zoned file systems we reserve the free extents before taking the
ilock so that if we have to force garbage collection it happens before we
take the iolock.  This is done because GC has to take the iolock after it
moved data to a new place, and this could otherwise deadlock.

This unfortunately has to exclude block zeroing, as for truncate we are
called with the iolock (aka i_rwsem) already held.  As zeroing is always
only for a single block at a time, or up to two total for a syscall in
case for free_file_range we deal with that by just stealing the block,
but failing the allocation if we'd have to wait for GC.

Add a new RTAVAILABLE counter of blocks that are actually directly
available to be written into in addition to the classic free counter.
Only allow a write to go ahead if it has blocks available to write, and
otherwise wait for GC.  This also requires tweaking the need GC condition a
bit as we now always need to GC if someone is waiting for space.

Thanks to Hans Holmberg <hans.holmberg@wdc.com> for lots of fixes
and improvements.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: add an incompat feature bit for zoned RT devices
Christoph Hellwig [Fri, 23 Aug 2024 14:35:20 +0000 (16:35 +0200)]
xfs: add an incompat feature bit for zoned RT devices

Source kernel commit: c137705016e30bbcb51b48109a4050eb08e26493

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay
Christoph Hellwig [Thu, 15 Aug 2024 16:49:08 +0000 (18:49 +0200)]
xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay

Source kernel commit: 422100222f665aadccf03ed11f5fd51b0066d9be

The zone allocator wants to be able to remove a delalloc mapping in the
COW fork while keeping the block reservation.  To support that pass the
blags argument down to xfs_bmap_del_extent_delay and support the
XFS_BMAPI_REMAP flag to keep the reservation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoFIXUP: add libxfs freecounter infrastructure
Christoph Hellwig [Tue, 31 Oct 2023 07:52:32 +0000 (08:52 +0100)]
FIXUP: add libxfs freecounter infrastructure

To be folded into the previous patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: generalize the freespace and reserved blocks handling
Christoph Hellwig [Thu, 15 Aug 2024 17:11:33 +0000 (19:11 +0200)]
xfs: generalize the freespace and reserved blocks handling

Source kernel commit: 9b08de5f50f114510b77b177255579f0a47e728a

The main handling of the incore per-cpu freespace counters is already
handled in xfs_mod_freecounter for both the block and RT extent cases,
but the actual counter is passed in an special cases.

Replace both the percpu counters and the resblks counters with arrays,
so that support reserved RT extents can be supported, which will be
needed for garbarge collection on zoned devices.

Use helpers to access the freespace counters everywhere intead of
poking through the abstraction by using the percpu_count helpers
directly.  This also switches the flooring of the frextents counter
to 0 in statfs for the rthinherit case to a manual min_t call to match
the handling of the fdblocks counter for normal file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: pass an xfs_inode to xfs_bmap_extsize_align
Christoph Hellwig [Tue, 31 Oct 2023 07:43:55 +0000 (08:43 +0100)]
xfs: pass an xfs_inode to xfs_bmap_extsize_align

Source kernel commit: 417af7cf6d5ca02a263dd9aa03718fa13aadd3c0

Pass the inode instead of the mount so that it can deduct by itself if
data should go to the RT device instead of explicitly passing an
argument for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agomkfs: enable reflink on the realtime device
Darrick J. Wong [Tue, 15 Oct 2024 19:44:49 +0000 (12:44 -0700)]
mkfs: enable reflink on the realtime device

Allow the creation of filesystems with both reflink and realtime volumes
enabled.  For now we don't support a realtime extent size > 1.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agomkfs: validate CoW extent size hint when rtinherit is set
Darrick J. Wong [Tue, 15 Oct 2024 19:44:49 +0000 (12:44 -0700)]
mkfs: validate CoW extent size hint when rtinherit is set

Extent size hints exist to nudge the behavior of the file data block
allocator towards trying to make aligned allocations.  Therefore, it
doesn't make sense to allow a hint that isn't a multiple of the
fundamental allocation unit for a given file.

This means that if the sysadmin is formatting with rtinherit set on the
root dir, validate_cowextsize_hint needs to check the hint value on a
simulated realtime file to make sure that it's correct.  This hasn't
been necessary in the past since one cannot have a CoW hint without a
reflink filesystem, and we previously didn't allow rt reflink
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_logprint: report realtime CUIs
Darrick J. Wong [Tue, 15 Oct 2024 19:44:49 +0000 (12:44 -0700)]
xfs_logprint: report realtime CUIs

Decode the CUI format just enough to report if an CUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: validate CoW extent size hint on rtinherit directories
Darrick J. Wong [Tue, 15 Oct 2024 19:44:49 +0000 (12:44 -0700)]
xfs_repair: validate CoW extent size hint on rtinherit directories

XFS allows a sysadmin to change the rt extent size when adding a rt
section to a filesystem after formatting.  If there are any directories
with both a cowextsize hint and rtinherit set, the hint could become
misaligned with the new rextsize.  Offer to fix the problem if we're in
modify mode and the verifier didn't trip.  If we're in dry run mode,
we let the kernel fix it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: allow realtime files to have the reflink flag set
Darrick J. Wong [Tue, 15 Oct 2024 19:44:48 +0000 (12:44 -0700)]
xfs_repair: allow realtime files to have the reflink flag set

Now that we allow reflink on the realtime volume, allow that combination
of inode flags if the feature's enabled.  Note that we now allow inodes
to have rtinherit even if there's no realtime volume, since the kernel
has never restricted that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: rebuild the realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:48 +0000 (12:44 -0700)]
xfs_repair: rebuild the realtime refcount btree

Use the collected reference count information to rebuild the btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: reject unwritten shared extents
Darrick J. Wong [Tue, 15 Oct 2024 19:44:48 +0000 (12:44 -0700)]
xfs_repair: reject unwritten shared extents

We don't allow sharing of unwritten extents, which means that repair
should reject an unwritten extent if someone else has already claimed
the space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: check existing realtime refcountbt entries against observed refcounts
Darrick J. Wong [Tue, 15 Oct 2024 19:44:48 +0000 (12:44 -0700)]
xfs_repair: check existing realtime refcountbt entries against observed refcounts

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime refcount
btree (particularly if we're in -n mode) to detect rtrefcountbt
problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: compute refcount data for the realtime groups
Darrick J. Wong [Tue, 15 Oct 2024 19:44:48 +0000 (12:44 -0700)]
xfs_repair: compute refcount data for the realtime groups

At the end of phase 4, compute reference count information for realtime
groups from the realtime rmap information collected, just like we do for
AGs in the data section.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: find and mark the rtrefcountbt inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:47 +0000 (12:44 -0700)]
xfs_repair: find and mark the rtrefcountbt inode

Make sure that we find the realtime refcountbt inode and mark it
appropriately, just in case we find a rogue inode claiming to
be an rtrefcount, or just plain garbage in the superblock field.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: use realtime refcount btree data to check block types
Darrick J. Wong [Tue, 15 Oct 2024 19:44:47 +0000 (12:44 -0700)]
xfs_repair: use realtime refcount btree data to check block types

Use the realtime refcount btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: allow CoW staging extents in the realtime rmap records
Darrick J. Wong [Tue, 15 Oct 2024 19:44:47 +0000 (12:44 -0700)]
xfs_repair: allow CoW staging extents in the realtime rmap records

Don't flag the rt rmap btree as having errors if there are CoW staging
extent records in it and the filesystem supports reflink.  As far as
reporting leftover staging extents, we'll report them when we scan the
rt refcount btree, in a future patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_spaceman: report health of the realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:47 +0000 (12:44 -0700)]
xfs_spaceman: report health of the realtime refcount btree

Report the health of the realtime reference count btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: copy the realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:47 +0000 (12:44 -0700)]
xfs_db: copy the realtime refcount btree

Copy the realtime refcountbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: support the realtime refcountbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs_db: support the realtime refcountbt

Wire up various parts of xfs_db for realtime refcount support so that we
can dump the rt refcount btree contents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: display the realtime refcount btree contents
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs_db: display the realtime refcount btree contents

Implement all the code we need to dump rtrefcountbt contents, starting
from the inode root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoman: document userspace API changes due to rt reflink
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
man: document userspace API changes due to rt reflink

Update documentation to describe userspace ABI changes made for realtime
reflink support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agolibfrog: enable scrubbing of the realtime refcount data
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
libfrog: enable scrubbing of the realtime refcount data

Add a new entry so that we can scrub the rtrefcountbt and its metadata
directory tree path.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:46 +0000 (12:44 -0700)]
xfs: scrub the metadir path of rt refcount btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: scrub the realtime refcount btree

Source kernel commit: 844d7f8755a67b01391da92b99a5342c8b2b83f4

Add code to scrub realtime refcount btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: report realtime refcount btree corruption errors to the health system

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: enable extent size hints for CoW operations

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Tue, 15 Oct 2024 19:44:45 +0000 (12:44 -0700)]
xfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: recover CoW leftovers in the realtime volume
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: recover CoW leftovers in the realtime volume

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: allow inodes to have the realtime and reflink flags

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: compute rtrmap btree max levels when reflink enabled

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:44 +0000 (12:44 -0700)]
xfs: update rmap to allow cow staging extents in the rt rmap

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: create routine to allocate and initialize a realtime refcount btree inode

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: wire up realtime refcount btree cursors

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: wire up a new inode fork type for the realtime refcount
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: wire up a new inode fork type for the realtime refcount

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: add metadata reservations for realtime refcount btree

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Tue, 15 Oct 2024 19:44:43 +0000 (12:44 -0700)]
xfs: add realtime refcount btree inode to metadata directory

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: prepare refcount functions to deal with rtrefcountbt

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: add realtime refcount btree operations

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Tue, 15 Oct 2024 19:44:42 +0000 (12:44 -0700)]
xfs: realtime refcount btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: introduce realtime refcount btree ondisk definitions
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
xfs: introduce realtime refcount btree ondisk definitions

Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
xfs: namespace the maximum length/refcount symbols

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agomkfs: use file write helper to populate files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
mkfs: use file write helper to populate files

Use the file write helper to write files into the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agolibxfs: resync libxfs_alloc_file_space interface with the kernel
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
libxfs: resync libxfs_alloc_file_space interface with the kernel

Make the userspace xfs_alloc_file_space behave (more or less) like the
kernel version, at least as far as the interface goes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agomkfs: create the realtime rmap inode
Darrick J. Wong [Tue, 15 Oct 2024 19:44:41 +0000 (12:44 -0700)]
mkfs: create the realtime rmap inode

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_logprint: report realtime RUIs
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_logprint: report realtime RUIs

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: reserve per-AG space while rebuilding rt metadata
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: reserve per-AG space while rebuilding rt metadata

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: rebuild the bmap btree for realtime files
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: rebuild the bmap btree for realtime files

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: check for global free space concerns with default btree slack levels
Darrick J. Wong [Tue, 15 Oct 2024 19:44:40 +0000 (12:44 -0700)]
xfs_repair: check for global free space concerns with default btree slack levels

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: rebuild the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: rebuild the realtime rmap btree

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: always check realtime file mappings against incore info
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: always check realtime file mappings against incore info

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: check existing realtime rmapbt entries against observed rmaps
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: check existing realtime rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: find and mark the rtrmapbt inodes
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: find and mark the rtrmapbt inodes

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: refactor realtime inode check
Darrick J. Wong [Tue, 15 Oct 2024 19:44:39 +0000 (12:44 -0700)]
xfs_repair: refactor realtime inode check

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: create a new set of incore rmap information for rt groups
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: create a new set of incore rmap information for rt groups

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: use realtime rmap btree data to check block types
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: use realtime rmap btree data to check block types

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_repair: flag suspect long-format btree blocks
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_repair: flag suspect long-format btree blocks

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_spaceman: report health status of the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_spaceman: report health status of the realtime rmap btree

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:38 +0000 (12:44 -0700)]
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support so that we can
dump the btree contents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the inode root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs_db: don't abort when bmapping on a non-extents/bmbt fork
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
xfs_db: don't abort when bmapping on a non-extents/bmbt fork

We're going to introduce new fork formats, so let's fix the problem that
xfs_db's bmap command aborts when the fork format isn't one of the
existing ones.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoman: document userspace API changes due to rt rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:37 +0000 (12:44 -0700)]
man: document userspace API changes due to rt rmap

Update documentation to describe userspace ABI changes made for realtime
rmap support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agolibfrog: enable scrubbing of the realtime rmap
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
libfrog: enable scrubbing of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt and its metadata
directory tree path too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: online repair of the realtime rmap btree

Source kernel commit: f813af307d62d4c4d620a358bbd406f89ffdeca2

Repair the realtime rmap btree while mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: online repair of realtime bitmaps for a realtime group
Darrick J. Wong [Tue, 15 Oct 2024 19:44:36 +0000 (12:44 -0700)]
xfs: online repair of realtime bitmaps for a realtime group

For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>