Christoph Hellwig [Thu, 17 Oct 2024 05:50:06 +0000 (07:50 +0200)]
xfs: delayed block allocation
There is no need to assign a zone util we submit the bio using it.
Assigning a zone earlier and incrementing the write pointer just means
we us up zone active resources longer than required. So instead of
allocating the block in the iomap_begin and map_blocks methods, just
provide stub iomaps there, and then allocate the blocks just before
submitting the bios, which also fits in really nicely with the flow
to split the bios to the hardware limits.
Christoph Hellwig [Sat, 31 Aug 2024 04:27:26 +0000 (07:27 +0300)]
xfs: support an internal zoned rtdev
Allow creating an RT subvolume on the same device as the main data
device. This is mostly used for SMR HDDs where the conventional zones
are used for the data device and the sequential write required zones
for the zoned RT section. One day we should also support the log
on sequential write required zones, but that is not supported here.
Christoph Hellwig [Tue, 30 Jul 2024 23:42:42 +0000 (16:42 -0700)]
xfs: factor out a xfs_rt_check_size helper
Add a helper to check that the last block of a RT device is readable
to share the code between mount and growfs. This also adds the mount
time overflow check to growfs and improves the error messages.
Christoph Hellwig [Tue, 30 Jul 2024 23:15:43 +0000 (16:15 -0700)]
xfs: simplify sector number calculation in xfs_zero_extent
xfs_zero_extent does some really odd gymnstics to calculate the block
layer sectors numbers passed to blkdev_issue_zeroout. This is because it
used to call sb_issue_zeroout and the calculations in that helper got
open coded here in the rather misleadingly named commit 3dc29161070a
("dax: use sb_issue_zerout instead of calling dax_clear_sectors").
Christoph Hellwig [Sat, 2 Nov 2024 06:04:18 +0000 (07:04 +0100)]
block: take chunk_sectors into account in bio_split_write_zeroes
For zoned devices, write zeroes must be split at the zone boundary
which is represented as chunk_sectors. For other uses like the
internally RAIDed NVMe devices it is probably at least useful.
Enhance get_max_io_size to know about write zeroes and use it in
bio_split_write_zeroes. Also add a comment about the seemingly
nonsensical zero max_write_zeroes limit.
Christoph Hellwig [Thu, 31 Oct 2024 14:09:05 +0000 (15:09 +0100)]
block: lift bio_is_zone_append to bio.h
Make bio_is_zone_append globally available, because file systems need
to use to check for a zone append bio in their end_io handlers to deal
with the block layer emulation.
Damien Le Moal [Fri, 1 Nov 2024 01:33:52 +0000 (10:33 +0900)]
block: Add a public bdev_zone_is_seq() helper
Turn the private disk_zone_is_conv() function in blk-zoned.c into a
public and documented bdev_zone_is_seq() helper with the inverse
polarity of the original function, also adding a check for non-zoned
devices so that all file systems can use the helper, even with a regular
block device.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Damien Le Moal [Fri, 1 Nov 2024 01:33:51 +0000 (10:33 +0900)]
block: RCU protect disk->conv_zones_bitmap
Ensure that a disk revalidation changing the conventional zones bitmap
of a disk does not cause invalid memory references when using the
disk_zone_is_conv() helper by RCU protecting the disk->conv_zones_bitmap
pointer.
disk_zone_is_conv() is modified to operate under the RCU read lock and
the function disk_set_conv_zones_bitmap() is added to update a disk
conv_zones_bitmap pointer using rcu_replace_pointer() with the disk
zone_wplugs_lock spinlock held.
disk_free_zone_resources() is modified to call
disk_update_zone_resources() with a NULL bitmap pointer to free the disk
conv_zones_bitmap. disk_set_conv_zones_bitmap() is also used in
disk_update_zone_resources() to set the new (revalidated) bitmap and
free the old one.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 25 Oct 2024 13:48:40 +0000 (15:48 +0200)]
xfs: split zoned GC writes on demand
Get rid of the max_zone_append_size hack, and always read up to the
available size for GC, and the split the chunks to the hardware limits
when writing the data.
Christoph Hellwig [Fri, 25 Oct 2024 13:48:40 +0000 (15:48 +0200)]
iomap/xfs: split bios to zone append limits in the submission handlers
Follow the btrfs lead and don't try to build bios to hardware limits,
but instead build them as normal and split them to the hardware limits
in the I/O submission handler.
This fixes a regression introduced in upstream commit ed9832bc08db
("block: introduce folio awareness and add a bigger size from folio")
that broken the zone append support in __bio_iov_iter_get_pages().
Christoph Hellwig [Tue, 29 Oct 2024 08:24:30 +0000 (09:24 +0100)]
iomap: allow the file system to submit the writeback bios
Change ->prepare_ioend to ->submit_ioend and require file systems that
implement it to submit the bio. This is needed for file systems that
do their own work on the bios before submitting them to the block layer
like btrfs or zoned xfs.
Christoph Hellwig [Tue, 15 Oct 2024 08:15:35 +0000 (10:15 +0200)]
block: add a public bdev_zone_is_seq helper
Turn the private disk_zone_is_conv in blk-zoned.c into a public and
documented bdev_zone_is_seq helper with inverse polarity and a check
for non-zoned devices that file systems can use.
Christoph Hellwig [Thu, 24 Oct 2024 15:14:49 +0000 (17:14 +0200)]
xfs: use the correct conversion helper in xfs_zone_alloc_blocks_rtg
We need to convert to a daddr and then to the weird byte format used by
iomap. This mostly doesn't matter, except when using non power of
two size without zone capacity support.
Christoph Hellwig [Thu, 24 Oct 2024 03:07:51 +0000 (05:07 +0200)]
xfs: reinstate post-EOF and seattr zeroing
generic/363 made it clear we can't actually skip this for zoned
file systems. Move the non-blocking reserved pool allocatio to
seattr and wire up the alloc context everywhere, as well as
re-enable the post-EOF zeroing in write().
Christoph Hellwig [Wed, 23 Oct 2024 07:01:12 +0000 (09:01 +0200)]
xfs: support growfs on zoned file systems
Replace the inner loop growing one RT bitmap block at a time with
one just modifying the superblock counters for growing an entire
zone (aka RTG). The big restriction is just like at mkfs time only
a RT extent size of a single FSB is allowed, and the file system
capacity needs to be aligned to the zone size.
Hans Holmberg [Sun, 6 Oct 2024 05:04:42 +0000 (07:04 +0200)]
xfs: add data placement info to mount stats
Add per-rtg active refs, life time hint and data separation score and
an aggregate data separation score as output to the mount stats
to aid debugging and analysis.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Christoph Hellwig [Mon, 22 Jul 2024 13:31:28 +0000 (06:31 -0700)]
xfs: support xrep_require_rtext_inuse on zoned file systems
Space usage is tracked by the rmap, which already is separately
cross-reference. But on top of that we have the write pointer and can
do a basic sanity check here that the block is not beyond the write
pointer.
Christoph Hellwig [Thu, 15 Aug 2024 16:12:33 +0000 (18:12 +0200)]
xfs: support xchk_xref_is_used_rt_space on zoned file systems
Space usage is tracked by the rmap, which already is separately
cross-reference. But on top of that we have the write pointer and can
do a basic sanity check here that the block is not beyond the write
pointer.
Christoph Hellwig [Fri, 18 Oct 2024 14:19:15 +0000 (16:19 +0200)]
xfs: support zone gaps
Zoned devices can have gaps beyoned the usable capacity of a zone and the
end in the LBA/daddr address space. In other words, the hardware
equivalent to the RT groups already takes care of the power of 2
alignment for us. In this case the sparse FSB/RTB address space maps 1:1
to the device address space.
Christoph Hellwig [Fri, 31 May 2024 09:25:00 +0000 (11:25 +0200)]
xfs: support zoned RT devices
WARNING: this is early prototype code.
The zoned allocator works by handing out data blocks to the direct or
buffered write code at the place where XFS currently does block
allocations. It does not actually insert them into the bmap extent tree
at this time, but only after I/O completion when we known the block number.
The zoned allocator works on any kind of device, including conventional
devices or conventional zones by having a crude write pointer emulation.
For zone devices active zone management is fully support, as is
zone capacity < zone size.
The two major limitations are:
- there is no support for unwritten extents and thus persistent
file preallocations from fallocate(). This is inherent to an
always out of place write scheme as there is no way to persistently
preallocate blocks for an indefinite number of overwrites
- because the metadata blocks and data blocks are on different
device you can run out of space for metadata while having plenty
of space for data and vice versa. This is inherent to a scheme
where we use different devices or pools for each.
For zoned file systems we reserve the free extents before taking the
ilock so that if we have to force garbage collection it happens before we
take the iolock. This is done because GC has to take the iolock after it
moved data to a new place, and this could otherwise deadlock.
This unfortunately has to exclude block zeroing, as for truncate we are
called with the iolock (aka i_rwsem) already held. As zeroing is always
only for a single block at a time, or up to two total for a syscall in
case for free_file_range we deal with that by just stealing the block,
but failing the allocation if we'd have to wait for GC.
Add a new RTAVAILABLE counter of blocks that are actually directly
available to be written into in addition to the classic free counter.
Only allow a write to go ahead if it has blocks available to write, and
otherwise wait for GC. This also requires tweaking the need GC condition a
bit as we now always need to GC if someone is waiting for space.
Thanks to Hans Holmberg <hans.holmberg@wdc.com> for lots of fixes
and improvements.
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Sun, 12 May 2024 05:39:45 +0000 (07:39 +0200)]
xfs: disable sb_frextents scrub/repair for zoned file systems
Zoned file systems not only don't use the frextents counter, but the
in-memory percpu couner also includes reservations take before even
allocating delalloc extent records, so it will never match the per-zone
used information.
Christoph Hellwig [Tue, 10 Sep 2024 05:00:17 +0000 (08:00 +0300)]
xfs: don't zero post-EOF blocks on write and truncate up for zoned file systems
Zoned file systems don't leave blocks past the last allocated block
around ever, so don't bother with a zeroing operation for these
non-existent blocks. This avoids having to take a space resevation
for these operations.
Christoph Hellwig [Sun, 6 Oct 2024 04:30:30 +0000 (06:30 +0200)]
xfs: add a helper to check if an inode sits on a zoned device
Add a xfs_is_zoned_inode helper that returns true if an inode has the
RT flag set and the file system is zoned. This will be used to key
off zoned allocator behavior.
Make xfs_is_always_cow_inode return true for zoned inodes as we always
need to write out of place on zoned devices.
Christoph Hellwig [Fri, 27 Oct 2023 07:58:24 +0000 (09:58 +0200)]
xfs: refine the unaligned check for always COW inodes in xfs_file_dio_write
For always COW inodes we also must check the alignment of each individual
iovec segment, as they could end up with different I/Os due to the way
bio_iov_iter_get_pages works, and we'd then overwrite an already written
block.
Christoph Hellwig [Tue, 10 Sep 2024 04:58:17 +0000 (07:58 +0300)]
xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay
The zone allocator wants to be able to remove a delalloc mapping in the
COW fork while keeping the block reservation. To support that pass the
blags argument down to xfs_bmap_del_extent_delay and support the
XFS_BMAPI_REMAP flag to keep the reservation.
Christoph Hellwig [Thu, 10 Oct 2024 05:27:50 +0000 (07:27 +0200)]
xfs: generalize the freespace and reserved blocks handling
The main handling of the incore per-cpu freespace counters is already
handled in xfs_mod_freecounter for both the block and RT extent cases,
but the actual counter is passed in an special cases.
Replace both the percpu counters and the resblks counters with arrays,
so that support reserved RT extents can be supported, which will be
needed for garbarge collection on zoned devices.
Use helpers to access the freespace counters everywhere intead of
poking through the abstraction by using the percpu_count helpers
directly. This also switches the flooring of the frextents counter
to 0 in statfs for the rthinherit case to a manual min_t call to match
the handling of the fdblocks counter for normal file systems.
Christoph Hellwig [Fri, 16 Aug 2024 16:49:13 +0000 (18:49 +0200)]
iomap: pass private data to iomap_truncate_page
Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.
Christoph Hellwig [Fri, 16 Aug 2024 16:48:16 +0000 (18:48 +0200)]
iomap: pass private data to iomap_zero_range
Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.
Christoph Hellwig [Tue, 10 Sep 2024 04:57:21 +0000 (07:57 +0300)]
iomap: pass private data to iomap_page_mkwrite
Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.
Christoph Hellwig [Fri, 24 Nov 2023 09:45:35 +0000 (10:45 +0100)]
iomap: optionally use ioends for direct I/O
struct iomap_ioend currently tracks outstanding buffered writes and has
some really nice code in core iomap and XFS to merge contiguous I/Os
an defer them to userspace for completion in a very efficient way.
For zoned writes we'll also need a per-bio user context completion to
record the written blocks, and the infrastructure for that would look
basically like the ioend handling for buffered I/O.
So intead of reinventing the wheel, reuse the existing infrastructure.
Christoph Hellwig [Wed, 14 Feb 2024 14:09:44 +0000 (15:09 +0100)]
iomap: support IOMAP_F_ZONE_APPEND for buffered I/O
Add support for Zone Append commands to the iomap buffer writeback code.
This involves selecting the right block layer operation and using the
right helper to add data to the bio, as well as not creating chained
bios inside the iomap for zone append as they could be written
non-contiguously and adjusting the sector based merge criteria.
Christoph Hellwig [Fri, 20 Oct 2023 13:36:03 +0000 (15:36 +0200)]
iomap: make iomap_sector Zone Append aware
Zone Append commands always point to the zone start sector. Change
the iomap_sector() helper to not adjust the start block for the position
in the iomap range for Zone Append iomaps.
Christoph Hellwig [Sun, 5 Nov 2023 05:40:52 +0000 (06:40 +0100)]
iomap: reinstate IOMAP_F_ZONE_APPEND support
Add back the support for using Zone Append in the iomap direct I/O code
that was removed a while ago as we'll use it for the XFS zoned device
support.
This is essentially a revert of commit 8e81aa16a421 ("iomap: remove
IOMAP_F_ZONE_APPEND") with an additional comment describing the flag now
that all the other IOMAP_F_* flags have a nice description.
Christoph Hellwig [Fri, 13 Oct 2023 06:09:39 +0000 (08:09 +0200)]
iomap: wait for writeback before allocating new blocks
This means we are actually forced to allocate new delalloc space for the
new dirtier instead of reusing one that is currently being used for
writeback.
Christoph Hellwig [Mon, 26 Aug 2024 06:16:23 +0000 (08:16 +0200)]
xfs: punch delalloc extents from the COW fork for COW writes
When ->iomap_end is called on a short write to the COW fork it needs to
punch stale delalloc data from the COW fork and not the data fork.
Ensure that IOMAP_F_NEW is set for new COW fork allocations in
xfs_buffered_write_iomap_begin, and then use the IOMAP_F_SHARED flag
in xfs_buffered_write_delalloc_punch to decide which fork to punch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 26 Aug 2024 05:32:22 +0000 (07:32 +0200)]
xfs: set IOMAP_F_SHARED for all COW fork allocations
Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork. It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.
This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.
Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 26 Aug 2024 05:04:01 +0000 (07:04 +0200)]
xfs: share more code in xfs_buffered_write_iomap_begin
Introduce a local iomap_flags variable so that the code allocating new
delalloc blocks in the data fork can fall through to the found_imap
label and reuse the code to unlock and fill the iomap.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 26 Aug 2024 06:16:10 +0000 (08:16 +0200)]
xfs: support the COW fork in xfs_bmap_punch_delalloc_range
xfs_buffered_write_iomap_begin can also create delallocate reservations
that need cleaning up, prepare for that by adding support for the COW
fork in xfs_bmap_punch_delalloc_range.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 20 Sep 2024 10:14:49 +0000 (12:14 +0200)]
xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
All XFS callers of iomap_zero_range and iomap_file_unshare already hold
invalidate_lock, so we can't take it again in
iomap_file_buffered_write_punch_delalloc.
Use the passed in flags argument to detect if we're called from a zero
or unshare operation and don't take the lock again in this case.
Christoph Hellwig [Tue, 3 Sep 2024 08:24:34 +0000 (11:24 +0300)]
xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
not take XFS_MMAPLOCK_EXCL (aka the invalidate lock). Currently that
is actually the right thing, as an error in the iomap zeroing code will
also take the invalidate_lock to clean up, but to fix that deadlock we
need a consistent locking pattern first.
The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
pagefaults, which isn't really needed here, but also not actively
harmful.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 20 Sep 2024 10:11:32 +0000 (12:11 +0200)]
iomap: move locking out of iomap_write_delalloc_release
XFS (which currently is the only user of iomap_write_delalloc_release)
already holds invalidate_lock for most zeroing operations. To be able
to avoid a deadlock it needs to stop taking the lock, but doing so
in iomap would leak XFS locking details into iomap.
To avoid this require the caller to hold invalidate_lock when calling
iomap_write_delalloc_release instead of taking it there.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Currently iomap_file_buffered_write_punch_delalloc can be called from
XFS either with the invalidate lock held or not. To fix this while
keeping the locking in the file system and not the iomap library
code we'll need to life the locking up into the file system.
To prepare for that, open code iomap_file_buffered_write_punch_delalloc
in the only caller, and instead export iomap_write_delalloc_release.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:42 +0000 (12:40 -0700)]
xfs: check for shared rt extents when rebuilding rt file's data fork
When we're rebuilding the data fork of a realtime file, we need to
cross-reference each mapping with the rt refcount btree to ensure that
the reflink flag is set if there are any shared extents found.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:39 +0000 (12:40 -0700)]
xfs: walk the rt reference count tree when rebuilding rmap
When we're rebuilding the data device rmap, if we encounter a "refcount"
format fork, we have to walk the (realtime) refcount btree inode to
build the appropriate mappings.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:39 +0000 (12:40 -0700)]
xfs: check new rtbitmap records against rt refcount btree
When we're rebuilding the realtime bitmap, check the proposed free
extents against the rt refcount btree to make sure we don't commit any
grievous errors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:38 +0000 (12:40 -0700)]
xfs: don't flag quota rt block usage on rtreflink filesystems
Quota space usage is allowed to exceed the size of the physical storage
when reflink is enabled. Now that we have reflink for the realtime
volume, apply this same logic to the rtb repair logic.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:36 +0000 (12:40 -0700)]
xfs: detect and repair misaligned rtinherit directory cowextsize hints
If we encounter a directory that has been configured to pass on a CoW
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review and/or turn it off because that is a
misconfiguration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:35 +0000 (12:40 -0700)]
xfs: check reference counts of gaps between rt refcount records
If there's a gap between records in the rt refcount btree, we ought to
cross-reference the gap with the rtrmap records to make sure that there
aren't any overlapping records for a region that doesn't have any shared
ownership.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:31 +0000 (12:40 -0700)]
xfs: check that the rtrefcount maxlevels doesn't increase when growing fs
The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees. Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in the rt refcount btree maxlevels.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:29 +0000 (12:40 -0700)]
xfs: apply rt extent alignment constraints to CoW extsize hint
The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint. Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.
Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:29 +0000 (12:40 -0700)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files. This apparently was done to disable
delayed allocation for realtime files.
However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files. In this case, the
logic will incorrectly send the write through the delalloc write path.
Fix this by adjusting the logic slightly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:25 +0000 (12:40 -0700)]
xfs: enable CoW for realtime data
Update our write paths to support copy on write on the rt volume. This
works in more or less the same way as it does on the data device, with
the major exception that we never do delalloc on the rt volume.
Because we consider unwritten CoW fork staging extents to be incore
quota reservation, we update xfs_quota_reserve_blkres to support this
case. Though xfs doesn't allow rt and quota together, the change is
trivial and we shouldn't leave a logic bomb here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:21 +0000 (12:40 -0700)]
xfs: refactor xfs_reflink_find_shared
Move lookup of the perag structure from the callers into the helpers,
and return the offset into the extent of the shared region instead of
the block number that needs post-processing. This prepares the
callsites for the creation of an rt-specific variant in the next patch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: port to the middle of the rtreflink series for cleanliness] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 15 Oct 2024 19:40:20 +0000 (12:40 -0700)]
xfs: wire up a new inode fork type for the realtime refcount
Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>