www.infradead.org Git - users/hch/xfs.git/log

]> www.infradead.org Git - users/hch/xfs.git/log

Darrick J. Wong [Fri, 9 Aug 2024 19:51:19 +0000 (12:51 -0700)]

xfs: fix di_metatype field of inodes that won't load

Make sure that the di_metatype field is at least set plausibly so that
later scrubbers could set the real type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:22 +0000 (15:54 -0700)]

xfs: adjust parent pointer scrubber for sb-rooted metadata files

Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.

For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent. For
purposes of validating nlink, we pretend that there is a parent pointer.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:21 +0000 (15:54 -0700)]

xfs: metadata files can have xattrs if metadir is enabled

If parent pointers are enabled, then metadata files will store parent
pointers in xattrs, just like files in the user visible directory tree.
Therefore, scrub and repair need to handle attr forks for metadata files
on metadir filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:21 +0000 (15:54 -0700)]

xfs: don't fail repairs on metadata files with no attr fork

Fix a minor bug where we fail repairs on metadata files that do not have
attr forks because xrep_metadata_inode_subtype doesn't filter ENOENT.

Fixes: 5a8e07e799721 ("xfs: repair the inode core and forks of a metadata inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:20 +0000 (15:54 -0700)]

xfs: do not count metadata directory files when doing online quotacheck

Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the online quotacheck code to
reflect this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:19 +0000 (15:54 -0700)]

xfs: refactor directory tree root predicates

Metadata directory trees make reasoning about the parent of a file more
difficult. Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock. Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.

Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:18 +0000 (15:54 -0700)]

xfs: record health problems with the metadata directory

Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:17 +0000 (15:54 -0700)]

xfs: adjust xfs_bmap_add_attrfork for metadir

Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers. In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:17 +0000 (15:54 -0700)]

xfs: mark quota inodes as metadata files

When we're creating quota files at mount time, make sure to mark them as
metadir inodes if appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:16 +0000 (15:54 -0700)]

xfs: don't count metadata directory files to quota

Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage. This will also
quiet down complaints about dquot usage not matching du output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:15 +0000 (15:54 -0700)]

xfs: allow bulkstat to return metadata directories

Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.

(Metadata files of course require per-file scrub code and hence do not
need exposure.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:14 +0000 (15:54 -0700)]

xfs: advertise metadata directory feature

Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:13 +0000 (15:54 -0700)]

xfs: hide metadata inodes from everyone because they are special

Metadata inodes are private files and therefore cannot be exposed to
userspace.  This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs.  As such, we mark
them S_PRIVATE, which stops all that.

While we're at it, put them in a separate lockdep class so that it won't
get confused by "recursive" i_rwsem locking such as what happens when we
write to a rt file and need to allocate from the rt bitmap file.  The
static function that we use to do this will be exported in the rtgroups
patchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:13 +0000 (15:54 -0700)]

xfs: disable the agi rotor for metadata inodes

Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk. Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log. Therefore, disable
AGI rotoring for metadata inode allocations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:12 +0000 (15:54 -0700)]

xfs: read and write metadata inode directory tree

Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:11 +0000 (15:54 -0700)]

xfs: enforce metadata inode flag

Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:10 +0000 (15:54 -0700)]

xfs: load metadata directory root at mount time

Load the metadata directory root inode into memory at mount time and
release it at unmount time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 14 Aug 2024 17:49:00 +0000 (10:49 -0700)]

xfs: iget for metadata inodes

Create a xfs_metafile_iget function for metadata inodes to ensure that
when we try to iget a metadata file, the inobt thinks a metadata inode
is in use and that the metadata type matches what we are expecting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Aug 2024 22:54:09 +0000 (15:54 -0700)]

xfs: define the on-disk format for the metadir feature

Define the on-disk layout and feature flags for the metadata inode
directory feature. Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 22 Sep 2024 06:04:55 +0000 (08:04 +0200)]

xfs: store a generic group structure in the intents

Replace the pag pointers in the extent free, bmap, rmap and refcount
intent structures with a pointer to the generic group to prepare
for adding intents for realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 22 Sep 2024 06:03:14 +0000 (08:03 +0200)]

xfs: remove xfs_group_intent_hold and xfs_group_intent_rele

Each of them just has a single caller, so fold them.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 20 Sep 2024 15:35:42 +0000 (17:35 +0200)]

xfs: add group based bno conversion helpers

Add convenience helpers to convert block numbers based on the generic
group. This will allow writing code that doesn't care if it is used
on AGs or the upcoming realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 06:45:52 +0000 (08:45 +0200)]

xfs: store a generic xfs_group pointer in xfs_getfsmap_info

Replace the pag and rtg pointers with a generic group pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 22 Sep 2024 05:39:53 +0000 (07:39 +0200)]

xfs: add a generic group pointer to the btree cursor

Replace the pag pointers in the type specific union with a generic
xfs_group pointer. This prepares for adding realtime group support.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 20 Sep 2024 13:40:57 +0000 (15:40 +0200)]

xfs: convert busy extent tracking to the generic group structure

Split busy extent tracking from struct xfs_perag into its own private
structure, which can be pointed to by the generic group structure.

Note that this structure is now dynamically allocated instead of embedded
as the upcoming zone XFS code doesn't need it and will also have an
unusually high number of groups due to hardware constraints. Dynamically
allocating the structure this is a big memory saver for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 13:09:06 +0000 (15:09 +0200)]

xfs: convert extent busy tracing to the generic group structure

Prepare for tracking busy RT extents by passing the generic group
structure to the xfs_extent_busy_class tracepoints.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 2 Sep 2024 05:45:37 +0000 (08:45 +0300)]

xfs: return the busy generation from xfs_extent_busy_list_empty

This avoid having to poke into the internals of the busy tracking in
ẋrep_setup_ag_allocbt.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 12:45:33 +0000 (15:45 +0300)]

xfs: move the online repair rmap hooks to the generic group structure

Prepare for the upcoming realtime groups feature by moving the online
repair rmap hooks to based to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 07:07:48 +0000 (09:07 +0200)]

xfs: move draining of deferred operations to the generic group structure

Prepare supporting the upcoming realtime groups feature by moving the
deferred operation draining to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 14 Sep 2024 07:09:31 +0000 (09:09 +0200)]

xfs: mark xfs_perag_intent_{hold,rele} static

These two functions are only used inside of xfs_drain.c, so mark them
static.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 20 Sep 2024 13:40:20 +0000 (15:40 +0200)]

xfs: move metadata health tracking to the generic group structure

Prepare for also tracking the health status of the upcoming realtime
groups by moving the health tracking code to the generic xfs_group
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 20 Sep 2024 13:37:43 +0000 (15:37 +0200)]

xfs: switch perag iteration from the for_each macros to a while based iterator

The current for_each_perag* macros are a bit annoying in that they
require the caller to both provide an object and an index iterator, and
also somewhat obsfucate the underlying control flow mechanism.

Switch to open coded while loops using new xfs_perag_next{,_from,_range}
helpers that return the next pag structure to iterate on based on the
previous one or NULL for the loop start.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 5 Sep 2024 04:23:38 +0000 (07:23 +0300)]

xfs: add a xfs_group_next_range helper

Add a helper to iterate over iterate over all groups, which can be used
as a simple while loop:

struct xfs_group *xg = NULL;

while ((xg = xfs_group_next_range(mp, xg, 0, MAX_GROUP))) {
...
}

This will be wrapped by the realtime group code first, and eventually
replace the for_each_rtgroup_from and for_each_rtgroup_range helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 22 Sep 2024 03:03:22 +0000 (05:03 +0200)]

xfs: factor out a generic xfs_group structure

Split the lookup and refcount handling of struct xfs_perag into an
embedded xfs_group structure that can be reused for the upcoming
realtime groups.

It will be extended with more features later.

Note that he xg_type field will only need a single bit even with
realtime group support. For now it fills a hole, but it might be
worth to fold it into another field if we can use this space better.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 22 Sep 2024 03:02:42 +0000 (05:02 +0200)]

xfs: factor out a xfs_iwalk_args helper

Add a helper to share more code between xfs_iwalk and xfs_inobt_walk,
and at the same time do away with the extra flags indirect so that
everyone use the same names for the same flags when using the common
iwalk code.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 08:48:37 +0000 (11:48 +0300)]

xfs: insert the pag structures into the xarray later

Cleaning up is much easier if a structure can't be looked up yet, so only
insert the pag once it is fully set up.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 13:16:07 +0000 (15:16 +0200)]

xfs: split xfs_initialize_perag

Factor out a xfs_perag_alloc helper that allocates a single perag
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 08:10:49 +0000 (11:10 +0300)]

xfs: convert remaining trace points to pass pag structures

Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 05:39:19 +0000 (08:39 +0300)]

xfs: pass the pag to the xrep_newbt_extent_class tracepoints

This requires moving a few of the callsites a little bit to ensure that
we already have the reference, but allows for the decoding to only happen
when tracing is actually enabled, and cleans up the callsites a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 05:34:33 +0000 (08:34 +0300)]

xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points

This requires holding the pag refcount a little longer, but allows for the
decoding to only happen when tracing is actually enabled, and cleans up the
callsites a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 05:26:13 +0000 (08:26 +0300)]

xfs: pass objects to the xrep_ibt_walk_rmap tracepoint

Pass the perag structure and the irec so that the decoding is only done
when tracing is actually enabled and the call sites look a lot neater,
and remove the pointless class indirection.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 08:20:24 +0000 (11:20 +0300)]

xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point

So that decoding is only done when tracing is actually enabled and the
call site look a lot neater.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 07:39:59 +0000 (10:39 +0300)]

xfs: pass objects to the xfs_irec_merge_{pre,post} trace points

Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 07:37:07 +0000 (10:37 +0300)]

xfs: pass a perag structure to the xfs_ag_resv_init_error trace point

And remove the single instance class indirection for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 07:02:34 +0000 (09:02 +0200)]

xfs: constify pag arguments to trace points

Trace points never modify their arguments. Mark all the pag objects
passed to trace points. The exception is the xfs_ag_resv_class, which
uses the xfs_perag_resv helper that can't be marked const due to
other users modifying the returned structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 14 Sep 2024 06:51:29 +0000 (08:51 +0200)]

xfs: remove the unused xrep_bmap_walk_rmap trace point

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 08:22:05 +0000 (11:22 +0300)]

xfs: remove the unused trace_xfs_iwalk_ag trace point

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 14 Sep 2024 06:31:53 +0000 (08:31 +0200)]

xfs: remove the mount field from struct xfs_busy_extents

The mount field is only passed to xfs_extent_busy_clear, which never uses
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 05:29:58 +0000 (08:29 +0300)]

xfs: keep a reference to the pag for busy extents

Processing of busy extents requires the perag structure, so keep the
reference while they are in flight.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 05:10:12 +0000 (08:10 +0300)]

xfs: pass a pag to xfs_extent_busy_{search,reuse}

Replace the [mp,agno] tuple with the perag structure, which will become
more useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 07:55:11 +0000 (10:55 +0300)]

xfs: add a xfs_agino_to_ino helper

Add a helpers to convert an agino to an ino based on a pag structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 06:25:30 +0000 (08:25 +0200)]

xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers

Add helpers to convert an agbno to a daddr or fsbno based on a pag
structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 08:18:02 +0000 (11:18 +0300)]

xfs: remove the agno argument to xfs_free_ag_extent

xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer. Use that instead of passing the redundant argument,
and do the same for the tracepoint.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sat, 31 Aug 2024 18:05:55 +0000 (21:05 +0300)]

xfs: pass a pag to xfs_difree_inode_chunk

We'll want to use more than just the agno field in a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 14:29:54 +0000 (17:29 +0300)]

xfs: remove the unused pag_active_wq field in struct xfs_perag

pag_active_wq is only woken, but never waited for.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 14:29:37 +0000 (17:29 +0300)]

xfs: remove the unused pagb_count field in struct xfs_perag

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 15 Sep 2024 04:49:40 +0000 (06:49 +0200)]

xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev

The for_each_perag helpers update the agno passed in for each iteration,
and thus the "if (pag->pag_agno == start_ag)" check will always be true.

Add another variable for the loop iterator so that the field is only
cleared after the first iteration.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 19 Sep 2024 13:15:10 +0000 (15:15 +0200)]

xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag

__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail,
which isn't really helpful during log recovery. Remove the flag and
stick to the default GFP_KERNEL policies.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Sun, 1 Sep 2024 08:09:32 +0000 (11:09 +0300)]

xfs: merge the perag freeing helpers

There is no good reason to have two different routines for freeing perag
structures for the unmount and error cases. Add two arguments to specify
the range of AGs to free to xfs_free_perag, and use that to replace
xfs_free_unused_perag_range.

The addition RCU grace period for the error case is harmless, and the
extra check for the AG to actually exist is not required now that the
callers pass the exact known allocated range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Sun, 8 Sep 2024 07:53:41 +0000 (10:53 +0300)]

xfs: pass the exact range to initialize to xfs_initialize_perag

Currently only the new agcount is passed to xfs_initialize_perag, which
requires lookups of existing AGs to skip them and complicates error
handling. Also pass the previous agcount so that the range that
xfs_initialize_perag operates on is exactly defined. That way the
extra lookups can be avoided, and error handling can clean up the
exact range from the old count to the last added perag structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 27 Aug 2024 05:03:21 +0000 (07:03 +0200)]

xfs: ensure st_blocks never goes to zero during COW writes

COW writes remove the amount overwritten either directly for delalloc
reservations, or in earlier deferred transactions than adding the new
amount back in the bmap map transaction. This means st_blocks on an
inode where all data is overwritten using the COW path can temporarily
show a 0 st_blocks. This can easily be reproduced with the pending
zoned device support where all writes use this path and trips the
check in generic/615, but could also happen on a reflink file without
that.

Fix this by temporarily add the pending blocks to be mapped to
i_delayed_blks while the item is queued.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Thu, 29 Aug 2024 04:08:41 +0000 (07:08 +0300)]

xfs: use xas_for_each_marked in xfs_reclaim_inodes_count

xfs_reclaim_inodes_count iterates over all AGs to sum up the reclaimable
inodes counts. There is no point in grabbing a reference to the them or
unlock the RCU critical section for each iteration, so switch to the
more efficient xas_for_each_marked iterator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Thu, 29 Aug 2024 04:08:40 +0000 (07:08 +0300)]

xfs: convert perag lookup to xarray

Convert the perag lookup from the legacy radix tree to the xarray,
which allows for much nicer iteration and bulk lookup semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Thu, 29 Aug 2024 04:08:39 +0000 (07:08 +0300)]

xfs: simplify tagged perag iteration

Pass the old perag structure to the tagged loop helpers so that they can
grab the old agno before releasing the reference. This removes the need
to separately track the agno and the iterator macro, and thus also
obsoletes the for_each_perag_tag syntactic sugar.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Thu, 29 Aug 2024 04:08:38 +0000 (07:08 +0300)]

xfs: move the tagged perag lookup helpers to xfs_icache.c

The tagged perag helpers are only used in xfs_icache.c in the kernel code
and not at all in xfsprogs. Move them to xfs_icache.c in preparation for
switching to an xarray, for which I have no plan to implement the tagged
lookup functions for userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Thu, 29 Aug 2024 04:08:37 +0000 (07:08 +0300)]

xfs: use kfree_rcu_mightsleep to free the perag structures

Using the kfree_rcu_mightsleep is simpler and removes the need for a
rcu_head in the perag structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Hongbo Li [Wed, 21 Aug 2024 06:43:55 +0000 (14:43 +0800)]

xfs: use LIST_HEAD() to simplify code

list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Jiapeng Chong [Wed, 10 Jul 2024 02:45:14 +0000 (10:45 +0800)]

xfs: Remove duplicate xfs_trans_priv.h header

./fs/xfs/libxfs/xfs_defer.c: xfs_trans_priv.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9491
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Dan Carpenter [Fri, 12 Jul 2024 14:07:36 +0000 (09:07 -0500)]

xfs: remove unnecessary check

We checked that "pip" is non-NULL at the start of the if else statement
so there is no need to check again here. Delete the check.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

John Garry [Wed, 10 Jul 2024 10:31:19 +0000 (10:31 +0000)]

xfs: Use xfs set and clear mp state helpers

Use the set and clear mp state helpers instead of open-coding.

It is noted that in some instances calls to atomic operation set_bit() and
clear_bit() are being replaced with test_and_set_bit() and
test_and_clear_bit(), respectively, as there is no specific helpers for
set_bit() and clear_bit() only. However should be ok, as we are just
ignoring the returned value from those "test" variants.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:42 +0000 (09:39 +0200)]

xfs: reclaim speculative preallocations for append only files

The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids
writes that don't append at the current EOF.

But the commit originally adding XFS_DIFLAG_APPEND support (commit
a23321e766d in xfs xfs-import repository) also checked it to skip
releasing speculative preallocations, which doesn't make any sense.

Another commit (dd9f438e3290 in the xfs-import repository) later extended
that flag to also report these speculation preallocations which should
not exist in getbmap.

Remove these checks as nothing XFS_DIFLAG_APPEND implies that
preallocations beyond EOF should exist, but explicitly check for
XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that
discard preallocations on the first close as append only files aren't
expected to be written to only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:41 +0000 (09:39 +0200)]

xfs: simplify extent lookup in xfs_can_free_eofblocks

xfs_can_free_eofblocks just cares if there is an extent beyond EOF.
Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent
as we've already checked that extents are read in earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:40 +0000 (09:39 +0200)]

xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocks

If the XFS_EOFBLOCKS_RELEASED flag is set, we are not going to free the
eofblocks, so don't bother locking the inode or performing the checks in
xfs_can_free_eofblocks. Also switch to a test_and_set operation once
the iolock has been acquire so that only the caller that sets it actually
frees the post-EOF blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Darrick J. Wong [Tue, 13 Aug 2024 07:39:39 +0000 (09:39 +0200)]

xfs: only free posteof blocks on first close

Certain workloads fragment files on XFS very badly, such as a software
package that creates a number of threads, each of which repeatedly run
the sequence: open a file, perform a synchronous write, and close the
file, which defeats the speculative preallocation mechanism. We work
around this problem by only deleting posteof blocks the /first/ time a
file is closed to preserve the behavior that unpacking a tarball lays
out files one after the other with no gaps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: rebased, updated comment, renamed the flag]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Dave Chinner [Tue, 13 Aug 2024 07:39:38 +0000 (09:39 +0200)]

xfs: don't free post-EOF blocks on read close

When we have a workload that does open/read/close in parallel with other
allocation, the file becomes rapidly fragmented. This is due to close()
calling xfs_file_release() and removing the speculative preallocation
beyond EOF.

Add a check for a writable context to xfs_file_release to skip the
post-EOF block freeing (an the similarly pointless flushing on truncate
down).

Before:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 919
/mnt/scratch/file.1: 916
/mnt/scratch/file.2: 919
/mnt/scratch/file.3: 920
/mnt/scratch/file.4: 920
/mnt/scratch/file.5: 921
/mnt/scratch/file.6: 916
/mnt/scratch/file.7: 918

After:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 24
/mnt/scratch/file.1: 24
/mnt/scratch/file.2: 11
/mnt/scratch/file.3: 24
/mnt/scratch/file.4: 3
/mnt/scratch/file.5: 24
/mnt/scratch/file.6: 24
/mnt/scratch/file.7: 23

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: wordsmithing, fix commit message]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: ported to the new ->release code structure]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:37 +0000 (09:39 +0200)]

xfs: skip all of xfs_file_release when shut down

There is no point in trying to free post-EOF blocks when the file system
is shutdown, as it will just error out ASAP. Instead return instantly
when xfs_file_release is called on a shut down file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:36 +0000 (09:39 +0200)]

xfs: don't bother returning errors from xfs_file_release

While ->release returns int, the only caller ignores the return value.
As we're only doing cleanup work there isn't much of a point in
return a value to start with, so just document the situation instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:35 +0000 (09:39 +0200)]

xfs: refactor f_op->release handling

Currently f_op->release is split in not very obvious ways. Fix that by
folding xfs_release into xfs_file_release.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Aug 2024 07:39:34 +0000 (09:39 +0200)]

xfs: remove the i_mode check in xfs_release

xfs_release is only called from xfs_file_release, which is wired up as
the f_op->release handler for regular files only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 04:33:07 +0000 (10:03 +0530)]

Merge tag 'btree-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: cleanups for inode rooted btree code [v4.2 8/8]

This series prepares the btree code to support realtime reverse mapping btrees
by refactoring xfs_ifork_realloc to be fed a per-btree ops structure so that it
can handle multiple types of inode-rooted btrees.  It moves on to refactoring
the btree code to use the new realloc routines.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: standardize the btree maxrecs function parameters
  xfs: replace shouty XFS_BM{BT,DR} macros

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:45:32 +0000 (09:15 +0530)]

Merge tag 'xfs-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: various bug fixes for 6.12 [7/8]

Various bug fixes for 6.12.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'xfs-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
  xfs: fix FITRIM reporting again
  xfs: fix C++ compilation errors in xfs_fs.h

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:44:59 +0000 (09:14 +0530)]

Merge tag 'quota-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: cleanups for quota mount [v4.2 6/8]

Refactor the quota file loading code in preparation for adding metadata
directory trees. Did you know that quotarm works even when quota isn't active?

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'quota-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: refactor loading quota inodes in the regular case

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:44:28 +0000 (09:14 +0530)]

Merge tag 'rtalloc-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: cleanups for the realtime allocator [v4.2 5/8]

This third series cleans up the realtime allocator code so that it'll be
somewhat less difficult to figure out what on earth it's doing.  We also
rearrange the fsmap code a bit.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtalloc-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c
  xfs: rearrange xfs_fsmap.c a little bit
  xfs: replace m_rsumsize with m_rsumblocks
  xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
  xfs: add xchk_setup_nothing and xchk_nothing helpers
  xfs: make the rtalloc start hint a xfs_rtblock_t
  xfs: factor out a xfs_rtallocate_align helper
  xfs: rework the rtalloc fallback handling
  xfs: factor out a xfs_rtallocate helper
  xfs: clean up the ISVALID macro in xfs_bmap_adjacent

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:43:51 +0000 (09:13 +0530)]

Merge tag 'rtalloc-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: fixes for the realtime allocator [v4.2 4/8]

While I was reviewing how to integrate realtime allocation groups with
the rt allocator, I noticed several bugs in the existing allocation code
with regards to calculating the maximum range of rtx to scan for free
space.  This series fixes those range bugs and cleans up a few things
too.

I also added a few cleanups from Christoph.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtalloc-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: simplify xfs_rtalloc_query_range
  xfs: remove xfs_rtb_to_rtxrem
  xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
  xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
  xfs: clean up xfs_rtallocate_extent_exact a bit
  xfs: refactor aligning bestlen to prod
  xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
  xfs: don't return too-short extents from xfs_rtallocate_extent_block
  xfs: ensure rtx mask/shift are correct after growfs
  xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:43:18 +0000 (09:13 +0530)]

Merge tag 'rtbitmap-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: clean up the rtbitmap code [v4.2 3/8]

Here are some cleanups and reorganization of the realtime bitmap code to share
more of that code between userspace and the kernel.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtbitmap-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
  xfs: factor out rtbitmap/summary initialization helpers
  xfs: factor out a xfs_last_rt_bmblock helper
  xfs: factor out a xfs_growfs_rt_bmblock helper
  xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
  xfs: cleanup the calling convention for xfs_rtpick_extent
  xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf
  xfs: assert a valid limit in xfs_rtfind_forw
  xfs: remove the limit argument to xfs_rtfind_back
  xfs: make the RT rsum_cache mandatory
  xfs: factor out a xfs_validate_rt_geometry helper
  xfs: remove xfs_validate_rtextents

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:42:47 +0000 (09:12 +0530)]

Merge tag 'metadir-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: cleanups before adding metadata directories [v4.2 2/8]

Before we start adding code for metadata directory trees, let's clean up
some warts in the realtime bitmap code and the inode allocator code.

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'metadir-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: pass the icreate args object to xfs_dialloc
  xfs: match on the global RT inode numbers in xfs_is_metadata_inode
  xfs: validate inumber in xfs_iget

commit | commitdiff | tree

Chandan Babu R [Tue, 3 Sep 2024 03:41:43 +0000 (09:11 +0530)]

Merge tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: atomic file content commits [v31.1 1/8]

This series creates XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE ioctls
to perform the exchange only if the target file has not been changed
since a given sampling point.

This new functionality uses the mechanism underlying EXCHANGE_RANGE to
stage and commit file updates such that reader programs will see either
the old contents or the new contents in their entirety, with no chance
of torn writes.  A successful call completion guarantees that the new
contents will be seen even if the system fails.  The pair of ioctls
allows userspace to perform what amounts to a compare and exchange
operation on entire file contents.

Note that there are ongoing arguments in the community about how best to
implement some sort of file data write counter that nfsd could also use
to signal invalidations to clients.  Until such a thing is implemented,
this patch will rely on ctime/mtime updates.

Here are the proposed manual pages:

IOCTL-XFS-COMMIT-RANGE(2) System Calls ManualIOCTL-XFS-COMMIT-RANGE(2)

NAME
       ioctl_xfs_start_commit  -  prepare  to exchange the contents of
       two files ioctl_xfs_commit_range - conditionally  exchange  the
       contents of parts of two files

SYNOPSIS
       #include <sys/ioctl.h>
       #include <xfs/xfs_fs.h>

       int  ioctl(int  file2_fd, XFS_IOC_START_COMMIT, struct xfs_com‐
       mit_range *arg);

       int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE,  struct  xfs_com‐
       mit_range *arg);

DESCRIPTION
       Given  a  range  of bytes in a first file file1_fd and a second
       range of bytes in a second file  file2_fd,  this  ioctl(2)  ex‐
       changes  the contents of the two ranges if file2_fd passes cer‐
       tain freshness criteria.

       Before exchanging the  contents,  the  program  must  call  the
       XFS_IOC_START_COMMIT   ioctl   to  sample  freshness  data  for
       file2_fd.  If the sampled metadata  does  not  match  the  file
       metadata  at  commit  time,  XFS_IOC_COMMIT_RANGE  will  return
       EBUSY.

       Exchanges are atomic with regards  to  concurrent  file  opera‐
       tions.   Implementations must guarantee that readers see either
       the old contents or the new contents in their entirety, even if
       the system fails.

       The  system  call  parameters are conveyed in structures of the
       following form:

           struct xfs_commit_range {
               __s32    file1_fd;
               __u32    pad;
               __u64    file1_offset;
               __u64    file2_offset;
               __u64    length;
               __u64    flags;
               __u64    file2_freshness[5];
           };

       The field pad must be zero.

       The fields file1_fd, file1_offset, and length define the  first
       range of bytes to be exchanged.

       The fields file2_fd, file2_offset, and length define the second
       range of bytes to be exchanged.

       The field file2_freshness is an opaque field whose contents are
       determined  by  the  kernel.  These file attributes are used to
       confirm that file2_fd has not changed by another  thread  since
       the current thread began staging its own update.

       Both  files must be from the same filesystem mount.  If the two
       file descriptors represent the same file, the byte ranges  must
       not  overlap.   Most  disk-based  filesystems  require that the
       starts of both ranges must be aligned to the file  block  size.
       If  this  is  the  case, the ends of the ranges must also be so
       aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.

       The field flags control the behavior of the exchange operation.

           XFS_EXCHANGE_RANGE_TO_EOF
                  Ignore the length parameter.  All bytes in  file1_fd
                  from  file1_offset to EOF are moved to file2_fd, and
                  file2's size is set to  (file2_offset+(file1_length-
                  file1_offset)).   Meanwhile, all bytes in file2 from
                  file2_offset to EOF are moved to file1  and  file1's
                  size    is   set   to   (file1_offset+(file2_length-
                  file2_offset)).

           XFS_EXCHANGE_RANGE_DSYNC
                  Ensure that all modified in-core data in  both  file
                  ranges  and  all  metadata updates pertaining to the
                  exchange operation are flushed to persistent storage
                  before  the  call  returns.  Opening either file de‐
                  scriptor with O_SYNC or O_DSYNC will have  the  same
                  effect.

           XFS_EXCHANGE_RANGE_FILE1_WRITTEN
                  Only  exchange sub-ranges of file1_fd that are known
                  to contain data  written  by  application  software.
                  Each  sub-range  may  be  expanded (both upwards and
                  downwards) to align with the file  allocation  unit.
                  For files on the data device, this is one filesystem
                  block.  For files on the realtime  device,  this  is
                  the realtime extent size.  This facility can be used
                  to implement fast atomic  scatter-gather  writes  of
                  any  complexity for software-defined storage targets
                  if all writes are aligned  to  the  file  allocation
                  unit.

           XFS_EXCHANGE_RANGE_DRY_RUN
                  Check  the parameters and the feasibility of the op‐
                  eration, but do not change anything.

RETURN VALUE
       On error, -1 is returned, and errno is set to indicate the  er‐
       ror.

ERRORS
       Error  codes can be one of, but are not limited to, the follow‐
       ing:

       EBADF  file1_fd is not open for reading and writing or is  open
              for  append-only  writes;  or  file2_fd  is not open for
              reading and writing or is open for append-only writes.

       EBUSY  The file2 inode number and timestamps  supplied  do  not
              match file2_fd.

       EINVAL The  parameters  are  not correct for these files.  This
              error can also appear if either file  descriptor  repre‐
              sents  a device, FIFO, or socket.  Disk filesystems gen‐
              erally require the offset and  length  arguments  to  be
              aligned to the fundamental block sizes of both files.

       EIO    An I/O error occurred.

       EISDIR One of the files is a directory.

       ENOMEM The  kernel  was unable to allocate sufficient memory to
              perform the operation.

       ENOSPC There is not enough free space  in  the  filesystem  ex‐
              change the contents safely.

       EOPNOTSUPP
              The filesystem does not support exchanging bytes between
              the two files.

       EPERM  file1_fd or file2_fd are immutable.

       ETXTBSY
              One of the files is a swap file.

       EUCLEAN
              The filesystem is corrupt.

       EXDEV  file1_fd and  file2_fd  are  not  on  the  same  mounted
              filesystem.

CONFORMING TO
       This API is XFS-specific.

USE CASES
       Several use cases are imagined for this system call.  Coordina‐
       tion between multiple threads is performed by the kernel.

       The first is a filesystem defragmenter, which copies  the  con‐
       tents  of  a  file into another file and wishes to exchange the
       space mappings of the two files,  provided  that  the  original
       file has not changed.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);
           struct stat sb;
           struct xfs_commit_range args = {
               .flags = XFS_EXCHANGE_RANGE_TO_EOF,
           };

           /* gather file2's freshness information */
           ioctl(fd, XFS_IOC_START_COMMIT, &args);
           fstat(fd, &sb);

           /* make a fresh copy of the file with terrible alignment to avoid reflink */
           clone_file_range(fd, NULL, temp_fd, NULL, 1, 0);
           clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0);

           /* commit the entire update */
           args.file1_fd = temp_fd;
           ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
           if (ret && errno == EBUSY)
               printf("file changed while defrag was underway
");

       The  second is a data storage program that wants to commit non-
       contiguous updates to a file atomically.  This  program  cannot
       coordinate updates to the file and therefore relies on the ker‐
       nel to reject the COMMIT_RANGE command if the file has been up‐
       dated  by  someone else.  This can be done by creating a tempo‐
       rary file, calling FICLONE(2) to share the contents, and  stag‐
       ing  the  updates into the temporary file.  The FULL_FILES flag
       is recommended for this purpose.  The  temporary  file  can  be
       deleted or punched out afterwards.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);
           struct xfs_commit_range args = {
               .flags = XFS_EXCHANGE_RANGE_TO_EOF,
           };

           /* gather file2's freshness information */
           ioctl(fd, XFS_IOC_START_COMMIT, &args);

           ioctl(temp_fd, FICLONE, fd);

           /* append 1MB of records */
           lseek(temp_fd, 0, SEEK_END);
           write(temp_fd, data1, 1000000);

           /* update record index */
           pwrite(temp_fd, data1, 600, 98765);
           pwrite(temp_fd, data2, 320, 54321);
           pwrite(temp_fd, data2, 15, 0);

           /* commit the entire update */
           args.file1_fd = temp_fd;
           ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
           if (ret && errno == EBUSY)
               printf("file changed before commit; will roll back
");

NOTES
       Some  filesystems may limit the amount of data or the number of
       extents that can be exchanged in a single call.

SEE ALSO
       ioctl(2)

XFS                           2024-02-18     IOCTL-XFS-COMMIT-RANGE(2)

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: introduce new file range commit ioctls

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:21 +0000 (15:37 -0700)]

xfs: standardize the btree maxrecs function parameters

Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions. This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:20 +0000 (15:37 -0700)]

xfs: fix a sloppy memory handling bug in xfs_iroot_realloc

While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block.  However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records.  We /should/ be copying keys.

Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)).  However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:17 +0000 (15:37 -0700)]

xfs: refactor loading quota inodes in the regular case

Create a helper function to load quota inodes in the case where the
dqtype and the sb quota inode fields correspond. This is true for
nearly all the iget callsites in the quota code, except for when we're
switching the group and project quota inodes. We'll need this in
subsequent patches to make the metadir handling less convoluted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:20 +0000 (15:37 -0700)]

xfs: replace shouty XFS_BM{BT,DR} macros

Replace all the shouty bmap btree and bmap disk root macros with actual
functions.

sed \
-e 's/XFS_BMBT_BLOCK_LEN/xfs_bmbt_block_len/g' \
-e 's/XFS_BMBT_REC_ADDR/xfs_bmbt_rec_addr/g' \
-e 's/XFS_BMBT_KEY_ADDR/xfs_bmbt_key_addr/g' \
-e 's/XFS_BMBT_PTR_ADDR/xfs_bmbt_ptr_addr/g' \
-e 's/XFS_BMDR_REC_ADDR/xfs_bmdr_rec_addr/g' \
-e 's/XFS_BMDR_KEY_ADDR/xfs_bmdr_key_addr/g' \
-e 's/XFS_BMDR_PTR_ADDR/xfs_bmdr_ptr_addr/g' \
-e 's/XFS_BMAP_BROOT_PTR_ADDR/xfs_bmap_broot_ptr_addr/g' \
-e 's/XFS_BMAP_BROOT_SPACE_CALC/xfs_bmap_broot_space_calc/g' \
-e 's/XFS_BMAP_BROOT_SPACE/xfs_bmap_broot_space/g' \
-e 's/XFS_BMDR_SPACE_CALC/xfs_bmdr_space_calc/g' \
-e 's/XFS_BMAP_BMDR_SPACE/xfs_bmap_bmdr_space/g' \
-i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:19 +0000 (15:37 -0700)]

xfs: fix FITRIM reporting again

Don't report FITRIMming more bytes than possibly exist in the
filesystem.

Fixes: 410e8a18f8e93 ("xfs: don't bother reporting blocks trimmed via FITRIM")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:18 +0000 (15:37 -0700)]

xfs: fix C++ compilation errors in xfs_fs.h

Several people reported C++ compilation errors due to things that C
compilers allow but C++ compilers do not. Fix both of these problems,
and hope there aren't more of these brown paper bags in 2 months when we
finally get these fixes through the process into a released xfsprogs.

NOTE: I am submitting this bugfix over the objections of a former
maintainer, who insists that we should remove this function from the
published userspace ABI instead of fixing the C++ compilation errors.
No deprecation period, no discussion, just a hard drop of an already
provided and correct C function, which would be in contravention of
Linus' rules. IOWs, removing ABI that have already shipped in a
released kernel requires a careful deprecation period, so I will let
that maintainer run that process.

Reported-by: kernel@mattwhitlock.name
Reported-by: sam@gentoo.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203
Fixes: 233f4e12bbb2c ("xfs: add parent pointer ioctls")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:16 +0000 (15:37 -0700)]

xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c

Move this function out of xfs_ioctl.c to reduce the clutter in there,
and make the entire getfsmap implementation self-contained in a single
file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 30 Aug 2024 22:37:08 +0000 (15:37 -0700)]

xfs: simplify xfs_rtalloc_query_range

There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t. Pass the rtxnum directly and
simply the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

commit | commitdiff | tree

Christoph Hellwig [Fri, 30 Aug 2024 22:36:59 +0000 (15:36 -0700)]

xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock

To prepare for being able to join an already locked rtbitmap inode to a
transaction split out separate helpers for joining the transaction from
the locking helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:36:49 +0000 (15:36 -0700)]

xfs: pass the icreate args object to xfs_dialloc

Pass the xfs_icreate_args object to xfs_dialloc since we can extract the
relevant mode (really just the file type) and parent inumber from there.
This simplifies the calling convention in preparation for the next
patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:36:47 +0000 (15:36 -0700)]

xfs: introduce new file range commit ioctls

This patch introduces two more new ioctls to manage atomic updates to
file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE.  The
commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
does, but with the additional requirement that file2 cannot have changed
since some sampling point.  The start-commit ioctl performs the sampling
of file attributes.

Note: This patch currently samples i_ctime during START_COMMIT and
checks that it hasn't changed during COMMIT_RANGE.  This isn't entirely
safe in kernels prior to 6.12 because ctime only had coarse grained
granularity and very fast updates could collide with a COMMIT_RANGE.
With the multi-granularity ctime introduced by Jeff Layton, it's now
possible to update ctime such that this does not happen.

It is critical, then, that this patch must not be backported to any
kernel that does not support fine-grained file change timestamps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Darrick J. Wong [Fri, 30 Aug 2024 22:37:15 +0000 (15:37 -0700)]

xfs: rearrange xfs_fsmap.c a little bit

The order of the functions in this file has gotten a little confusing
over the years. Specifically, the two data device implementations
(bnobt and rmapbt) could be adjacent in the source code instead of split
in two by the logdev and rtdev fsmap implementations. We're about to
add more functionality to this file, so rearrange things now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 30 Aug 2024 22:37:07 +0000 (15:37 -0700)]

xfs: remove xfs_rtb_to_rtxrem

Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem. Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom