]> www.infradead.org Git - users/hch/xfsprogs.git/log
users/hch/xfsprogs.git
11 months agolibxfs: pass IGET flags through to xfs_iread
Darrick J. Wong [Wed, 3 Jul 2024 21:21:32 +0000 (14:21 -0700)]
libxfs: pass IGET flags through to xfs_iread

Change the lock_flags parameter to iget_flags so that we can supply
XFS_IGET_ flags in future patches.  All callers of libxfs_iget and
libxfs_trans_iget pass zero for this parameter and there are no inode
locks in xfsprogs, so there's no behavior change here.

Port the kernel's version of the xfs_inode_from_disk callsite.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agolibxfs: put all the inode functions in a single file
Darrick J. Wong [Wed, 3 Jul 2024 21:21:32 +0000 (14:21 -0700)]
libxfs: put all the inode functions in a single file

Move all the inode functions into a single source code file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs: hoist project id get/set functions to libxfs
Darrick J. Wong [Wed, 3 Jul 2024 21:21:32 +0000 (14:21 -0700)]
xfs: hoist project id get/set functions to libxfs

Move the project id get and set functions into libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs: hoist inode flag conversion functions to libxfs
Darrick J. Wong [Wed, 3 Jul 2024 21:21:31 +0000 (14:21 -0700)]
xfs: hoist inode flag conversion functions to libxfs

Hoist the inode flag conversion functions into libxfs so that we can
keep them in sync.  Do this by creating a new xfs_inode_util.c file in
libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs: hoist extent size helpers to libxfs
Darrick J. Wong [Wed, 3 Jul 2024 21:21:31 +0000 (14:21 -0700)]
xfs: hoist extent size helpers to libxfs

Move the extent size helpers to xfs_bmap.c in libxfs since they're used
there already.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs: Remove header files which are included more than once
Wenchao Hao [Tue, 16 Jul 2024 21:54:14 +0000 (14:54 -0700)]
xfs: Remove header files which are included more than once

Source kernel commit: a330cae8a7147890262b06e1aa13db048e3b130f

Following warning is reported, so remove these duplicated header
including:

./fs/xfs/libxfs/xfs_trans_resv.c: xfs_da_format.h is included more than once.
./fs/xfs/scrub/quota_repair.c: xfs_format.h is included more than once.
./fs/xfs/xfs_handle.c: xfs_da_btree.h is included more than once.
./fs/xfs/xfs_qm_bhv.c: xfs_mount.h is included more than once.
./fs/xfs/xfs_trace.c: xfs_bmap.h is included more than once.

This is just a clean code, no logic changed.

Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
11 months agoFrom 0c7fcdb6d06cdf8b19b57c17605215b06afa864a Mon Sep 17 00:00:00 2001
lei lu [Fri, 14 Jun 2024 02:22:53 +0000 (10:22 +0800)]
From 0c7fcdb6d06cdf8b19b57c17605215b06afa864a Mon Sep 17 00:00:00 2001
Subject: xfs: don't walk off the end of a directory data block

This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry
to make sure don't stray beyond valid memory region. Before patching, the
loop simply checks that the start offset of the dup and dep is within the
range. So in a crafted image, if last entry is xfs_dir2_data_unused, we
can change dup->length to dup->length-1 and leave 1 byte of space. In the
next traversal, this space will be considered as dup or dep. We may
encounter an out of bound read when accessing the fixed members.

In the patch, we make sure that the remaining bytes large enough to hold
an unused entry before accessing xfs_dir2_data_unused and
xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make
sure that the remaining bytes large enough to hold a dirent with a
single-byte name before accessing xfs_dir2_data_entry.

Signed-off-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
11 months agoFrom d40c2865bdbbbba6418436b0a877daebe1d7c63e Mon Sep 17 00:00:00 2001
Gao Xiang [Tue, 28 May 2024 04:12:39 +0000 (12:12 +0800)]
From d40c2865bdbbbba6418436b0a877daebe1d7c63e Mon Sep 17 00:00:00 2001
Subject: xfs: avoid redundant AGFL buffer invalidation

Currently AGFL blocks can be filled from the following three sources:
 - allocbt free blocks, as in xfs_allocbt_free_block();
 - rmapbt free blocks, as in xfs_rmapbt_free_block();
 - refilled from freespace btrees, as in xfs_alloc_fix_freelist().

Originally, allocbt free blocks would be marked as stale only when they
put back in the general free space pool as Dave mentioned on IRC, "we
don't stale AGF metadata btree blocks when they are returned to the
AGFL .. but once they get put back in the general free space pool, we
have to make sure the buffers are marked stale as the next user of
those blocks might be user data...."

However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks
moved to the free list") and commit edfd9dd54921 ("xfs: move buffer
invalidation to xfs_btree_free_block"), even allocbt / bmapbt free
blocks will be invalidated immediately since they may fail to pass
V5 format validation on writeback even writeback to free space would be
safe.

IOWs, IMHO currently there is actually no difference of free blocks
between AGFL freespace pool and the general free space pool.  So let's
avoid extra redundant AGFL buffer invalidation, since otherwise we're
currently facing unnecessary xfs_log_force() due to xfs_trans_binval()
again on buffers already marked as stale before as below:

[  333.507469] Call Trace:
[  333.507862]  xfs_buf_find+0x371/0x6a0       <- xfs_buf_lock
[  333.508451]  xfs_buf_get_map+0x3f/0x230
[  333.509062]  xfs_trans_get_buf_map+0x11a/0x280
[  333.509751]  xfs_free_agfl_block+0xa1/0xd0
[  333.510403]  xfs_agfl_free_finish_item+0x16e/0x1d0
[  333.511157]  xfs_defer_finish_noroll+0x1ef/0x5c0
[  333.511871]  xfs_defer_finish+0xc/0xa0
[  333.512471]  xfs_itruncate_extents_flags+0x18a/0x5e0
[  333.513253]  xfs_inactive_truncate+0xb8/0x130
[  333.513930]  xfs_inactive+0x223/0x270

xfs_log_force() will take tens of milliseconds with AGF buffer locked.
It becomes an unnecessary long latency especially on our PMEM devices
with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same
time are stuck due to the same AGF lock.  Removing the double
invalidation on the AGFL blocks does not make this issue go away, but
this patch fixes for our workloads in reality and it should also work
by the code analysis.

Note that I'm not sure I need to remove another redundant one in
xfs_alloc_ag_vextent_small() since it's unrelated to our workloads.
Also fstests are passed with this patch.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
11 months agodebian: create a new package for automatic self-healing
Darrick J. Wong [Thu, 11 Jul 2024 22:59:42 +0000 (15:59 -0700)]
debian: create a new package for automatic self-healing

Create a new package for people who explicilty want self-healing turned
on by default for XFS.  This package is named xfsprogs-self-healing.

Note: This introduces a new "install-selfheal" target to install only
the files needed for enabling online fsck by default.  Other
distributions should take note of the new target if they choose to
create a package for enabling autonomous self healing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_scrub: use the self_healing fsproperty to select mode
Darrick J. Wong [Mon, 29 Jul 2024 18:02:51 +0000 (11:02 -0700)]
xfs_scrub: use the self_healing fsproperty to select mode

Now that we can set properties on xfs filesystems, make the xfs_scrub
background service query the self_healing property to figure out which
mode (dry run, optimize, repair, none) it should use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agomisc: shift install targets
Darrick J. Wong [Wed, 3 Jul 2024 21:25:59 +0000 (14:25 -0700)]
misc: shift install targets

Modify each Makefile so that "install-pkg" installs the main package
contents, and "install" just invokes "install-pkg".  We'll need this
indirection for the next patch where we add an install-selfheal target
to build the xfsprogs-self-healing package but will still want 'make
install' to install everything on a developer's workstation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agomkfs: set self_healing property
Darrick J. Wong [Fri, 26 Jul 2024 18:31:47 +0000 (11:31 -0700)]
mkfs: set self_healing property

Add a new mkfs options so that sysadmins can control the background
scrubbing behavior of filesystems from the start.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_scrub: allow sysadmin to control background scrubs
Darrick J. Wong [Fri, 26 Jul 2024 05:47:58 +0000 (22:47 -0700)]
xfs_scrub: allow sysadmin to control background scrubs

Define a "self_healing" filesystem property so that sysadmins can
indicate their preferences for background online fsck.  Add an extended
option to xfs_scrub so that it selects the operation mode from the self
healing fs property.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agolibfrog: define a self_healing filesystem property
Darrick J. Wong [Fri, 26 Jul 2024 20:32:43 +0000 (13:32 -0700)]
libfrog: define a self_healing filesystem property

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_property: add a new tool to administer fs properties
Darrick J. Wong [Fri, 26 Jul 2024 22:09:28 +0000 (15:09 -0700)]
xfs_property: add a new tool to administer fs properties

Create a tool to list, get, set, and remove filesystem properties.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_db: add a command to list xattrs
Darrick J. Wong [Fri, 26 Jul 2024 21:32:56 +0000 (14:32 -0700)]
xfs_db: add a command to list xattrs

Add a command to list extended attributes from xfs_db.  We'll need this
later to manage the fs properties when unmounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agolibxfs: pass a transaction context through listxattr
Darrick J. Wong [Fri, 26 Jul 2024 21:15:49 +0000 (14:15 -0700)]
libxfs: pass a transaction context through listxattr

Pass a transaction context so that a new caller can walk the attr names
and query the values all in one go without deadlocking on nested buffer
access.

While we're at it, make the existing xfs_repair callers try to use
empty transactions so that we don't deadlock on cycles in the xattr
structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agolibxfs: hoist listxattr from xfs_repair
Darrick J. Wong [Fri, 26 Jul 2024 21:08:19 +0000 (14:08 -0700)]
libxfs: hoist listxattr from xfs_repair

Hoist the listxattr code from xfs_repair so that we can use it in
xfs_db.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_db: improve getting and setting extended attributes
Darrick J. Wong [Sat, 27 Jul 2024 00:38:11 +0000 (17:38 -0700)]
xfs_db: improve getting and setting extended attributes

Add an attr_get command to retrieve the value of an xattr from a file;
and extend the attr_set command to allow passing of string values.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_spaceman: edit filesystem properties
Darrick J. Wong [Thu, 25 Jul 2024 18:46:48 +0000 (11:46 -0700)]
xfs_spaceman: edit filesystem properties

Add some new subcommands to xfs_spaceman so that we can examine
filesystem properties.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agolibfrog: support editing filesystem property sets
Darrick J. Wong [Fri, 26 Jul 2024 17:37:30 +0000 (10:37 -0700)]
libfrog: support editing filesystem property sets

Add some library functions so that spaceman and scrub can share the same
code to edit and retrieve filesystem properties.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_repair: allow symlinks with short remote targets
Darrick J. Wong [Wed, 3 Jul 2024 21:21:31 +0000 (14:21 -0700)]
xfs_repair: allow symlinks with short remote targets

Symbolic links can have extended attributes.  If the attr fork consumes
enough space in the inode record, a shortform symlink can become a
remote symlink.  However, if we delete those extended attributes, the
target is not moved back into the inode core.

IOWs, we can end up with a symlink inode that looks like this:

core.magic = 0x494e
core.mode = 0120777
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.nextents = 1
core.size = 297
core.nblocks = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,12,1,0]

This is a symbolic link with a 297-byte target stored in a disk block,
which is to say this is a symlink with a remote target.  The forkoff is
0, which is to say that there's 512 - 176 == 336 bytes in the inode core
to store the data fork.

Prior to kernel commit 1eb70f54c445f, the kernel was ok with this
arrangement, but the change to symlink validation in that patch now
produces corruption errors on filesystems written by older kernels that
are not otherwise inconsistent.  Those changes were inspired by reports
of illegal memory accesses, which I think were a result of making data
fork access decisions based on symlink di_size and not on di_format.

Unfortunately, for a very long time xfs_repair has flagged these inodes
as being corrupt, even though the kernel has historically been willing
to read and write symlinks with these properties.  Resolve the conflict
by adjusting the xfs_repair corruption tests to allow extents format.
This change matches the kernel patch "xfs: allow symlinks with short
remote targets".

While we're at it, fix a lurking bad symlink fork access.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: try spot repairs of metadata items to make scrub progress
Darrick J. Wong [Wed, 3 Jul 2024 21:21:31 +0000 (14:21 -0700)]
xfs_scrub: try spot repairs of metadata items to make scrub progress

Now that we've enabled scrub dependency barriers, it's possible that a
scrub_item_check call will return with some of the scrub items still in
NEEDSCHECK state.  If, for example, scrub type B depends on scrub type
A being clean and A is not clean, B will still be in NEEDSCHECK state.

In order to make as much scanning progress as possible during phase 2
and phase 3, allow ourselves to try some spot repairs in the hopes that
it will enable us to make progress towards at least scanning the whole
metadata item.  If we can't make any forward progress, we'll queue the
scrub item for repair in phase 4, which means that anything still in in
NEEDSCHECK state becomes CORRUPT state.  (At worst, the NEEDSCHECK item
will actually be clean by phase 4, and xfs_scrub will report that it
didn't need any work after all.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: use scrub barriers to reduce kernel calls
Darrick J. Wong [Wed, 3 Jul 2024 21:21:31 +0000 (14:21 -0700)]
xfs_scrub: use scrub barriers to reduce kernel calls

Use scrub barriers so that we can submit a single scrub request for a
bunch of things, and have the kernel stop midway through if it finds
anything broken.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: vectorize repair calls
Darrick J. Wong [Wed, 3 Jul 2024 21:21:30 +0000 (14:21 -0700)]
xfs_scrub: vectorize repair calls

Use the new vectorized scrub kernel calls to reduce the overhead of
performing repairs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: vectorize scrub calls
Darrick J. Wong [Wed, 3 Jul 2024 21:21:30 +0000 (14:21 -0700)]
xfs_scrub: vectorize scrub calls

Use the new vectorized kernel scrub calls to reduce the overhead of
checking metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec
Darrick J. Wong [Wed, 3 Jul 2024 21:21:30 +0000 (14:21 -0700)]
xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec

Convert the scrub and repair epilogue code to pass around xfs_scrub_vecs
as we prepare for vectorized operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: split the repair epilogue code into a separate function
Darrick J. Wong [Wed, 3 Jul 2024 21:21:30 +0000 (14:21 -0700)]
xfs_scrub: split the repair epilogue code into a separate function

Move all the code that updates the internal state in response to a
repair ioctl() call completion into a separate function.  This will help
with vectorizing repair calls later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: split the scrub epilogue code into a separate function
Darrick J. Wong [Wed, 3 Jul 2024 21:21:29 +0000 (14:21 -0700)]
xfs_scrub: split the scrub epilogue code into a separate function

Move all the code that updates the internal state in response to a scrub
ioctl() call completion into a separate function.  This will help with
vectorizing scrub calls later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_io: support vectored scrub
Darrick J. Wong [Wed, 3 Jul 2024 21:21:29 +0000 (14:21 -0700)]
xfs_io: support vectored scrub

Create a new scrubv command to xfs_io to support the vectored scrub
ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibfrog: support vectored scrub
Darrick J. Wong [Wed, 3 Jul 2024 21:21:29 +0000 (14:21 -0700)]
libfrog: support vectored scrub

Enhance libfrog to support performing vectored metadata scrub.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoman: document vectored scrub mode
Darrick J. Wong [Wed, 3 Jul 2024 21:21:29 +0000 (14:21 -0700)]
man: document vectored scrub mode

Add a manpage to document XFS_IOC_SCRUBV_METADATA.  From the kernel
patch:

Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
mode.  The caller specifies the principal metadata object that they want
to scrub (allocation group, inode, etc.) once, followed by an array of
scrub types they want called on that object.  The kernel runs the scrub
operations and writes the output flags and errno code to the
corresponding array element.

A new pseudo scrub type BARRIER is introduced to force the kernel to
return to userspace if any corruptions have been found when scrubbing
the previous scrub types in the array.  This enables userspace to
schedule, for example, the sequence:

 1. data fork
 2. barrier
 3. directory

If the data fork scrub is clean, then the kernel will perform the
directory scrub.  If not, the barrier in 2 will exit back to userspace.

The alternative would have been an interface where userspace passes a
pointer to an empty buffer, and the kernel formats that with
xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
was.  With that the kernel would have to communicate that the buffer
needed to have been at least X size, even though for our cases
XFS_SCRUB_TYPE_NR + 2 would always be enough.

Compared to that, this design keeps all the dependency policy and
ordering logic in userspace where it already resides instead of
duplicating it in the kernel. The downside of that is that it needs the
barrier logic.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime due to fewer transitions across the
system call boundary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: defer phase5 file scans if dirloop fails
Darrick J. Wong [Wed, 3 Jul 2024 21:21:29 +0000 (14:21 -0700)]
xfs_scrub: defer phase5 file scans if dirloop fails

If we cannot fix dirloop problems during the initial phase 5 inode scan,
defer them until later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: detect and repair directory tree corruptions
Darrick J. Wong [Wed, 3 Jul 2024 21:21:28 +0000 (14:21 -0700)]
xfs_scrub: detect and repair directory tree corruptions

Now that we have online fsck for directory tree structure problems, we
need to find a place to call it.  The scanner requires that parent
pointers are enabled, that directory link counts are correct, and that
every directory entry has a corresponding parent pointer.  Therefore, we
can only run it after phase 4 fixes every file, and phase 5 resets the
link counts.

In other words, we call it as part of the phase 5 file scan that we do
to warn about weird looking file names.  This has the added benefit that
opening the directory by handle is less likely to fail if there are
loops in the directory structure.  For now, only plumb in enough to try
to fix directory tree problems right away; the next patch will make
phase 5 retry the dirloop scanner until the problems are fixed or we
stop making forward progress.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: fix erroring out of check_inode_names
Darrick J. Wong [Wed, 3 Jul 2024 21:21:28 +0000 (14:21 -0700)]
xfs_scrub: fix erroring out of check_inode_names

The early exit logic in this function is a bit suboptimal -- we don't
need to close the @fd if we haven't even opened it, and since all errors
are fatal, we don't need to bump the progress counter.  The logic in
this function is about to get more involved due to the addition of the
directory tree structure checker, so clean up these warts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_spaceman: report directory tree corruption in the health information
Darrick J. Wong [Wed, 3 Jul 2024 21:21:28 +0000 (14:21 -0700)]
xfs_spaceman: report directory tree corruption in the health information

Report directories that are the source of corruption in the directory
tree.  While we're at it, add the documentation updates for the new
reporting flags and scrub type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibfrog: add directory tree structure scrubber to scrub library
Darrick J. Wong [Wed, 3 Jul 2024 21:21:28 +0000 (14:21 -0700)]
libfrog: add directory tree structure scrubber to scrub library

Make it so that scrub clients can detect corruptions within the
directory tree structure itself.  Update the documentation for the scrub
ioctl to mention this new functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: wipe ondisk parent pointers when there are none
Darrick J. Wong [Wed, 3 Jul 2024 21:21:27 +0000 (14:21 -0700)]
xfs_repair: wipe ondisk parent pointers when there are none

Erase all the parent pointers when there aren't any found by the
directory entry scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: update ondisk parent pointer records
Darrick J. Wong [Wed, 3 Jul 2024 21:21:27 +0000 (14:21 -0700)]
xfs_repair: update ondisk parent pointer records

Update the ondisk parent pointer records as necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: dump garbage parent pointer attributes
Darrick J. Wong [Wed, 3 Jul 2024 21:21:27 +0000 (14:21 -0700)]
xfs_repair: dump garbage parent pointer attributes

Delete xattrs that have ATTR_PARENT set but are so garbage that they
clearly aren't parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: check parent pointers
Darrick J. Wong [Wed, 3 Jul 2024 21:21:27 +0000 (14:21 -0700)]
xfs_repair: check parent pointers

Use the parent pointer index that we constructed in the previous patch
to check that each file's parent pointer records exactly match the
directory entries that we recorded while walking directory entries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: deduplicate strings stored in string blob
Darrick J. Wong [Wed, 3 Jul 2024 21:21:27 +0000 (14:21 -0700)]
xfs_repair: deduplicate strings stored in string blob

Reduce the memory requirements of the string blob structure by
deduplicating the strings stored within.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: move the global dirent name store to a separate object
Darrick J. Wong [Wed, 3 Jul 2024 21:21:26 +0000 (14:21 -0700)]
xfs_repair: move the global dirent name store to a separate object

Abstract the main parent pointer dirent names xfblob object into a
separate data structure to hide implementation details.

The goals here are (a) reduce memory usage when we can by deduplicating
dirent names that exist in multiple directories; and (b) provide a
unique id for each name in the system so that sorting incore parent
pointer records can be done in a stable manner.  Fast stable sorting of
records is required for the dirent <-> pptr matching algorithm.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: build a parent pointer index
Darrick J. Wong [Wed, 3 Jul 2024 21:21:26 +0000 (14:21 -0700)]
xfs_repair: build a parent pointer index

When we're walking directories during phase 6, build an index of parent
pointers that we expect to find.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: junk duplicate hashtab entries when processing sf dirents
Darrick J. Wong [Wed, 3 Jul 2024 21:21:26 +0000 (14:21 -0700)]
xfs_repair: junk duplicate hashtab entries when processing sf dirents

dir_hash_add() adds the passed-in dirent to the directory hashtab even
if there's already a duplicate.  Therefore, if we detect a duplicate or
a garbage entry while processing the a shortform directory's entries, we
need to junk the newly added entry, just like we do when processing
directory data blocks.

This will become particularly relevant in the next patch, where we
generate a master index of parent pointers from the non-junked hashtab
entries of each directory that phase6 scans.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: add parent pointers when messing with /lost+found
Darrick J. Wong [Wed, 3 Jul 2024 21:21:26 +0000 (14:21 -0700)]
xfs_repair: add parent pointers when messing with /lost+found

Make sure that the /lost+found gets created with parent pointers, and
that lost children being put in there get new parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: junk parent pointer attributes when filesystem doesn't support them
Darrick J. Wong [Wed, 3 Jul 2024 21:21:26 +0000 (14:21 -0700)]
xfs_repair: junk parent pointer attributes when filesystem doesn't support them

Drop a parent pointer xattr if the filesystem doesn't support parent
pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: actually report errors from libxfs_attr_set
Darrick J. Wong [Wed, 3 Jul 2024 21:21:25 +0000 (14:21 -0700)]
xfs_db: actually report errors from libxfs_attr_set

Actually tell the user what went wrong when setting or removing xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: remove some boilerplate from xfs_attr_set
Darrick J. Wong [Wed, 3 Jul 2024 21:21:25 +0000 (14:21 -0700)]
xfs_db: remove some boilerplate from xfs_attr_set

In preparation for online/offline repair wanting to use xfs_attr_set,
move some of the boilerplate out of this function into the callers.
Repair can initialize the da_args completely, and the userspace flag
handling/twisting goes away once we move it to xfs_attr_change.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoman2: update ioctl_xfs_scrub_metadata.2 for parent pointers
Darrick J. Wong [Wed, 3 Jul 2024 21:21:25 +0000 (14:21 -0700)]
man2: update ioctl_xfs_scrub_metadata.2 for parent pointers

Update the man page for the scrub ioctl to reflect the new scrubbing
abilities when parent pointers are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs: create a blob array data structure
Darrick J. Wong [Wed, 3 Jul 2024 21:21:25 +0000 (14:21 -0700)]
xfs: create a blob array data structure

Create a simple 'blob array' data structure for storage of arbitrarily
sized metadata objects that will be used to reconstruct metadata.  For
the intended usage (temporarily storing extended attribute names and
values) we only have to support storing objects and retrieving them.
Use the xfile abstraction to store the attribute information in memory
that can be swapped out.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agomkfs: enable formatting with parent pointers
Allison Henderson [Wed, 3 Jul 2024 21:21:24 +0000 (14:21 -0700)]
mkfs: enable formatting with parent pointers

Enable parent pointer support in mkfs via the '-n parent' parameter.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move the no-V4 filesystem check to join the rest]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agomkfs: Add parent pointers during protofile creation
Allison Henderson [Wed, 3 Jul 2024 21:21:24 +0000 (14:21 -0700)]
mkfs: Add parent pointers during protofile creation

Inodes created from protofile parsing will also need to add the
appropriate parent pointers.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: use xfs_parent_add from libxfs instead of open-coding xfs_attr_set]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibxfs: create new files with attr forks if necessary
Darrick J. Wong [Wed, 3 Jul 2024 21:21:24 +0000 (14:21 -0700)]
libxfs: create new files with attr forks if necessary

Create new files with attr forks if they're going to have parent
pointers.  In the next patch we'll fix mkfs to use the same parent
creation functions as the kernel, so we're going to need this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: compute hashes of parent pointers
Darrick J. Wong [Wed, 3 Jul 2024 21:21:24 +0000 (14:21 -0700)]
xfs_db: compute hashes of parent pointers

Enhance the hash command to compute the hashes of parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: add link and unlink expert commands
Darrick J. Wong [Wed, 3 Jul 2024 21:21:24 +0000 (14:21 -0700)]
xfs_db: add link and unlink expert commands

Create a pair of commands to create and remove directory entries to
support functional testing of directory tree corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: make attr_set and attr_remove handle parent pointers
Darrick J. Wong [Wed, 3 Jul 2024 21:21:23 +0000 (14:21 -0700)]
xfs_db: make attr_set and attr_remove handle parent pointers

Make it so that xfs_db can load up the filesystem (somewhat uselessly)
with parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: add a parents command to list the parents of a file
Darrick J. Wong [Wed, 3 Jul 2024 21:21:23 +0000 (14:21 -0700)]
xfs_db: add a parents command to list the parents of a file

Create a command to dump the parents of a file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h
Darrick J. Wong [Wed, 3 Jul 2024 21:21:23 +0000 (14:21 -0700)]
libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h

Do the xfs -> libxfs switcheroo and cleanups separately so the next
patch doesn't become an even larger mess.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: obfuscate dirent and parent pointer names consistently
Darrick J. Wong [Wed, 3 Jul 2024 21:21:23 +0000 (14:21 -0700)]
xfs_db: obfuscate dirent and parent pointer names consistently

When someone wants to perform an obfuscated metadump of a filesystem
where parent pointers are enabled, we have to use the *exact* same
obfuscated name for both the directory entry and the parent pointer.

Create a name remapping table so that when we obfuscate a dirent name or
a parent pointer name, we can apply the same obfuscation when we find
the corresponding parent pointer or dirent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: report parent pointers embedded in xattrs
Darrick J. Wong [Wed, 3 Jul 2024 21:21:22 +0000 (14:21 -0700)]
xfs_db: report parent pointers embedded in xattrs

Decode the parent pointer inode, generation, and name fields if the
parent pointer passes basic validation checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: report parent bit on xattrs
Darrick J. Wong [Wed, 3 Jul 2024 21:21:22 +0000 (14:21 -0700)]
xfs_db: report parent bit on xattrs

Display the parent bit on xattr keys

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_db: report parent pointers in version command
Darrick J. Wong [Wed, 3 Jul 2024 21:21:22 +0000 (14:21 -0700)]
xfs_db: report parent pointers in version command

Report the presents of PARENT pointers from the version subcommand.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: use parent pointers to report lost file data
Darrick J. Wong [Wed, 3 Jul 2024 21:21:22 +0000 (14:21 -0700)]
xfs_scrub: use parent pointers to report lost file data

If parent pointers are enabled, compute the path to the file while we're
doing the fsmap scan and report that, instead of walking the entire
directory tree to print the paths of the (hopefully few) files that lost
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: use parent pointers when possible to report file operations
Darrick J. Wong [Wed, 3 Jul 2024 21:21:22 +0000 (14:21 -0700)]
xfs_scrub: use parent pointers when possible to report file operations

If parent pointers are available, use them to supply file paths when
doing things to files, instead of merely printing the inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_spaceman: report file paths
Darrick J. Wong [Wed, 3 Jul 2024 21:21:21 +0000 (14:21 -0700)]
xfs_spaceman: report file paths

Teach the health command to report file paths when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_logprint: decode parent pointers in ATTRI items fully
Allison Henderson [Wed, 3 Jul 2024 21:21:21 +0000 (14:21 -0700)]
xfs_logprint: decode parent pointers in ATTRI items fully

This patch modifies the ATTRI print routines to look for the parent
pointer flag, and decode logged parent pointers fully when dumping log
contents.  Between the existing ATTRI: printouts and the new ones
introduced here, we can figure out what was stored in each log iovec,
as well as the higher level parent pointer that was logged.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_io: Add i, n and f flags to parent command
Allison Henderson [Wed, 3 Jul 2024 21:21:21 +0000 (14:21 -0700)]
xfs_io: Add i, n and f flags to parent command

This patch adds the flags i, n, and f to the parent command. These flags add
filtering options that are used by the new parent pointer tests in xfstests, and
help to improve the test run time.  The flags are:

-i: Only show parent pointer records containing the given inode
-n: Only show parent pointer records containing the given filename
-f: Print records in short format: ino/gen/namelen/name

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adapt to new getparents ioctl]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_io: adapt parent command to new parent pointer ioctls
Darrick J. Wong [Wed, 3 Jul 2024 21:21:21 +0000 (14:21 -0700)]
xfs_io: adapt parent command to new parent pointer ioctls

For ages, xfs_io has had a totally useless 'parent' command that enabled
callers to walk the parents or print the directory tree path of an open
file.  This code used the ioctl interface presented by SGI's version of
parent pointers that was never merged.  Rework the code in here to use
the new ioctl interfaces that we've settled upon.  Get rid of the old
parent pointer checking code since xfs_repair/xfs_scrub will take care
of that.

(This originally was in the "xfsprogs: implement the upper half of
parent pointers" megapatch.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibfrog: add parent pointer support code
Darrick J. Wong [Wed, 3 Jul 2024 21:21:20 +0000 (14:21 -0700)]
libfrog: add parent pointer support code

Add some support code to libfrog so that client programs can walk file
descriptors and handles upwards through the directory tree; and obtain a
reasonable file path from a file descriptor/handle.  This code will be
used in xfsprogs utilities.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibfrog: report parent pointers to userspace
Darrick J. Wong [Wed, 3 Jul 2024 21:21:20 +0000 (14:21 -0700)]
libfrog: report parent pointers to userspace

Report the presence of parent pointer to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoman: document the XFS_IOC_GETPARENTS ioctl
Darrick J. Wong [Wed, 3 Jul 2024 21:21:20 +0000 (14:21 -0700)]
man: document the XFS_IOC_GETPARENTS ioctl

Document how this new ioctl works.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_logprint: dump new attr log item fields
Darrick J. Wong [Wed, 3 Jul 2024 21:21:20 +0000 (14:21 -0700)]
xfs_logprint: dump new attr log item fields

Dump the new extended attribute log item fields.  This was split out
from the previous patch to make libxfs resyncing easier.  This code
needs more cleaning, which we'll do in the next few patches before
moving on to the parent pointer code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_{db,repair}: implement new attr hash value function
Darrick J. Wong [Wed, 3 Jul 2024 21:21:20 +0000 (14:21 -0700)]
xfs_{db,repair}: implement new attr hash value function

Port existing utilities to use libxfs_attr_hashname instead of calling
libxfs_da_hashname directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agolibxfs: create attr log item opcodes and formats for parent pointers
Darrick J. Wong [Wed, 3 Jul 2024 21:21:19 +0000 (14:21 -0700)]
libxfs: create attr log item opcodes and formats for parent pointers

Update xfs_attr_defer_add to use the pptr-specific opcodes if it's
reading or writing parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: check for unknown flags in attr entries
Darrick J. Wong [Wed, 3 Jul 2024 21:21:19 +0000 (14:21 -0700)]
xfs_repair: check for unknown flags in attr entries

Explicitly check for unknown bits being set in the shortform and leaf
attr entries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: enforce one namespace bit per extended attribute
Darrick J. Wong [Wed, 3 Jul 2024 21:21:19 +0000 (14:21 -0700)]
xfs_repair: enforce one namespace bit per extended attribute

Enforce that all extended attributes have at most one namespace bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_repair: check free space requirements before allowing upgrades
Darrick J. Wong [Wed, 3 Jul 2024 21:21:19 +0000 (14:21 -0700)]
xfs_repair: check free space requirements before allowing upgrades

Currently, the V5 feature upgrades permitted by xfs_repair do not affect
filesystem space usage, so we haven't needed to verify the geometry.

However, this will change once we start to allow the sysadmin to add new
metadata indexes to existing filesystems.  Add all the infrastructure we
need to ensure that there's enough space for metadata space reservations
and per-AG reservations the next time the filesystem will be mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
[david: Recompute transaction reservation values; Exit with error if upgrade fails]
Signed-off-by: Dave Chinner <david@fromorbit.com>
[djwong: Refuse to upgrade if any part of the fs has < 10% free]
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: add an optimization-only mode
Darrick J. Wong [Wed, 3 Jul 2024 21:21:18 +0000 (14:21 -0700)]
xfs_scrub: add an optimization-only mode

Add a "preen" mode in which we only optimize filesystem metadata.
Repairs will result in early exits.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: automatic downgrades to dry-run mode in service mode
Darrick J. Wong [Wed, 3 Jul 2024 21:21:18 +0000 (14:21 -0700)]
xfs_scrub: automatic downgrades to dry-run mode in service mode

When service mode is enabled, xfs_scrub is being run within the context
of a systemd service.  The service description language doesn't have any
particularly good constructs for adding in a '-n' argument if the
filesystem is readonly, which means that xfs_scrub is passed a path, and
needs to switch to dry-run mode on its own if the fs is mounted
readonly or the kernel doesn't support repairs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: fail fast on masked units
Darrick J. Wong [Tue, 23 Jul 2024 23:27:45 +0000 (16:27 -0700)]
xfs_scrub_all: fail fast on masked units

If xfs_scrub_all tries to start a masked xfs_scrub@ unit, that's a sign
that the system administrator really didn't want us to scrub that
filesystem.  Instead of retrying pointlessly, just make a note of the
failure and move on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: implement retry and backoff for dbus calls
Darrick J. Wong [Wed, 3 Jul 2024 21:21:18 +0000 (14:21 -0700)]
xfs_scrub_all: implement retry and backoff for dbus calls

Calls to systemd across dbus are remote procedure calls, which means
that they're subject to transitory connection failures (e.g. systemd
re-exec itself).  We don't want to fail at the *first* sign of what
could be temporary trouble, so implement a limited retry with fibonacci
backoff before we resort to invoking xfs_scrub as a subprocess.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: convert systemctl calls to dbus
Darrick J. Wong [Wed, 3 Jul 2024 21:21:18 +0000 (14:21 -0700)]
xfs_scrub_all: convert systemctl calls to dbus

Convert the systemctl invocations to direct dbus calls, which decouples
us from the CLI in favor of direct API calls.  This spares us from some
of the insanity of divining service state from program outputs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: add CLI option for easier debugging
Darrick J. Wong [Wed, 3 Jul 2024 21:21:17 +0000 (14:21 -0700)]
xfs_scrub_all: add CLI option for easier debugging

Add a new CLI argument to make it easier to figure out what exactly the
program is doing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: encapsulate all the systemctl code in an object
Darrick J. Wong [Wed, 3 Jul 2024 21:21:17 +0000 (14:21 -0700)]
xfs_scrub_all: encapsulate all the systemctl code in an object

Move all the systemd service handling code to an object so that we can
contain all the insanity^Wdetails in a single place.  This also makes
the killfuncs handling similar to starting background processes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: encapsulate all the subprocess code in an object
Darrick J. Wong [Wed, 3 Jul 2024 21:21:17 +0000 (14:21 -0700)]
xfs_scrub_all: encapsulate all the subprocess code in an object

Move all the xfs_scrub subprocess handling code to an object so that we
can contain all the details in a single place.  This also simplifies the
background state management.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: failure reporting for the xfs_scrub_all job
Darrick J. Wong [Wed, 3 Jul 2024 21:21:17 +0000 (14:21 -0700)]
xfs_scrub_all: failure reporting for the xfs_scrub_all job

Create a failure reporting service for when xfs_scrub_all fails.  This
shouldn't happen often, but let's report anyways.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: trigger automatic media scans once per month
Darrick J. Wong [Wed, 3 Jul 2024 21:21:16 +0000 (14:21 -0700)]
xfs_scrub_all: trigger automatic media scans once per month

Teach the xfs_scrub_all background service to trigger an automatic scan
of all file data once per month.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: enable periodic file data scrubs automatically
Darrick J. Wong [Wed, 3 Jul 2024 21:21:16 +0000 (14:21 -0700)]
xfs_scrub_all: enable periodic file data scrubs automatically

Enhance xfs_scrub_all with the ability to initiate a file data scrub
periodically.  The user must specify the period, and they may optionally
specify the path to a file that will record the last time the file data
was scrubbed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: support metadata+media scans of all filesystems
Darrick J. Wong [Wed, 3 Jul 2024 21:21:16 +0000 (14:21 -0700)]
xfs_scrub_all: support metadata+media scans of all filesystems

Add the necessary systemd services and control bits so that
xfs_scrub_all can kick off a metadata+media scan of a filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: remove journalctl background process
Darrick J. Wong [Wed, 3 Jul 2024 21:21:16 +0000 (14:21 -0700)]
xfs_scrub_all: remove journalctl background process

Now that we only start systemd services if we're running in service
mode, there's no need for the background journalctl process that only
ran if we had started systemd services in non-service mode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: only use the xfs_scrub@ systemd services in service mode
Darrick J. Wong [Wed, 3 Jul 2024 21:21:16 +0000 (14:21 -0700)]
xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode

Since the per-mount xfs_scrub@.service definition includes a bunch of
resource usage constraints, we no longer want to use those services if
xfs_scrub_all is being run directly by the sysadmin (aka not in service
mode) on the presumption that sysadmins want answers as quickly as
possible.

Therefore, only try to call the systemd service from xfs_scrub_all if
SERVICE_MODE is set in the environment.  If reaching out to systemd
fails and we're in service mode, we still want to run xfs_scrub
directly.  Split the makefile variables as necessary so that we only
pass -b to xfs_scrub in service mode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_all: tighten up the security on the background systemd service
Darrick J. Wong [Wed, 3 Jul 2024 21:21:15 +0000 (14:21 -0700)]
xfs_scrub_all: tighten up the security on the background systemd service

Currently, xfs_scrub_all has to run with enough privileges to find
mounted XFS filesystems and the device associated with that mount and to
start xfs_scrub@<mountpoint> sub-services.  Minimize the risk of
xfs_scrub_all escaping its service container or contaminating the rest
of the system by using systemd's sandboxing controls to prohibit as much
access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub_all.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub_fail: tighten up the security on the background systemd service
Darrick J. Wong [Wed, 3 Jul 2024 21:21:15 +0000 (14:21 -0700)]
xfs_scrub_fail: tighten up the security on the background systemd service

Currently, xfs_scrub_fail has to run with enough privileges to access
the journal contents for a given scrub run and to send a report via
email.  Minimize the risk of xfs_scrub_fail escaping its service
container or contaminating the rest of the system by using systemd's
sandboxing controls to prohibit as much access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub_fail@.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: tighten up the security on the background systemd service
Darrick J. Wong [Wed, 3 Jul 2024 21:21:15 +0000 (14:21 -0700)]
xfs_scrub: tighten up the security on the background systemd service

Currently, xfs_scrub has to run with some elevated privileges.  Minimize
the risk of xfs_scrub escaping its service container or contaminating
the rest of the system by using systemd's sandboxing controls to
prohibit as much access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub@.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: use dynamic users when running as a systemd service
Darrick J. Wong [Wed, 3 Jul 2024 21:21:15 +0000 (14:21 -0700)]
xfs_scrub: use dynamic users when running as a systemd service

Five years ago, systemd introduced the DynamicUser directive that
allocates a new unique user/group id, runs a service with those ids, and
deletes them after the service exits.  This is a good replacement for
User=nobody, since it eliminates the threat of nobody-services messing
with each other.

Make this transition ahead of all the other security tightenings that
will land in the next few patches, and add credits for the people who
suggested the change and reviewed it.

Link: https://0pointer.net/blog/dynamic-users-with-systemd.html
Suggested-by: Helle Vaanzinn <glitsj16@riseup.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
11 months agoxfs_scrub.service: reduce background CPU usage to less than one core if possible
Darrick J. Wong [Wed, 3 Jul 2024 21:21:14 +0000 (14:21 -0700)]
xfs_scrub.service: reduce background CPU usage to less than one core if possible

Currently, the xfs_scrub background service is configured to use -b,
which means that the program runs completely serially.  However, even
using all of one CPU core with idle priority may be enough to cause
thermal throttling and unwanted fan noise on smaller systems (e.g.
laptops) with fast IO systems.

Let's try to avoid this (at least on systemd) by using cgroups to limit
the program's usage to slghtly more than half of one CPU and lowering
the nice priority in the scheduler.  What we /really/ want is to run
steadily on an efficiency core, but there doesn't seem to be a means to
ask the scheduler not to ramp up the CPU frequency for a particular
task.

While we're at it, group the resource limit directives together.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: allow auxiliary pathnames for sandboxing
Darrick J. Wong [Wed, 3 Jul 2024 21:21:14 +0000 (14:21 -0700)]
xfs_scrub: allow auxiliary pathnames for sandboxing

In the next patch, we'll tighten up the security on the xfs_scrub
service so that it can't escape.  However, sandboxing the service
involves making the host filesystem as inaccessible as possible, with
the filesystem to scrub bind mounted onto a known location within the
sandbox.  Hence we need one path for reporting and a new -M argument to
tell scrub what it should actually be trying to open.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: tune fstrim minlen parameter based on free space histograms
Darrick J. Wong [Wed, 3 Jul 2024 21:21:14 +0000 (14:21 -0700)]
xfs_scrub: tune fstrim minlen parameter based on free space histograms

Currently, phase 8 runs very slowly on filesystems with a lot of small
free space extents.  To reduce the amount of time spent on fstrim
activities during phase 8, we want to balance estimated runtime against
completeness of the trim.  In short, the goal is to reduce runtime by
avoiding small trim requests.

At the start of phase 8, a CDF is computed in decreasing order of extent
length from the histogram buckets created during the fsmap scan in phase
7.  A point corresponding to the fstrim percentage target is chosen from
the CDF and mapped back to a histogram bucket, and free space extents
smaller than that amount are ommitted from fstrim.

On my aging /home filesystem, the free space histogram reported by
xfs_spaceman looks like this:

   from      to extents    blocks    pct blkcdf extcdf
      1       1  121953    121953   0.04 100.00 100.00
      2       3  124741    299694   0.09  99.96  81.16
      4       7  113492    593763   0.18  99.87  61.89
      8      15  109215   1179524   0.36  99.69  44.36
     16      31   76972   1695455   0.52  99.33  27.48
     32      63   48655   2219667   0.68  98.82  15.59
     64     127   31398   2876898   0.88  98.14   8.08
    128     255    8014   1447920   0.44  97.27   3.23
    256     511    4142   1501758   0.46  96.82   1.99
    512    1023    2433   1768732   0.54  96.37   1.35
   1024    2047    1795   2648460   0.81  95.83   0.97
   2048    4095    1429   4206103   1.28  95.02   0.69
   4096    8191    1045   6162111   1.88  93.74   0.47
   8192   16383     791   9242745   2.81  91.87   0.31
  16384   32767     473  10883977   3.31  89.06   0.19
  32768   65535     272  12385566   3.77  85.74   0.12
  65536  131071     192  18098739   5.51  81.98   0.07
 131072  262143     108  20675199   6.29  76.47   0.04
 262144  524287      80  29061285   8.84  70.18   0.03
 524288 1048575      39  29002829   8.83  61.33   0.02
1048576 2097151      25  36824985  11.21  52.51   0.01
2097152 4194303      32 101727192  30.95  41.30   0.01
4194304 8388607       7  34007410  10.35  10.35   0.00

From this table, we see that free space extents that are 16 blocks or
longer constitute 99.3% of the free space in the filesystem but only
27.5% of the extents.  If we set the fstrim minlen parameter to 16
blocks, that means that we can trim over 99% of the space in one third
of the time it would take to trim everything.

Add a new -o fstrim_pct= option to xfs_scrub just in case there are
users out there who want a different percentage.  For example, accepting
a 95% trim would net us a speed increase of nearly two orders of
magnitude, ignoring system call overhead.  Setting it to 100% will trim
everything, just like fstrim(8).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
11 months agoxfs_scrub: collect free space histograms during phase 7
Darrick J. Wong [Wed, 3 Jul 2024 21:21:14 +0000 (14:21 -0700)]
xfs_scrub: collect free space histograms during phase 7

Collect a histogram of free space observed during phase 7.  We'll put
this information to use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>