]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 months agoxfs: enable realtime reflink
Darrick J. Wong [Thu, 21 Nov 2024 00:21:17 +0000 (16:21 -0800)]
xfs: enable realtime reflink

Enable reflink for realtime devices, as long as the realtime allocation
unit is a single fsblock.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: fix CoW forks for realtime files
Darrick J. Wong [Thu, 21 Nov 2024 00:21:16 +0000 (16:21 -0800)]
xfs: fix CoW forks for realtime files

Port the copy on write fork repair to realtime files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check for shared rt extents when rebuilding rt file's data fork
Darrick J. Wong [Thu, 21 Nov 2024 00:21:15 +0000 (16:21 -0800)]
xfs: check for shared rt extents when rebuilding rt file's data fork

When we're rebuilding the data fork of a realtime file, we need to
cross-reference each mapping with the rt refcount btree to ensure that
the reflink flag is set if there are any shared extents found.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: repair inodes that have a refcount btree in the data fork
Darrick J. Wong [Thu, 21 Nov 2024 00:21:15 +0000 (16:21 -0800)]
xfs: repair inodes that have a refcount btree in the data fork

Plumb knowledge of refcount btrees into the inode core repair code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: online repair of the realtime refcount btree
Darrick J. Wong [Thu, 21 Nov 2024 00:21:14 +0000 (16:21 -0800)]
xfs: online repair of the realtime refcount btree

Port the data device's refcount btree repair code to the realtime
refcount btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: capture realtime CoW staging extents when rebuilding rt rmapbt
Darrick J. Wong [Thu, 21 Nov 2024 00:21:13 +0000 (16:21 -0800)]
xfs: capture realtime CoW staging extents when rebuilding rt rmapbt

Walk the realtime refcount btree to find the CoW staging extents when
we're rebuilding the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: walk the rt reference count tree when rebuilding rmap
Darrick J. Wong [Thu, 21 Nov 2024 00:21:12 +0000 (16:21 -0800)]
xfs: walk the rt reference count tree when rebuilding rmap

When we're rebuilding the data device rmap, if we encounter a "refcount"
format fork, we have to walk the (realtime) refcount btree inode to
build the appropriate mappings.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check new rtbitmap records against rt refcount btree
Darrick J. Wong [Thu, 21 Nov 2024 00:21:11 +0000 (16:21 -0800)]
xfs: check new rtbitmap records against rt refcount btree

When we're rebuilding the realtime bitmap, check the proposed free
extents against the rt refcount btree to make sure we don't commit any
grievous errors.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: don't flag quota rt block usage on rtreflink filesystems
Darrick J. Wong [Thu, 21 Nov 2024 00:21:11 +0000 (16:21 -0800)]
xfs: don't flag quota rt block usage on rtreflink filesystems

Quota space usage is allowed to exceed the size of the physical storage
when reflink is enabled.  Now that we have reflink for the realtime
volume, apply this same logic to the rtb repair logic.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Thu, 21 Nov 2024 00:21:10 +0000 (16:21 -0800)]
xfs: scrub the metadir path of rt refcount btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: detect and repair misaligned rtinherit directory cowextsize hints
Darrick J. Wong [Thu, 21 Nov 2024 00:21:09 +0000 (16:21 -0800)]
xfs: detect and repair misaligned rtinherit directory cowextsize hints

If we encounter a directory that has been configured to pass on a CoW
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review and/or turn it off because that is a
misconfiguration.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: allow dquot rt block count to exceed rt blocks on reflink fs
Darrick J. Wong [Thu, 21 Nov 2024 00:21:09 +0000 (16:21 -0800)]
xfs: allow dquot rt block count to exceed rt blocks on reflink fs

Update the quota scrubber to allow dquots where the realtime block count
exceeds the block count of the rt volume if reflink is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check reference counts of gaps between rt refcount records
Darrick J. Wong [Thu, 21 Nov 2024 00:21:08 +0000 (16:21 -0800)]
xfs: check reference counts of gaps between rt refcount records

If there's a gap between records in the rt refcount btree, we ought to
cross-reference the gap with the rtrmap records to make sure that there
aren't any overlapping records for a region that doesn't have any shared
ownership.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: allow overlapping rtrmapbt records for shared data extents
Darrick J. Wong [Thu, 21 Nov 2024 00:21:07 +0000 (16:21 -0800)]
xfs: allow overlapping rtrmapbt records for shared data extents

Allow overlapping realtime reverse mapping records if they both describe
shared data extents and the fs supports reflink on the realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: cross-reference checks with the rt refcount btree
Darrick J. Wong [Thu, 21 Nov 2024 00:21:06 +0000 (16:21 -0800)]
xfs: cross-reference checks with the rt refcount btree

Use the realtime refcount btree to implement cross-reference checks in
other data structures.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Thu, 21 Nov 2024 00:21:06 +0000 (16:21 -0800)]
xfs: scrub the realtime refcount btree

Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Thu, 21 Nov 2024 00:21:05 +0000 (16:21 -0800)]
xfs: report realtime refcount btree corruption errors to the health system

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check that the rtrefcount maxlevels doesn't increase when growing fs
Darrick J. Wong [Thu, 21 Nov 2024 00:21:04 +0000 (16:21 -0800)]
xfs: check that the rtrefcount maxlevels doesn't increase when growing fs

The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees.  Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in the rt refcount btree maxlevels.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Thu, 21 Nov 2024 00:21:03 +0000 (16:21 -0800)]
xfs: enable extent size hints for CoW operations

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Thu, 21 Nov 2024 00:21:03 +0000 (16:21 -0800)]
xfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Thu, 21 Nov 2024 00:21:02 +0000 (16:21 -0800)]
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: recover CoW leftovers in the realtime volume
Darrick J. Wong [Thu, 21 Nov 2024 00:21:01 +0000 (16:21 -0800)]
xfs: recover CoW leftovers in the realtime volume

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Thu, 21 Nov 2024 00:21:00 +0000 (16:21 -0800)]
xfs: allow inodes to have the realtime and reflink flags

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: enable sharing of realtime file blocks
Darrick J. Wong [Thu, 21 Nov 2024 00:21:00 +0000 (16:21 -0800)]
xfs: enable sharing of realtime file blocks

Update the remapping routines to be able to handle realtime files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: enable CoW for realtime data
Darrick J. Wong [Thu, 21 Nov 2024 00:20:59 +0000 (16:20 -0800)]
xfs: enable CoW for realtime data

Update our write paths to support copy on write on the rt volume.  This
works in more or less the same way as it does on the data device, with
the major exception that we never do delalloc on the rt volume.

Because we consider unwritten CoW fork staging extents to be incore
quota reservation, we update xfs_quota_reserve_blkres to support this
case.  Though xfs doesn't allow rt and quota together, the change is
trivial and we shouldn't leave a logic bomb here.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: refactor reflink quota updates
Darrick J. Wong [Thu, 21 Nov 2024 00:20:58 +0000 (16:20 -0800)]
xfs: refactor reflink quota updates

Hoist all quota updates for reflink into a helper function, since things
are about to become more complicated.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Thu, 21 Nov 2024 00:20:57 +0000 (16:20 -0800)]
xfs: compute rtrmap btree max levels when reflink enabled

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Thu, 21 Nov 2024 00:20:57 +0000 (16:20 -0800)]
xfs: update rmap to allow cow staging extents in the rt rmap

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Thu, 21 Nov 2024 00:20:56 +0000 (16:20 -0800)]
xfs: create routine to allocate and initialize a realtime refcount btree inode

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Thu, 21 Nov 2024 00:20:55 +0000 (16:20 -0800)]
xfs: wire up realtime refcount btree cursors

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: refactor xfs_reflink_find_shared
Christoph Hellwig [Thu, 21 Nov 2024 00:20:54 +0000 (16:20 -0800)]
xfs: refactor xfs_reflink_find_shared

Move lookup of the perag structure from the callers into the helpers,
and return the offset into the extent of the shared region instead of
the block number that needs post-processing.  This prepares the
callsites for the creation of an rt-specific variant in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: port to the middle of the rtreflink series for cleanliness]
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
7 months agoxfs: wire up a new metafile type for the realtime refcount
Darrick J. Wong [Thu, 21 Nov 2024 00:20:54 +0000 (16:20 -0800)]
xfs: wire up a new metafile type for the realtime refcount

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Thu, 21 Nov 2024 00:20:53 +0000 (16:20 -0800)]
xfs: add metadata reservations for realtime refcount btree

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Thu, 21 Nov 2024 00:20:52 +0000 (16:20 -0800)]
xfs: add realtime refcount btree inode to metadata directory

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add realtime refcount btree block detection to log recovery
Darrick J. Wong [Thu, 21 Nov 2024 00:20:51 +0000 (16:20 -0800)]
xfs: add realtime refcount btree block detection to log recovery

Identify rt refcount btree blocks in the log correctly so that we can
validate them during log recovery.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support recovering refcount intent items targetting realtime extents
Darrick J. Wong [Thu, 21 Nov 2024 00:20:51 +0000 (16:20 -0800)]
xfs: support recovering refcount intent items targetting realtime extents

Now that we have reflink on the realtime device, refcount intent items
have to support remapping extents on the realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Thu, 21 Nov 2024 00:20:50 +0000 (16:20 -0800)]
xfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Thu, 21 Nov 2024 00:20:49 +0000 (16:20 -0800)]
xfs: prepare refcount functions to deal with rtrefcountbt

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Thu, 21 Nov 2024 00:20:48 +0000 (16:20 -0800)]
xfs: add realtime refcount btree operations

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Thu, 21 Nov 2024 00:20:48 +0000 (16:20 -0800)]
xfs: realtime refcount btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: introduce realtime refcount btree ondisk definitions
Darrick J. Wong [Thu, 21 Nov 2024 00:20:47 +0000 (16:20 -0800)]
xfs: introduce realtime refcount btree ondisk definitions

Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation, though the changes to actually root the rtrefcount btree
in an inode come later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Thu, 21 Nov 2024 00:20:46 +0000 (16:20 -0800)]
xfs: namespace the maximum length/refcount symbols

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: prepare refcount btree cursor tracepoints for realtime
Darrick J. Wong [Thu, 21 Nov 2024 00:20:45 +0000 (16:20 -0800)]
xfs: prepare refcount btree cursor tracepoints for realtime

Rework the refcount btree cursor tracepoints in preparation to handle the
realtime refcount btree cursor.  Mostly this involves renaming the field to
"refcbno" and extracting the group number from the cursor when possible.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: enable realtime rmap btree
Darrick J. Wong [Thu, 21 Nov 2024 00:20:45 +0000 (16:20 -0800)]
xfs: enable realtime rmap btree

Permit mounting filesystems with realtime rmap btrees.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: react to fsdax failure notifications on the rt device
Darrick J. Wong [Tue, 17 Dec 2024 21:22:44 +0000 (13:22 -0800)]
xfs: react to fsdax failure notifications on the rt device

Now that we have reverse mapping for the realtime device, use the
information to kill processes that have mappings to bad pmem.  This
requires refactoring the existing routines to handle rtgroups or AGs;
and splitting out the translation function to improve cohesion.
Also make a proper header file for the dax holder ops.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: don't shut down the filesystem for media failures beyond end of log
Darrick J. Wong [Tue, 17 Dec 2024 21:43:06 +0000 (13:43 -0800)]
xfs: don't shut down the filesystem for media failures beyond end of log

If the filesystem has an external log device on pmem and the pmem
reports a media error beyond the end of the log area, don't shut down
the filesystem because we don't use that space.

Cc: <stable@vger.kernel.org> # v6.0
Fixes: 6f643c57d57c56 ("xfs: implement ->notify_failure() for XFS")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: hook live realtime rmap operations during a repair operation
Darrick J. Wong [Thu, 21 Nov 2024 00:20:44 +0000 (16:20 -0800)]
xfs: hook live realtime rmap operations during a repair operation

Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Thu, 21 Nov 2024 00:20:43 +0000 (16:20 -0800)]
xfs: create a shadow rmap btree during realtime rmap repair

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Thu, 21 Nov 2024 00:20:42 +0000 (16:20 -0800)]
xfs: online repair of the realtime rmap btree

Repair the realtime rmap btree while mounted.  Similar to the regular
rmap btree repair code, we walk the data fork mappings of every realtime
file in the filesystem to collect reverse-mapping records in an xfarray.
Then we sort the xfarray, and use the btree bulk loader to create a new
rtrmap btree ondisk.  Finally, we swap the btree roots, and reap the old
blocks in the usual way.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support repairing metadata btrees rooted in metadir inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:20:42 +0000 (16:20 -0800)]
xfs: support repairing metadata btrees rooted in metadir inodes

Adapt the repair code so that we can stage a new btree in the data fork
area of a metadir inode and reap the old blocks.  We already have nearly
all of the infrastructure; the only parts that were missing were the
metadata inode reservation handling.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: online repair of realtime bitmaps for a realtime group
Darrick J. Wong [Thu, 21 Nov 2024 00:20:41 +0000 (16:20 -0800)]
xfs: online repair of realtime bitmaps for a realtime group

For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: repair rmap btree inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:20:40 +0000 (16:20 -0800)]
xfs: repair rmap btree inodes

Teach the inode repair code how to deal with realtime rmap btree inodes
that won't load properly.  This is most likely moot since the filesystem
generally won't mount without the rtrmapbt inodes being usable, but
we'll add this for completeness.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: repair inodes that have realtime extents
Darrick J. Wong [Thu, 21 Nov 2024 00:20:39 +0000 (16:20 -0800)]
xfs: repair inodes that have realtime extents

Plumb into the inode core repair code the ability to search for extents
on realtime devices.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: online repair of realtime file bmaps
Darrick J. Wong [Thu, 21 Nov 2024 00:20:39 +0000 (16:20 -0800)]
xfs: online repair of realtime file bmaps

Now that we have a reverse-mapping index of the realtime device, we can
rebuild the data fork forward-mappings of any realtime file.  Enhance
the existing bmbt repair code to walk the rtrmap btrees to gather this
information.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: walk the rt reverse mapping tree when rebuilding rmap
Darrick J. Wong [Thu, 21 Nov 2024 00:20:38 +0000 (16:20 -0800)]
xfs: walk the rt reverse mapping tree when rebuilding rmap

When we're rebuilding the data device rmap, if we encounter an "rmap"
format fork, we have to walk the (realtime) rmap btree inode to build
the appropriate mappings.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Thu, 21 Nov 2024 00:20:37 +0000 (16:20 -0800)]
xfs: scrub the metadir path of rt rmap btree files

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings
Darrick J. Wong [Thu, 21 Nov 2024 00:20:36 +0000 (16:20 -0800)]
xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings

Teach the bmbt scrubber how to perform a comprehensive check that the
rmapbt does not contain /any/ mappings that are not described by bmbt
records when it's dealing with a realtime file.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: cross-reference the realtime rmapbt
Darrick J. Wong [Thu, 21 Nov 2024 00:20:36 +0000 (16:20 -0800)]
xfs: cross-reference the realtime rmapbt

Teach the data fork and realtime bitmap scrubbers to cross-reference
information with the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: cross-reference realtime bitmap to realtime rmapbt scrubber
Darrick J. Wong [Thu, 21 Nov 2024 00:20:35 +0000 (16:20 -0800)]
xfs: cross-reference realtime bitmap to realtime rmapbt scrubber

When we're checking the realtime rmap btree entries, cross-reference
those entries with the realtime bitmap too.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Thu, 21 Nov 2024 00:20:34 +0000 (16:20 -0800)]
xfs: scrub the realtime rmapbt

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: allow queued realtime intents to drain before scrubbing
Darrick J. Wong [Thu, 21 Nov 2024 00:20:33 +0000 (16:20 -0800)]
xfs: allow queued realtime intents to drain before scrubbing

When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself.  Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.

When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.

Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Thu, 21 Nov 2024 00:20:33 +0000 (16:20 -0800)]
xfs: report realtime rmap btree corruption errors to the health system

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check that the rtrmapbt maxlevels doesn't increase when growing fs
Darrick J. Wong [Thu, 21 Nov 2024 00:20:32 +0000 (16:20 -0800)]
xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs

The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees.  Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in those maxlevels.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: wire up getfsmap to the realtime reverse mapping btree
Darrick J. Wong [Thu, 21 Nov 2024 00:20:31 +0000 (16:20 -0800)]
xfs: wire up getfsmap to the realtime reverse mapping btree

Connect the getfsmap ioctl to the realtime rmapbt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Thu, 21 Nov 2024 00:20:30 +0000 (16:20 -0800)]
xfs: create routine to allocate and initialize a realtime rmap btree inode

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Thu, 21 Nov 2024 00:20:30 +0000 (16:20 -0800)]
xfs: wire up rmap map and unmap to the realtime rmapbt

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: wire up a new metafile type for the realtime rmap
Darrick J. Wong [Thu, 21 Nov 2024 00:20:29 +0000 (16:20 -0800)]
xfs: wire up a new metafile type for the realtime rmap

Plumb in the pieces we need to embed the root of the realtime rmap btree
in an inode's data fork, complete with new metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Thu, 21 Nov 2024 00:20:28 +0000 (16:20 -0800)]
xfs: add metadata reservations for realtime rmap btrees

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Thu, 21 Nov 2024 00:20:27 +0000 (16:20 -0800)]
xfs: add realtime reverse map inode to metadata directory

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support file data forks containing metadata btrees
Darrick J. Wong [Thu, 21 Nov 2024 00:20:27 +0000 (16:20 -0800)]
xfs: support file data forks containing metadata btrees

Create a new fork format type for metadata btrees.  This fork type
requires that the inode is in the metadata directory tree, and only
applies to the data fork.  The actual type of the metadata btree itself
is determined by the di_metatype field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: pretty print metadata file types in error messages
Darrick J. Wong [Thu, 21 Nov 2024 00:20:26 +0000 (16:20 -0800)]
xfs: pretty print metadata file types in error messages

Create a helper function to turn a metadata file type code into a
printable string, and use this to complain about lockdep problems with
rtgroup inodes.  We'll use this more in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support recovering rmap intent items targetting realtime extents
Darrick J. Wong [Thu, 21 Nov 2024 00:20:25 +0000 (16:20 -0800)]
xfs: support recovering rmap intent items targetting realtime extents

Now that we have rmap on the realtime device and rmap intent items that
target the realtime device, log recovery has to support remapping
extents on the realtime volume.  Make this work.  Identify rtrmapbt
blocks in the log correctly so that we can validate them during log
recovery.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Thu, 21 Nov 2024 00:20:24 +0000 (16:20 -0800)]
xfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items to handle realtime volumes by
adding a new log intent item type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Thu, 21 Nov 2024 00:20:23 +0000 (16:20 -0800)]
xfs: prepare rmap functions to deal with rtrmapbt

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Thu, 21 Nov 2024 00:20:23 +0000 (16:20 -0800)]
xfs: add realtime rmap btree operations

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Thu, 21 Nov 2024 00:20:22 +0000 (16:20 -0800)]
xfs: realtime rmap btree transaction reservations

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: introduce realtime rmap btree ondisk definitions
Darrick J. Wong [Thu, 21 Nov 2024 00:20:21 +0000 (16:20 -0800)]
xfs: introduce realtime rmap btree ondisk definitions

Add the ondisk structure definitions for realtime rmap btrees. The
realtime rmap btree will be rooted from a hidden inode so it needs to
have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate rmap btree
blocks. This prepares the way for connecting the btree operations
implementation, though embedding the rtrmap btree root in the inode
comes later in the series.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Thu, 21 Nov 2024 00:20:21 +0000 (16:20 -0800)]
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: prepare rmap btree cursor tracepoints for realtime
Darrick J. Wong [Thu, 21 Nov 2024 00:20:20 +0000 (16:20 -0800)]
xfs: prepare rmap btree cursor tracepoints for realtime

Rework the rmap btree cursor tracepoints in preparation to handle the
realtime rmap btree cursor.  Mostly this involves renaming the field to
"gbno" and extracting the group number from the cursor.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add some rtgroup inode helpers
Darrick J. Wong [Fri, 22 Nov 2024 20:33:17 +0000 (12:33 -0800)]
xfs: add some rtgroup inode helpers

Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes.  Conversion was done with this spatch and some
minor reformatting:

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_BITMAP]
+ rtg_bitmap(rtg)

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_SUMMARY]
+ rtg_summary(rtg)

and the CLI command:

$ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Thu, 21 Nov 2024 00:20:19 +0000 (16:20 -0800)]
xfs: allow inode-based btrees to reserve space in the data device

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Back when we were testing the rmap and refcount btrees for the data
device, people observed occasional shutdowns when xfs_btree_split was
called for either of those two btrees.  This happened when certain
operations (mostly writeback ioends) created new rmap or refcount
records, which would expand the size of the btree.  If there were no
free blocks available the allocation would fail and the split would shut
down the filesystem.

I considered pre-reserving blocks for btree expansion at the time of a
write() call, but there wasn't any good way to attach the reservations
to an inode and keep them there all the way to ioend processing.  Unlike
delalloc reservations which have that indlen mechanism, there's no way
to do that for mapped extents; and indlen blocks are given back during
the delalloc -> unwritten transition.

The solution was to reserve sufficient blocks for rmap/refcount btree
expansion at mount time.  This is what the XFS_AG_RESV_* flags provide;
any expansion of those two btrees can come from the pre-reserved space.

This patch brings that pre-reservation ability to inode-rooted btrees so
that the rt rmap and refcount btrees can also save room for future
expansion.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: prepare to reuse the dquot pointer space in struct xfs_inode
Darrick J. Wong [Thu, 21 Nov 2024 00:20:18 +0000 (16:20 -0800)]
xfs: prepare to reuse the dquot pointer space in struct xfs_inode

Files participating in the metadata directory tree are not accounted to
the quota subsystem.  Therefore, the i_[ugp]dquot pointers in struct
xfs_inode are never used and should always be NULL.

In the next patch we want to add a u64 count of fs blocks reserved for
metadata btree expansion, but we don't want every inode in the fs to pay
the memory price for this feature.  The intent is to union those three
pointers with the u64 counter, but for that to work we must guard
against all access to the dquot pointers for metadata files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Thu, 21 Nov 2024 00:20:18 +0000 (16:20 -0800)]
xfs: support storing records in the inode core root

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Thu, 21 Nov 2024 00:20:17 +0000 (16:20 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Thu, 21 Nov 2024 00:20:16 +0000 (16:20 -0800)]
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: tidy up xfs_bmap_broot_realloc a bit
Darrick J. Wong [Thu, 21 Nov 2024 00:20:15 +0000 (16:20 -0800)]
xfs: tidy up xfs_bmap_broot_realloc a bit

Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller.  Also
use the correct bmbt pointer type for the sizeof operation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: make xfs_iroot_realloc a bmap btree function
Darrick J. Wong [Thu, 21 Nov 2024 00:20:15 +0000 (16:20 -0800)]
xfs: make xfs_iroot_realloc a bmap btree function

Move the inode fork btree root reallocation function part of the btree
ops because it's now mostly bmbt-specific code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: make xfs_iroot_realloc take the new numrecs instead of deltas
Darrick J. Wong [Thu, 21 Nov 2024 00:20:14 +0000 (16:20 -0800)]
xfs: make xfs_iroot_realloc take the new numrecs instead of deltas

Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number.  This will make the callsites easier to understand.

Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported.  This will be addressed in
a subsequent patch.

Return the new btree root to reduce the amount of code clutter.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: refactor the inode fork memory allocation functions
Darrick J. Wong [Thu, 21 Nov 2024 00:20:13 +0000 (16:20 -0800)]
xfs: refactor the inode fork memory allocation functions

Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function.  Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: tidy up xfs_iroot_realloc
Darrick J. Wong [Thu, 21 Nov 2024 00:20:12 +0000 (16:20 -0800)]
xfs: tidy up xfs_iroot_realloc

Tidy up this function a bit before we start refactoring the memory
handling and move the function to the bmbt code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: release the dquot buf outside of qli_lock
Darrick J. Wong [Tue, 17 Dec 2024 23:00:49 +0000 (15:00 -0800)]
xfs: release the dquot buf outside of qli_lock

Lai Yi reported a lockdep complaint about circular locking:

 Chain exists of:
   &lp->qli_lock --> &bch->bc_lock --> &l->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&l->lock);
                                lock(&bch->bc_lock);
                                lock(&l->lock);
   lock(&lp->qli_lock);

I /think/ the problem here is that xfs_dquot_attach_buf during
quotacheck will release the buffer while it's holding the qli_lock.
Because this is a cached buffer, xfs_buf_rele_cached takes b_lock before
decrementing b_hold.  Other threads have taught lockdep that a locking
dependency chain is bp->b_lock -> bch->bc_lock -> l(ru)->lock; and that
another chain is l(ru)->lock -> lp->qli_lock.  Hence we do not want to
take b_lock while holding qli_lock.

Reported-by: syzbot+3126ab3db03db42e7a31@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org> # v6.13-rc3
Fixes: ca378189fdfa89 ("xfs: convert quotacheck to attach dquot buffers")
Tested-by: syzbot+3126ab3db03db42e7a31@syzkaller.appspotmail.com
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: don't over-report free space or inodes in statvfs
Darrick J. Wong [Thu, 12 Dec 2024 22:37:56 +0000 (14:37 -0800)]
xfs: don't over-report free space or inodes in statvfs

Emmanual Florac reports a strange occurrence when project quota limits
are enabled, free space is lower than the remaining quota, and someone
runs statvfs:

  # mkfs.xfs -f /dev/sda
  # mount /dev/sda /mnt -o prjquota
  # xfs_quota  -x -c 'limit -p bhard=2G 55' /mnt
  # mkdir /mnt/dir
  # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir
  # fallocate -l 19g /mnt/a
  # df /mnt /mnt/dir
  Filesystem      Size  Used Avail Use% Mounted on
  /dev/sda         20G   20G  345M  99% /mnt
  /dev/sda        2.0G     0  2.0G   0% /mnt

I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally
assigns to f_bfree without checking that the filesystem has enough free
space to fill the remaining project quota.  However, this is a
longstanding behavior of xfs so it's unclear what to do here.

Cc: <stable@vger.kernel.org> # v2.6.18
Fixes: 932f2c323196c2 ("[XFS] statvfs component of directory/project quota support, code originally by Glen.")
Reported-by: Emmanuel Florac <eflorac@intellique.com>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoLinux 6.13-rc4
Linus Torvalds [Sun, 22 Dec 2024 21:22:21 +0000 (13:22 -0800)]
Linux 6.13-rc4

7 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 22 Dec 2024 20:16:41 +0000 (12:16 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM x86 fixes from Paolo Bonzini:

 - Disable AVIC on SNP-enabled systems that don't allow writes to the
   virtual APIC page, as such hosts will hit unexpected RMP #PFs in the
   host when running VMs of any flavor.

 - Fix a WARN in the hypercall completion path due to KVM trying to
   determine if a guest with protected register state is in 64-bit mode
   (KVM's ABI is to assume such guests only make hypercalls in 64-bit
   mode).

 - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix
   a regression with Windows guests, and because KVM's read-only
   behavior appears to be entirely made up.

 - Treat TDP MMU faults as spurious if the faulting access is allowed
   given the existing SPTE. This fixes a benign WARN (other than the
   WARN itself) due to unexpectedly replacing a writable SPTE with a
   read-only SPTE.

 - Emit a warning when KVM is configured with ignore_msrs=1 and also to
   hide the MSRs that the guest is looking for from the kernel logs.
   ignore_msrs can trick guests into assuming that certain processor
   features are present, and this in turn leads to bogus bug reports.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: let it be known that ignore_msrs is a bad idea
  KVM: VMX: don't include '<linux/find.h>' directly
  KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed
  KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits
  KVM: x86: Play nice with protected guests in complete_hypercall_exit()
  KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature

7 months agoMerge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEAD
Paolo Bonzini [Sun, 22 Dec 2024 17:07:16 +0000 (12:07 -0500)]
Merge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.13:

 - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual
   APIC page, as such hosts will hit unexpected RMP #PFs in the host when
   running VMs of any flavor.

 - Fix a WARN in the hypercall completion path due to KVM trying to determine
   if a guest with protected register state is in 64-bit mode (KVM's ABI is to
   assume such guests only make hypercalls in 64-bit mode).

 - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a
   regression with Windows guests, and because KVM's read-only behavior appears
   to be entirely made up.

 - Treat TDP MMU faults as spurious if the faulting access is allowed given the
   existing SPTE.  This fixes a benign WARN (other than the WARN itself) due to
   unexpectedly replacing a writable SPTE with a read-only SPTE.

7 months agoKVM: x86: let it be known that ignore_msrs is a bad idea
Paolo Bonzini [Thu, 19 Dec 2024 12:43:20 +0000 (07:43 -0500)]
KVM: x86: let it be known that ignore_msrs is a bad idea

When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has
no clue that that the guest is being lied to.  This may cause bug reports
such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling
a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and
being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume
other things about the local APIC which were not true:

  Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.
  Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40)
  Sep 14 12:02:53 kernel: Call Trace:
  ...
  Sep 14 12:02:53 kernel:  native_apic_msr_read+0x20/0x30
  Sep 14 12:02:53 kernel:  setup_APIC_eilvt+0x47/0x110
  Sep 14 12:02:53 kernel:  mce_amd_feature_init+0x485/0x4e0
  ...
  Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu

Without reported_ignored_msrs=0 at least the host kernel log will contain
enough information to avoid going on a wild goose chase.  But if reports
about individual MSR accesses are being silenced too, at least complain
loudly the first time a VM is started.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 months agoKVM: VMX: don't include '<linux/find.h>' directly
Wolfram Sang [Tue, 17 Dec 2024 07:05:40 +0000 (08:05 +0100)]
KVM: VMX: don't include '<linux/find.h>' directly

The header clearly states that it does not want to be included directly,
only via '<linux/bitmap.h>'. Replace the include accordingly.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-ID: <20241217070539.2433-2-wsa+renesas@sang-engineering.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 months agoMerge tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 22 Dec 2024 16:40:23 +0000 (08:40 -0800)]
Merge tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Disable #address-cells/#size-cells warning on coreboot (Chromebooks)
   platforms

 - Add missing root #address-cells/#size-cells in default empty DT

 - Fix uninitialized variable in of_irq_parse_one()

 - Fix interrupt-map cell length check in of_irq_parse_imap_parent()

 - Fix refcount handling in __of_get_dma_parent()

 - Fix error path in of_parse_phandle_with_args_map()

 - Fix dma-ranges handling with flags cells

 - Drop explicit fw_devlink handling of 'interrupt-parent'

 - Fix "compression" typo in fixed-partitions binding

 - Unify "fsl,liodn" property type definitions

* tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add coreboot firmware to excluded default cells list
  of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one()
  of/irq: Fix interrupt-map cell length check in of_irq_parse_imap_parent()
  of: Fix refcount leakage for OF node returned by __of_get_dma_parent()
  of: Fix error path in of_parse_phandle_with_args_map()
  dt-bindings: mtd: fixed-partitions: Fix "compression" typo
  of: Add #address-cells/#size-cells in the device-tree root empty node
  dt-bindings: Unify "fsl,liodn" type definitions
  of: address: Preserve the flags portion on 1:1 dma-ranges mapping
  of/unittest: Add empty dma-ranges address translation tests
  of: property: fw_devlink: Do not use interrupt-parent directly

7 months agoMerge tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sat, 21 Dec 2024 23:45:06 +0000 (15:45 -0800)]
Merge tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "Two more small fixes, correcting the cacheline size on Raspberry Pi 5
  and fixing a logic mistake in the microchip mpfs firmware driver"

* tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5
  firmware: microchip: fix UL_IAP lock check in mpfs_auto_update_state()

7 months agoMerge tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 21 Dec 2024 23:31:56 +0000 (15:31 -0800)]
Merge tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "25 hotfixes.  16 are cc:stable.  19 are MM and 6 are non-MM.

  The usual bunch of singletons and doubletons - please see the relevant
  changelogs for details"

* tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  mm: huge_memory: handle strsep not finding delimiter
  alloc_tag: fix set_codetag_empty() when !CONFIG_MEM_ALLOC_PROFILING_DEBUG
  alloc_tag: fix module allocation tags populated area calculation
  mm/codetag: clear tags before swap
  mm/vmstat: fix a W=1 clang compiler warning
  mm: convert partially_mapped set/clear operations to be atomic
  nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
  vmalloc: fix accounting with i915
  mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
  fork: avoid inappropriate uprobe access to invalid mm
  nilfs2: prevent use of deleted inode
  zram: fix uninitialized ZRAM not releasing backing device
  zram: refuse to use zero sized block device as backing device
  mm: use clear_user_(high)page() for arch with special user folio handling
  mm: introduce cpu_icache_is_aliasing() across all architectures
  mm: add RCU annotation to pte_offset_map(_lock)
  mm: correctly reference merged VMA
  mm: use aligned address in copy_user_gigantic_page()
  mm: use aligned address in clear_gigantic_page()
  mm: shmem: fix ShmemHugePages at swapout
  ...