]> www.infradead.org Git - users/hch/xfs.git/log
users/hch/xfs.git
9 months agoxfs: add a lockdep class key for rtgroup inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:41:08 +0000 (13:41 -0700)]
xfs: add a lockdep class key for rtgroup inodes

Add a dynamic lockdep class key for rtgroup inodes.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks
when nesting ILOCKs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: define locking primitives for realtime groups
Darrick J. Wong [Mon, 23 Sep 2024 20:41:08 +0000 (13:41 -0700)]
xfs: define locking primitives for realtime groups

Define helper functions to lock all metadata inodes related to a
realtime group.  There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: create incore realtime group structures
Darrick J. Wong [Mon, 23 Sep 2024 20:41:07 +0000 (13:41 -0700)]
xfs: create incore realtime group structures

Create an incore object that will contain information about a realtime
allocation group.  This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: clean up xfs_getfsmap_helper arguments
Christoph Hellwig [Mon, 23 Sep 2024 20:41:06 +0000 (13:41 -0700)]
xfs: clean up xfs_getfsmap_helper arguments

The calling conventions for xfs_getfsmap_helper are confusing -- callers
pass in an rmap record, but they must also supply startblock and
blockcount in daddr units.  This was bolted onto the original fsmap
implementation so that we could report *something* for realtime
volumes, which do not support rmap and hence can draw only from the rt
free space bitmap.  Free space on the rt volume can be more than 2^32
fsblocks long, which means that we can't use the rmap startblock or
blockcount fields.

This is confusing for callers, because they must supplying redundant
data, but not all of it is used.  Streamline this by creating a separate
fsmap irec structure that contains exactly the data we need, once.

Note that we actually do need rm_startblock for rmap key comparisons
when we're actually querying an rmap btree, so leave that field but
document why it's there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: repair metadata directory file path connectivity
Darrick J. Wong [Mon, 23 Sep 2024 20:41:05 +0000 (13:41 -0700)]
xfs: repair metadata directory file path connectivity

Fix disconnected or incorrect metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: confirm dotdot target before replacing it during a repair
Darrick J. Wong [Mon, 23 Sep 2024 20:41:04 +0000 (13:41 -0700)]
xfs: confirm dotdot target before replacing it during a repair

xfs_dir_replace trips an assertion if you tell it to change a dirent to
point to an inumber that it already points at.  Look up the dotdot entry
directly to confirm that we need to make a change.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: check metadata directory file path connectivity
Darrick J. Wong [Mon, 23 Sep 2024 20:41:04 +0000 (13:41 -0700)]
xfs: check metadata directory file path connectivity

Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use.  IOWs, check that "/quota/user" in the
metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: move repair temporary files to the metadata directory tree
Darrick J. Wong [Mon, 23 Sep 2024 20:41:03 +0000 (13:41 -0700)]
xfs: move repair temporary files to the metadata directory tree

Due to resource acquisition rules, we have to create the ondisk
temporary files used to stage a filesystem repair before we can acquire
a reference to the inode that we actually want to repair.  Therefore,
we do not know at tempfile creation time whether the tempfile will
belong to the regular directory tree or the metadata directory tree.

This distinction becomes important when the swapext code tries to figure
out the quota accounting of the two files whose mappings are being
swapped.  The swapext code assumes that accounting updates are required
for a file if dqattach attaches dquots.  Metadir files are never
accounted in quota, which means that swapext must not update the quota
accounting when swapping in a repaired directory/xattr/rtbitmap structure.

Prior to the swapext call, therefore, both files must be marked as
METADIR for dqattach so that dqattach will ignore them.  Add support for
a repair tempfile to be switched to the metadir tree and switched back
before being released so that ifree will just free the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: check the metadata directory inumber in superblocks
Darrick J. Wong [Mon, 23 Sep 2024 20:41:02 +0000 (13:41 -0700)]
xfs: check the metadata directory inumber in superblocks

When metadata directories are enabled, make sure that the secondary
superblocks point to the metadata directory.  This isn't strictly
required because the secondaries are only used to recover damaged
filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: scrub metadata directories
Darrick J. Wong [Mon, 23 Sep 2024 20:41:01 +0000 (13:41 -0700)]
xfs: scrub metadata directories

Teach online scrub about the metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: fix di_metatype field of inodes that won't load
Darrick J. Wong [Mon, 23 Sep 2024 20:41:01 +0000 (13:41 -0700)]
xfs: fix di_metatype field of inodes that won't load

Make sure that the di_metatype field is at least set plausibly so that
later scrubbers could set the real type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: adjust parent pointer scrubber for sb-rooted metadata files
Darrick J. Wong [Mon, 23 Sep 2024 20:41:00 +0000 (13:41 -0700)]
xfs: adjust parent pointer scrubber for sb-rooted metadata files

Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.

For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent.  For
purposes of validating nlink, we pretend that there is a parent pointer.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: metadata files can have xattrs if metadir is enabled
Darrick J. Wong [Mon, 23 Sep 2024 20:40:59 +0000 (13:40 -0700)]
xfs: metadata files can have xattrs if metadir is enabled

If parent pointers are enabled, then metadata files will store parent
pointers in xattrs, just like files in the user visible directory tree.
Therefore, scrub and repair need to handle attr forks for metadata files
on metadir filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't fail repairs on metadata files with no attr fork
Darrick J. Wong [Mon, 23 Sep 2024 20:40:58 +0000 (13:40 -0700)]
xfs: don't fail repairs on metadata files with no attr fork

Fix a minor bug where we fail repairs on metadata files that do not have
attr forks because xrep_metadata_inode_subtype doesn't filter ENOENT.

Fixes: 5a8e07e799721 ("xfs: repair the inode core and forks of a metadata inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: do not count metadata directory files when doing online quotacheck
Darrick J. Wong [Mon, 23 Sep 2024 20:40:58 +0000 (13:40 -0700)]
xfs: do not count metadata directory files when doing online quotacheck

Previously, we stated that files in the metadata directory tree are not
counted in the dquot information.  Fix the online quotacheck code to
reflect this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: refactor directory tree root predicates
Darrick J. Wong [Mon, 23 Sep 2024 20:40:57 +0000 (13:40 -0700)]
xfs: refactor directory tree root predicates

Metadata directory trees make reasoning about the parent of a file more
difficult.  Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock.  Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.

Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: record health problems with the metadata directory
Darrick J. Wong [Mon, 23 Sep 2024 20:40:56 +0000 (13:40 -0700)]
xfs: record health problems with the metadata directory

Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: adjust xfs_bmap_add_attrfork for metadir
Darrick J. Wong [Mon, 23 Sep 2024 20:40:55 +0000 (13:40 -0700)]
xfs: adjust xfs_bmap_add_attrfork for metadir

Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers.  In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: mark quota inodes as metadata files
Darrick J. Wong [Mon, 23 Sep 2024 20:40:54 +0000 (13:40 -0700)]
xfs: mark quota inodes as metadata files

When we're creating quota files at mount time, make sure to mark them as
metadir inodes if appropriate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't count metadata directory files to quota
Darrick J. Wong [Mon, 23 Sep 2024 20:40:54 +0000 (13:40 -0700)]
xfs: don't count metadata directory files to quota

Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage.  This will also
quiet down complaints about dquot usage not matching du output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: allow bulkstat to return metadata directories
Darrick J. Wong [Mon, 23 Sep 2024 20:40:53 +0000 (13:40 -0700)]
xfs: allow bulkstat to return metadata directories

Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.

(Metadata files of course require per-file scrub code and hence do not
need exposure.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: advertise metadata directory feature
Darrick J. Wong [Mon, 23 Sep 2024 20:40:52 +0000 (13:40 -0700)]
xfs: advertise metadata directory feature

Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: hide metadata inodes from everyone because they are special
Darrick J. Wong [Mon, 23 Sep 2024 20:40:51 +0000 (13:40 -0700)]
xfs: hide metadata inodes from everyone because they are special

Metadata inodes are private files and therefore cannot be exposed to
userspace.  This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs.  As such, we mark
them S_PRIVATE, which stops all that.

While we're at it, put them in a separate lockdep class so that it won't
get confused by "recursive" i_rwsem locking such as what happens when we
write to a rt file and need to allocate from the rt bitmap file.  The
static function that we use to do this will be exported in the rtgroups
patchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: disable the agi rotor for metadata inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:40:51 +0000 (13:40 -0700)]
xfs: disable the agi rotor for metadata inodes

Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk.  Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log.  Therefore, disable
AGI rotoring for metadata inode allocations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: read and write metadata inode directory tree
Darrick J. Wong [Mon, 23 Sep 2024 20:40:50 +0000 (13:40 -0700)]
xfs: read and write metadata inode directory tree

Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: enforce metadata inode flag
Darrick J. Wong [Mon, 23 Sep 2024 20:40:49 +0000 (13:40 -0700)]
xfs: enforce metadata inode flag

Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: load metadata directory root at mount time
Darrick J. Wong [Mon, 23 Sep 2024 20:40:48 +0000 (13:40 -0700)]
xfs: load metadata directory root at mount time

Load the metadata directory root inode into memory at mount time and
release it at unmount time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: iget for metadata inodes
Darrick J. Wong [Mon, 23 Sep 2024 20:40:47 +0000 (13:40 -0700)]
xfs: iget for metadata inodes

Create a xfs_metafile_iget function for metadata inodes to ensure that
when we try to iget a metadata file, the inobt thinks a metadata inode
is in use and that the metadata type matches what we are expecting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: undefine the sb_bad_features2 when metadir is enabled
Darrick J. Wong [Wed, 25 Sep 2024 21:48:35 +0000 (14:48 -0700)]
xfs: undefine the sb_bad_features2 when metadir is enabled

The metadir feature is a good opportunity to break from the past where
we had to do this strange dance with sb_bad_features2 due to a past bug
where the superblock was not kept aligned to a multiple of 8 bytes.

Therefore, stop this pretense when metadir is enabled.  We'll just set
the incore and ondisk fields to zero, thereby freeing up 4 bytes of
space in the superblock.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: define the on-disk format for the metadir feature
Darrick J. Wong [Mon, 23 Sep 2024 20:40:47 +0000 (13:40 -0700)]
xfs: define the on-disk format for the metadir feature

Define the on-disk layout and feature flags for the metadata inode
directory feature.  Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: constify the xfs_inode predicates
Darrick J. Wong [Wed, 25 Sep 2024 22:46:59 +0000 (15:46 -0700)]
xfs: constify the xfs_inode predicates

Change the xfs_inode predicates to take a const struct xfs_inode pointer
because they do not change the inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: constify the xfs_sb predicates
Darrick J. Wong [Wed, 25 Sep 2024 22:43:35 +0000 (15:43 -0700)]
xfs: constify the xfs_sb predicates

Change the xfs_sb predicates to take a const struct xfs_sb pointer
because they do not change the superblock.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: store a generic group structure in the intents
Christoph Hellwig [Mon, 23 Sep 2024 20:40:46 +0000 (13:40 -0700)]
xfs: store a generic group structure in the intents

Replace the pag pointers in the extent free, bmap, rmap and refcount
intent structures with a pointer to the generic group to prepare
for adding intents for realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove xfs_group_intent_hold and xfs_group_intent_rele
Christoph Hellwig [Mon, 23 Sep 2024 20:40:45 +0000 (13:40 -0700)]
xfs: remove xfs_group_intent_hold and xfs_group_intent_rele

Each of them just has a single caller, so fold them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add group based bno conversion helpers
Christoph Hellwig [Mon, 23 Sep 2024 20:40:44 +0000 (13:40 -0700)]
xfs: add group based bno conversion helpers

Add convenience helpers to convert block numbers based on the generic
group.  This will allow writing code that doesn't care if it is used
on AGs or the upcoming realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: store a generic xfs_group pointer in xfs_getfsmap_info
Christoph Hellwig [Mon, 23 Sep 2024 20:40:43 +0000 (13:40 -0700)]
xfs: store a generic xfs_group pointer in xfs_getfsmap_info

Replace the pag and rtg pointers with a generic group pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add a generic group pointer to the btree cursor
Christoph Hellwig [Mon, 23 Sep 2024 20:40:43 +0000 (13:40 -0700)]
xfs: add a generic group pointer to the btree cursor

Replace the pag pointers in the type specific union with a generic
xfs_group pointer.  This prepares for adding realtime group support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: convert busy extent tracking to the generic group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:42 +0000 (13:40 -0700)]
xfs: convert busy extent tracking to the generic group structure

Split busy extent tracking from struct xfs_perag into its own private
structure, which can be pointed to by the generic group structure.

Note that this structure is now dynamically allocated instead of embedded
as the upcoming zone XFS code doesn't need it and will also have an
unusually high number of groups due to hardware constraints.  Dynamically
allocating the structure this is a big memory saver for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: convert extent busy tracking to the generic group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:41 +0000 (13:40 -0700)]
xfs: convert extent busy tracking to the generic group structure

Prepare for tracking busy RT extents by passing the generic group
structure to the xfs_extent_busy_class tracepoints.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: return the busy generation from xfs_extent_busy_list_empty
Christoph Hellwig [Mon, 23 Sep 2024 20:40:40 +0000 (13:40 -0700)]
xfs: return the busy generation from xfs_extent_busy_list_empty

This avoid having to poke into the internals of the busy tracking in
ẋrep_setup_ag_allocbt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: move the online repair rmap hooks to the generic group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:39 +0000 (13:40 -0700)]
xfs: move the online repair rmap hooks to the generic group structure

Prepare for the upcoming realtime groups feature by moving the online
repair rmap hooks to based to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: move draining of deferred operations to the generic group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:39 +0000 (13:40 -0700)]
xfs: move draining of deferred operations to the generic group structure

Prepare supporting the upcoming realtime groups feature by moving the
deferred operation draining to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: mark xfs_perag_intent_{hold,rele} static
Christoph Hellwig [Mon, 23 Sep 2024 20:40:38 +0000 (13:40 -0700)]
xfs: mark xfs_perag_intent_{hold,rele} static

These two functions are only used inside of xfs_drain.c, so mark them
static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: move metadata health tracking to the generic group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:37 +0000 (13:40 -0700)]
xfs: move metadata health tracking to the generic group structure

Prepare for also tracking the health status of the upcoming realtime
groups by moving the health tracking code to the generic xfs_group
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: switch perag iteration from the for_each macros to a while based iterator
Christoph Hellwig [Mon, 23 Sep 2024 20:40:36 +0000 (13:40 -0700)]
xfs: switch perag iteration from the for_each macros to a while based iterator

The current for_each_perag* macros are a bit annoying in that they
require the caller to both provide an object and an index iterator, and
also somewhat obsfucate the underlying control flow mechanism.

Switch to open coded while loops using new xfs_perag_next{,_from,_range}
helpers that return the next pag structure to iterate on based on the
previous one or NULL for the loop start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add a xfs_group_next_range helper
Christoph Hellwig [Mon, 23 Sep 2024 20:40:35 +0000 (13:40 -0700)]
xfs: add a xfs_group_next_range helper

Add a helper to iterate over iterate over all groups, which can be used
as a simple while loop:

struct xfs_group *xg = NULL;

while ((xg = xfs_group_next_range(mp, xg, 0, MAX_GROUP))) {
...
}

This will be wrapped by the realtime group code first, and eventually
replace the for_each_rtgroup_from and for_each_rtgroup_range helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: factor out a generic xfs_group structure
Christoph Hellwig [Mon, 23 Sep 2024 20:40:34 +0000 (13:40 -0700)]
xfs: factor out a generic xfs_group structure

Split the lookup and refcount handling of struct xfs_perag into an
embedded xfs_group structure that can be reused for the upcoming
realtime groups.

It will be extended with more features later.

Note that he xg_type field will only need a single bit even with
realtime group support.  For now it fills a hole, but it might be
worth to fold it into another field if we can use this space better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: factor out a xfs_iwalk_args helper
Christoph Hellwig [Mon, 23 Sep 2024 20:40:33 +0000 (13:40 -0700)]
xfs: factor out a xfs_iwalk_args helper

Add a helper to share more code between xfs_iwalk and xfs_inobt_walk,
and at the same time do away with the extra flags indirect so that
everyone use the same names for the same flags when using the common
iwalk code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: insert the pag structures into the xarray later
Christoph Hellwig [Mon, 23 Sep 2024 20:40:32 +0000 (13:40 -0700)]
xfs: insert the pag structures into the xarray later

Cleaning up is much easier if a structure can't be looked up yet, so only
insert the pag once it is fully set up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: split xfs_initialize_perag
Christoph Hellwig [Mon, 23 Sep 2024 20:40:32 +0000 (13:40 -0700)]
xfs: split xfs_initialize_perag

Factor out a xfs_perag_alloc helper that allocates a single perag
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: convert remaining trace points to pass pag structures
Christoph Hellwig [Mon, 23 Sep 2024 20:40:31 +0000 (13:40 -0700)]
xfs: convert remaining trace points to pass pag structures

Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass the pag to the xrep_newbt_extent_class tracepoints
Christoph Hellwig [Mon, 23 Sep 2024 20:40:30 +0000 (13:40 -0700)]
xfs: pass the pag to the xrep_newbt_extent_class tracepoints

This requires moving a few of the callsites a little bit to ensure that
we already have the reference, but allows for the decoding to only happen
when tracing is actually enabled, and cleans up the callsites a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points
Christoph Hellwig [Mon, 23 Sep 2024 20:40:29 +0000 (13:40 -0700)]
xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points

This requires holding the pag refcount a little longer, but allows for the
decoding to only happen when tracing is actually enabled, and cleans up the
callsites a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass objects to the xrep_ibt_walk_rmap tracepoint
Christoph Hellwig [Mon, 23 Sep 2024 20:40:29 +0000 (13:40 -0700)]
xfs: pass objects to the xrep_ibt_walk_rmap tracepoint

Pass the perag structure and the irec so that the decoding is only done
when tracing is actually enabled and the call sites look a lot neater,
and remove the pointless class indirection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point
Christoph Hellwig [Mon, 23 Sep 2024 20:40:28 +0000 (13:40 -0700)]
xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point

So that decoding is only done when tracing is actually enabled and the
call site look a lot neater.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass objects to the xfs_irec_merge_{pre,post} trace points
Christoph Hellwig [Mon, 23 Sep 2024 20:40:27 +0000 (13:40 -0700)]
xfs: pass objects to the xfs_irec_merge_{pre,post} trace points

Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass a perag structure to the xfs_ag_resv_init_error trace point
Christoph Hellwig [Mon, 23 Sep 2024 20:40:26 +0000 (13:40 -0700)]
xfs: pass a perag structure to the xfs_ag_resv_init_error trace point

And remove the single instance class indirection for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: constify pag arguments to trace points
Christoph Hellwig [Mon, 23 Sep 2024 20:40:26 +0000 (13:40 -0700)]
xfs: constify pag arguments to trace points

Trace points never modify their arguments.  Mark all the pag objects
passed to trace points.  The exception is the xfs_ag_resv_class, which
uses the xfs_perag_resv helper that can't be marked const due to
other users modifying the returned structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the unused xrep_bmap_walk_rmap trace point
Christoph Hellwig [Mon, 23 Sep 2024 20:40:25 +0000 (13:40 -0700)]
xfs: remove the unused xrep_bmap_walk_rmap trace point

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the unused trace_xfs_iwalk_ag trace point
Christoph Hellwig [Mon, 23 Sep 2024 20:40:24 +0000 (13:40 -0700)]
xfs: remove the unused trace_xfs_iwalk_ag trace point

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the mount field from struct xfs_busy_extents
Christoph Hellwig [Mon, 23 Sep 2024 20:40:23 +0000 (13:40 -0700)]
xfs: remove the mount field from struct xfs_busy_extents

The mount field is only passed to xfs_extent_busy_clear, which never uses
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: keep a reference to the pag for busy extents
Christoph Hellwig [Mon, 23 Sep 2024 20:40:23 +0000 (13:40 -0700)]
xfs: keep a reference to the pag for busy extents

Processing of busy extents requires the perag structure, so keep the
reference while they are in flight.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass a pag to xfs_extent_busy_{search,reuse}
Christoph Hellwig [Mon, 23 Sep 2024 20:40:22 +0000 (13:40 -0700)]
xfs: pass a pag to xfs_extent_busy_{search,reuse}

Replace the [mp,agno] tuple with the perag structure, which will become
more useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add a xfs_agino_to_ino helper
Christoph Hellwig [Mon, 23 Sep 2024 20:40:21 +0000 (13:40 -0700)]
xfs: add a xfs_agino_to_ino helper

Add a helpers to convert an agino to an ino based on a pag structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers
Christoph Hellwig [Mon, 23 Sep 2024 20:40:20 +0000 (13:40 -0700)]
xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers

Add helpers to convert an agbno to a daddr or fsbno based on a pag
structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the agno argument to xfs_free_ag_extent
Christoph Hellwig [Mon, 23 Sep 2024 20:40:20 +0000 (13:40 -0700)]
xfs: remove the agno argument to xfs_free_ag_extent

xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer.  Use that instead of passing the redundant argument,
and do the same for the tracepoint.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass a pag to xfs_difree_inode_chunk
Christoph Hellwig [Mon, 23 Sep 2024 20:40:19 +0000 (13:40 -0700)]
xfs: pass a pag to xfs_difree_inode_chunk

We'll want to use more than just the agno field in a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the unused pag_active_wq field in struct xfs_perag
Christoph Hellwig [Mon, 23 Sep 2024 20:40:18 +0000 (13:40 -0700)]
xfs: remove the unused pag_active_wq field in struct xfs_perag

pag_active_wq is only woken, but never waited for.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: remove the unused pagb_count field in struct xfs_perag
Christoph Hellwig [Mon, 23 Sep 2024 20:40:17 +0000 (13:40 -0700)]
xfs: remove the unused pagb_count field in struct xfs_perag

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev
Christoph Hellwig [Mon, 23 Sep 2024 20:40:17 +0000 (13:40 -0700)]
xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev

The for_each_perag helpers update the agno passed in for each iteration,
and thus the "if (pag->pag_agno == start_ag)" check will always be true.

Add another variable for the loop iterator so that the field is only
cleared after the first iteration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
Christoph Hellwig [Mon, 23 Sep 2024 20:40:16 +0000 (13:40 -0700)]
xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag

__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail,
which isn't really helpful during log recovery.  Remove the flag and
stick to the default GFP_KERNEL policies.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: merge the perag freeing helpers
Christoph Hellwig [Mon, 23 Sep 2024 20:40:15 +0000 (13:40 -0700)]
xfs: merge the perag freeing helpers

There is no good reason to have two different routines for freeing perag
structures for the unmount and error cases.  Add two arguments to specify
the range of AGs to free to xfs_free_perag, and use that to replace
xfs_free_unused_perag_range.

The addition RCU grace period for the error case is harmless, and the
extra check for the AG to actually exist is not required now that the
callers pass the exact known allocated range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: pass the exact range to initialize to xfs_initialize_perag
Christoph Hellwig [Mon, 23 Sep 2024 20:40:14 +0000 (13:40 -0700)]
xfs: pass the exact range to initialize to xfs_initialize_perag

Currently only the new agcount is passed to xfs_initialize_perag, which
requires lookups of existing AGs to skip them and complicates error
handling.  Also pass the previous agcount so that the range that
xfs_initialize_perag operates on is exactly defined.  That way the
extra lookups can be avoided, and error handling can clean up the
exact range from the old count to the last added perag structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: fix integer overflow in xrep_bmap
Darrick J. Wong [Mon, 7 Oct 2024 16:41:12 +0000 (09:41 -0700)]
xfs: fix integer overflow in xrep_bmap

The variable declaration in this function predates the merge of the
nrext64 (aka 64-bit extent counters) feature, which means that the
variable declaration type is insufficient to avoid an integer overflow.
Fix that by redeclaring the variable to be xfs_extnum_t.

Coverity-id: 1630958
Fixes: 8f71bede8efd ("xfs: repair inode fork block mapping data structures")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agofsdax: dax_unshare_iter needs to copy entire blocks
Darrick J. Wong [Wed, 2 Oct 2024 20:11:51 +0000 (13:11 -0700)]
fsdax: dax_unshare_iter needs to copy entire blocks

The code that copies data from srcmap to iomap in dax_unshare_iter is
very very broken, which bfoster's recent fsx changes have exposed.

If the pos and len passed to dax_file_unshare are not aligned to an
fsblock boundary, the iter pos and length in the _iter function will
reflect this unalignment.

dax_iomap_direct_access always returns a pointer to the start of the
kmapped fsdax page, even if its pos argument is in the middle of that
page.  This is catastrophic for data integrity when iter->pos is not
aligned to a page, because daddr/saddr do not point to the same byte in
the file as iter->pos.  Hence we corrupt user data by copying it to the
wrong place.

If iter->pos + iomap_length() in the _iter function not aligned to a
page, then we fail to copy a full block, and only partially populate the
destination block.  This is catastrophic for data confidentiality
because we expose stale pmem contents.

Fix both of these issues by aligning copy_pos/copy_len to a page
boundary (remember, this is fsdax so 1 fsblock == 1 base page) so that
we always copy full blocks.

We're not done yet -- there's no call to invalidate_inode_pages2_range,
so programs that have the file range mmap'd will continue accessing the
old memory mapping after the file metadata updates have completed.

Be careful with the return value -- if the unshare succeeds, we still
need to return the number of bytes that the iomap iter thinks we're
operating on.

Cc: ruansy.fnst@fujitsu.com
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agofsdax: remove zeroing code from dax_unshare_iter
Darrick J. Wong [Wed, 2 Oct 2024 20:44:02 +0000 (13:44 -0700)]
fsdax: remove zeroing code from dax_unshare_iter

Remove the code in dax_unshare_iter that zeroes the destination memory
because it's not necessary.

If srcmap is unwritten, we don't have to do anything because that
unwritten extent came from the regular file mapping, and unwritten
extents cannot be shared.  The same applies to holes.

Furthermore, zeroing to unshare a mapping is just plain wrong because
unsharing means copy on write, and we should be copying data.

This is effectively a revert of commit 13dd4e04625f ("fsdax: unshare:
zero destination if srcmap is HOLE or UNWRITTEN")

Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoiomap: share iomap_unshare_iter predicate code with fsdax
Darrick J. Wong [Wed, 2 Oct 2024 20:27:18 +0000 (13:27 -0700)]
iomap: share iomap_unshare_iter predicate code with fsdax

The predicate code that iomap_unshare_iter uses to decide if it's really
needs to unshare a file range mapping should be shared with the fsdax
version, because right now they're opencoded and inconsistent.

Note that we simplify the predicate logic a bit -- we no longer allow
unsharing of inline data mappings, but there aren't any filesystems that
allow shared inline data currently.

This is a fix in the sense that it should have been ported to fsdax.

Fixes: b53fdb215d13 ("iomap: improve shared block detection in iomap_unshare_iter")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't allocate COW extents when unsharing a hole
Darrick J. Wong [Wed, 2 Oct 2024 20:12:14 +0000 (13:12 -0700)]
xfs: don't allocate COW extents when unsharing a hole

It doesn't make sense to allocate a COW extent when unsharing a hole
because holes cannot be shared.

Fixes: 1f1397b7218d7 ("xfs: don't allocate into the data fork for an unshare request")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: fix simplify extent lookup in xfs_can_free_eofblocks
Darrick J. Wong [Wed, 25 Sep 2024 18:54:52 +0000 (11:54 -0700)]
xfs: fix simplify extent lookup in xfs_can_free_eofblocks

In commit 11f4c3a53adde, we tried to simplify the extent lookup in
xfs_can_free_eofblocks so that it doesn't incur the overhead of all the
extra stuff that xfs_bmapi_read does around the iext lookup.

Unfortunately, this causes regressions on generic/603, xfs/108,
generic/219, xfs/173, generic/694, xfs/052, generic/230, and xfs/441
when always_cow is turned on.  In all cases, the regressions take the
form of alwayscow files consuming rather more space than the golden
output is expecting.  I observed that in all these cases, the cause of
the excess space usage was due to CoW fork delalloc reservations that go
beyond EOF.

For alwayscow files we allow posteof delalloc CoW reservations because
all writes go through the CoW fork.  Recall that all extents in the CoW
fork are accounted for via i_delayed_blks, which means that prior to
this patch, we'd invoke xfs_free_eofblocks on first close if anything
was in the CoW fork.  Now we don't do that.

Fix the problem by reverting the removal of the i_delayed_blks check.

Fixes: 11f4c3a53adde ("xfs: simplify extent lookup in xfs_can_free_eofblocks")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 months agoxfs: don't free cowblocks from under dirty pagecache on unshare
Brian Foster [Tue, 1 Oct 2024 22:19:13 +0000 (15:19 -0700)]
xfs: don't free cowblocks from under dirty pagecache on unshare

fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.

For example, consider the following command sequence:

xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
-c "pwrite 0 32k" -c "funshare 0 1k" <file>

This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.

This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:

$ xfs_io -c "fiemap -v" <file>
<file>:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   ...
   1: [8..511]:        hole               504
   ...
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  cd cd cd cd cd cd cd cd  ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  00 00 00 00 00 00 00 00  ........

To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.

Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoxfs: skip background cowblock trims on inodes open for write
Brian Foster [Mon, 30 Sep 2024 17:03:10 +0000 (10:03 -0700)]
xfs: skip background cowblock trims on inodes open for write

The background blockgc scanner runs on a 5m interval by default and
trims preallocation (post-eof and cow fork) from inodes that are
otherwise idle. Idle effectively means that iolock can be acquired
without blocking and that the inode has no dirty pagecache or I/O in
flight.

This simple mechanism and heuristic has worked fairly well for
post-eof speculative preallocations. Support for reflink and COW
fork preallocations came sometime later and plugged into the same
mechanism, with similar heuristics. Some recent testing has shown
that COW fork preallocation may be notably more sensitive to blockgc
processing than post-eof preallocation, however.

For example, consider an 8GB reflinked file with a COW extent size
hint of 1MB. A worst case fully randomized overwrite of this file
results in ~8k extents of an average size of ~1MB. If the same
workload is interrupted a couple times for blockgc processing
(assuming the file goes idle), the resulting extent count explodes
to over 100k extents with an average size <100kB. This is
significantly worse than ideal and essentially defeats the COW
extent size hint mechanism.

While this particular test is instrumented, it reflects a fairly
reasonable pattern in practice where random I/Os might spread out
over a large period of time with varying periods of (in)activity.
For example, consider a cloned disk image file for a VM or container
with long uptime and variable and bursty usage. A background blockgc
scan that races and processes the image file when it happens to be
clean and idle can have a significant effect on the future
fragmentation level of the file, even when still in use.

To help combat this, update the heuristic to skip cowblocks inodes
that are currently opened for write access during non-sync blockgc
scans. This allows COW fork preallocations to persist for as long as
possible unless otherwise needed for functional purposes (i.e. a
sync scan), the file is idle and closed, or the inode is being
evicted from cache. While here, update the comments to help
distinguish performance oriented heuristics from the logic that
exists to maintain functional correctness.

Suggested-by: Darrick Wong <djwong@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
9 months agoLinux 6.12-rc2
Linus Torvalds [Sun, 6 Oct 2024 22:32:27 +0000 (15:32 -0700)]
Linux 6.12-rc2

9 months agoMerge tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Sun, 6 Oct 2024 18:34:55 +0000 (11:34 -0700)]
Merge tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Move non-boot built-in DTBs to the .rodata section

 - Fix Kconfig bugs

 - Fix maint scripts in the linux-image Debian package

 - Import some list macros to scripts/include/

* tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: deb-pkg: Remove blank first line from maint scripts
  kbuild: fix a typo dt_binding_schema -> dt_binding_schemas
  scripts: import more list macros
  kconfig: qconf: fix buffer overflow in debug links
  kconfig: qconf: move conf_read() before drawing tree pain
  kconfig: clear expr::val_is_valid when allocated
  kconfig: fix infinite loop in sym_calc_choice()
  kbuild: move non-boot built-in DTBs to .rodata section

9 months agoMerge tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Oct 2024 18:11:01 +0000 (11:11 -0700)]
Merge tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Intel PMC fix for suspend/resume issues on some Sky and Kaby Lake
   laptops

 - Intel Diamond Rapids hw-id additions

 - Documentation and MAINTAINERS fixes

 - Some other small fixes

* tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors
  platform/x86: wmi: Update WMI driver API documentation
  platform/x86: dell-ddv: Fix typo in documentation
  platform/x86: dell-sysman: add support for alienware products
  platform/x86/intel: power-domains: Add Diamond Rapids support
  platform/x86: ISST: Add Diamond Rapids to support list
  platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake
  platform/x86: dell-laptop: Do not fail when encountering unsupported batteries
  MAINTAINERS: Update Intel In Field Scan(IFS) entry
  platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug

9 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 6 Oct 2024 17:53:28 +0000 (10:53 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix pKVM error path on init, making sure we do not change critical
     system registers as we're about to fail

   - Make sure that the host's vector length is at capped by a value
     common to all CPUs

   - Fix kvm_has_feat*() handling of "negative" features, as the current
     code is pretty broken

   - Promote Joey to the status of official reviewer, while James steps
     down -- hopefully only temporarly

  x86:

   - Fix compilation with KVM_INTEL=KVM_AMD=n

   - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use

  Selftests:

   - Fix compilation on non-x86 architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/reboot: emergency callbacks are now registered by common KVM code
  KVM: x86: leave kvm.ko out of the build if no vendor module is requested
  KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
  KVM: arm64: Fix kvm_has_feat*() handling of negative features
  KVM: selftests: Fix build on architectures other than x86_64
  KVM: arm64: Another reviewer reshuffle
  KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
  KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path

9 months agoMerge tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 6 Oct 2024 17:43:00 +0000 (10:43 -0700)]
Merge tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

 - Allow r30 to be used in vDSO code generation of getrandom

Thanks to Jason A. Donenfeld

* tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vdso: allow r30 in vDSO code generation of getrandom

9 months agokbuild: deb-pkg: Remove blank first line from maint scripts
Aaron Thompson [Fri, 4 Oct 2024 07:52:45 +0000 (07:52 +0000)]
kbuild: deb-pkg: Remove blank first line from maint scripts

The blank line causes execve() to fail:

  # strace ./postinst
  execve("./postinst", ...) = -1 ENOEXEC (Exec format error)
  strace: exec: Exec format error
  +++ exited with 1 +++

However running the scripts via shell does work (at least with bash)
because the shell attempts to execute the file as a shell script when
execve() fails.

Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agokbuild: fix a typo dt_binding_schema -> dt_binding_schemas
Xu Yang [Wed, 25 Sep 2024 05:32:30 +0000 (13:32 +0800)]
kbuild: fix a typo dt_binding_schema -> dt_binding_schemas

If we follow "make help" to "make dt_binding_schema", we will see
below error:

$ make dt_binding_schema
make[1]: *** No rule to make target 'dt_binding_schema'.  Stop.
make: *** [Makefile:224: __sub-make] Error 2

It should be a typo. So this will fix it.

Fixes: 604a57ba9781 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json")
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoscripts: import more list macros
Sami Tolvanen [Mon, 23 Sep 2024 18:18:47 +0000 (18:18 +0000)]
scripts: import more list macros

Import list_is_first, list_is_last, list_replace, and list_replace_init.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoplatform/x86: x86-android-tablets: Fix use after free on platform_device_register...
Hans de Goede [Sat, 5 Oct 2024 13:05:45 +0000 (15:05 +0200)]
platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors

x86_android_tablet_remove() frees the pdevs[] array, so it should not
be used after calling x86_android_tablet_remove().

When platform_device_register() fails, store the pdevs[x] PTR_ERR() value
into the local ret variable before calling x86_android_tablet_remove()
to avoid using pdevs[] after it has been freed.

Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs")
Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()")
Cc: stable@vger.kernel.org
Reported-by: Aleksandr Burakov <a.burakov@rosalinux.ru>
Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov@rosalinux.ru/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com
9 months agoplatform/x86: wmi: Update WMI driver API documentation
Armin Wolf [Sat, 5 Oct 2024 21:38:24 +0000 (23:38 +0200)]
platform/x86: wmi: Update WMI driver API documentation

The WMI driver core now passes the WMI event data to legacy notify
handlers, so WMI devices sharing notification IDs are now being
handled properly.

Fixes: e04e2b760ddb ("platform/x86: wmi: Pass event data directly to legacy notify handlers")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005213825.701887-1-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: dell-ddv: Fix typo in documentation
Anaswara T Rajan [Sat, 5 Oct 2024 07:00:56 +0000 (12:30 +0530)]
platform/x86: dell-ddv: Fix typo in documentation

Fix typo in word 'diagnostics' in documentation.

Signed-off-by: Anaswara T Rajan <anaswaratrajan@gmail.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005070056.16326-1-anaswaratrajan@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: dell-sysman: add support for alienware products
Crag Wang [Fri, 4 Oct 2024 15:27:58 +0000 (23:27 +0800)]
platform/x86: dell-sysman: add support for alienware products

Alienware supports firmware-attributes and has its own OEM string.

Signed-off-by: Crag Wang <crag_wang@dell.com>
Link: https://lore.kernel.org/r/20241004152826.93992-1-crag_wang@dell.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86/intel: power-domains: Add Diamond Rapids support
Srinivas Pandruvada [Thu, 3 Oct 2024 21:55:54 +0000 (14:55 -0700)]
platform/x86/intel: power-domains: Add Diamond Rapids support

Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support
domaid id mappings.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: ISST: Add Diamond Rapids to support list
Srinivas Pandruvada [Thu, 3 Oct 2024 21:55:53 +0000 (14:55 -0700)]
platform/x86: ISST: Add Diamond Rapids to support list

Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding
to isst_cpu_ids.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake
Hans de Goede [Thu, 3 Oct 2024 20:26:13 +0000 (22:26 +0200)]
platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake

There have been multiple reports that the ACPI PM Timer disabling is
causing Sky and Kaby Lake systems to hang on all suspend (s2idle, s3,
hibernate) methods.

Remove the acpi_pm_tmr_ctl_offset and acpi_pm_tmr_disable_bit settings from
spt_reg_map to disable the ACPI PM Timer disabling on Sky and Kaby Lake to
fix the hang on suspend.

Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/linux-pm/18784f62-91ff-4d88-9621-6c88eb0af2b5@molgen.mpg.de/
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219346
Cc: Marek Maslanka <mmaslanka@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360/0596KF
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20241003202614.17181-2-hdegoede@redhat.com
9 months agoplatform/x86: dell-laptop: Do not fail when encountering unsupported batteries
Armin Wolf [Tue, 1 Oct 2024 21:28:35 +0000 (23:28 +0200)]
platform/x86: dell-laptop: Do not fail when encountering unsupported batteries

If the battery hook encounters a unsupported battery, it will
return an error. This in turn will cause the battery driver to
automatically unregister the battery hook.

On machines with multiple batteries however, this will prevent
the battery hook from handling the primary battery, since it will
always get unregistered upon encountering one of the unsupported
batteries.

Fix this by simply ignoring unsupported batteries.

Reviewed-by: Pali Rohár <pali@kernel.org>
Fixes: ab58016c68cc ("platform/x86:dell-laptop: Add knobs to change battery charge settings")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241001212835.341788-4-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoMAINTAINERS: Update Intel In Field Scan(IFS) entry
Jithu Joseph [Tue, 1 Oct 2024 17:08:08 +0000 (10:08 -0700)]
MAINTAINERS: Update Intel In Field Scan(IFS) entry

Ashok is no longer with Intel and his e-mail address will start bouncing
soon.  Update his email address to the new one he provided to ensure
correct contact details in the MAINTAINERS file.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Link: https://lore.kernel.org/r/20241001170808.203970-1-jithu.joseph@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoMerge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Sun, 6 Oct 2024 07:59:22 +0000 (03:59 -0400)]
Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
  system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
  common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
  code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
  down -- hopefully only temporarly

9 months agox86/reboot: emergency callbacks are now registered by common KVM code
Paolo Bonzini [Tue, 1 Oct 2024 14:34:58 +0000 (10:34 -0400)]
x86/reboot: emergency callbacks are now registered by common KVM code

Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>