Christoph Hellwig [Sat, 31 Aug 2024 08:10:49 +0000 (11:10 +0300)]
xfs: convert remaining trace points to pass pag structures
Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.
Christoph Hellwig [Sun, 1 Sep 2024 05:39:19 +0000 (08:39 +0300)]
xfs: pass the pag to the xrep_newbt_extent_class tracepoints
This requires moving a few of the callsites a little bit to ensure that
we already have the reference, but allows for the decoding to only happen
when tracing is actually enabled, and cleans up the callsites a bit.
Christoph Hellwig [Sun, 1 Sep 2024 05:34:33 +0000 (08:34 +0300)]
xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points
This requires holding the pag refcount a little longer, but allows for the
decoding to only happen when tracing is actually enabled, and cleans up the
callsites a bit.
Christoph Hellwig [Sun, 1 Sep 2024 05:26:13 +0000 (08:26 +0300)]
xfs: pass objects to the xrep_ibt_walk_rmap tracepoint
Pass the perag structure and the irec so that the decoding is only done
when tracing is actually enabled and the call sites look a lot neater,
and remove the pointless class indirection.
Christoph Hellwig [Sat, 31 Aug 2024 07:39:59 +0000 (10:39 +0300)]
xfs: pass objects to the xfs_irec_merge_{pre,post} trace points
Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.
Christoph Hellwig [Thu, 19 Sep 2024 07:02:34 +0000 (09:02 +0200)]
xfs: constify pag arguments to trace points
Trace points never modify their arguments. Mark all the pag objects
passed to trace points. The exception is the xfs_ag_resv_class, which
uses the xfs_perag_resv helper that can't be marked const due to
other users modifying the returned structure.
Christoph Hellwig [Sat, 31 Aug 2024 08:18:02 +0000 (11:18 +0300)]
xfs: remove the agno argument to xfs_free_ag_extent
xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer. Use that instead of passing the redundant argument,
and do the same for the tracepoint.
Christoph Hellwig [Thu, 19 Sep 2024 13:15:10 +0000 (15:15 +0200)]
xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail,
which isn't really helpful during log recovery. Remove the flag and
stick to the default GFP_KERNEL policies.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 1 Sep 2024 08:09:32 +0000 (11:09 +0300)]
xfs: merge the perag freeing helpers
There is no good reason to have two different routines for freeing perag
structures for the unmount and error cases. Add two arguments to specify
the range of AGs to free to xfs_free_perag, and use that to replace
xfs_free_unused_perag_range.
The addition RCU grace period for the error case is harmless, and the
extra check for the AG to actually exist is not required now that the
callers pass the exact known allocated range.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Sun, 8 Sep 2024 07:53:41 +0000 (10:53 +0300)]
xfs: pass the exact range to initialize to xfs_initialize_perag
Currently only the new agcount is passed to xfs_initialize_perag, which
requires lookups of existing AGs to skip them and complicates error
handling. Also pass the previous agcount so that the range that
xfs_initialize_perag operates on is exactly defined. That way the
extra lookups can be avoided, and error handling can clean up the
exact range from the old count to the last added perag structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Tue, 27 Aug 2024 05:03:21 +0000 (07:03 +0200)]
xfs: ensure st_blocks never goes to zero during COW writes
COW writes remove the amount overwritten either directly for delalloc
reservations, or in earlier deferred transactions than adding the new
amount back in the bmap map transaction. This means st_blocks on an
inode where all data is overwritten using the COW path can temporarily
show a 0 st_blocks. This can easily be reproduced with the pending
zoned device support where all writes use this path and trips the
check in generic/615, but could also happen on a reflink file without
that.
Fix this by temporarily add the pending blocks to be mapped to
i_delayed_blks while the item is queued.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Thu, 29 Aug 2024 04:08:41 +0000 (07:08 +0300)]
xfs: use xas_for_each_marked in xfs_reclaim_inodes_count
xfs_reclaim_inodes_count iterates over all AGs to sum up the reclaimable
inodes counts. There is no point in grabbing a reference to the them or
unlock the RCU critical section for each iteration, so switch to the
more efficient xas_for_each_marked iterator.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Thu, 29 Aug 2024 04:08:39 +0000 (07:08 +0300)]
xfs: simplify tagged perag iteration
Pass the old perag structure to the tagged loop helpers so that they can
grab the old agno before releasing the reference. This removes the need
to separately track the agno and the iterator macro, and thus also
obsoletes the for_each_perag_tag syntactic sugar.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Thu, 29 Aug 2024 04:08:38 +0000 (07:08 +0300)]
xfs: move the tagged perag lookup helpers to xfs_icache.c
The tagged perag helpers are only used in xfs_icache.c in the kernel code
and not at all in xfsprogs. Move them to xfs_icache.c in preparation for
switching to an xarray, for which I have no plan to implement the tagged
lookup functions for userspace.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Hongbo Li [Wed, 21 Aug 2024 06:43:55 +0000 (14:43 +0800)]
xfs: use LIST_HEAD() to simplify code
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().
Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Dan Carpenter [Fri, 12 Jul 2024 14:07:36 +0000 (09:07 -0500)]
xfs: remove unnecessary check
We checked that "pip" is non-NULL at the start of the if else statement
so there is no need to check again here. Delete the check.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
John Garry [Wed, 10 Jul 2024 10:31:19 +0000 (10:31 +0000)]
xfs: Use xfs set and clear mp state helpers
Use the set and clear mp state helpers instead of open-coding.
It is noted that in some instances calls to atomic operation set_bit() and
clear_bit() are being replaced with test_and_set_bit() and
test_and_clear_bit(), respectively, as there is no specific helpers for
set_bit() and clear_bit() only. However should be ok, as we are just
ignoring the returned value from those "test" variants.
Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Tue, 13 Aug 2024 07:39:42 +0000 (09:39 +0200)]
xfs: reclaim speculative preallocations for append only files
The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids
writes that don't append at the current EOF.
But the commit originally adding XFS_DIFLAG_APPEND support (commit a23321e766d in xfs xfs-import repository) also checked it to skip
releasing speculative preallocations, which doesn't make any sense.
Another commit (dd9f438e3290 in the xfs-import repository) later extended
that flag to also report these speculation preallocations which should
not exist in getbmap.
Remove these checks as nothing XFS_DIFLAG_APPEND implies that
preallocations beyond EOF should exist, but explicitly check for
XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that
discard preallocations on the first close as append only files aren't
expected to be written to only once.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Tue, 13 Aug 2024 07:39:41 +0000 (09:39 +0200)]
xfs: simplify extent lookup in xfs_can_free_eofblocks
xfs_can_free_eofblocks just cares if there is an extent beyond EOF.
Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent
as we've already checked that extents are read in earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Tue, 13 Aug 2024 07:39:40 +0000 (09:39 +0200)]
xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocks
If the XFS_EOFBLOCKS_RELEASED flag is set, we are not going to free the
eofblocks, so don't bother locking the inode or performing the checks in
xfs_can_free_eofblocks. Also switch to a test_and_set operation once
the iolock has been acquire so that only the caller that sets it actually
frees the post-EOF blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Darrick J. Wong [Tue, 13 Aug 2024 07:39:39 +0000 (09:39 +0200)]
xfs: only free posteof blocks on first close
Certain workloads fragment files on XFS very badly, such as a software
package that creates a number of threads, each of which repeatedly run
the sequence: open a file, perform a synchronous write, and close the
file, which defeats the speculative preallocation mechanism. We work
around this problem by only deleting posteof blocks the /first/ time a
file is closed to preserve the behavior that unpacking a tarball lays
out files one after the other with no gaps.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: rebased, updated comment, renamed the flag] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Dave Chinner [Tue, 13 Aug 2024 07:39:38 +0000 (09:39 +0200)]
xfs: don't free post-EOF blocks on read close
When we have a workload that does open/read/close in parallel with other
allocation, the file becomes rapidly fragmented. This is due to close()
calling xfs_file_release() and removing the speculative preallocation
beyond EOF.
Add a check for a writable context to xfs_file_release to skip the
post-EOF block freeing (an the similarly pointless flushing on truncate
down).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: wordsmithing, fix commit message] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: ported to the new ->release code structure] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Tue, 13 Aug 2024 07:39:37 +0000 (09:39 +0200)]
xfs: skip all of xfs_file_release when shut down
There is no point in trying to free post-EOF blocks when the file system
is shutdown, as it will just error out ASAP. Instead return instantly
when xfs_file_release is called on a shut down file system.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Christoph Hellwig [Tue, 13 Aug 2024 07:39:36 +0000 (09:39 +0200)]
xfs: don't bother returning errors from xfs_file_release
While ->release returns int, the only caller ignores the return value.
As we're only doing cleanup work there isn't much of a point in
return a value to start with, so just document the situation instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Merge tag 'btree-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: cleanups for inode rooted btree code [v4.2 8/8]
This series prepares the btree code to support realtime reverse mapping btrees
by refactoring xfs_ifork_realloc to be fed a per-btree ops structure so that it
can handle multiple types of inode-rooted btrees. It moves on to refactoring
the btree code to use the new realloc routines.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: standardize the btree maxrecs function parameters
xfs: replace shouty XFS_BM{BT,DR} macros
Merge tag 'xfs-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: various bug fixes for 6.12 [7/8]
Various bug fixes for 6.12.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'xfs-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
xfs: fix FITRIM reporting again
xfs: fix C++ compilation errors in xfs_fs.h
Merge tag 'quota-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: cleanups for quota mount [v4.2 6/8]
Refactor the quota file loading code in preparation for adding metadata
directory trees. Did you know that quotarm works even when quota isn't active?
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'quota-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: refactor loading quota inodes in the regular case
Merge tag 'rtalloc-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: cleanups for the realtime allocator [v4.2 5/8]
This third series cleans up the realtime allocator code so that it'll be
somewhat less difficult to figure out what on earth it's doing. We also
rearrange the fsmap code a bit.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtalloc-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c
xfs: rearrange xfs_fsmap.c a little bit
xfs: replace m_rsumsize with m_rsumblocks
xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
xfs: add xchk_setup_nothing and xchk_nothing helpers
xfs: make the rtalloc start hint a xfs_rtblock_t
xfs: factor out a xfs_rtallocate_align helper
xfs: rework the rtalloc fallback handling
xfs: factor out a xfs_rtallocate helper
xfs: clean up the ISVALID macro in xfs_bmap_adjacent
Merge tag 'rtalloc-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: fixes for the realtime allocator [v4.2 4/8]
While I was reviewing how to integrate realtime allocation groups with
the rt allocator, I noticed several bugs in the existing allocation code
with regards to calculating the maximum range of rtx to scan for free
space. This series fixes those range bugs and cleans up a few things
too.
I also added a few cleanups from Christoph.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtalloc-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: simplify xfs_rtalloc_query_range
xfs: remove xfs_rtb_to_rtxrem
xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
xfs: clean up xfs_rtallocate_extent_exact a bit
xfs: refactor aligning bestlen to prod
xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
xfs: don't return too-short extents from xfs_rtallocate_extent_block
xfs: ensure rtx mask/shift are correct after growfs
xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
Merge tag 'rtbitmap-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: clean up the rtbitmap code [v4.2 3/8]
Here are some cleanups and reorganization of the realtime bitmap code to share
more of that code between userspace and the kernel.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'rtbitmap-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
xfs: factor out rtbitmap/summary initialization helpers
xfs: factor out a xfs_last_rt_bmblock helper
xfs: factor out a xfs_growfs_rt_bmblock helper
xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
xfs: cleanup the calling convention for xfs_rtpick_extent
xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf
xfs: assert a valid limit in xfs_rtfind_forw
xfs: remove the limit argument to xfs_rtfind_back
xfs: make the RT rsum_cache mandatory
xfs: factor out a xfs_validate_rt_geometry helper
xfs: remove xfs_validate_rtextents
Merge tag 'metadir-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: cleanups before adding metadata directories [v4.2 2/8]
Before we start adding code for metadata directory trees, let's clean up
some warts in the realtime bitmap code and the inode allocator code.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'metadir-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: pass the icreate args object to xfs_dialloc
xfs: match on the global RT inode numbers in xfs_is_metadata_inode
xfs: validate inumber in xfs_iget
Merge tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA
xfs: atomic file content commits [v31.1 1/8]
This series creates XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE ioctls
to perform the exchange only if the target file has not been changed
since a given sampling point.
This new functionality uses the mechanism underlying EXCHANGE_RANGE to
stage and commit file updates such that reader programs will see either
the old contents or the new contents in their entirety, with no chance
of torn writes. A successful call completion guarantees that the new
contents will be seen even if the system fails. The pair of ioctls
allows userspace to perform what amounts to a compare and exchange
operation on entire file contents.
Note that there are ongoing arguments in the community about how best to
implement some sort of file data write counter that nfsd could also use
to signal invalidations to clients. Until such a thing is implemented,
this patch will rely on ctime/mtime updates.
Here are the proposed manual pages:
IOCTL-XFS-COMMIT-RANGE(2) System Calls ManualIOCTL-XFS-COMMIT-RANGE(2)
NAME
ioctl_xfs_start_commit - prepare to exchange the contents of
two files ioctl_xfs_commit_range - conditionally exchange the
contents of parts of two files
int ioctl(int file2_fd, XFS_IOC_START_COMMIT, struct xfs_com‐
mit_range *arg);
int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE, struct xfs_com‐
mit_range *arg);
DESCRIPTION
Given a range of bytes in a first file file1_fd and a second
range of bytes in a second file file2_fd, this ioctl(2) ex‐
changes the contents of the two ranges if file2_fd passes cer‐
tain freshness criteria.
Before exchanging the contents, the program must call the
XFS_IOC_START_COMMIT ioctl to sample freshness data for
file2_fd. If the sampled metadata does not match the file
metadata at commit time, XFS_IOC_COMMIT_RANGE will return
EBUSY.
Exchanges are atomic with regards to concurrent file opera‐
tions. Implementations must guarantee that readers see either
the old contents or the new contents in their entirety, even if
the system fails.
The system call parameters are conveyed in structures of the
following form:
The fields file1_fd, file1_offset, and length define the first
range of bytes to be exchanged.
The fields file2_fd, file2_offset, and length define the second
range of bytes to be exchanged.
The field file2_freshness is an opaque field whose contents are
determined by the kernel. These file attributes are used to
confirm that file2_fd has not changed by another thread since
the current thread began staging its own update.
Both files must be from the same filesystem mount. If the two
file descriptors represent the same file, the byte ranges must
not overlap. Most disk-based filesystems require that the
starts of both ranges must be aligned to the file block size.
If this is the case, the ends of the ranges must also be so
aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.
The field flags control the behavior of the exchange operation.
XFS_EXCHANGE_RANGE_TO_EOF
Ignore the length parameter. All bytes in file1_fd
from file1_offset to EOF are moved to file2_fd, and
file2's size is set to (file2_offset+(file1_length-
file1_offset)). Meanwhile, all bytes in file2 from
file2_offset to EOF are moved to file1 and file1's
size is set to (file1_offset+(file2_length-
file2_offset)).
XFS_EXCHANGE_RANGE_DSYNC
Ensure that all modified in-core data in both file
ranges and all metadata updates pertaining to the
exchange operation are flushed to persistent storage
before the call returns. Opening either file de‐
scriptor with O_SYNC or O_DSYNC will have the same
effect.
XFS_EXCHANGE_RANGE_FILE1_WRITTEN
Only exchange sub-ranges of file1_fd that are known
to contain data written by application software.
Each sub-range may be expanded (both upwards and
downwards) to align with the file allocation unit.
For files on the data device, this is one filesystem
block. For files on the realtime device, this is
the realtime extent size. This facility can be used
to implement fast atomic scatter-gather writes of
any complexity for software-defined storage targets
if all writes are aligned to the file allocation
unit.
XFS_EXCHANGE_RANGE_DRY_RUN
Check the parameters and the feasibility of the op‐
eration, but do not change anything.
RETURN VALUE
On error, -1 is returned, and errno is set to indicate the er‐
ror.
ERRORS
Error codes can be one of, but are not limited to, the follow‐
ing:
EBADF file1_fd is not open for reading and writing or is open
for append-only writes; or file2_fd is not open for
reading and writing or is open for append-only writes.
EBUSY The file2 inode number and timestamps supplied do not
match file2_fd.
EINVAL The parameters are not correct for these files. This
error can also appear if either file descriptor repre‐
sents a device, FIFO, or socket. Disk filesystems gen‐
erally require the offset and length arguments to be
aligned to the fundamental block sizes of both files.
EIO An I/O error occurred.
EISDIR One of the files is a directory.
ENOMEM The kernel was unable to allocate sufficient memory to
perform the operation.
ENOSPC There is not enough free space in the filesystem ex‐
change the contents safely.
EOPNOTSUPP
The filesystem does not support exchanging bytes between
the two files.
EPERM file1_fd or file2_fd are immutable.
ETXTBSY
One of the files is a swap file.
EUCLEAN
The filesystem is corrupt.
EXDEV file1_fd and file2_fd are not on the same mounted
filesystem.
CONFORMING TO
This API is XFS-specific.
USE CASES
Several use cases are imagined for this system call. Coordina‐
tion between multiple threads is performed by the kernel.
The first is a filesystem defragmenter, which copies the con‐
tents of a file into another file and wishes to exchange the
space mappings of the two files, provided that the original
file has not changed.
An example program might look like this:
int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct stat sb;
struct xfs_commit_range args = {
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};
/* make a fresh copy of the file with terrible alignment to avoid reflink */
clone_file_range(fd, NULL, temp_fd, NULL, 1, 0);
clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0);
/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed while defrag was underway
");
The second is a data storage program that wants to commit non-
contiguous updates to a file atomically. This program cannot
coordinate updates to the file and therefore relies on the ker‐
nel to reject the COMMIT_RANGE command if the file has been up‐
dated by someone else. This can be done by creating a tempo‐
rary file, calling FICLONE(2) to share the contents, and stag‐
ing the updates into the temporary file. The FULL_FILES flag
is recommended for this purpose. The temporary file can be
deleted or punched out afterwards.
/* gather file2's freshness information */
ioctl(fd, XFS_IOC_START_COMMIT, &args);
ioctl(temp_fd, FICLONE, fd);
/* append 1MB of records */
lseek(temp_fd, 0, SEEK_END);
write(temp_fd, data1, 1000000);
/* update record index */
pwrite(temp_fd, data1, 600, 98765);
pwrite(temp_fd, data2, 320, 54321);
pwrite(temp_fd, data2, 15, 0);
/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed before commit; will roll back
");
NOTES
Some filesystems may limit the amount of data or the number of
extents that can be exchanged in a single call.
SEE ALSO
ioctl(2)
XFS 2024-02-18 IOCTL-XFS-COMMIT-RANGE(2)
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: introduce new file range commit ioctls
Darrick J. Wong [Fri, 30 Aug 2024 22:37:21 +0000 (15:37 -0700)]
xfs: standardize the btree maxrecs function parameters
Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions. This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:20 +0000 (15:37 -0700)]
xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block. However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records. We /should/ be copying keys.
Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:17 +0000 (15:37 -0700)]
xfs: refactor loading quota inodes in the regular case
Create a helper function to load quota inodes in the case where the
dqtype and the sb quota inode fields correspond. This is true for
nearly all the iget callsites in the quota code, except for when we're
switching the group and project quota inodes. We'll need this in
subsequent patches to make the metadir handling less convoluted.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:19 +0000 (15:37 -0700)]
xfs: fix FITRIM reporting again
Don't report FITRIMming more bytes than possibly exist in the
filesystem.
Fixes: 410e8a18f8e93 ("xfs: don't bother reporting blocks trimmed via FITRIM") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:18 +0000 (15:37 -0700)]
xfs: fix C++ compilation errors in xfs_fs.h
Several people reported C++ compilation errors due to things that C
compilers allow but C++ compilers do not. Fix both of these problems,
and hope there aren't more of these brown paper bags in 2 months when we
finally get these fixes through the process into a released xfsprogs.
NOTE: I am submitting this bugfix over the objections of a former
maintainer, who insists that we should remove this function from the
published userspace ABI instead of fixing the C++ compilation errors.
No deprecation period, no discussion, just a hard drop of an already
provided and correct C function, which would be in contravention of
Linus' rules. IOWs, removing ABI that have already shipped in a
released kernel requires a careful deprecation period, so I will let
that maintainer run that process.
Reported-by: kernel@mattwhitlock.name Reported-by: sam@gentoo.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203 Fixes: 233f4e12bbb2c ("xfs: add parent pointer ioctls") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:08 +0000 (15:37 -0700)]
xfs: simplify xfs_rtalloc_query_range
There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t. Pass the rtxnum directly and
simply the interface.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:59 +0000 (15:36 -0700)]
xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
To prepare for being able to join an already locked rtbitmap inode to a
transaction split out separate helpers for joining the transaction from
the locking helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:36:49 +0000 (15:36 -0700)]
xfs: pass the icreate args object to xfs_dialloc
Pass the xfs_icreate_args object to xfs_dialloc since we can extract the
relevant mode (really just the file type) and parent inumber from there.
This simplifies the calling convention in preparation for the next
patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:36:47 +0000 (15:36 -0700)]
xfs: introduce new file range commit ioctls
This patch introduces two more new ioctls to manage atomic updates to
file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE. The
commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
does, but with the additional requirement that file2 cannot have changed
since some sampling point. The start-commit ioctl performs the sampling
of file attributes.
Note: This patch currently samples i_ctime during START_COMMIT and
checks that it hasn't changed during COMMIT_RANGE. This isn't entirely
safe in kernels prior to 6.12 because ctime only had coarse grained
granularity and very fast updates could collide with a COMMIT_RANGE.
With the multi-granularity ctime introduced by Jeff Layton, it's now
possible to update ctime such that this does not happen.
It is critical, then, that this patch must not be backported to any
kernel that does not support fine-grained file change timestamps.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:15 +0000 (15:37 -0700)]
xfs: rearrange xfs_fsmap.c a little bit
The order of the functions in this file has gotten a little confusing
over the years. Specifically, the two data device implementations
(bnobt and rmapbt) could be adjacent in the source code instead of split
in two by the logdev and rtdev fsmap implementations. We're about to
add more functionality to this file, so rearrange things now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:07 +0000 (15:37 -0700)]
xfs: remove xfs_rtb_to_rtxrem
Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem. Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:59 +0000 (15:36 -0700)]
xfs: factor out rtbitmap/summary initialization helpers
Add helpers to libxfs that can be shared by growfs and mkfs for
initializing the rtbitmap and summary, and by passing the optional data
pointer also by repair for rebuilding them. This will become even more
useful when the rtgroups feature adds a metadata header to each block,
which means even more shared code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor documentation and data advance tweaks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:49 +0000 (15:36 -0700)]
xfs: match on the global RT inode numbers in xfs_is_metadata_inode
Match the inode number instead of the inode pointers, as the inode
pointers in the superblock will go away soon.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: port to my tree, make the parameter a const pointer] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:15 +0000 (15:37 -0700)]
xfs: replace m_rsumsize with m_rsumblocks
Track the RT summary file size in blocks, just like the RT bitmap
file. While we have users of both units, blocks are used slightly
more often and this matches the bitmap file for consistency.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:06 +0000 (15:37 -0700)]
xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
This function tries to find a suitable free space extent starting from
a particular rtbitmap block. Some time ago, I added a clamping function
to prevent the free space scans from running off the end of the bitmap,
but I didn't quite get the logic right.
Let's say there's an allocation request with a minlen of 5 and a maxlen
of 32 and we're scanning the last rtbitmap block. If we come within 4
rtx of the end of the rt volume, maxlen will get clamped to 4. If the
next 3 rtx are free, we could have satisfied the allocation, but the
code setting partial besti/bestlen for "minlen < maxlen" will think that
we're doing a non-variable allocation and ignore it.
The root of this problem is overwriting maxlen; I should have stuffed
the results in a different variable, which would not have introduced
this bug.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:14 +0000 (15:37 -0700)]
xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused,
so remove them to simplify refactoring other rtbitmap helpers. They
can be added back or simply open coded when actually needed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:05 +0000 (15:37 -0700)]
xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
The near rt allocator employs two allocation strategies -- first it
tries to allocate at exactly @start. If that fails, it will pivot back
and forth around that starting point looking for an appropriately sized
free space.
However, I clamped maxlen ages ago to prevent the exact allocation scan
from running off the end of the rt volume. This, I realize, was
excessive. If the allocation request is (say) for 32 rtx but the start
position is 5 rtx from the end of the volume, we clamp maxlen to 5. If
the exact allocation fails, we then pivot back and forth looking for 5
rtx, even though the original intent was to try to get 32 rtx.
If we then find 5 rtx when we could have gotten 32 rtx, we've not done
as well as we could have. This may be moot if the caller immediately
comes back for more space, but it might not be. Either way, we can do
better here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:57 +0000 (15:36 -0700)]
xfs: factor out a xfs_growfs_rt_bmblock helper
Add a helper to contain the per-rtbitmap block logic in xfs_growfs_rt.
Note that this helper now allocates a new fake mount structure for
each rtbitmap block iteration instead of reusing the memory for an
entire growfs call. Compared to all the other work done when freeing
the blocks the overhead for this is in the noise and it keeps the code
nicely modular.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:13 +0000 (15:37 -0700)]
xfs: add xchk_setup_nothing and xchk_nothing helpers
Add common helpers for no-op scrubbing methods.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:05 +0000 (15:37 -0700)]
xfs: clean up xfs_rtallocate_extent_exact a bit
Before we start doing more surgery on the rt allocator, let's clean up
the exact allocator so that it doesn't change its arguments and uses the
helper introduced in the previous patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:56 +0000 (15:36 -0700)]
xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
Currently the various low-level RT allocator functions call into
xfs_rtallocate_range directly, which ties them into the locking protocol
for the RT bitmap. As these helpers already return the allocated range,
lift the call to xfs_rtallocate_range into xfs_bmap_rtalloc so that it
happens as high as possible in the stack, which will simplify future
changes to the locking protocol.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:12 +0000 (15:37 -0700)]
xfs: make the rtalloc start hint a xfs_rtblock_t
0 is a valid start RT extent, and with pending changes it will become
both more common and non-unique. Switch to pass a xfs_rtblock_t instead
so that we can use NULLRTBLOCK to determine if a hint was set or not.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:04 +0000 (15:37 -0700)]
xfs: refactor aligning bestlen to prod
There are two places in xfs_rtalloc.c where we want to make sure that a
count of rt extents is aligned with a particular prod(uct) factor. In
one spot, we actually use rounddown(), albeit unnecessarily if prod < 2.
In the other case, we open-code this rounding inefficiently by promoting
the 32-bit length value to a 64-bit value and then performing a 64-bit
division to figure out the subtraction.
Refactor this into a single helper that uses the correct types and
division method for the type, and skips the division entirely unless
prod is large enough to make a difference.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:55 +0000 (15:36 -0700)]
xfs: cleanup the calling convention for xfs_rtpick_extent
xfs_rtpick_extent never returns an error. Do away with the error return
and directly return the picked extent instead of doing that through a
call by reference argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:03 +0000 (15:37 -0700)]
xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
The loop conditional here is not quite correct because an rtbitmap block
can represent rtextents beyond the end of the rt volume. There's no way
that it makes sense to scan for free space beyond EOFS, so don't do it.
This overrun has been present since v2.6.0.
Also fix the type of bestlen, which was incorrectly converted.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:10 +0000 (15:37 -0700)]
xfs: rework the rtalloc fallback handling
xfs_rtallocate currently has two fallbacks, when an allocation fails:
1) drop the requested extent size alignment, if any, and retry
2) ignore the locality hint
Oddly enough it does those in order, as trying a different location
is more in line with what the user asked for, and does it in a very
unstructured way.
Lift the fallback to try to allocate without the locality hint into
xfs_rtallocate to both perform them in a more sensible order and to
clean up the code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Fri, 30 Aug 2024 22:37:02 +0000 (15:37 -0700)]
xfs: don't return too-short extents from xfs_rtallocate_extent_block
If xfs_rtallocate_extent_block is asked for a variable-sized allocation,
it will try to return the best-sized free extent, which is apparently
the largest one that it finds starting in this rtbitmap block. It will
then trim the size of the extent as needed to align it with prod.
However, it misses one thing -- rounding down the best-fit candidate to
the required alignment could make the extent shorter than minlen. In
the case where minlen > 1, we'd rather the caller relaxed its alignment
requirements and tried again, as the allocator already supports that.
Returning a too-short extent that causes xfs_bmapi_write to return
ENOSR if there aren't enough nmaps to handle multiple new allocations,
which can then cause filesystem shutdowns.
I haven't seen this happen on any production systems, but then I don't
think it's very common to set a per-file extent size hint on realtime
files. I tripped it while working on the rtgroups feature and pounding
on the realtime allocator enthusiastically.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:10 +0000 (15:37 -0700)]
xfs: factor out a xfs_rtallocate helper
Split out a helper from xfs_rtallocate that performs the actual
allocation. This keeps the scope of the xfs_rtalloc_args structure
contained, and prepares for rtgroups support.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:01 +0000 (15:37 -0700)]
xfs: ensure rtx mask/shift are correct after growfs
When growfs sets an extent size, it doesn't updated the m_rtxblklog and
m_rtxblkmask values, which could lead to incorrect usage of them if they
were set before and can't be used for the new extent size.
Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and
also use it when calculating the new RT geometry instead of disabling
the optimization there.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:37:00 +0000 (15:37 -0700)]
xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
After going great length to calculate the transaction reservation for
the new geometry, we should also use it to allocate the transaction it
was calculated for.
Fixes: 578bd4ce7100 ("xfs: recompute growfsrtfree transaction reservation while growing rt volume") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:52 +0000 (15:36 -0700)]
xfs: make the RT rsum_cache mandatory
Currently the RT mount code simply ignores an allocation failure for the
rsum_cache. The code mostly works fine with it, but not having it leads
to nasty corner cases in the growfs code that we don't really handle
well. Switch to failing the mount if we can't allocate the memory, the
file system would not exactly be useful in such a constrained environment
to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:51 +0000 (15:36 -0700)]
xfs: factor out a xfs_validate_rt_geometry helper
Split the RT geometry validation in the early mount code into a
helper than can be reused by repair (from which this code was
apparently originally stolen anyway).
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: u64 return value for calc_rbmblocks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Fri, 30 Aug 2024 22:36:50 +0000 (15:36 -0700)]
xfs: remove xfs_validate_rtextents
Replace xfs_validate_rtextents with an open coded check for 0
rtextents. The name for the function implies it does a lot more
than a zero check, which is more obvious when open coded.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs
Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
lock should not have been able to cause that...
- Fix a rare data corruption in the rebalance path, caught as a nonce
inconsistency on encrypted filesystems
- Revert lockless buffered write path
- Mark more errors as autofix"
* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
bcachefs: Mark more errors as autofix
bcachefs: Revert lockless buffered IO path
bcachefs: Fix bch2_extents_match() false positive
bcachefs: Fix failure to return error in data_update_index_update()
Kent Overstreet [Thu, 22 Aug 2024 15:47:32 +0000 (11:47 -0400)]
bcachefs: Mark more errors as autofix
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.
note that errors are still logged in the superblock, so we'll still know
that they happened.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.
Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.
It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.
Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.
My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.
Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sat, 31 Aug 2024 21:18:48 +0000 (09:18 +1200)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull misc fixes from Guenter Roeck.
These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
apparmor: fix policy_unpack_test on big endian systems
Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
microblaze: don't treat zero reserved memory regions as error
Linus Torvalds [Sat, 31 Aug 2024 21:07:44 +0000 (09:07 +1200)]
Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
previous fix for this driver was incomplete and broke the WLAN support
on some platforms. This addresses the issue.
- set the direction of the wlan-enable GPIO to output after
requesting it as-is"
* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
Bartosz Golaszewski [Fri, 23 Aug 2024 11:55:00 +0000 (13:55 +0200)]
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.