www.infradead.org Git - users/hch/xfsprogs.git/log

mkfs: break up the rest of the rtinit() function

Break up this really long function into smaller functions that each do
one thing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: clean up the rtinit() function

Clean up some of the warts in this function, like the inconsistent use
of @i for @error, missing comments, and make this more visually pleasing
by adding some whitespace between major sections. Some things are left
untouched for the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: use library functions for orphanage creation

Use new library functions to create lost+found.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: use library functions to reset root/rbm/rsum inodes

Use the iroot reset function to reset root inodes instead of open-coding
the reset routine. While we're at it, fix a longstanding memory leak if
the inode being reset actually had an xattr fork full of mappings.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: don't crash in get_inode_parent

The xfs_repair fuzz test suite encountered a crash in xfs_repair.  In
the fuzzed filesystem, inode 8388736 is a single-block directory where
the one dir data block has been trashed.  This inode maps to agno 1
agino 128, and all other inodes in that inode chunk are regular files.
Output is as follows:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
Metadata corruption detected at 0x565335fbd534, xfs_dir3_block block 0x4ebc78/0x1000
corrupt directory block 0 for inode 8388736
no . entry for directory 8388736
no .. entry for directory 8388736
problem with directory contents in inode 8388736
would have cleared inode 8388736
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
entry "S_IFDIR.FMT_BLOCK" at block 0 offset 1728 in directory inode 128 references free inode 8388736
        would clear inode number in entry at offset 1728...
        - agno = 1
entry "." at block 0 offset 64 in directory inode 8388736 references free inode 8388736
imap claims in-use inode 8388736 is free, would correct imap
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
./common/xfs: line 387: 84940 Segmentation fault      (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV

From the coredump, we crashed in get_inode_parent here because ptbl is a
NULL pointer:

if (ptbl->pmask & (1ULL << offset))  {

Directory inode 8388736 doesn't have a dotdot entry and phase 3 decides
to clear that inode, so it never calls set_inode_parent for 8388736.
Because the rest of the inodes in the chunk are regular files, phase 3
never calls set_inode_parent on the corresponding irec.  As a result,
neither irec->ino_un.plist nor irec->ino_un.ex_data->parents are ever
set to a parents array.

When phase 6 calls get_inode_parent to check the S_IFDIR.FMT_BLOCK
dirent from the root directory to inode 8388736, it sets ptbl to
irec->ino_un.ex_data->parents (which is still NULL) and walks off the
NULL pointer.

Because get_inode_parent already has the behavior that it can return
zero for "unknown parent", the correction is simple: check ptbl before
dereferencing it.  git blame says this code has been in xfsprogs since
the beginning of git, so I won't bother with a fixes tag.

Found by fuzzing bhdr.hdr.bno = zeroes in xfs/386.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: fix exchrange upgrade

Set the correct superblock field for the exchange-range feature upgrade.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: port the iunlink command to use the libxfs iunlink function

Now that we've ported the kernel's iunlink code to userspace, adapt the
debugger command to use it instead of duplicating the logic.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db/mdrestore/repair: don't use the incore struct xfs_sb for offsets into struct xfs_dsb

Source kernel commit: ac3a0275165b4f80d9b7b516d6a8f8b308644fff

Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct
(xfs_sb) to compute the address of sb_crc within the ondisk superblock
struct (xfs_dsb). This is a landmine if we ever change the layout of
the incore superblock (as we're about to do), so redefine the macro
to use xfs_dsb to compute the layout of xfs_dsb.

Port the userspace utilities to the new code just like we did for the
kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db/mkfs/xfs_repair: port to use XFS_ICREATE_UNLINKABLE

INIT_XATTRS is overloaded here -- it's set during the creat process when
we think that we're immediately going to set some ACL xattrs to save
time.  However, it's also used by the parent pointers code to enable the
attr fork in preparation to receive ppptr xattrs.  This results in
xfs_has_parent() branches scattered around the codebase to turn on
INIT_XATTRS.

Linkable files are created far more commonly than unlinkable temporary
files or directory tree roots, so we should centralize this logic in
xfs_inode_init.  For the three callers that don't want parent pointers
(online repiar tempfiles, unlinkable tempfiles, rootdir creation) we
provide an UNLINKABLE flag to skip attr fork initialization.

Port these three utilities to use XFS_ICREATE_UNLINKABLE the same as we
did for the kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[aalbersh: drop reference to kernel commit]

xfs_db: port the unlink command to use libxfs_droplink

Port this command to use the libxfs droplink implementation instead of
opencoding it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: xfs_finobt_count_blocks() walks the wrong btree

Source kernel commit: 95179935beadccaf0f0bb461adb778731e293da4

As a result of the factoring in commit 14dd46cf31f4 ("xfs: split
xfs_inobt_init_cursor"), mount started taking a long time on a
user's filesystem. For Anders, this made mount times regress from
under a second to over 15 minutes for a filesystem with only 30
million inodes in it.

Anders bisected it down to the above commit, but even then the bug
was not obvious. In this commit, over 20 calls to
xfs_inobt_init_cursor() were modified, and some we modified to call
a new function named xfs_finobt_init_cursor().

If that takes you a moment to reread those function names to see
what the rename was, then you have realised why this bug wasn't
spotted during review. And it wasn't spotted on inspection even
after the bisect pointed at this commit - a single missing "f" isn't
the easiest thing for a human eye to notice....

The result is that xfs_finobt_count_blocks() now incorrectly calls
xfs_inobt_init_cursor() so it is now walking the inobt instead of
the finobt. Hence when there are lots of allocated inodes in a
filesystem, mount takes a -long- time run because it now walks a
massive allocated inode btrees instead of the small, nearly empty
free inode btrees. It also means all the finobt space reservations
are wrong, so mount could potentially given ENOSPC on kernel
upgrade.

In hindsight, commit 14dd46cf31f4 should have been two commits - the
first to convert the finobt callers to the new API, the second to
modify the xfs_inobt_init_cursor() API for the inobt callers. That
would have made the bug very obvious during review.

Fixes: 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor")
Reported-by: Anders Blomdell <anders.blomdell@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: fix di_onlink checking for V1/V2 inodes

Source kernel commit: e21fea4ac3cf12eba1921fbbf7764bf69c6d4b2c

"KjellR" complained on IRC that an old V4 filesystem suddenly stopped
mounting after upgrading from 6.9.11 to 6.10.3, with the following splat
when trying to read the rt bitmap inode:

00000000: 49 4e 80 00 01 02 00 01 00 00 00 00 00 00 00 00  IN..............
00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000020: 00 00 00 00 00 00 00 00 43 d2 a9 da 21 0f d6 30  ........C...!..0
00000030: 43 d2 a9 da 21 0f d6 30 00 00 00 00 00 00 00 00  C...!..0........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00  ................
00000060: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

As Dave Chinner points out, this is a V1 inode with both di_onlink and
di_nlink set to 1 and di_flushiter == 0.  In other words, this inode was
formatted this way by mkfs and hasn't been touched since then.

Back in the old days of xfsprogs 3.2.3, I observed that libxfs_ialloc
would set di_nlink, but if the filesystem didn't have NLINK, it would
then set di_version = 1.  libxfs_iflush_int later sees the V1 inode and
copies the value of di_nlink to di_onlink without zeroing di_onlink.

Eventually this filesystem must have been upgraded to support NLINK
because 6.10 doesn't support !NLINK filesystems, which is how we tripped
over this old behavior.  The filesystem doesn't have a realtime section,
so that's why the rtbitmap inode has never been touched.

Fix this by removing the di_onlink/di_nlink checking for all V1/V2
inodes because this is a muddy mess.  The V3 inode handling code has
always supported NLINK and written di_onlink==0 so keep that check.
The removal of the V1 inode handling code when we dropped support for
!NLINK obscured this old behavior.

Reported-by: kjell.m.randa@gmail.com
Fixes: 40cb8613d612 ("xfs: check unused nlink fields in the ondisk inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: remove unused parameter in macro XFS_DQUOT_LOGRES

Source kernel commit: af5d92f2fad818663da2ce073b6fe15b9d56ffdc

In the macro definition of XFS_DQUOT_LOGRES, a parameter is accepted,
but it is not used. Hence, it should be removed.

This patch has only passed compilation test, but it should be fine.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: get rid of xfs_ag_resv_rmapbt_alloc

Source kernel commit: 49cdc4e834e46d7c11a91d7adcfa04f56d19efaf

The pag in xfs_ag_resv_rmapbt_alloc() is already held when the struct
xfs_btree_cur is initialized in xfs_rmapbt_init_cursor(), so there is no
need to get pag again.

On the other hand, in xfs_rmapbt_free_block(), the similar function
xfs_ag_resv_rmapbt_free() was removed in commit 92a005448f6f ("xfs: get
rid of unnecessary xfs_perag_{get,put} pairs"), xfs_ag_resv_rmapbt_alloc()
was left because scrub used it, but now scrub has removed it. Therefore,
we could get rid of xfs_ag_resv_rmapbt_alloc() just like the rmap free
block, make the code cleaner.

Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: background AIL push should target physical space

Source kernel commit: b50b4c49d8d79af05ac3bb3587f58589713139cc

Currently the AIL attempts to keep 25% of the "log space" free,
where the current used space is tracked by the reserve grant head.
That is, it tracks both physical space used plus the amount reserved
by transactions in progress.

When we start tail pushing, we are trying to make space for new
reservations by writing back older metadata and the log is generally
physically full of dirty metadata, and reservations for modifications
in flight take up whatever space the AIL can physically free up.

Hence we don't really need to take into account the reservation
space that has been used - we just need to keep the log tail moving
as fast as we can to free up space for more reservations to be made.
We know exactly how much physical space the journal is consuming in
the AIL (i.e. max LSN - min LSN) so we can base push thresholds
directly on this state rather than have to look at grant head
reservations to determine how much to physically push out of the
log.

This also allows code that needs to know if log items in the current
transaction need to be pushed or re-logged to simply sample the
current target - they don't need to calculate the current target
themselves. This avoids the need for any locking when doing such
checks.

Further, moving to a physical target means we don't need "push all
until empty semantics" like were introduced in the previous patch.
We can now test and clear the "push all" as a one-shot command to
set the target to the current head of the AIL. This allows the
xfsaild to maximise the use of log space right up to the point where
conditions indicate that the xfsaild is not keeping up with load and
it needs to work harder, and as soon as those constraints go away
(i.e. external code no longer needs everything pushed) the xfsaild
will return to maintaining the normal 25% free space thresholds.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: AIL doesn't need manual pushing

Source kernel commit: 9adf40249e6cfd7231c2973bb305f6c20902bfd9

We have a mechanism that checks the amount of log space remaining
available every time we make a transaction reservation. If the
amount of space is below a threshold (25% free) we push on the AIL
to tell it to do more work. To do this, we end up calculating the
LSN that the AIL needs to push to on every reservation and updating
the push target for the AIL with that new target LSN.

This is silly and expensive. The AIL is perfectly capable of
calculating the push target itself, and it will always be running
when the AIL contains objects.

What the target does is determine if the AIL needs to do
any work before it goes back to sleep. If we haven't run out of
reservation space or memory (or some other push all trigger), it
will simply go back to sleep for a while if there is more than 25%
of the journal space free without doing anything.

If there are items in the AIL at a lower LSN than the target, it
will try to push up to the target or to the point of getting stuck
before going back to sleep and trying again soon after.`

Hence we can modify the AIL to calculate it's own 25% push target
before it starts a push using the same reserve grant head based
calculation as is currently used, and remove all the places where we
ask the AIL to push to a new 25% free target. We can also drop the
minimum free space size of 256BBs from the calculation because the
25% of a minimum sized log is *always going to be larger than
256BBs.

This does still require a manual push in certain circumstances.
These circumstances arise when the AIL is not full, but the
reservation grants consume the entire of the free space in the log.
In this case, we still need to push on the AIL to free up space, so
when we hit this condition (i.e. reservation going to sleep to wait
on log space) we do a single push to tell the AIL it should empty
itself. This will keep the AIL moving as new reservations come in
and want more space, rather than keep queuing them and having to
push the AIL repeatedly.

The reason for using the "push all" when grant space runs out is
that we can run out of grant space when there is more than 25% of
the log free. Small logs are notorious for this, and we have a hack
in the log callback code (xlog_state_set_callback()) where we push
the AIL because the *head* moved) to ensure that we kick the AIL
when we consume space in it because that can push us over the "less
than 25% available" available that starts tail pushing back up
again.

Hence when we run out of grant space and are going to sleep, we have
to consider that the grant space may be consuming almost all the log
space and there is almost nothing in the AIL. In this situation, the
AIL pins the tail and moving the tail forwards is the only way the
grant space will come available, so we have to force the AIL to push
everything to guarantee grant space will eventually be returned.
Hence triggering a "push all" just before sleeping removes all the
nasty corner cases we have in other parts of the code that work
around the "we didn't ask the AIL to push enough to free grant
space" condition that leads to log space hangs...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: Avoid races with cnt_btree lastrec updates

Source kernel commit: 94a0333b9212a114d19096a77903f76d0d5bca26

A concurrent file creation and little writing could unexpectedly return
-ENOSPC error since there is a race window that the allocator could get
the wrong agf->agf_longest.

Write file process steps:
1) Find the entry that best meets the conditions, then calculate the start
address and length of the remaining part of the entry after allocation.
2) Delete this entry and update the -current- agf->agf_longest.
3) Insert the remaining unused parts of this entry based on the
calculations in 1), and update the agf->agf_longest again if necessary.

Create file process steps:
1) Check whether there are free inodes in the inode chunk.
2) If there is no free inode, check whether there has space for creating
inode chunks, perform the no-lock judgment first.
3) If the judgment succeeds, the judgment is performed again with agf lock
held. Otherwire, an error is returned directly.

If the write process is in step 2) but not go to 3) yet, the create file
process goes to 2) at this time, it may be mistaken for no space,
resulting in the file system still has space but the file creation fails.

We have sent two different commits to the community in order to fix this
problem[1][2]. Unfortunately, both solutions have flaws. In [2], I
discussed with Dave and Darrick, realized that a better solution to this
problem requires the "last cnt record tracking" to be ripped out of the
generic btree code. And surprisingly, Dave directly provided his fix code.
This patch includes appropriate modifications based on his tmp-code to
address this issue.

The entire fix can be roughly divided into two parts:
1) Delete the code related to lastrec-update in the generic btree code.
2) Place the process of updating longest freespace with cntbt separately
to the end of the cntbt modifications. Move the cursor to the rightmost
firstly, and update the longest free extent based on the record.

Note that we can not update the longest with xfs_alloc_get_rec() after
find the longest record, as xfs_verify_agbno() may not pass because
pag->block_count is updated on the outside. Therefore, use
xfs_btree_get_rec() as a replacement.

[1] https://lore.kernel.org/all/20240419061848.1032366-2-yebin10@huawei.com
[2] https://lore.kernel.org/all/20240604071121.3981686-1-wozizhi@huawei.com

Reported by: Ye Bin <yebin10@huawei.com>

Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: move xfs_refcount_update_defer_add to xfs_refcount_item.c

Source kernel commit: 783e8a7c9cab6744ebc5dfe75081248ac39181b2

Move the code that adds the incore xfs_refcount_update_item deferred
work data to a transaction live with the CUI log item code. This means
that the refcount code no longer has to know about the inner workings of
the CUI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: simplify usage of the rcur local variable in xfs_refcount_finish_one

Source kernel commit: e51987a12cb57ca3702bff5df8a615037b2c8f8a

Only update rcur when we know the final *pcur value.

Inspired-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: don't bother calling xfs_refcount_finish_one_cleanup in xfs_refcount_finish_one

Source kernel commit: bac3f784925299b5e69a857e7e03e59c88aa14be

In xfs_refcount_finish_one we know the cursor is non-zero when calling
xfs_refcount_finish_one_cleanup and we pass a 0 error variable. This
means xfs_refcount_finish_one_cleanup is just doing a
xfs_btree_del_cursor.

Open code that and move xfs_refcount_finish_one_cleanup to
fs/xfs/xfs_refcount_item.c.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: reuse xfs_refcount_update_cancel_item

Source kernel commit: 8aef79928b3ddd8c10a3235f982933addc15a977

Reuse xfs_refcount_update_cancel_item to put the AG/RTG and free the
item in a few places that currently open code the logic.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: add a ci_entry helper

Source kernel commit: 0e9254861f980bd60a58b7c2b57ba0414c038409

Add a helper to translate from the item list head to the
refcount_intent_item structure and use it so shorten assignments and
avoid the need for extra local variables.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: clean up refcount log intent item tracepoint callsites

Source kernel commit: 886f11c797722650d98c554b28e66f12317a33e4

Pass the incore refcount intent structure to the tracepoints instead of
open-coding the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: pass btree cursors to refcount btree tracepoints

Source kernel commit: 8fbac2f1a0947dc45ecf13e9b5aa17b5942b4a2d

Prepare the rest of refcount btree tracepoints for use with realtime
reflink by making them take the btree cursor object as a parameter.
This will save us a lot of trouble later on.

Remove the xfs_refcount_recover_extent tracepoint since it's already
covered by other refcount tracepoints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create specialized classes for refcount tracepoints

Source kernel commit: bb0efb0d0a2885b4c65ca31e2815da2281b99153

The only user of the "ag" tracepoint event classes is the refcount
btree, so rename them to make that obvious and make them take the btree
cursor to simplify the arguments. This will save us a lot of trouble
later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: give refcount btree cursor error tracepoints their own class

Source kernel commit: 7cf2663ff1cfb20f5fe025122016b68920b28041

Convert all the refcount tracepoints to use the btree error tracepoint
class.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.c

Source kernel commit: ea7b0820d960d5a3ee72bc67cbd8b5d47c67aa4c

Move the code that adds the incore xfs_rmap_update_item deferred work
data to a transaction to live with the RUI log item code. This means
that the rmap code no longer has to know about the inner workings of the
RUI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: simplify usage of the rcur local variable in xfs_rmap_finish_one

Source kernel commit: 905af72610d90f58f994feff4ead1fc258f5d2b1

Only update rcur when we know the final *pcur value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[djwong: don't leave the caller with a dangling ref]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_one

Source kernel commit: 8363b4361997044ecb99880a1a9bfdebf9145eed

In xfs_rmap_finish_one we known the cursor is non-zero when calling
xfs_rmap_finish_one_cleanup and we pass a 0 error variable. This means
xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor.

Open code that and move xfs_rmap_finish_one_cleanup to
fs/xfs/xfs_rmap_item.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor porting changes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: reuse xfs_rmap_update_cancel_item

Source kernel commit: 37f9d1db03ba0511403c5d25ba0baaddf5208ba7

Reuse xfs_rmap_update_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: add a ri_entry helper

Source kernel commit: f93963779b438a33ca4b13384c070a6864ce2b2b

Add a helper to translate from the item list head to the
rmap_intent_item structure and use it so shorten assignments
and avoid the need for extra local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: clean up rmap log intent item tracepoint callsites

Source kernel commit: fbe8c7e167a6b226ae0234c26ebb65d8401473a5

Pass the incore rmap structure to the tracepoints instead of open-coding
the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: pass btree cursors to rmap btree tracepoints

Source kernel commit: 47492ed124219b37acf65cd931c1e45d5bc0c274

Prepare the rmap btree tracepoints for use with realtime rmap btrees by
making them take the btree cursor object as a parameter. This will save
us a lot of trouble later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: give rmap btree cursor error tracepoints their own class

Source kernel commit: 71f5a17e526775f001f643c9d54e5b59fa29d7ac

Create a new tracepoint class for btree-related errors, then convert all
the rmap tracepoints to use it. Also fix the one tracepoint that was
abusing the old class by making it a separate tracepoint.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: move xfs_extent_free_defer_add to xfs_extfree_item.c

Source kernel commit: 84a3c1576c5aade32170fae6c61d51bd2d16010f

Move the code that adds the incore xfs_extent_free_item deferred work
data to a transaction to live with the EFI log item code. This means
that the allocator code no longer has to know about the inner workings
of the EFI log items.

As a consequence, we can get rid of the _{get,put}_group helpers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: remove xfs_defer_agfl_block

Source kernel commit: 7272f77c67c0710918e5678266f8dad6e3bfc8d2

xfs_free_extent_later can handle the extra AGFL special casing with
very little extra logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: remove duplicate asserts in xfs_defer_extent_free

Source kernel commit: 851a6781895a0f6e0ba75168dc7aecc132d13e6a

The bno/len verification is already done by the calls to
xfs_verify_rtbext / xfs_verify_fsbext, and reporting a corruption error
seem like the better handling than tripping an assert anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: reuse xfs_extent_free_cancel_item

Source kernel commit: 61665fae4e4302f2a48de56749640a9f1a4c2ec5

Reuse xfs_extent_free_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: add a xefi_entry helper

Source kernel commit: 649c0c2b86ee944a1a9962b310b1b97ead12e97a

Add a helper to translate from the item list head to the
xfs_extent_free_item structure and use it so shorten assignments
and avoid the need for extra local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: pass the fsbno to xfs_perag_intent_get

Source kernel commit: 62d597a197e390a89eadff60b98231e91b32ab83

All callers of xfs_perag_intent_get have a fsbno and need boilerplate
code to turn that into an agno. Just pass the fsbno to
xfs_perag_intent_get and look up the agno there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: convert "skip_discard" to a proper flags bitset

Source kernel commit: 980faece91a60c279e7c24cb1d1a378bbbb74bb9

Convert the boolean to skip discard on free into a proper flags field so
that we can add more flags in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: clean up extent free log intent item tracepoint callsites

Source kernel commit: 4e0e2c0fe35b44cd4db6a138ed4316178ed60b5c

Pass the incore EFI structure to the tracepoints instead of open-coding
the argument passing. This cleans up the call sites a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb

Source kernel commit: ac3a0275165b4f80d9b7b516d6a8f8b308644fff

Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct
(xfs_sb) to compute the address of sb_crc within the ondisk superblock
struct (xfs_dsb). This is a landmine if we ever change the layout of
the incore superblock (as we're about to do), so redefine the macro
to use xfs_dsb to compute the layout of xfs_dsb.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: move dirent update hooks to xfs_dir2.c

Source kernel commit: 62bbf50bea21b1c76990fd1bae58a65660a11c27

Move the directory entry update hook code to xfs_dir2 so that it is
mostly consolidated with the higher level directory functions. Retain
the exports so that online fsck can still send notifications through the
hooks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create libxfs helper to rename two directory entries

Source kernel commit: 28d0d813444645689fefa232bcf88e86a5a3a746

Create a new libxfs function to rename two directory entries. The
upcoming metadata directory feature will need this to replace a metadata
inode directory entry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create libxfs helper to exchange two directory entries

Source kernel commit: a55712b35c065eee4ab1195233a5478fb7c93efa

Create a new libxfs function to exchange two directory entries.
The upcoming metadata directory feature will need this to replace a
metadata inode directory entry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create libxfs helper to remove an existing inode/name from a directory

Source kernel commit: 90636e4531a8bfb5ef37d38a76eb97e5f5793deb

Create a new libxfs function to remove a (name, inode) entry from a
directory. The upcoming metadata directory feature will need this to
create a metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist inode free function to libxfs

Source kernel commit: 1964435d19d947b8626379d09db3e33b9669f333

Create a libxfs helper function that marks an inode free on disk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create libxfs helper to link an existing inode into a directory

Source kernel commit: c1f0bad4232fd309b2fe849153fcf473e775b1f7

Create a new libxfs function to link an existing inode into a directory.
The upcoming metadata directory feature will need this to create a
metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create libxfs helper to link a new inode into a directory

Source kernel commit: 1fa2e81957cf11620867729fb613b121692ee0d3

Create a new libxfs function to link a newly created inode into a
directory. The upcoming metadata directory feature will need this to
create a metadata directory tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: separate the icreate logic around INIT_XATTRS

Source kernel commit: b11b11e3b7a72606cfef527255a9467537bcaaa5

INIT_XATTRS is overloaded here -- it's set during the creat process when
we think that we're immediately going to set some ACL xattrs to save
time.  However, it's also used by the parent pointers code to enable the
attr fork in preparation to receive ppptr xattrs.  This results in
xfs_has_parent() branches scattered around the codebase to turn on
INIT_XATTRS.

Linkable files are created far more commonly than unlinkable temporary
files or directory tree roots, so we should centralize this logic in
xfs_inode_init.  For the three callers that don't want parent pointers
(online repiar tempfiles, unlinkable tempfiles, rootdir creation) we
provide an UNLINKABLE flag to skip attr fork initialization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist xfs_{bump,drop}link to libxfs

Source kernel commit: a9e583d34facc64b6edf3c9afb2ff4891038176d

Move xfs_bumplink and xfs_droplink to libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist xfs_iunlink to libxfs

Source kernel commit: b8a6107921ca799330ff3efdd154b7fa0ff54582

Move xfs_iunlink and xfs_iunlink_remove to libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist new inode initialization functions to libxfs

Source kernel commit: e9d2b35bb9d3ff372fad27998fc3969ced3f563d

Move all the code that initializes a new inode's attributes from the
icreate_args structure and the parent directory into libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: implement get_random_u32

Actually query the kernel for some random bytes instead of returning
zero, if that's possible. The most noticeable effect of this is that
mkfs will now create the rtbitmap file, the rtsummary file, and children
of the root directory with a nonzero generation. Apparently xfsdump
requires that the root directory have a generation number of zero.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: remove libxfs_dir_ialloc

This function no longer exists in the kernel, and it's not really needed
in userspace either. There are two users of it: repair and mkfs.
xfs_repair and xfs_db do not have useful cred and fsxattr structures so
they can call libxfs_dialloc and libxfs_icreate directly. For mkfs
we'll move the guts of libxfs_dir_ialloc into proto.c as a creatproto
function that handles setting user/group ids, and move struct cred to
mkfs since it's now the only user.

This gets us ready to hoist the rest of the inode initialization code to
libxfs for metadata directories.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: backport inode init code from the kernel

Reorganize the userspace inode initialization code to more closely
resemble its kernel counterpart. This is preparation to hoist the
initialization routines to libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: split new inode creation into two pieces

Source kernel commit: 38fd3d6a956f1b104f11cd6eee116c54bfe458c4

There are two parts to initializing a newly allocated inode: setting up
the incore structures, and initializing the new inode core based on the
parent inode and the current user's environment. The initialization
code is not specific to the kernel, so we would like to share that with
userspace by hoisting it to libxfs. Therefore, split xfs_icreate into
separate functions to prepare for the next few patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: split new inode creation into two pieces

Source kernel commit: 38fd3d6a956f1b104f11cd6eee116c54bfe458c4

There are two parts to initializing a newly allocated inode: setting up
the incore structures, and initializing the new inode core based on the
parent inode and the current user's environment. The initialization
code is not specific to the kernel, so we would like to share that with
userspace by hoisting it to libxfs. Therefore, split xfs_icreate into
separate functions to prepare for the next few patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: pass flags2 from parent to child when creating files

When mkfs creates a new file as a child of an existing directory, we
should propagate the flags2 field from parent to child like the kernel
does. This ensures that mkfs propagates cowextsize hints properly when
protofiles are in use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: when creating a file in a directory, set the project id based on the parent

When we're creating a file as a child of an existing directory, use
xfs_get_initial_prid to have the child inherit the project id of the
directory if the directory has PROJINHERIT set, just like the kernel
does. This fixes mkfs project id propagation with -d projinherit=X when
protofiles are in use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: set access time when creating files

Set the access time on files that we're creating, to match the behavior
of the kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: rearrange libxfs_trans_ichgtime call when creating inodes

Rearrange the libxfs_trans_ichgtime call in libxfs_ialloc so that we
call it once with the flags we want.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: implement atime updates in xfs_trans_ichgtime

Source kernel commit: 3d1dfb6df9b7b9ffc95499b9ddd92d949e5a60d2

Enable xfs_trans_ichgtime to change the inode access time so that we can
use this function to set inode times when allocating inodes instead of
open-coding it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: pack icreate initialization parameters into a separate structure

Source kernel commit: ba4b39fe4c011078469dcd28f51447d75852d21c

Callers that want to create an inode currently pass all possible file
attribute values for the new inode into xfs_init_new_inode as ten
separate parameters.  This causes two code maintenance issues: first, we
have large multi-line call sites which programmers must read carefully
to make sure they did not accidentally invert a value.  Second, all
three file id parameters must be passed separately to the quota
functions; any discrepancy results in quota count errors.

Clean this up by creating a new icreate_args structure to hold all this
information, some helpers to initialize them properly, and make the
callers pass this structure through to the creation function, whose name
we shorten to xfs_icreate.  This eliminates the issues, enables us to
keep the inode init code in sync with userspace via libxfs, and is
needed for future metadata directory tree management.

(A subsequent cleanup will also fix the quota alloc calls.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: pack icreate initialization parameters into a separate structure

Source kernel commit: ba4b39fe4c011078469dcd28f51447d75852d21c

Callers that want to create an inode currently pass all possible file
attribute values for the new inode into xfs_init_new_inode as ten
separate parameters.  This causes two code maintenance issues: first, we
have large multi-line call sites which programmers must read carefully
to make sure they did not accidentally invert a value.  Second, all
three file id parameters must be passed separately to the quota
functions; any discrepancy results in quota count errors.

Clean this up by creating a new icreate_args structure to hold all this
information, some helpers to initialize them properly, and make the
callers pass this structure through to the creation function, whose name
we shorten to xfs_icreate.  This eliminates the issues, enables us to
keep the inode init code in sync with userspace via libxfs, and is
needed for future metadata directory tree management.

(A subsequent cleanup will also fix the quota alloc calls.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: pass IGET flags through to xfs_iread

Change the lock_flags parameter to iget_flags so that we can supply
XFS_IGET_ flags in future patches. All callers of libxfs_iget and
libxfs_trans_iget pass zero for this parameter and there are no inode
locks in xfsprogs, so there's no behavior change here.

Port the kernel's version of the xfs_inode_from_disk callsite.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: put all the inode functions in a single file

Move all the inode functions into a single source code file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: hoist project id get/set functions to libxfs

Source kernel commit: fcea5b35f36233c04003ab8b3eb081b5e20e1aa4

Move the project id get and set functions into libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist inode flag conversion functions to libxfs

Source kernel commit: b7c477be396948ce88ea591b91070fa68ac12437

Hoist the inode flag conversion functions into libxfs so that we can
keep them in sync. Do this by creating a new xfs_inode_util.c file in
libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: hoist extent size helpers to libxfs

Source kernel commit: acdddbe168040372a8b6b9b5876b92b715322910

Move the extent size helpers to xfs_bmap.c in libxfs since they're used
there already.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: Remove header files which are included more than once

Source kernel commit: a330cae8a7147890262b06e1aa13db048e3b130f

Following warning is reported, so remove these duplicated header
including:

./fs/xfs/libxfs/xfs_trans_resv.c: xfs_da_format.h is included more than once.
./fs/xfs/scrub/quota_repair.c: xfs_format.h is included more than once.
./fs/xfs/xfs_handle.c: xfs_da_btree.h is included more than once.
./fs/xfs/xfs_qm_bhv.c: xfs_mount.h is included more than once.
./fs/xfs/xfs_trace.c: xfs_bmap.h is included more than once.

This is just a clean code, no logic changed.

Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: don't walk off the end of a directory data block

Source kernel commit: 0c7fcdb6d06cdf8b19b57c17605215b06afa864a

This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry
to make sure don't stray beyond valid memory region. Before patching, the
loop simply checks that the start offset of the dup and dep is within the
range. So in a crafted image, if last entry is xfs_dir2_data_unused, we
can change dup->length to dup->length-1 and leave 1 byte of space. In the
next traversal, this space will be considered as dup or dep. We may
encounter an out of bound read when accessing the fixed members.

In the patch, we make sure that the remaining bytes large enough to hold
an unused entry before accessing xfs_dir2_data_unused and
xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make
sure that the remaining bytes large enough to hold a dirent with a
single-byte name before accessing xfs_dir2_data_entry.

Signed-off-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs: avoid redundant AGFL buffer invalidation

Source kernel commit: d40c2865bdbbbba6418436b0a877daebe1d7c63e

Currently AGFL blocks can be filled from the following three sources:
- allocbt free blocks, as in xfs_allocbt_free_block();
- rmapbt free blocks, as in xfs_rmapbt_free_block();
- refilled from freespace btrees, as in xfs_alloc_fix_freelist().

Originally, allocbt free blocks would be marked as stale only when they
put back in the general free space pool as Dave mentioned on IRC, "we
don't stale AGF metadata btree blocks when they are returned to the
AGFL .. but once they get put back in the general free space pool, we
have to make sure the buffers are marked stale as the next user of
those blocks might be user data...."

However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks
moved to the free list") and commit edfd9dd54921 ("xfs: move buffer
invalidation to xfs_btree_free_block"), even allocbt / bmapbt free
blocks will be invalidated immediately since they may fail to pass
V5 format validation on writeback even writeback to free space would be
safe.

IOWs, IMHO currently there is actually no difference of free blocks
between AGFL freespace pool and the general free space pool.  So let's
avoid extra redundant AGFL buffer invalidation, since otherwise we're
currently facing unnecessary xfs_log_force() due to xfs_trans_binval()
again on buffers already marked as stale before as below:

[  333.507469] Call Trace:
[  333.507862]  xfs_buf_find+0x371/0x6a0       <- xfs_buf_lock
[  333.508451]  xfs_buf_get_map+0x3f/0x230
[  333.509062]  xfs_trans_get_buf_map+0x11a/0x280
[  333.509751]  xfs_free_agfl_block+0xa1/0xd0
[  333.510403]  xfs_agfl_free_finish_item+0x16e/0x1d0
[  333.511157]  xfs_defer_finish_noroll+0x1ef/0x5c0
[  333.511871]  xfs_defer_finish+0xc/0xa0
[  333.512471]  xfs_itruncate_extents_flags+0x18a/0x5e0
[  333.513253]  xfs_inactive_truncate+0xb8/0x130
[  333.513930]  xfs_inactive+0x223/0x270

xfs_log_force() will take tens of milliseconds with AGF buffer locked.
It becomes an unnecessary long latency especially on our PMEM devices
with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same
time are stuck due to the same AGF lock.  Removing the double
invalidation on the AGFL blocks does not make this issue go away, but
this patch fixes for our workloads in reality and it should also work
by the code analysis.

Note that I'm not sure I need to remove another redundant one in
xfs_alloc_ag_vextent_small() since it's unrelated to our workloads.
Also fstests are passed with this patch.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

xfs_io: add RWF_ATOMIC support to pwrite

Enable testing write behavior with the per-io RWF_ATOMIC flag.

Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>

libfrog: emulate deprecated attrlist functionality in libattr

Break our dependence on the (deprecated) libattr by emulating the
necessary pieces in libfrog. It's a little sketchy to open-code these
symbols, but they're originally from XFS so we trust ourselves.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

misc: clean up code around attr_list_by_handle calls

Reduce stack usage of the attr_list_by_handle calls by allocating the
buffers dynamically. Remove some redundant bits while we're at it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

debian: Correct the day-of-week on 2024-09-04

Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: Modernize build script

- Use autoreconf template, this will now properly installs the
translation files
- Use $(CURDIR) to replace `pwd` syntax

Link: https://bugs.launchpad.net/ubuntu/+source/xfsprogs/+bug/2076309
Suggested-by: Zixing Liu <zixing.liu@canonical.com>
Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: Add Build-Depends on pkg with systemd.pc

The detection for the systemd unit installation searches for systemd.pc.
This file was moved after the bookworm release, so we depend on two
alternative packages.

Fixes: 45cc055588 ("debian: enable xfs_scrub_all systemd timer services by default")
Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: Prevent recreating the orig tarball

Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: Update public release key

New key material, used for some releases in the Debian archive:

pub   ed25519 2022-05-27 [C]
      4020459E58C1A52511F5399113F703E6C11CF6F0
uid           Carlos Eduardo Maiolino <carlos@maiolino.me>
uid           Carlos Eduardo Maiolino <cem@kernel.org>
uid           Carlos Eduardo Maiolino <cmaiolino@redhat.com>
sub   ed25519 2022-05-27 [A]
sub   ed25519 2022-05-27 [S]
sub   nistp384 2024-02-15 [A]
sub   nistp384 2024-02-15 [S]
sub   nistp384 2024-02-15 [E]
sub   cv25519 2022-05-27 [E]

Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: Update debhelper-compat level

debhelper-compat-upgrade-checklist.7 discourages using level 11.
Update to compat level 12, which is supported back to buster.

Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

fsck.xfs: fix fsck.xfs run by different shells when fsck.mode=force is set

When fsck.mode=force is specified in the kernel command line, fsck.xfs
is executed during the boot process. However, when the default shell is
not bash, $PS1 should be a different value, consider the following script:
cat ps1.sh
echo "$PS1"

run ps1.sh with different shells:
ash ./ps1.sh
$
bash ./ps1.sh

dash ./ps1.sh
$
ksh ./ps1.sh

zsh ./ps1.sh

On systems like Ubuntu, where dash is the default shell during the boot
process to improve startup speed. This results in FORCE being incorrectly
set to false and then xfs_repair is not invoked:
if [ -n "$PS1" -o -t 0 ]; then
FORCE=false
fi

Other distros may encounter this issue too if the default shell is set
to anoother shell.

Check "-t 0" is enough to determine if we are in interactive mode, and
xfs_repair is invoked as expected regardless of the shell used.

Fixes: 04a2d5dc ("fsck.xfs: allow forced repairs using xfs_repair")
Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

libxfs: provide a memfd_create() wrapper if not present in libc

Commit 1cb2e387 "libxfs: add xfile support" introduced
a use of the memfd_create() system call, included in version
xfsprogs v6.9.0.

This system call was introduced in kernel commit [1], first included
in kernel v3.17 (released on 2014-10-05).

The memfd_create() glibc wrapper function was added much later in
commit [2], first included in glibc version 2.27 (released on
2018-02-01).

This direct use memfd_create() introduced a requirement on
Kernel >= 3.17 and glibc >= 2.27.

There is old toolchains like [3] for example (which ships gcc 7.3.1,
glibc 2.25 and includes kernel v4.10 headers), that can still be used
to build newer kernels. Even if such toolchains can be seen as
outdated, they are still claimed as supported by recent kernel.
For example, Kernel v6.10.5 has a requirement on gcc version 5.1 and
greater. See [4].

When compiling xfsprogs v6.9.0 with a toolchain not providing the
memfd_create() syscall wrapper, the compilation fail with message:

    xfile.c: In function 'xfile_create_fd':
    xfile.c:56:7: warning: implicit declaration of function 'memfd_create'; did you mean 'timer_create'? [-Wimplicit-function-declaration]
      fd = memfd_create(description, MFD_CLOEXEC | MFD_NOEXEC_SEAL);
           ^~~~~~~~~~~~

    ../libxfs/.libs/libxfs.a(xfile.o): In function 'xfile_create_fd':
    /build/xfsprogs-6.9.0/libxfs/xfile.c:56: undefined reference to 'memfd_create'

In order to let xfsprogs compile in a wider range of configurations,
this commit adds a memfd_create() function check in autoconf configure
script, and adds a system call wrapper which will be used if the
function is not available. With this commit, the environment
requirement is relaxed to only kernel >= v3.17.

Note: this issue was found in xfsprogs integration in Buildroot [5]
using the command "utils/test-pkg -a -p xfsprogs", which tests many
toolchain/arch combinations.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9183df25fe7b194563db3fec6dc3202a5855839c
[2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=59d2cbb1fe4b8601d5cbd359c3806973eab6c62d
[3] https://releases.linaro.org/components/toolchain/binaries/7.3-2018.05/aarch64-linux-gnu/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu.tar.xz
[4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/process/changes.rst?h=v6.10.5#n32
[5] https://buildroot.org/

Signed-off-by: Julien Olivain <ju.o@free.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: Fix fscrypt macros ordering

We've been reported a failure to build xfsprogs within buildroot's
environment when they tried to upgrade xfsprogs from 6.4 to 6.9:

encrypt.c:53:36: error: 'FSCRYPT_KEY_IDENTIFIER_SIZE' undeclared
here (not in a function)
        53 |         __u8
master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
           |
^~~~~~~~~~~~~~~~~~~~~~~~~~~
     encrypt.c:61:42: error: field 'v1' has incomplete type
        61 |                 struct fscrypt_policy_v1 v1;
           |                                          ^~

They were using a kernel version without FS_IOC_GET_ENCRYPTION_POLICY_EX
set and OVERRIDE_SYSTEM_FSCRYPT_POLICY_V2 was unset.
This combination caused xfsprogs to attempt to define fscrypt_policy_v2
locally, which uses:
__u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];

The problem is FSCRYPT_KEY_IDENTIFIER_SIZE is only after this block of
code, so we need to define it earlier.

This also attempts to use fscrypt_policy_v1, which is defined only
later.

To fix this, just reorder both ifdef blocks, but we need to move the
definition of FS_IOC_GET_ENCRYPTION_POLICY_EX to the later, otherwise,
the later definitions won't be enabled causing havoc.

Fixes: e97caf714697a ("xfs_io/encrypt: support specifying crypto data unit size")
Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Tested-by: Waldemar Brodkorb <wbx@openadk.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>

man: Update unit for fsx_extsize and fsx_cowextsize

The values in fsx_extsize and fsx_cowextsize are in units of bytes, and not
filesystem blocks, so update.

In addition, the default cowextsize is 32 filesystem blocks, not 128, so
fix that as well.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_db: release ip resource before returning from get_next_unlinked()

Fix potential memory leak in function get_next_unlinked(). Call
libxfs_irele(ip) before exiting.

Details:
Error: RESOURCE_LEAK (CWE-772):
xfsprogs-6.5.0/db/iunlink.c:51:2: alloc_arg: "libxfs_iget" allocates memory that is stored into "ip".
xfsprogs-6.5.0/db/iunlink.c:68:2: noescape: Resource "&ip->i_imap" is not freed or pointed-to in "libxfs_imap_to_bp".
xfsprogs-6.5.0/db/iunlink.c:76:2: leaked_storage: Variable "ip" going out of scope leaks the storage it points to.
#   74|    libxfs_buf_relse(ino_bp);
#   75|
#   76|-> return ret;
#   77|   bad:
#   78|    dbprintf(_("AG %u agino %u: %s\n"), agno, agino, strerror(error));

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

libxfs: dirty buffers should be marked uptodate too

I started fuzz-testing the realtime rmap feature with a very large
number of realtime allocation groups.  There were so many rt groups that
repair had to rebuild /realtime in the metadata directory tree, and that
directory was big enough to spur the creation of a block format
directory.

Unfortunately, repair then walks both directory trees to look for
unconnceted files.  This part of phase 6 emits CRC errors on the newly
created buffers for the /realtime directory, declares the directory to
be garbage, and moves all the rt rmap inodes to /lost+found, resulting
in a corrupt fs.

Poking around in gdb, I noticed that the buffer contents were indeed
zero, and that UPTODATE was not set.  This was very strange, until I
added a watch on bp->b_flags to watch for accesses.  It turns out that
xfs_repair's prefetch code will _get a buffer and zero the contents if
UPTODATE is not set.

The directory tree code in libxfs will also _get a buffer, initialize
it, and log it to the coordinating transaction, which in this case is
the transactions used to reconnect the rmap btree inodes to /realtime.
At no point does any of that code ever set UPTODATE on the buffer, which
is why prefetch zaps the contents.

Hence change both buffer dirtying functions to set UPTODATE, since a
dirty buffer is by definition at least as recent as whatever's on disk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: Release v6.10.1

Update all the necessary files for a v6.10.1 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: fix C++ compilation errors in xfs_fs.h

Source kernel commit: 64dfa18d6e322034c8a30b080f4c380a0b20bb7f

Several people reported C++ compilation errors due to things that C
compilers allow but C++ compilers do not. Fix both of these problems,
and hope there aren't more of these brown paper bags in 2 months when we
finally get these fixes through the process into a released xfsprogs.

NOTE: I am submitting this bugfix over the objections of a former
maintainer, who insists that we should remove this function from the
published userspace ABI instead of fixing the C++ compilation errors.
No deprecation period, no discussion, just a hard drop of an already
provided and correct C function, which would be in contravention of
Linus' rules. IOWs, removing ABI that have already shipped in a
released kernel requires a careful deprecation period, so I will let
that maintainer run that process.

Reported-by: kernel@mattwhitlock.name
Reported-by: sam@gentoo.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203
Fixes: 233f4e12bbb2c ("xfs: add parent pointer ioctls")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: Release v6.10.0

Update all the necessary files for a v6.10.0 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'debian-autofsck-6.10_2024-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

debian: enable xfs_scrub_all by default [v30.11 2/2]

Update our packaging to enable the background xfs_scrub timer by default.
This won't do much unless the sysadmin sets the autofsck fs property or
formats a filesystem with backref metadata.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Merge tag 'autofsck-6.10_2024-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_scrub: admin control of automatic fsck [v30.11 1/2]

Now that we have the ability to set per-filesystem properties, teach the
background xfs_scrub service to pick up advice from the filesystem that it
wants to examine, and pick a mode from that. We're only going to enable this
by default for newer filesystems.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

debian: enable xfs_scrub_all systemd timer services by default

Now that we're finished building online fsck, enable the periodic
background scrub service by default.  This involves the postinst script
starting the resource management slice and the timer.

No other sub-services need to be enabled or unmasked explicitly.  They
also shouldn't be started or restarted because that might interrupt
background operation unnecessarily.

Although the xfs_scrub_all timer is activated by default, the individual
xfs_scrub@ services that it spawns will only do real work on filesystems
that are new enough to have back reference metadata available.  This
avoids surprises for people who are upgrading Debian; only new installs
with new mkfs will get any automatic fsck behavior.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: set autofsck filesystem property

Add a new mkfs option -m autofsck so that sysadmins can control the
background scrubbing behavior of filesystems from the start.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: use the autofsck fsproperty to select mode

Now that we can set properties on xfs filesystems, make the xfs_scrub
background service query the autofsck property to figure out which
operating mode it should use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: allow sysadmin to control background scrubs

Add an extended option -o autofsck to xfs_scrub so that it selects the
operation mode from the "autofsck" filesystem property.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: define a autofsck filesystem property

Now that we have the ability to set properties on filesystems, create an
"autofsck" property so that sysadmins can control background xfs_scrub
behaviors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_property: add a new tool to administer fs properties

Create a tool to list, get, set, and remove filesystem properties.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>