www.infradead.org Git - users/hch/xfsprogs.git/log

xfs_quota: fix typo in manpage

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: check ino alignment value to avoid mod by zero

xfs_repair checks inode records for valid alignment according to the
alignment specified in the superblock. It currently performs the
alignment check whenever fs_aligned_inodes is set, which is determined
based on whether the fs supports the field.

Support for the field does not guarantee its value is non-zero, however.
For example, a large block size fs on a large page size arch (e.g.,
ppc64):

mkfs.xfs -f -m crc=1,finobt=1 -b size=64k <dev>

... can lead to incorrect badly aligned inode record messages from
xfs_repair and other problems.

Update the inobt and finobt checks to verify that alignment is a
non-zero value before attempting to use it to divide (mod) by zero.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix v5 sb ino alignment calculation for large blocksizes

xfs_repair validates the superblock inode alignment field against
several possible valid values. On v5 superblocks, the inode alignment
can be scaled up based on the inode size in relation to the minimum
inode size.

If the block size is larger than the default cluster size (consider
large page size arches such as ppc64), the initial alignment value
calculates to zero. If the inode size is large enough such that
sb_inoalignmt is not zero, sb_validate_ino_align() scales the align
value by the factor of inode size increase. If align is zero, however,
we multiply by zero, the subsequent check incorrectly fails and the
overall superblock verification fails as well. To reproduce, format an
fs as follows on ppc64 and run xfs_repair:

mkfs.xfs -f -m crc=1 -b size=64k -i size=2k <dev>

Fix the scaled alignment calculation by scaling the default cluster size
appropriately to avoid a multiplication by zero.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: report VERSION in libxfs_fs_repair_cmn_err()

Because this is usually the first question asked...

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add finsert command for insert range

Add finsert command for fallocate FALLOC_FL_INSERT_RANGE flag.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: remove unused strided secondary sb scan logic

verify_set_primary_sb() scans and verifies secondary superblocks based
on the primary sb. It currently defines a maximum number of 8
superblocks to scan per iteration. It also implements a strided
algorithm that appears intended to ultimately scan every secondary,
albeit in a strided order.

Given that the algorithm is written to hit every sb and the stride value
is initialized as follows:

num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
skip = howmany(num_sbs, rsb->sb_agcount);

... which is guaranteed to be 1 since the howmany() parameters are
backwards, the strided algorithm doesn't appear to accomplish anything
that can't be done with a simple for loop. In other words, despite the
max value and strided algorithm, repair always scans all of the
secondary superblocks in incremental order.

Therefore, remove the strided algorithm bits and replace with a simple
for loop. As a result of this cleanup, also remove the 'checked' buffer
used to track repeated ag visits and the now unused NUM_SBS definition.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix unnecessary secondary scan if only last sb is corrupt

verify_set_primary_sb() scans the secondary superbocks based on the
geometry specified in the primary and determines the most likely correct
geometry by tracking how many superblocks are consistent across the set.
The most frequent geometry is copied into the primary superblock. The
return value is checked by the caller (phase1()) to determine whether a
brute force secondary scan is necessary.

This generally occurs when not enough secondary sb's are consistent to
declare the geometry correct. If enough secondaries are consistent,
verify_set_primary_sb() returns the status of the last secondary sb that
was scanned. Corruptions to secondary supers other than the last are
thus resolved fine. If the last secondary is corrupt, however, an error
is returned to phase1(). This causes a brute force scan even if enough
supers were found to repair the last secondary.

Move the initialization of retval to after the sb scan to return an
error only if not enough secondary supers were found to declare a
correct geometry.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix max block offset test

Eryu pointed out that in fstest xfs/071, we find corruption
reported at the end. This test attempts to do IO at the
maximum possible offsets, and repair yields:

inode 1027 - extent offset too large - start 70, count 1, offset 2251799813685247
correcting nextents for inode 1027
bad data fork in inode 1027
would have cleared inode 1027

Repair is complaining that an extent *starts* at the maximum
block, but AFAICT, starting there is just fine, as long as
we also end there. i.e. a one-block extent at the limit
is just fine.

So change the xfs_repair test to allow this situation.

Also, the warning text is a bit unclear, mixing in the physical
block w/ the logical block... rearrange that a little to make
it obvious.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: do not check symlink component lengths

As reported by Andy Grimm,

# ln -s $( python -c 'print "a" * 260' ) /mnt/foo

will succeed on xfs, but then xfs_repair will complain:

component of symlink in inode 131 too long
problem with symbolic link in inode 131
would have cleared inode 131

The kernel checks the total length of the symlink on both read
and write, but does not look at component paths.

Looking around the kernel, no other filesystem checks component
lengths, nor does the vfs. And as Andy points out, the target
could even be on a different filesystem, with different limitations.

And having a "too-long" component doesn't even seem like something
likely to stem from disk corruption anyway, so I'm not sure why repair
should care.

Therefore I propose removing the component length checks from xfs_repair.

Andy Grimm <agrimm@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.2 release

Update all the version and changelog files for release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix crash on zero record finobt reconstruction

The inode btrees are reconstructed in phase 5. init_ino_cursor() helps
determine the block requirements of the tree based on the number of
records. If the finobt is empty, we can crash in the btree blocks
calculation code due to a divide-by-zero error in the following line:

lptr->modulo = num_recs % lptr->num_blocks;

This occurs if num_recs and in-turn lptr->num_blocks evaluate to zero.

We already have an execution path for the zero record btree scenario.
However, it is only invoked when no records are found in the in-core
tree. The finobt zero-record scenario can occur with a populated in-core
tree provided that none of the existing records contain free inodes.

Move the zero-record handling code after the loop and use the record
count to trigger it. This is safe because the loop iterator checks for
ino_rec != NULL. This allows reuse of the same code regardless of
whether the in-core tree is empty or non-empty but contains no records
that meet the requirements for the particular on-disk tree under
reconstruction (e.g., finobt).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: simplify first_agbno calculation

After ffe9a9a xfsprogs: xfs_copy: fix data corruption of target,
xfs_copy started hitting an ASSERT for a 4k sector / 4k blocksize
filesystem:

# dd if=/dev/zero of=test.img bs=1M count=1024
# mkfs.xfs -s size=4096 test.img
# xfs_copy test.img xfs.img
xfs_copy: xfs_copy.c:720: main: Assertion `((((((xfs_daddr_t)(3 << (mp)->m_sectbb_log)) + 1) * (1<<9)) + first_residue) % source_blocksize) == 0' failed.
Aborted

I started digging through all the calculations below, and realized
that in the end, all it wants is the first filesystem block after
the AG header. XFS_AGFL_BLOCK(mp) + 1 suffices for this purpose;
rip out the rest which seems overly complex and apparently bug-prone.

I tested this by creating a 4g filesystem with combinations of
sector & block size between 512 and 4k, copying in /lib/modules,
running an xfs_copy of that, and running repair against the copy;
it all looks good. It took a long time, but I will create a
simpler/shorter xfstest based on this.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

packaging: update deb changelog for upcoming release

Add list of resolved issues from merged patches into the
debian/changelog for processing with the next release.

Signed-off-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

packaging: clarify xfs header licensing within deb builds

Tackles bug #751511 - ensuring the licensing information in
the debian/copyright file matches reality. Use explanation
that Christoph Hellwig came up with, pretty much verbatim.

Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

packaging: rework dh_autoreconf invocation for deb builds

Reviewed, tested and merged the final iteration of proposed
solutions to #757455 - resolving configure-script-generation
for clean ppc64le builds, originally. Many thanks to Andreas
Barth and Matthias Klose for coming up with this solution.

Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: AGFL rebuild fails if btree split required

In phase 5 we rebuild the freelist btrees, the AGF and the AGFL from
all the free space information we have gathered and resolved during
pahses 3 and 4. If the freespace information is laid out just right,
we end up having to allocate free space for the AGFL blocks.

If the size of the free space we allocate from is larger than the
space we need, then we have to insert the remainder back into the
freespace btree. For the by-size tree, this means we are likely to
be removing a record from one leaf, and then inserting the remainder
- a smaller size - into another leaf.

The issue is that the leaf blocks to the left of the original leaf
block we removed the extent record from are full and hence require a
split to insert the new record. That, of course, requires a free
block in the AGFL to allocate from, and now we have a chicken and
egg situation: there are no free blocks in the AGFL because we are
setting it up.

As a result, setting up the free list silently fails, leaving the
freespace btrees in an inconsistent state and the AGFL in question
empty. When the filesystem is next mounted, the first allocation
from that AGF results in attempting to fix the AGFL, and it then
does exactly the same thing as the repair code, fails to allocate a
block during the split and fails. This results in an immediate
shutdown because the transaction doing the allocation is dirty by
this stage.

The fix for the problem is to make repair handle rebulding the btree
differently. If we leave ispace for a couple of records in each
btree leaf and node, there is never a need for a split to occur when
initially setting up the AGFL. This results in repair doing the
right thing, and hence the runtime problems after mount don't occur.
Further, add error checking the the AGFL setup code and abort repair
if we have a failure to correctly set up the AGFL so we catch this
problem at repair time, not mount time...

Reported-by: Barkley Vowk <bvowk@box.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix XR_BLD_FREE_TRACE compilation errors

Obviously hasn't been used for quite some time, so fix the
build problems and make it useful again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: don't warn about log sunit size if it was auto-discovered

Today, users doing a bare mkfs on storage with a large default
stripe size may be surprised to get this warning:

log stripe unit (%d bytes) is too large (maximum is 256KiB
log stripe unit adjusted to 32KiB

through no fault of their own. The fallback is appropriate
and harmless, and there's no need to warn about this in the
defaults case.

However, we keep the warning if a large log stripe unit was
specified by the user on the commandline.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: ignore stripe geom if sunit or swidth == physical sector size

Today, this geometry:

# modprobe scsi_debug opt_blks=2048 dev_size_mb=2048
# blockdev --getpbsz --getss --getiomin --getioopt /dev/sdd
512
512
512
1048576

will result in a warning at mkfs time, like this:

# mkfs.xfs -f -d su=64k,sw=12 -l su=64k /dev/sdd
mkfs.xfs: Specified data stripe width 1536 is not the same as the volume stripe width 2048

because our geometry discovery thinks it looks like a
valid striping setup which the commandline is overriding.
However, a stripe unit of 512 really isn't indicative of
a proper stripe geometry.

Prior to this patch, we reset only sunit *or* swidth,
if either was equal to physical block size, but not
necessarily both.

Change the heuristic so that if either the discovered
sunit or the discovered swidth is physical block size,
we reset *both* to zero and ignore the geom completely.

While we're at it, don't pass &dummy in for multiple
arguments to blkid_get_topology(); that'll mean that
inside the function, the last assignment wins, and could
lead to unexpected results.

Reported-by: Stan Hoeppner <stan@hardwarefreak.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add sync and syncfs commands

There's no easy way to invoke syncfs from the commandline,
as far as I know, so add it to xfs_io to be handy.

Add sync while we're at it, just for completeness.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: two more completely harmless sparse nits

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix harmless sparse endian nit

h_crc is __le32 but cpu_to_be32() is... __be32. So sparse
complains, even though it's harmless.

Although sparse is smart about bare 0s, and we could
drop the swap, other places explicitly swap to keep
things clear (I guess?) so "swap" the 0 with the proper
routine.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix endian mishap in xfs_dialloc_ag()

Fixes a regression introduced by:

88fc730 xfs: use and update the finobt on inode allocation

which passed the non-swapped version of agi->agi_newino to
xfs_inobt_lookup()

Caught by make C=2, ftw!

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsctl.3: fix XFS_IOC_FSSETXATTR fields

The xfsctl manual page fails to mention that fsx_projid is a
setable field.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: use structure initializers for cache_operations

This makes it a lot easier for cscope etc.

Surely all modern compilers can cope?

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: add supported file attributes to xfs.5 manpage

The chattr(1) manpage suffers from the same problems mount(1) had:
many options listed, not kept up to date for various filesystems.

I've submitted a manpage update for chattr(1) which says to refer to
filesystem-specific manpages for supported attributes; this patch
updates xfs(5) to list the attributes supported by xfs.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add mremap command

This adds a simple mremap command to xfs_io.

It does not take a start address; it uses the existing start
address, so the sized passed will be the new total size of the
mapping.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: preserve error state in process_shortform_attr

process_shortform_attr uses the "junkit" error to track whether an
error was found, but by assigning it directly to the result of
valuecheck, previous errors are ignored, leading to unrepairable
errors of the form i.e.

"entry has INCOMPLETE flag on in shortform attribute"
or
"entry contains illegal character in shortform attribute name"

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: clear bad flags in process_dinode_int

process_dinode_int() reports bad flags if dino->di_flags &
~XFS_DIFLAG_ANY - i.e. if any flags are set outside the known set.
But then instead of clearing them, it does flags &= ~XFS_DIFLAG_ANY
which keeps *only* the bad flags. This leads to persistent,
unrepairable errors of the form:

"Bad flags set in inode XXX"

Fix this.

While we are at it, fix a couple lines which look like they used to
be continuation lines, but are no longer.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: free flist on error in write_struct()

One error path in write_struct() wasn't freeing the flist_t *fl
which was allocated, so it leaks.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: fix leaks in parent_list()

parent_list() has instances where a handle is leaked, both by going
out of scope, and on error paths.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libhandle: Fix handle leak in path_to_fshandle error paths

path_to_fshandle calls obj_to_handle, which potentially allocates a
handle, but the handle isn't freed on a subsequent error path.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: free handlep in fsrfs

We leaked the fshandlep in both error returns and normal function
exit.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: fix leaks & catch error in fsrfile()

The allocated fshandlep leaks on most error paths; restructure with
an out: target that does all necessary freeing, and initialize
filehandles to -1 so that we know whether they need to be closed on
the error path.

While we're at it, if gettmpname() fails, we still return 0 for an
error, because error is initialized to 0 and only set otherwise by
fsrfile_common. So if gettmpname() fails, we return success from
the function even though we did no work. Fix that as well by
initializing error to -1.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: free fshandlep in parent_check()

The allocated fshandle wasn't freed in either normal
exit or error paths.

Do this, and consolidate cleanup into an out: target.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: fix typo in output

Fix typo in xfs.mkfs output.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: Set ftype for entries in lost+found

So far all entries in lost+found had file type XFS_DIR3_FT_UNKNOWN which
is somewhat annoying as the next xfs_repair pass will find these and
report as an error. Set proper file type when creating these entries.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libhandle: fix installation for symlinked /usr

Canonicalize the pathnames for PKG_LIB_DIR and PKG_ROOT_LIB_DIR
before checking if they are the same. This is required for Fedora
which doesn't have a separate /usr/lib directory anymore.

[Christoph Hellwig: reformat and change to canonical names]

Reported-by: Jan Tulak <jan@tulak.me>
Signed-off-by: Jan Tulak <jan@tulak.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

logprint: Fix printing of AGF and AGI buffers

Currently xfs_logprint doesn't show detailed data about AGF and AGI
buffers and instead always shows "Out of space". This is because
xfs_agf_t has additional fields and padding which we never read from
disk and thus buffer length is always smaller than the size of
xfs_agf_t or xfs_agi_t respectively.

Fix the problem by only making sure we have enough data in the buffer
to contain all the information we want to print.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

quota: fix NULL pointer dereference in report_f

Run xfs_quota report against an invalid XFS path without desired quota
limitation is enabled will hit SEGSEGV as fs_path is uninitialized, e.g.

# xfs_quota -xc 'report -up' /invalid_path
xfs_quota: cannot setup path for mount /invalid_path: No such file or directory
Segmentation fault (core dumped)

(gdb) r -xc 'report -up' /invalid_path
xfs_quota: cannot setup path for mount /invalid_path: No such file or directory

Program received signal SIGSEGV, Segmentation fault.
0x0000000000408b4d in report_f (argc=2, argv=0x105ea70) at report.c:627
627 else if (fs_path->fs_flags & FS_MOUNT_POINT)

This patch fixes report_f() to only do report if the fs_path is initialized.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxcmd: make all comparisons using realpath'd paths

Both mountpoints and devices can be symlinks, so given a path
to look for, and mountpoints/devices from the system, use
realpath() on *everything* before making the comparison to see
if our path is a match.

So, with symlinks for mount points as well as for devices:

# ls -l /dev/mapper/testvg-lvol0
lrwxrwxrwx. 1 root root 7 Jul 11 19:24 /dev/mapper/testvg-lvol0 -> ../dm-3
# ls -l /mnt/scratch2
lrwxrwxrwx. 1 root root 12 Jul 11 19:57 /mnt/scratch2 -> /mnt/scratch

this should all work, and does now:

# xfs_quota -xc "report -h" /mnt/scratch2
User quota on /mnt/scratch (/dev/mapper/testvg-lvol0)
                        Blocks
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            0      0      0  00 [------]

# xfs_quota -xc "report -h" /mnt/scratch
User quota on /mnt/scratch (/dev/mapper/testvg-lvol0)
                        Blocks
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            0      0      0  00 [------]

# xfs_quota -xc "report -h" /dev/dm-3
User quota on /mnt/scratch (/dev/mapper/testvg-lvol0)
                        Blocks
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            0      0      0  00 [------]

# xfs_quota -xc "report -h" /dev/mapper/testvg-lvol0
User quota on /mnt/scratch (/dev/mapper/testvg-lvol0)
                        Blocks
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            0      0      0  00 [------]

The commit:

050a7f1 xfsprogs: handle symlinks etc in fs_table_initialise_mounts()

tried to fix this earlier, but only worked one way;
it compared the argument path in both given and realpath
form to the paths in getmntent, but did not compare to
the realpaths of the getmntent devices.

If we reduce everything, everywhere, to a realpath(), we've
got our best shot at finding the match.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: copy, don't clear, stripe geometry in backup SB

Today, if we have a filesystem with stripe geometry and
a damaged primary superblock, we will zero out stripe geometry
if we have copied the backup.

I'm guessing this might be because changing geometry with mount
options only updates the primary, so backups aren't guaranteed
to be current or correct.

Unfortunately, that leaves us with sb 0 w/ no geom, and backups
*with* geom, so the next repair finds the mismatch, and complains.
(In other words, the 2nd repair does not come up clean.)_
And ... the second repair copies the backup stripe geometry back
into the primary!

Rather than clearing stripe geometry in this case, just leave it
at what was found in the backup super, and inform the user that this
was done. This leaves a consistent filesystem, and gives the user
a heads-up to double-check the result.

This can all be demonstrated and tested by running xfs/030 with
geometry set in MKFS_OPTIONS. (To really make the test pass,
we need to filter the warning out of repair output.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: update debian packaging for next release

Make a note of each of the Debian bugs resolved in this release,
so that they'll be automatically closed during next upload.

Signed-off-by: Nathan Scott <nathans@debian.org>

xfsprogs: add a watch file into the debian packaging

Apparently it can improve some Debian tools that check it (e.g. UDD).
Resolves Debian bug #748483.

Signed-off-by: Nathan Scott <nathans@debian.org>

xfsprogs: v3.2.1 release

Update all the versiona nd changelog files for release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: update polish translation

Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: handle uncorrected corruptions in phase 2

Some of the AG header corruptions detected by the IO verifiers
cannot be corrected in phase 2 when we do the initial scan of the
AGs. Correcting some errors cannot be done until a full rebuild of
the trees is done in phase 5.

Hence we can end up with a "clean" AGF/AGI buffer but have a
EFSCORRUPTED error on the buffer. This results in an assert failing:

ASSERT(agf_dirty || agfbuf->b_error != EFSCORRUPTED);

and repair not beign able to fix the problems it has tripped over.
Hence the assert that we corrected all corruptions in the buffers
is not valid and should be removed.

Reported-by: Hans Kraus <hans.w.kraus@gmx.at>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: clear the buffer error while the buffer is locked

When releasing a buffer, the error shoul dbe cleared while the lock
is still held on the buffer to avoid racing with a new user of the
buffer.

This was pointed out in review of commit 6af7c1e ("libxfs: reused
invalidated buffers leak state and data") but the version committed
didn't have the fix. Thanks to Christoph Hellwig for checking and
pointing out the oversight.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: get rid of BADFSINO

When we find a bad dirent, we "clear" the inode the inode number by
writing BADFSINO to the inode number in the entry:

#define BADFSINO ((xfs_ino_t)0xfeffffffffffffffULL)

We then capture this bad inode number later in the same function
either in the same pass or in a later phase and junk the entry.
When we junk the entry, we write a "/" over the first character of
the dirent name, which is then detected up later by the directory
rebuild and ignored.

The issue with this is that the directory buffer can be written to
disk between the dirent being marked with BADFSINO and the directory
rebuild processing in phase 6, resulting in the directory block
verifier firing this error:

Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000

And so will not write the *corrupt block* to disk. The result is
that we don't repair a corruption in the directory block correctly
and subsequent repair runs continue to find problems with the
directory.

We really don't need both BADFSINO *and* overwriting the dirent name
with "/" to mark an entry as junked. They both mean exactly the same
thing, so get rid of BADFSINO and only use the name junking to mark
dirents as bad. This prevents the directory data block verifier from
triggering on bad inode numbers, and so the later reread of the
block will find the junked entries correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix quota inode handling in secondary superblocks

Changes to support separate project quota inodes changed the way
quota inodes got written to the superblock. The current code is
tailored for the needs to the kernel, where the inodes should only
be written if certain falgs are set saying a quota type is enabled.

Unfortunately, when recovering a corrupt secondary superblock, we
need to unconditionally write the quota inode fields after we
unconditionally zero the quota flags field. The result of this bug
is that the bad quota inode fields cannot be cleared and hence
always are reported by bad by repair in subsequent runs.

Fix this by directly clearing the quota inodes in the superblock
buffers so that we do need to set special flags to get
xfs_sb_to_disk() to do the right thing as setting flags leave bad
flag values in the superblock instead of bad inode numbers....

Also, when clearing the inode numbers, write them as NULLFSINO
rather than 0 as this is what the kernel will write them as if quota
is turned off.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: reused invalidated buffers leak state and data

When rebuilding a bad directory, repair first truncates away all the
blocks in the directory, good or bad. This removes blocks from the
bmap btree, and when those blocks are freed the bmap btree code
invalidates them. This marks the buffers LIBXFS_B_STALE so that we
don't try to write stale data from that buffer at a later time.

However, when rebuilding the directory, blocks may get reallocated
and we reuse the underlying buffers. This has two problems.

The first is that if the buffer was previously detected as having a
verifier error (i.e. an error that is leading to the block being
freed and the buffer being invalidated) then the error might still
be held in b_error. Hence the libxfs code needs to ensure that
b_error does not leak from one buffer usage context to another
after invalidation.

The second problem is that when new data is written into a buffer,
it no longer has stale contents. Hence when we write the buffer, we
need to clear the LIBXFS_B_STALE flag to ensure that the new data
gets written.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: handle directory block corruption in phase 6

This should only occur in no-modify mode, but when we fail to find
the last extent in a directory btree due to corruption we need to
trash the directory if it's the first data block we find the error
on. That is because there is nothing to recover from the directory,
and if we try to scan it xfs_reapir segv's because nothing has been
read from disk.

Also catch a memory allocation failure in this code, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: write command broken on 64 bit values

convert_args() has problesm with 64 bit fields because it tries to
shift them by 64 bits. The result of doing so is undefined by the C
standard, and so results in the unexpected behaviour of the result
being being the original value unchanged rather than 0. Hence you
can't write 64 bit fields because the code thinks that all values
other than 0 are out of range.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: support more than 25 ACLs

v5 superblock supports many more than 25 ACLs on an inode, but
xfs_repair still thinks that the maximum is 25. This slipped through
becase the reapir code does not share any of the kernel side ACL
code in libxfs, and instead has all it's own internal ACL
definitions.

Fix the repair code to support more than 25 ACLs and update
the ACL definitions to match the kernel definitions. In doing so,
this tickles a off-by-one bug on remote attribute maximum sizes
that is already fixed in the kernel code. So in addition to fixing
the repair code, this patch pulls in parts of the following kernel
commits:

bba719b5 xfs: fix off-by-one error in xfs_attr3_rmt_verify
0a8aa193 xfs: increase number of ACL entries for V5 superblocks

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: Repair directory block CRC mismatches

It can happen that just CRC doesn't match for directory blocks. In that
case xfs_repair will just report the error but won't fix anything (as
further checking of the block doesn't reveal any problems). Make sure we
recompute and write out new CRC when we failed verification during
reading.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: add "-m" options to the man page

Because they are missing.

Reported-by: Matthias Schniedermeyer <ms@citd.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix crc field handling in xfs_sb_to/from_disk

If we xfs_mdrestore an image from a non-crc filesystem, lo
and behold the restored image has gained a CRC:

# db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img
# xfs_db -c "sb 0" -c "p crc" /dev/sdc1
crc = 0 (correct)
# xfs_db -c "sb 0" -c "p crc" test.img
crc = 0xb6f8d6a0 (correct)

This is because xfs_sb_from_disk doesn't fill in sb_crc,
but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory
CRC to disk - so we get uninitialized memory on disk.

Fix this by always initializing sb_crc to 0 when we read
the superblock, and masking out the CRC bit from ALL_BITS
when we write it.

This same fix has already been sent for kernelspace.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: don't send null bp to xfs_trans_brelse()

In this case, if bp is null, error is set, and we send
bp to xfs_trans_brelse, which will try to dereference it.

Test whether we actualy have a buffer before we try to
free it.

Same fix as was sent for kernelspace.

Coverity spotted this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: indicate default mount options in xfs.5 manpage

Not every pair of mount options indicated which was the
default, so add those.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: add mount options to xfs.5 manpage

This is a straight cut and paste from the util-linux
mount manpage to xfs.5.

It's pretty much impossible for util-linux to keep up
with every filesystem out there, and Karel has more than
once expressed a wish that mount options move into fs-specific
manpages.

So, here we go.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: test for more potential failures in packfile()

Test for lseek, ftruncate, and fsync failures in packfile()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: create a cleanup/return target in packfile()

Error handling is a mishmash of closes, frees, etc at every
error point. Create an "out" target that does this all
in one place.

Minor comment/doc update while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: ensure the line we read from leftofffile is null terminated

Ensure that the string we read from leftofffile is NULL
terminated; the buffer gets passed to strchr(), so
it's important that we ensure it ends with NULL.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: fix data corruption of target

The unit of XFS_AGFL_DADDR(mp) is "basic block" whose size is "BBSIZE"
(512 bytes), so when "source_sectorsize" is not 512, it will cause the
target a corrupted filesystem.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: don't call blkid_get_topology on existing regular files

If we encounter a target that's really a regular file,
even without "-d file..." on the cmdline, call
platform_findsizes() instead of blkid_get_topology to
try to discover the "sector size" via the fsgeom() call.

Otherwise mkfs.xfs will try to do direct IO with a default
512 sector size, and if the underlying file has different
DIO requirements, mkfs will fail.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: try to handle mkfs of a file on 4k sector device

Try the xfs geometry ioctl if the mkfs target resides
in a file; this gives us the equivalent of a device
sector size.

If this fails, and there's a sector size mismatch
between the host FS and the filesystem, then mkfs might
fail - but that's no worse than it's been before.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: update polish translation

Jakub provided the polish translation here:

http://qboosh.pl/pl.po/xfsprogs-3.2.0.pl.po

Signed-off-by: Dave Chinner <dchinner@redhat.com>

db: add finobt support to metadump

Include the free inode btree in metadump images. If the source fs
is finobt-enabled, run an additional scan_btree() of the finobt.
Since the private 'agi' scanfunc_ino() parameter is unused, change
the private parameter to a flag that indicates whether the current
scan is for the inobt or finobt. If the latter, we skip copying the
actual inode chunks as this work is already performed by the inobt
scan.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

growfs: report finobt status in fs geometry (xfs_info)

Check and report on the free inode btree status bit in the fs
geometry.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: reconstruct the finobt in phase 5

Support reconstruction of the finobt in phase 5 of xfs_repair. We
create a new cursor for the finobt and write the in-core records
that contain free inodes to the tree. Finally, pass the cursor
along to build_agi() to include the finobt root and level count in
the agi header.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: helpers for finding in-core inode records w/ free inodes

Add the findfirst_free_inode_rec() and next_free_ino_rec() helpers
to assist scanning the in-core inode records for records with at
least one free inode. These will be used to determine what records
are included in the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: pull the build_agi() call up out of the inode tree build

Pull the build_agi() call out of build_ino_tree() in phase 5. This
is to prepare for finobt support, in which build_agi() will require
context from multiple inode tree reconstructions (both the inode
allocation tree and free inode tree, when it exists).

Create the new 'agi_stat' structure to carry the requisite state
from the build_ino_tree() operation to build_agi().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: pass btree block magic as param to build_ino_tree()

A minor cleanup to build_ino_tree() to provide the appropriate
magic value for btree block initialization from the caller. This
facilitates use of separate magic values for finobt blocks when
building the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: phase 2 finobt scan

If one exists, scan the free inode btree in phase 2 of xfs_repair.
We use the same general infrastructure as for the inobt scan, but
trigger finobt chunk scan logic in in scan_inobt() via the magic
value.

The new scan_single_finobt_chunk() function is similar to the inobt
equivalent with some finobt specific logic. We can expect that
underlying inode chunk blocks are already marked used due to the
previous inobt scan. We can also expect to find every record
tracked by the finobt already accounted for in the in-core tree
with equivalent (and internally consistent) inobt record data.

Spit out a warning on any divergences from the above and add the
inodes referenced by the current finobt record to the appropriate
in-core tree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: account for finobt in ag 0 geometry pre-calculation

Account for the finobt in calc_mkfs().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: finobt support

Add the AGI finobt fields and fibt layouts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: finobt mkfs support

Add the 'finobt' metadata option to mkfs to format an fs with free
inode btree support. If enabled, initialize the associated AGI
header fields and btree root block.

Also, do the initialization of the superblock version and feature
bits (including the new finobt flag) a bit earlier. These fields
must now be initialized prior to the use of XFS_PREALLOC_BLOCKS(),
as the latter returns a value that depends on whether a finobt root
btree block is reserved.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: enable the finobt feature on v5 superblocks

Add the finobt feature bit to the list of known features. As of
this point, the kernel code knows how to mount and manage both
finobt and non-finobt formatted filesystems.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: report finobt status in fs geometry

Define the XFS_FSOP_GEOM_FLAGS_FINOBT fs geometry flag and set the
associated bit if the filesystem supports the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: update the finobt on inode free

An inode free operation can have several effects on the finobt. If
all inodes have been freed and the chunk deallocated, we remove the
finobt record. If the inode chunk was previously full, we must
insert a new record based on the existing inobt record. Otherwise,
we modify the record in place.

Create the xfs_ifree_finobt() function to identify the potential
scenarios and update the finobt appropriately.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper

Refactor xfs_difree() in preparation for the finobt. xfs_difree()
performs the validity checks against the ag and reads the agi
header. The work of physically updating the inode allocation btree
is pushed down into the new xfs_difree_inobt() helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: use and update the finobt on inode allocation

Replace xfs_dialloc_ag() with an implementation that looks for a
record in the finobt. The finobt only tracks records with at least
one free inode. This eliminates the need for the intra-ag scan in
the original algorithm. Once the inode is allocated, update the
finobt appropriately (possibly removing the record) as well as the
inobt.

Move the original xfs_dialloc_ag() algorithm to
xfs_dialloc_ag_slow() and fall back as such if finobt support is
not enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: insert newly allocated inode chunks into the finobt

A newly allocated inode chunk, by definition, has at least one
free inode, so a record is always inserted into the finobt.

Create the xfs_inobt_insert() helper from existing code to insert
a record in an inobt based on the provided BTNUM. Update
xfs_ialloc_ag_alloc() to invoke the helper for the existing
XFS_BTNUM_INO tree and XFS_BTNUM_FINO tree, if enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: update inode allocation/free transaction reservations for finobt

Create the xfs_calc_finobt_res() helper to calculate the finobt log
reservation for inode allocation and free. Update
XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt
insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to
reserve blocks for the potential finobt record insertion on inode
free (i.e., if an inode chunk was previously fully allocated).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: support the XFS_BTNUM_FINOBT free inode btree type

Define the AGI fields for the finobt root/level and add magic
numbers. Update the btree code to add support for the new
XFS_BTNUM_FINOBT inode btree.

The finobt root block is reserved immediately following the inobt
root block in the AG. Update XFS_PREALLOC_BLOCKS() to determine the
starting AG data block based on whether finobt support is enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: reserve v5 superblock read-only compat. feature bit for finobt

Reserve a v5 read-only compatibility feature bit for the finobt and
create the xfs_sb_version_hasfinobt() helper to determine whether
an fs has the feature enabled.

The finobt does not change existing on-disk structures, but must
remain consistent with the ialloc btree. Modifications from older
kernels would violate that constrant. Therefore, we restrict older
kernels to read-only mounts of finobt-enabled filesystems.

Note that this does not yet enable the ability to rw mount a finobt
fs (by setting the feature bit in the XFS_SB_FEAT_RO_COMPAT_ALL
mask).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers

The introduction of the free inode btree (finobt) requires that
xfs_ialloc_btree.c handle multiple trees. Refactor xfs_ialloc_btree.c
so the caller specifies the btree type on cursor initialization to
prepare for addition of the finobt.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: don't let bplist index go negative in prefetch

After:

bbd3275 repair: don't unlock prefetch tree to read discontig buffers

Coverity spotted that it's possible for us to arrive at the loop
below with num == 1, and then we decrement it to 0, and try to
index bplist[num-1].

I think this was possible before the change, i.e. it's probably
not a regression.

Fix this by not trying to shrink the window unless we have
more than one buffer in the array.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix compile error when libxfs header used in C++ code

xfs_ialloc.h:102: error: expected ',' or '...' before 'delete'

Simple parameter rename, no changes to behaviour.

Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove xfs_check

This removes xfs_check and all references to it in
manpages. The DIAGNOSTICS section from xfs_check(8)
has been moved to xfs_db(8).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: use exit() to replace killall()

Sending a SIGKILL signal to child thread will terminate the whole process,
xfs_copy will return an error value 137. This cause confuse for script to
know whether the copy successes.

Calling exit() in main thread can terminate the whole process and return the
right value. Replace killall()+abort() with exit(1) to match the old way
exit in error case. Also remove killall()+pthread_exit(NULL) since return 0
will be followed by an exit(0) to terminate the process.

[ Christoph Hellwig:
Btw, I think the reason for this cruft is that xfs_copy was originally
written using the IRIX sproc interface, and the port to pthreads didn't
remove this gem:

http://marc.info/?l=linux-xfs&m=99535721110020&w=2 ]

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: dont free xfs_inode until complete

Originally, the xfs_inode are released upon the first
call to xfs_trans_cancel, xfs_trans_commit, or
inode_item_done. This code used the log item lock field
to prevent the release of the inode on the next call to
one of the above functions. This is a unusual use of the
log item lock field which is suppose to specify which lock
is to be release on transaction commit or cancel. User
space does not perform locking in transactions..

Unfortunately, this breaks any code that relies on multiple
transaction operations. For example, adding an extended
attribute to an inode that does not have an attribute fork
will fail:

# xfs_db -x XFS_DEVICE
xfs_db> inode INO_NUM
xfs_db> attr_set newattribute

This patch does the following:
1) Removes the iput from the transaction completion and
    requires that the xfs_inode allocators call IRELE()
    when they are done with the pointer. The real time
    inodes are pointed to by the xfs_mount and have a longer
    lifetime.
2) Removes libxfs_trans_iput() because transaction entries
    are removed in transaction commit and cancel.
3) Removes libxfs_trans_ihold() which is an obsolete interface.
4) Removes the now unneeded ili_ilock_flags from the
    xfs_inode_log_item structure.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove unused argument in trans_iput

Remove the unused second argument to xfs_iput() and
xfs_trans_iput().

Introduce the define "IRELE()" and use in place of xfs_iput().

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.0 release

Update all the version and changelog files for v3.2.0.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.0-rc3 release

Update all the various version and changelog files for a v3.2.0-rc3
release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: don't grind CPUs with large extent lists

When repairing a large filesystem with fragemented files, xfs_repair
can grind to a halt burning multiple CPUs in blkmap_set_ext().
blkmap_set_ext() inserts extents into the blockmap for the inode
fork and it keeps them in order, even if the inserts are not done in
order.

The ordered insert is highly inefficient - it starts at the first
extent, and simple walks the array to find the insertion point. i.e.
it is an O(n) operation. When we have a fragemented file with a
large number of extents, the cost of the entire mapping operation
is rather costly.

The thing is, we are doing the insertion from an *ordered btree
scan* which is inserting the extents in ascending offset order.
IOWs, we are always inserting the extent at the end of the array
after searching the entire array. i.e. the mapping operation cost is
O(N^2).

Fix this simply by reversing the order of the insert slot search.
Start at the end of the blockmap array when we do almost all
insertions, which brings the overhead of each insertion down to O(1)
complexity. This, in turn, results in the overall map building
operation being reduced to an O(N) operation, and so performance
degrades linearly with increasing extent counts rather than
exponentially.

While there, I noticed that the growing of the blkmap array was only
done 4 extents at a time. When we are dealing with files that may
have hundreds of thousands of extents, growing th map only 4 extents
at a time requires excessive amounts of reallocation. Reduce the
reallocation rate by increasing the grow increment according to how
large the array currently is.

The result is that the test filesystem (27TB, 30M inodes, at ENOSPC)
takes 5m10s to *fully repair* on my test system, rather that getting
15 (of 60) AGs into phase three and sitting there burning 3-4 CPUs
making no progress for over half an hour.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: don't unlock prefetch tree to read discontig buffers

The way discontiguous buffers are currently handled in prefetch is
by unlocking the prefetch tree and reading them one at a time in
pf_read_discontig(), inside the normal loop of searching for buffers
to read in a more optimized fashion.

But by unlocking the tree, we allow other threads to come in and
find buffers which we've already stashed locally on our bplist[].
If 2 threads think they own the same set of buffers, they may both
try to delete them from the prefetch btree, and the second one to
arrive will not find it, resulting in:

fatal error -- prefetch corruption

To fix this, simply abort the buffer gathering loop when we come
across a discontiguous buffer, process the gathered list as per
normal, and then after running the large optimised read, check to
see if the last buffer on the list is a discontiguous buffer.
If is is discontiguous, then issue the discontiguous buffer read
while the locks are not held. We only ever have one discontiguous
buffer per read loop, so it is safe just to check the last buffer in
the list.

The fix is loosely based on a a patch provided by Eric Sandeen, who
did all the hard work of finding the bug and demonstrating how to
fix it.

Reported-by:Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

Update debian changelog in preparation for pending release

In particular, add a note to the changelog about the added
dh_autoconf use so the BTS will be updated appropriately.

Signed-off-by: Nathan Scott <nathans@debian.org>

Fix msgfmt warning when building the German translation

Use the same header present in the Polish language files to
resolve the following warning in the de.po build:
[MSGFMT] de.mo
de.po:7: warning: header field 'Language' missing in header

Signed-off-by: Nathan Scott <nathans@debian.org>

Fix 32 bit build warning in libxfs, xfs_daddr_t printing

Add the usual type casts to resolve the following warnings:
rdwr.c: In function 'libxfs_getbufr_map':
rdwr.c:499:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'xfs_daddr_t' [-Wformat]
rdwr.c:499:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'xfs_daddr_t' [-Wformat]

Signed-off-by: Nathan Scott <nathans@debian.org>