Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code. No new code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add tracepoints for the internals of the deferred ops mechanism
and tracepoint classes for clients of the dops, to make debugging
easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items. Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.
With that in mind, xfs_bmap_free really becomes a deferred ops control
structure. Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Refactor the btree_change_owner function into a more generic apparatus
which visits all blocks in a btree. We'll use this in a subsequent
patch for counting btree blocks for AG reservations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Create a function to enable querying of btree records mapping to a
range of keys. This will be used in subsequent patches to allow
querying the reverse mapping btree to find the extents mapped to a
range of physical blocks, though the generic code can be used for
any range query.
The overlapped query range function needs to use the btree get_block
helper because the root block could be an inode, in which case
bc_bufs[nlevels-1] will be NULL.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
On a filesystem with both reflink and reverse mapping enabled, it's
possible to have multiple rmap records referring to the same blocks on
disk. When overlapping intervals are possible, querying a classic
btree to find all records intersecting a given interval is inefficient
because we cannot use the left side of the search interval to filter
out non-matching records the same way that we can use the existing
btree key to filter out records coming after the right side of the
search interval. This will become important once we want to use the
rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.
(For the non-overlapping case, we can perform such queries trivially
by starting at the left side of the interval and walking the tree
until we pass the right side.)
Therefore, extend the btree code to come closer to supporting
intervals as a first-class record attribute. This involves widening
the btree node's key space to store both the lowest key reachable via
the node pointer (as the btree does now) and the highest key reachable
via the same pointer and teaching the btree modifying functions to
keep the highest-key records up to date.
This behavior can be turned on via a new btree ops flag so that btrees
that cannot store overlapping intervals don't pay the overhead costs
in terms of extra code and disk format changes.
When we're deleting a record in a btree that supports overlapped
interval records and the deletion results in two btree blocks being
joined, we defer updating the high/low keys until after all possible
joining (at higher levels in the tree) have finished. At this point,
the btree pointers at all levels have been updated to remove the empty
blocks and we can update the low and high keys.
When we're doing this, we must be careful to update the keys of all
node pointers up to the root instead of stopping at the first set of
keys that don't need updating. This is because it's possible for a
single deletion to cause joining of multiple levels of tree, and so
we need to update everything going back to the root.
The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than,
equal to, or greater than key2, respectively. This is consistent
with the rest of the kernel and the C library.
In btree_updkeys(), we need to evaluate the force_all parameter before
running the key diff to avoid reading uninitialized memory when we're
forcing a key update. This happens when we've allocated an empty slot
at level N + 1 to point to a new block at level N and we're in the
process of filling out the new keys.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add some function pointers to bc_ops to get the btree keys for
leaf and node blocks, and to update parent keys of a block.
Convert the _btree_updkey calls to use our new pointer, and
modify the tree shape changing code to call the appropriate
get_*_keys pointer instead of _btree_copy_keys because the
overlapping btree has to calculate high key values.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record. It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.
When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer. There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.
As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
If we make the inode root block of a btree unfull by expanding the
root, we must set *stat to 1 to signal success, rather than leaving
it uninitialized.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
When we're deleting realtime extents, we need to lock the summary
inode in case we need to update the summary info to prevent an assert
on the rsumip inode lock on a debug kernel. While we're at it, fix
the locking annotations so that we avoid triggering lockdep warnings.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Apparently cris doesn't require structure stride to align with the
largest type in the struct, so list[0] isn't at offset 4 like it is
everywhere else. Fix this... insofar as existing XFSes on cris are
screwed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Instead we always declare struct xfs_dir2_sf_hdr as packed. That's
the expected layout, and while most major architectures do the packing
by default the new structure size and offset checker showed that not
only the ARM old ABI got this wrong, but various minor embedded
architectures did as well.
[Verified that no code change on x86-64 results from this change]
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
And use an array of unsigned char values directly to avoid problems
with architectures that pad the size of structures. This also gets
rid of the xfs_dir2_ino4_t and xfs_dir2_ino8_t types, and introduces
new constants for the size of 4 and 8 bytes as well as the size
difference between the two.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Create a common function to calculate the maximum height of a per-AG
btree. This will eventually be used by the rmapbt and refcountbt
code to calculate appropriate maxlevels values for each. This is
important because the verifiers and the transaction block
reservations depend on accurate estimates of how many blocks are
needed to satisfy a btree split.
We were mistakenly using the max bnobt height for all the btrees,
which creates a dangerous situation since the larger records and
keys in an rmapbt make it very possible that the rmapbt will be
taller than the bnobt and so we can run out of transaction block
reservation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
In struct xfs_bmap_free, convert the open-coded free extent list to
a regular list, then use list_sort to sort it prior to processing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Break up xfs_free_extent() into a helper that fixes the freelist.
This helper will be used subsequently to ensure the freelist during
deferred rmap processing.
[darrick: refactor to put this at the head of the patchset]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 10 Aug 2016 01:29:35 +0000 (11:29 +1000)]
libxfs: add more list operations
Add some list operations that the deferred rmap code requires.
Code comes from the following kernel files:
lib/list_sort.c for all the list_sort stuff,
include/linux/list.h for the rest of the list_* stuff,
include/linux/kernel.h for container_of.
[ dchinner: move list_sort code to libxfs/list_sort.c ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Wed, 10 Aug 2016 01:29:30 +0000 (11:29 +1000)]
libxfs: fix set-but unused warning in dir2 code
Fix these build warnings:
xfs_dir2_leaf.c: In function ¿xfs_dir2_block_to_leaf¿:
xfs_dir2_leaf.c:389:16: warning: variable ¿tp¿ set but not used [-Wunused-but-set-variable]
xfs_trans_t *tp; /* transaction pointer */
^
xfs_dir2_node.c: In function ¿xfs_dir2_leaf_to_node¿:
xfs_dir2_node.c:302:16: warning: variable ¿tp¿ set but not used [-Wunused-but-set-variable]
xfs_trans_t *tp; /* transaction pointer */
^
Zorro Lang [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
xfs_quota: fall back silently if XFS_GETNEXTQUOTA fails
After XFS_GETNEXTQUOTA feature has been merged into linux kernel and
xfsprogs, xfs_quota use Q_XGETNEXTQUOTA for report and dump, and
fall back to old XFS_GETQUOTA ioctl if XFS_GETNEXTQUOTA fails.
But when XFS_GETNEXTQUOTA fails, xfs_quota print a warning as
"XFS_GETQUOTA: Invalid argument". That's due to kernel can't
recognize XFS_GETNEXTQUOTA ioctl and return EINVAL. At this time,
the warning is helpless, xfs_quota just need to fall back.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Anna Schumaker [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
xfs_io: Update man page for copy_range command
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Felix Janda [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
mkfs: Remove workaround for getsubopt() on older glibc
The workaround addressed a const-correctness warning on glibc
versions older than 2.2. However, it also captures alternative C
libraries on Linux which it should not do. glibc is really old, so
let's just remove the workaround.
Signed-off-by: Felix Janda <felix.janda@posteo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Anna Schumaker [Wed, 20 Jul 2016 05:31:54 +0000 (15:31 +1000)]
xfs_io: implement 'copy_range' command
Implements a new xfs_io command, named 'copy_range', which is supposed
to be used to copy a range of data from one file to another.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Zorro Lang [Wed, 20 Jul 2016 05:31:18 +0000 (15:31 +1000)]
xfs_repair: low memory shouldn't indicate corruption on exit
When I run "xfs_repair -n" on a 500T device with 16G memory,
xfs_repair print warning as below:
Memory available for repair (11798MB) may not be sufficient.
At least 64048MB is needed to repair this filesystem efficiently
If repair fails due to lack of memory, please
turn prefetching off (-P) to reduce the memory footprint.
And it returned an exit value of 1. But xfs_repair didn't hit any
error, so there is no reason to mark the fs as corrupted just
because it thinks it might *possibly* not have enough memory to run
to completion.
do_warn() will set fs_is_dirty=1 and hence give a non-zero exit
status. If we only want to print an informational message (not a
real issue), then we should use do_log() instead.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
kernel commit 5ef828c4
xfs: avoid false quotacheck after unclean shutdown
made xfs_sb_from_disk() also call xfs_sb_quota_from_disk
by default.
However, when this was merged to libxfs, existing separate
calls to libxfs_sb_quota_from_disk remained, and calling it
twice in a row on a V4 superblock leads to issues, because:
and after the second call, we have set both pquotino and gquotino
to NULLFSINO.
Fix this by making it safe to call twice, and also remove the extra
calls to libxfs_sb_quota_from_disk.
This is only spotted when running xfstests with "-m crc=0" because
the sb_from_disk change came about after V5 became default, and
the above behavior only exists on a V4 superblock.
Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Coverity complains that when multiplying two 32 bit values that
eventually will be stored in a 64 bit value that it's possible
the math could overflow unless one of the values being multiplied
is type cast to the proper size.
Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Wed, 29 Jun 2016 01:13:02 +0000 (11:13 +1000)]
libxfs: fix double free in libxfs_alloc_file_space
When porting the transaction alocation interface to userspace
(commit 9074815), I missed a change in libxfs_alloc_file_space() that
could lead to a double free of a transaction pointer in an error path.
Coverity spotted it, so fix it.
Coverity-id: 1362811 Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Wed, 29 Jun 2016 01:12:48 +0000 (11:12 +1000)]
libxfs: fix use after from in xfs_trans_roll
When porting the transaction alocation interface to userspace
(commit 9074815), I missed a change in xfs_trans_roll() that could
lead to a use after free. Coverity spotted it, so fix it.
Coverity-id: 1362812 Signed-off-by: Dave Chinner <david@fromorbit.com>
This goes over the whole indirection array and calls
xfs_iext_irec_remove for each one of the erps (from the last one to
the first one). As a result, we keep shrinking (reallocating
actually) the indirection array until we shrink out all of its
elements. When we have files with huge numbers of extents, umount
takes 30-80 sec, depending on the amount of files that XFS loaded
and the amount of indirection entries of each file. The unmount
stack looks like:
Further, this reallocation prevents us from freeing the extent list
from a RCU callback as allocation can block. Hence if the extent
list is in indirect format, optimise the freeing of the extent list
to only use kmem_free calls by freeing entire extent buffer pages at
a time, rather than extent by extent.
[dchinner: simplified freeing loop based on Christoph's suggestion]
Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Use krealloc to implement our realloc function. This helps to avoid
new allocations if we are still in the slab bucket. At least for the
bmap btree root that's actually the common case.
This also allows removing the now unused oldsize argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
that returns a transaction with all the required log and block reservations,
and which allows passing transaction flags directly to avoid the cumbersome
_xfs_trans_alloc interface.
While we're at it we also get rid of the transaction type argument that has
been superflous since we stopped supporting the non-CIL logging mode. The
guts of it will be removed in another patch.
[dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
By overallocating the in-core inode fork data buffer and zero
terminating the link target in xfs_init_local_fork we can avoid
the memory allocation in ->follow_link.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
[....]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_dir2_node_trim_free can return with setting the rvalp argument
pointer. Initialize it to 0 at the beginning of the function and
only update it to 1 if we succeeded trimming a freespace block.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_bmap_del_extent() handles extent removal from the in-core and
on-disk extent lists. When removing a delalloc range, it updates the
indirect block reservation appropriately based on the removal. It
currently enforces that the new indirect block reservation is less than
or equal to the original. This is normally the case in all situations
except for in certain cases when the removed range creates a hole in a
single delalloc extent, thus splitting a single delalloc extent in two.
It is possible with small enough extents to split an indlen==1 extent
into two such slightly smaller extents. This leaves one extent with 0
indirect blocks and leads to assert failures in other areas (e.g.,
xfs_bunmapi() if the extent happens to be removed).
Update the indlen distribution code to steal blocks from the deleted
extent, if necessary, to satisfy the worst case total indirect
reservation for the new extents. This is safe as the caller does not
update the fdblocks counters until the extent is removed. Blocks stolen
in this manner simply remain accounted as allocated, having ownership
transferred from the data extent to an indirect reservation.
As a precaution, fall back to the original reservation algorithm if the
new indlen requirement is not met and warn if we end up with extents
without any reservation at all to detect this more easily in the future.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
The delayed allocation indirect reservation splitting code is not
sufficient in some cases where a delalloc extent is split in two. In
preparation for enhancements to this code, refactor the current indlen
distribution algorithm into a new helper function.
[dchinner: rename temp, temp2 variables]
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_bunmapi() currently updates the fdblocks counter, unreserves quota,
etc. before the extent is deleted by xfs_bmap_del_extent(). The function
has problems dividing up the indirect reserved blocks for scenarios
where a single delalloc extent is split in two. Particularly, there
aren't always enough blocks reserved for multiple extents in a single
extent reservation.
The solution to this problem is to allow the extent removal code to
steal from the deleted extent to meet indirect reservation requirements.
Move the block of code in xfs_bmapi() that updates the fdblocks counter
to after the call to xfs_bmap_del_extent() to allow the codepath to
update the extent record before the free blocks are accounted. Also,
reshuffle the code slightly so the delalloc accounting occurs near the
xfs_bmap_del_extent() call to provide context for the comments.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
bp_release is set to 0 just before the breakpoint of the for loop before
the conditional check (in line 458). The other breakpoint is a goto that
skips the dead code.
Addresses-Coverity-Id: 102338
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
difflibxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index ef00156..9054c50 100644
Commit 88740da18[1] introduced a function to compute the maximum
height of the inode btree back in 1994. Back then, apparently, the
freespace and inode btrees shared the same geometry; however, it has
long since been the case that the inode and freespace btrees have
different record and key sizes. Therefore, we must use m_inobt_mnr if
we want a correct calculation/log reservation/etc.
(Yes, this bug has been around for 21 years and ten months.)
(Yes, I was in middle school when this bug was committed.)
Historical-research-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Tue, 21 Jun 2016 02:59:57 +0000 (12:59 +1000)]
xfs_check: process sparse inode chunks correctly
Update the inode btree scanning functions to process sparse inode chunks
correctly. For filesystems with sparse inode support enabled, process
each chunk a cluster at a time. Each cluster is checked against the
inobt record to determine if it is a hole and skipped if so.
Note that since xfs_check is deprecated in favor of xfs_repair, this
adds the minimum support necessary to process sparse inode enabled
filesystems. In other words, this adds no sparse inode specific checks
or verifications. We only update the inobt scanning functions to extend
the existing level of verification to sparse inode enabled filesystems
(e.g., avoid incorrectly tracking sparse regions as inodes). Problems
or corruptions associated with sparse inode records must be detected and
recovered via xfs_repair.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Tue, 21 Jun 2016 02:58:57 +0000 (12:58 +1000)]
xfs_db: Revert "xfs_db: make check work for sparse inodes"
This reverts commit bb2f98b78f20f4abbfbbd442162d9f535c84888a which
introduced support for multi-record inode chunks in
xfs_db/xfs_check. However, it doesn't currently handle filesystems
with multi-record inode chunks correctly. For example, do the
following on a 64k page size arch such as ppc64:
# mkfs.xfs -f -b size=64k <dev>
# xfs_db -c check <dev>
bad magic number 0 for inode 1152
bad magic number 0 for inode 1153
bad magic number 0 for inode 1154
bad magic number 0 for inode 1155
bad magic number 0 for inode 1156
bad magic number 0 for inode 1157
...
This boils down to a regression in the inode record processing code
(scanfunc_ino()) in db/check.c. Specifically, the cblocks value can
end up being zero after it is shifted by mp->m_sb.sb_inopblog (i.e.,
64 >> 7 == 0 for an -isize=512 -bsize=64k fs).
Fixing this problem is easier to do from scratch, so revert the
oringial commit first.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Tue, 21 Jun 2016 02:55:15 +0000 (12:55 +1000)]
xfs_repair: set rsumino version to 2
If we run xfs/033 with "-m crc=0", the test fails with a repair
output difference:
Phase 7 - verify and correct link counts...
+resetting inode INO nlinks from 0 to 1
done
This is because when we zero out the realtime summary inode and
rebuild it, we set its version to 1, then set its ip->i_d.di_nlink
to 1. This is a little odd, because v1 inodes store their link
count in di_onlink...
Then, later in repair we call xfs_inode_from_disk(), which sees the
version one inode, and converts it to version 2 in part by copying
di_onlink to di_nlink. But we never *set* di_onlink, so di_nlink
gets reset to zero, and this error is discovered later in repair.
Interestingly, mk_rbmino() was changed in 138659f1 to set version 2;
it looks like mk_rsumino was just missed.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Tue, 21 Jun 2016 02:54:30 +0000 (12:54 +1000)]
mkfs: test that -l su is a multiple of block size
lsunit was already tested, but lsu was not. So a thing like -l su=4097 was
possible. This commit adds a check to catch this, and moves the entire
lsu/lsunit block size testing to calc_stripe_factors(), where already is some
logic w.r.t. lsu/lsunit.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Tue, 21 Jun 2016 02:53:32 +0000 (12:53 +1000)]
mkfs: better error with incorrect b/s value suffix usage
If user writes a value using b or s suffix without explicitly stating the size
of blocks or sectors, mkfs ends with a not helpful error about the value being
too small. It happens because we read the physical geometry after all options
are parsed.
So, tell the user exactly what is wrong with the input.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Tue, 21 Jun 2016 02:52:22 +0000 (12:52 +1000)]
mkfs: fix -l su minval
-l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE,
because the upper limit is imposed by kernel on iclogbuf: stripe
unit can't be bigger than the log buffer, but the log buffer can
span multiple stripe units. L_SUNIT is changed in the same way.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Fri, 3 Jun 2016 01:04:15 +0000 (11:04 +1000)]
xfs.h: define XFS_IOC_FREEZE even if FIFREEZE is defined
And the same for XFS_IOC_THAW. Just because we now have a common
version of the ioctl we still need to provide the old name for it
for anyone using those.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Fri, 3 Jun 2016 01:04:15 +0000 (11:04 +1000)]
xfs_quota: only round up timer reporting > 1 day
I was too hasty with:
d1fe6ff xfs_quota: remove extra 30 seconds from time limit reporting
The point of that extra 30s, turns out, was to allow the user
to set a limit, query it, and get back what they just set, if
it is set to more than a day.
Without it, if we set a grace period to i.e. 3 days, and query it
1 second later, the rounding in the time_to_string function returns
"2 days" not "3 days" as it did before, because we are at
2 days 23:59:59 and it essentially applies a floor() for
brevity. I guess this was confusing.
(I've run into this same conundrum on my stove digital timer;
if you set it to 10m, it blinks "10" at you twice so that you
know what you set, then quickly flips to 9 as it counts down).
In some cases, however (and this is the case that prompted the
prior patch), we display a full "XYZ days hh:mm:ss" - we do this
if the verbose flag is set, or if the timer is less than one day.
In these cases, we should not add the 30s, because we are showing
full time resolution to the user.
Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Bill O'Donnell [Fri, 3 Jun 2016 01:04:03 +0000 (11:04 +1000)]
xfs_repair: further improvement on secondary superblock search method
This patch is a further optimization of secondary sb search, in
order to handle non-default geometries. Once again, use a similar
method to find fs geometry as that of xfs_mkfs. Refactor
verify_sb(), creating new sub-function that checks sanity of
agblocks and agcount: verify_sb_blocksize().
If verify_sb_blocksize verifies sane paramters, use found values for
the sb search. Otherwise, try search with default values. If these
faster methods both fail, fall back to original brute force slower
search.
Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Mon, 30 May 2016 02:21:31 +0000 (12:21 +1000)]
xfs_quota: check report_mount return value
The new call to report_mount doesn't check the return value
like every other caller does...
Returning 1 means it printed something; if the terse flag
is used and there is no usage, nothing gets printed.
If we set the NO_HEADER_FLAG anyway, then we won't see
the header for subsequent entries as we expect.
For example, project ID 0 has no usage in this case:
# xfs_quota -x -c "report -a" /mnt/test
Project quota on /mnt/test (/dev/sdb1)
Blocks
Project ID Used Soft Hard Warn/Grace
---------- --------------------------------------------------
#0 0 0 0 00 [--------]
project 2048 4 4 00 [--none--]
So using the terse flag results in no header when it prints
projects with usage:
# xfs_quota -x -c "report -t -a" /mnt/test
Project quota on /mnt/test (/dev/sdb1)
Blocks
Project ID Used Soft Hard Warn/Grace
---------- --------------------------------------------------
project 2048 4 4 00 [--none--]
Addresses-Coverity-Id: 1361552 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Bill O'Donnell [Mon, 30 May 2016 02:21:26 +0000 (12:21 +1000)]
xfs_repair: new secondary superblock search method
Optimize secondary sb search, using similar method to find
fs geometry as that of xfs_mkfs. If this faster method fails
in finding a secondary sb, fall back to original brute force
slower search.
Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Mon, 30 May 2016 00:35:56 +0000 (10:35 +1000)]
xfs_db: defang frag command
Too many people freak out about this fictitious "fragmentation
factor." As shown in the fact, it is largely meaningless, because
the number approaches 100% extremely quickly for just a few
extents per file.
I thought about removing it altogether, but perhaps a note
about its uselessness, and a more soothing metric (avg extents
per file) might be useful.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
The result should be just the single entry being asked for.
Currently this outputs the entire remainder of the array starting at
the given index. This makes it difficult to extract single entry
values.
This occurs because the printing of a flat array of number types
does not take into account the range that is specified on the
command line, which is held in fl->low and fl->high. To make this
work for flat arrays of number types (print function fp_num), change
print_flist() to limit the count of values to be emitted to the
range specified. This now gives:
To further simplify external parsing of single entry values, if only
a single value is requested from the array of fp_num type, don't
print the array index - it's already known. Hence:
This change will take effect on all types of flat number arrays that
are printed. e.g. the range limiting will work for things like the
AGI unlinked list arrays.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Mon, 30 May 2016 00:32:41 +0000 (10:32 +1000)]
xfs_db: allow recalculating CRCs on invalid metadata
Currently we can't write corrupt structures with valid CRCs on v5
filesystems via xfs_db. TO emulate certain types of corruption
result from software bugs in the kernel code, we need this
capability to set up the corrupted state. i.e. corrupt state with a
valid CRC needs to appear on disk.
This requires us to avoid running the verifier that would otherwise
prevent writing corrupt state to disk. To enable this, add the CRC
offset to the type table for different buffers and add a new flag to
the write command to trigger running a CRC calculation base don this
type table. We can then insert the calculated value into the correct
location in the buffer...
Because some objects are not directly buffer based, we can't easily
do this CRC trick. Those object types will be marked as
TYP_NO_CRC_OFF, and as a result will emit an error such as:
# xfs_db -x -c "inode 96" -c "write -d magic 0x4949" /dev/ram0
Cannot recalculate CRCs on this type of object
#
All v4 superblock types are configured this way, as are inode,
dquots and other v5 metadata types that either don't have CRCs or
don't have a fixed offset into a buffer to store their CRC.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
xfs_db: fix unaligned accesses
Fix 2 unaligned accesses in xfs_db which caused bus errors on
sparc64. Similar treatment was already done in xfs_repair and
xfs_metadump but somehow xfs_db got missed.
Thanks to Anatoly for reminding me that unaligned access is
a thing. ;)
Resolves-oss-bugzilla: #1140 Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
In practice, this works out to 63 indices; sadly 64 are required
to store a 32k metadata chunk, if the filesystem was created with
XFS_MAX_SECTORSIZE. This leads to more sadness later on, as we
index past arrays etc.
For now, just refuse to create a metadump from a 32k sector
filesystem; that's largely just theoretical at this point anyway.
Also check this on mdrestore, and check the lower bound as well;
the AFL fuzzer showed that interesting things happen when the
metadump image claims to contain a sector size of 0.
Oh, and spell "indices" correctly while we're at it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: add optional 'reason' for illegal_option
Allow us to tell the user what exactly is wrong with the specified
options. For example, that the value is too small, instead of just
generic "bad option."
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: don't treat files as though they are block devices
If the device is actually a file, and "-d file" is not specified,
mkfs will try to treat it as a block device and get stuff wrong.
Image files don't necessarily have the same sector sizes as the
block device or filesystem underlying the image file, nor should we
be issuing discard ioctls on image files.
To fix this sanely, only require "-d file" if the device name is
invalid to trigger creation of the file. Otherwise, use stat() to
determine if the device is a file or block device and deal with that
appropriately by setting the "isfile" variables and turning off
direct IO. Then ensure that we check the "isfile" options before
doing things that are specific to block devices.
Other file/blockdev issues fixed:
- use getstr to detect specifying the data device name
twice.
- check file/size/name parameters before anything else.
- overwrite checks need to be done before the image file is
opened and potentially truncated.
- blkid_get_topology() should not be called for image files,
so warn when it is called that way.
- zero_old_xfs_structures() emits a spurious error:
"existing superblock read failed: Success"
when it is run on a truncated image file. Don't warn if we
see this problem on an image file.
- Don't issue discards on image files.
- Use fsync() for image files, not BLKFLSBUF in
platform_flush_device() for Linux.
[ Eric Sandeen <sandeen@redhat.com>: move check for no mkfs target,
other minor fixes. ]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: add string options to generic parsing
So that string options are correctly detected for conflicts and
respecification, add a getstr() function that modifies the option
tables appropriately.
[ Eric Sandeen <sandeen@redaht.com>: whitespace fixup ]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: encode conflicts into parsing table
Many options conflict, so we need to specify which options conflict
with each other in a generic manner. We already have a "seen"
variable used for respecification detection, so we can also use this
code conflict detection. Hence add a "conflicts" array to the sub
options parameter definition.
[ Eric Sandeen <sandeen@redhat.com>: remove explicit L_FILE conflict ]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: merge getnum
getnum() is now only called by getnum_checked(). Move the two
together into a single getnum() function and change all the callers
back to getnum().
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: table based parsing for converted parameters
All the parameters that can be passed as block or sector sizes need
to be passed the block and sector sizes that they should be using
for conversion. For parameter parsing, it is always the same two
variables, so to make things easy just declare them as global
variables so we can avoid needing to pass them to getnum_checked().
We also need to mark these parameters are requiring conversion so
that we don't need to pass this information manually to
getnum_checked(). Further, some of these options are required to
have a power of 2 value, so add optional checking for that as well.
[ Eric Sandeen <sandeen@redhat.com>: fix min/max for "-l su" ]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: add respecification detection to generic parsing
Add flags to the generic input parameter tables so that
respecification can be detected in a generic manner.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: use getnum_checked for all ranged parameters
Now that getnum_checked can handle min/max checking, use this for
all parameters that take straight numbers and don't require unit
conversions.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:07 +0000 (17:16 +1000)]
mkfs: getbool is redundant
getbool() can be replaced with getnum_checked with appropriate
min/max values set for the boolean variables. Make boolean
arguments consistent - all accept 0 or 1 value now.
[ Eric Sandeen <sandeen@redhat.com>: manpage tidiness ]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
mkfs: structify input parameter passing
Passing large number of parameters around to number conversion
functions is painful. Add a structure to encapsulate the constant
parameters that are passed, and convert getnum_checked to use it.
This is the first real step towards a table driven option parser.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Yeah, I just asked for a block size of 2^456858480, and it didn't
get rejected. Great, isn't it?
So, factor out the parsing of logarithmic parameters, and pass in
the maximum valid value that they can take. These maximum values
might not be completely accurate (e.g. block/sector sizes will
affect the eventual valid maximum) but we can get rid of all the
overflows and stupidities before we get to fine-grained validity
checking later in mkfs once things like block and sector sizes have
been finalised.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
mkfs: factor boolean option parsing
Many of the options passed to mkfs have boolean options (0 or 1) and
all hand roll the same code and validity checks. Factor these out
into a common function.
Note that the lazy-count option is now changed to match other
booleans in that if you don't specify a value, it reverts to the
default value (on) rather than throwing an error.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
mkfs: validate all input values
Right now, mkfs does a poor job of input validation of values. Many
parameters do not check for trailing garbage and so will pass
obviously invalid values as OK. Some don't even detect completely
invalid values, leaving it for other checks later on to fail due to
a bad value conversion - these tend to rely on atoi() implicitly
returning a sane value when it is passed garbage, and atoi gives no
guarantee of the return value when passed garbage.
Clean all this up by passing all strings that need to be converted
into values into a common function that is called regardless of
whether unit conversion is needed or not. Further, make sure every
conversion is checked for a valid result, and abort the moment an
invalid value is detected.
Get rid of the silly "isdigits(), cvtnum()" calls which don't use
any of the conversion capabilities of cvtnum() because we've already
ensured that there are no conversion units in the string via the
isdigits() call. These can simply be replaced by a standard
strtoll() call followed by checking for no trailing bytes.
Finally, the block size of the filesystem is not known until all
the options have been parsed and we can determine if the default is
to be used. This means any parameter that relies on using conversion
from filesystem block size (the "NNNb" format) requires the block
size to first be specified on the command line so it is known.
Similarly, we make the same rule for specifying counts in sectors.
This is a change from the existing behaviour that assumes sectors
are 512 bytes unless otherwise changed on the command line. This,
unfortunately, leads to complete silliness where you can specify the
sector size as a count of sectors. It also means that you can do
some conversions with 512 byte sector sizes, and others with
whatever was specified on the command line, meaning the mkfs
behaviour changes depending in where in the command line the sector
size is changed....
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
mkfs: Sanitise the superblock feature macros
They are horrible macros that simply obfuscate the code, so
let's factor the code and make them nice functions.
To do this, add a sb_feat_args structure that carries around the
variables rather than a strange assortment of variables. This means
all the default can be clearly defined in a structure
initialisation, and dependent feature bits are easy to check.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
mkfs: sanitise ftype parameter values.
Because passing "-n ftype=2" should fail.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
xfsprogs: use common code for multi-disk detection
Both xfs_repair and mkfs.xfs need to agree on what is a "multidisk:
configuration - mkfs for determining the AG count of the filesystem,
repair for determining how to automatically parallelise it's
execution. This requires a bunch of common defines that both mkfs
and reapir need to share.
In fact, most of the defines in xfs_mkfs.h could be shared with
other programs (i.e. all the defaults mkfs uses) and so it is
simplest to move xfs_mkfs.h to the shared include directory and add
the new defines to it directly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Zorro Lang [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
xfs_quota: print quota id number if the name can't be found
When use GETNEXTQUOTA ioctl to report project quota, it always
report an unexpected quota:
(null) 0 0 0 00 [--------]
The ID 0 store the default quota, even if no one set default quota,
it still have quota accounting, but not enforced. So GETNEXTQUOTA
can find and report this undefined quota.
From this problem, I thought if others' quota name miss, (null) will
be printed too. e.g.
# xfs_quota -xc "limit -u bsoft=300m bhard=400m test" $mnt
# xfs_quota -xc "report -u" $mnt
User ID Used Soft Hard Warn/Grace
---------- --------------------------------------------------
root 0 0 0 00 [--------]
test 0 307200 409600 00 [--------]
# userdel -r test
# xfs_quota -xc "report -u" $mnt
User ID Used Soft Hard Warn/Grace
---------- --------------------------------------------------
root 0 0 0 00 [--------]
(null) 0 307200 409600 00 [--------]
So this problem same with above id 0's problem. To deal with this,
this patch will print id number if the name can't be found.
However, if we use the old GETQUOTA ioctl, it won't print project id
0 quota information if it's not defined. That's different with
GETNEXTQUOTA. For keep consistent, this patch also print project id
0 when use old GETQUOTA.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Zorro Lang [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
xfs_quota: fully support users and groups beginning with digits
A normal user or group name allow beginning with digits, but xfs_quota
can't create a limit for that user or group. The reason is 'strtoul'
function only translate digits at the beginning, it will ignore
letters after digits.
There's a commit fd537fc50eeade63bbd2a66105f39d04a011a7f5, it try to
fix "xfsprogs: xfs_quota allow user or group names beginning with
digits". But it doesn't effect 'limit' command, so a command likes:
Zorro Lang [Tue, 10 May 2016 07:16:06 +0000 (17:16 +1000)]
xfs_quota: add missed options -D and -P into man page
There're two options in xfsprogs/quota/init.c:init() function, the -D
option is used to set a file to instead of /etc/projects, and the -P
option is used to set a file to instead of /etc/projid. I don't know
these two options when I write xfstests case xfs/133, because
there's no any information about them in xfs_quota and other related
man pages.
Document these options in the xfs_quota man page.
Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Zorro Lang [Tue, 10 May 2016 07:16:05 +0000 (17:16 +1000)]
xfs_io: allow mmap command to reserve some free space
When using xfs_io to rewrite LTP's mmap16 testcase for xfstests,
we always hit mremap ENOMEM error. But according to linux commit:
90a8020 vfs: fix data corruption when blocksize < pagesize for mmaped data
The reproducer shouldn't use MREMAP_MAYMOVE flag for mremap(). So we
need a stable method to make mremap can extend space from the original
mapped starting address.
Generally if we try to mremap from the original mapped starting point
in a C program, at first we always do:
The "res_size" is bigger than "real_len". This will help us get a
region between real_len and res_size, which maybe free space. The
truth is we can't guarantee that this free memory will stay free.
But this method is still very helpful for reserve some free space
in short time.
This patch bring in the "res_size" by use a new -s option. If don't use
this option, xfs_io -c "mmap 0 1024" -c "mremap 8192" will hit ENOMEM
error nearly 100%, but if use this option, it nearly always remap
successfully.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>