When trying to flush qgroups in order to release space we run delayed
iputs in order to release space from recently deleted files (their link
counted reached zero), and then we start delalloc and wait for any
existing ordered extents to complete.
However there's a time window here where we end up not doing the final
iput on a deleted file which could release necessary space:
1) An unlink operation starts;
2) During the unlink, or right before it completes, delalloc is flushed
and an ordered extent is created;
3) When the ordered extent is created, the inode's ref count is
incremented (with igrab() at alloc_ordered_extent());
4) When the unlink finishes it doesn't drop the last reference on the
inode and so it doesn't trigger inode eviction to delete all of
the inode's items in its root and drop all references on its data
extents;
5) Another task enters try_flush_qgroup() to try to release space,
it runs all delayed iputs, but there's no delayed iput yet for that
deleted file because the ordered extent hasn't completed yet;
6) Then at try_flush_qgroup() we wait for the ordered extent to complete
and that results in adding a delayed iput at btrfs_put_ordered_extent()
when called from btrfs_finish_one_ordered();
7) Adding the delayed iput results in waking the cleaner kthread if it's
not running already. However it may take some time for it to be
scheduled, or it may be running but busy running auto defrag, dropping
deleted snapshots or doing other work, so by the time we return from
try_flush_qgroup() the space for deleted file isn't released.
Improve on this by running delayed iputs only after flushing delalloc
and waiting for ordered extent completion.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
return 0;
}
- btrfs_run_delayed_iputs(root->fs_info);
- btrfs_wait_on_delayed_iputs(root->fs_info);
ret = btrfs_start_delalloc_snapshot(root, true);
if (ret < 0)
goto out;
btrfs_wait_ordered_extents(root, U64_MAX, NULL);
+ /*
+ * After waiting for ordered extents run delayed iputs in order to free
+ * space from unlinked files before committing the current transaction,
+ * as ordered extents may have been holding the last reference of an
+ * inode and they add a delayed iput when they complete.
+ */
+ btrfs_run_delayed_iputs(root->fs_info);
+ btrfs_wait_on_delayed_iputs(root->fs_info);
+
ret = btrfs_commit_current_transaction(root);
out:
clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);