Filipe Manana [Wed, 23 Oct 2024 13:14:11 +0000 (14:14 +0100)]
btrfs: remove num_entries atomic counter from delayed ref root
The atomic counter 'num_entries' is not used anymore, we increment it
and decrement it but then we don't ever read it to use for any logic.
Its last use was removed with commit 61a56a992fcf ("btrfs: delayed refs
pre-flushing should only run the heads we have"). So remove it.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 18 Oct 2024 10:29:37 +0000 (11:29 +0100)]
btrfs: use helper to find first ref head at btrfs_destroy_delayed_refs()
Instead of open coding it, use the find_first_ref_head() helper at
btrfs_destroy_delayed_refs(). This avoids duplicating the logic,
specially with the upcoming changes in subsequent patches.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 21 Oct 2024 11:10:26 +0000 (12:10 +0100)]
btrfs: remove duplicated code to drop delayed ref during transaction abort
When destroying delayed refs during a transaction abort, we have open
coded the removal of a delayed ref, which is also done by the static
helper function drop_delayed_ref(). So remove that duplicated code and
use drop_delayed_ref() instead.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 21 Oct 2024 16:12:22 +0000 (17:12 +0100)]
btrfs: remove fs_info parameter from btrfs_cleanup_one_transaction()
The fs_info parameter is redundant because it can be extracted from the
transaction given as another parameter. So remove it and use the fs_info
accessible from the transaction.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 21 Oct 2024 16:03:46 +0000 (17:03 +0100)]
btrfs: remove fs_info parameter from btrfs_destroy_delayed_refs()
The fs_info parameter is redundant because it can be extracted from the
transaction given as another parameter. So remove it and use the fs_info
accessible from the transaction.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 17 Oct 2024 15:23:41 +0000 (16:23 +0100)]
btrfs: move btrfs_destroy_delayed_refs() to delayed-ref.c
It's better suited at delayed-ref.c since it's about delayed refs and
contains logic to iterate over them (using the red black tree, doing all
the locking, freeing, etc), so move it from disk-io.c, which is pretty
big, into delayed-ref.c, hiding implementation details of how delayed
refs are tracked and managed. This also facilitates the next patches in
the series.
This change moves the code between files but also does the following
simple cleanups:
1) Rename the 'cache' variable to 'bg', since it's a block group
(the 'cache' logic comes from old days where the block group
structure was named 'btrfs_block_group_cache');
2) Move the 'ref' variable declaration to the scope of the inner
while loop, since it's not used outside that loop.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 17 Oct 2024 14:07:45 +0000 (15:07 +0100)]
btrfs: remove BUG_ON() at btrfs_destroy_delayed_refs()
At btrfs_destroy_delayed_refs() it's unexpected to not find the block
group to which a delayed reference's extent belongs to, so we have this
BUG_ON(), not just because it's highly unexpected but also because we
don't know what to do there.
Since we are in the transaction abort path, there's nothing we can do
other than proceed and cleanup all used resources we can. So remove
the BUG_ON() and deal with a missing block group by logging an error
message and continuing to cleanup all we can related to the current
delayed ref head and moving to other delayed refs.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Robbie Ko [Thu, 24 Oct 2024 02:31:42 +0000 (10:31 +0800)]
btrfs: reduce extent tree lock contention when searching for inline backref
When inserting extent backref, in order to check whether refs other than
inline refs are used, we always use path keep locks for tree search, which
will increase the lock contention of extent tree.
We do not need the parent node every time to determine whether normal
refs are used. It is only needed when the extent item is the last item
in a leaf.
Therefore, we change it to first use keep_locks=0 for search. If the
extent item happens to be the last item in the leaf, we then change to
keep_locks=1 for the second search to reduce lock contention.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 23 Oct 2024 13:25:18 +0000 (15:25 +0200)]
btrfs: tests: implement case for partial RAID stripe-tree delete
Implement self-tests for partial deletion of RAID stripe-tree entries.
These two new tests cover both the deletion of the front of a RAID
stripe-tree stripe extent as well as truncation of an item to make it
smaller.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
This ASSERT()ion triggers, because for the initial design of RAID
stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned
btrfs in mind.
But for a RAID stripe-tree based system, that is not hosted on a zoned
storage device, but on a regular device this rule doesn't apply.
So in case the range we want to delete starts in the middle of the
previous item, grab the item and "truncate" it's length. That is, clone
the item, subtract the deleted portion from the key's offset, delete the
old item and insert the new one.
In case the range to delete ends in the middle of an item, we have to
adjust both the item's key as well as the stripe extents and then
re-insert the modified clone into the tree after deleting the old stripe
extent.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 10 Oct 2024 04:46:13 +0000 (15:16 +1030)]
btrfs: convert btrfs_buffered_write() to use folios
The buffered write path is still heavily utilizing the page interface.
Since we have converted it to do a page-by-page copying, it's much easier
to convert all involved functions to folio interface, this involves:
All function are changed to accept a folio parameter, and if the word
"page" is in the function name, change that to "folio" too.
The function btrfs_dirty_page() is exported for v1 space cache, convert
v1 cache call site to convert its page to folio for the new interface.
And there is a small enhancement for prepare_one_folio(), instead of
manually waiting for the page writeback, let __filemap_get_folio() to
handle that by using FGP_WRITEBEGIN, which implies
(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE).
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Mark Harmstone [Tue, 15 Oct 2024 11:37:29 +0000 (12:37 +0100)]
btrfs: fix wrong sizeof in btrfs_do_encoded_write()
btrfs_do_encoded_write() was converted to use folios in 400b172b8cdc,
but we're still allocating based on sizeof(struct page *) rather than
sizeof(struct folio *). There's no functional change.
Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Boris Burkov [Tue, 15 Oct 2024 21:27:32 +0000 (14:27 -0700)]
btrfs: do not clear read-only when adding sprout device
If you follow the seed/sprout wiki, it suggests the following workflow:
btrfstune -S 1 seed_dev
mount seed_dev mnt
btrfs device add sprout_dev
mount -o remount,rw mnt
The first mount mounts the FS readonly, which results in not setting
BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
somewhat surprisingly clears the readonly bit on the sb (though the
mount is still practically readonly, from the users perspective...).
Finally, the remount checks the readonly bit on the sb against the flag
and sees no change, so it does not run the code intended to run on
ro->rw transitions, leaving BTRFS_FS_OPEN unset.
As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
does no work. This results in leaking deleted snapshots until we run out
of space.
I propose fixing it at the first departure from what feels reasonable:
when we clear the readonly bit on the sb during device add.
A new fstest I have written reproduces the bug and confirms the fix.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 16 Oct 2024 10:13:24 +0000 (11:13 +0100)]
btrfs: remove local generation variable from read_block_for_search()
It's redundant to have the 'gen' variable since we already have the same
value in the local btrfs_tree_parent_check structure. So remove it and
instead use the structure's field.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 16 Oct 2024 10:09:26 +0000 (11:09 +0100)]
btrfs: remove redundant initializations for struct btrfs_tree_parent_check
It's pointless to initialize the has_first_key field of the stack local
btrfs_tree_parent_check structure at btrfs_tree_parent_check() and at
btrfs_qgroup_trace_subtree() since all fields not explicitly initialized
are zeroed out. In the case of the first function it's a bit odd because
we are assigning 0 and the field is of type bool, however not incorrect
since a 0 is converted to false.
Just remove the explicit initializations due to their redundancy.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 16 Oct 2024 09:57:48 +0000 (10:57 +0100)]
btrfs: simplify arguments for btrfs_verify_level_key()
The only caller of btrfs_verify_level_key() is read_block_for_search() and
it's passing 3 arguments to it that can be extracted from its on stack
variable of type struct btrfs_tree_parent_check.
So change btrfs_verify_level_key() to accept an argument of type
struct btrfs_tree_parent_check instead of level, first key and parent
transid arguments.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 16 Oct 2024 09:45:47 +0000 (10:45 +0100)]
btrfs: remove redundant level argument from read_block_for_search()
The level parameter passed to read_block_for_search() always matches the
level of the extent buffer passed in the "eb_ret" parameter, which we are
also extracting into the "parent_level" local variable.
So remove the level parameter and instead use the "parent_level" variable
which in fact has a better name (it's the level of the parent node from
which we are reading a child node/leaf).
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 17 Sep 2024 11:03:11 +0000 (12:03 +0100)]
btrfs: re-enable the extent map shrinker
Now that the extent map shrinker can only be run by a single task and runs
asynchronously as a work queue job, enable it as it can no longer cause
stalls on tasks allocating memory and entering the extent map shrinker
through the fs shrinker (implemented by btrfs_free_cached_objects()).
This is crucial to prevent exhaustion of memory due to unbounded extent
map creation, primarily with direct IO but also for buffered IO on files
with holes. This problem, for the direct IO case, was first reported in
the Link tag below. That report was added to a Link tag of the first patch
that introduced the extent map shrinker, commit 956a17d9d050 ("btrfs: add
a shrinker for extent maps"), however the Link tag disappeared somehow
from the committed patch (but was included in the submitted patch to the
mailing list), so adding it below for future reference.
Filipe Manana [Thu, 5 Sep 2024 10:31:49 +0000 (11:31 +0100)]
btrfs: rename extent map shrinker members from struct btrfs_fs_info
The names for the members of struct btrfs_fs_info related to the extent
map shrinker are a bit too long, so rename them to be shorter by replacing
the "extent_map_" prefix with the "em_" prefix.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 4 Sep 2024 16:03:43 +0000 (17:03 +0100)]
btrfs: simplify tracking progress for the extent map shrinker
Now that the extent map shrinker can only be run by a single task (as a
work queue item) there is no need to keep the progress of the shrinker
protected by a spinlock and passing the progress to trace events as
parameters. So remove the lock and simplify the arguments for the trace
events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 29 Aug 2024 14:23:32 +0000 (15:23 +0100)]
btrfs: make the extent map shrinker run asynchronously as a work queue job
Currently the extent map shrinker is run synchronously for kswapd tasks
that end up calling the fs shrinker (fs/super.c:super_cache_scan()).
This has some disadvantages and for some heavy workloads with memory
pressure it can cause some delays and stalls that make a machine
unresponsive for some periods. This happens because:
1) We can have several kswapd tasks on machines with multiple NUMA zones,
and running the extent map shrinker concurrently can cause high
contention on some spin locks, namely the spin locks that protect
the radix tree that tracks roots, the per root xarray that tracks
open inodes and the list of delayed iputs. This not only delays the
shrinker but also causes high CPU consumption and makes the task
running the shrinker monopolize a core, resulting in the symptoms
of an unresponsive system. This was noted in previous commits such as
commit ae1e766f623f ("btrfs: only run the extent map shrinker from
kswapd tasks");
2) The extent map shrinker's iteration over inodes can often be slow, even
after changing the data structure that tracks open inodes for a root
from a red black tree (up to kernel 6.10) to an xarray (kernel 6.10+).
The transition to the xarray while it made things a bit faster, it's
still somewhat slow - for example in a test scenario with 10000 inodes
that have no extent maps loaded, the extent map shrinker took between
5ms to 8ms, using a release, non-debug kernel. Iterating over the
extent maps of an inode can also be slow if have an inode with many
thousands of extent maps, since we use a red black tree to track and
search extent maps. So having the extent map shrinker run synchronously
adds extra delay for other things a kswapd task does.
So make the extent map shrinker run asynchronously as a job for the
system unbounded workqueue, just like what we do for data and metadata
space reclaim jobs.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Robbie Ko [Tue, 15 Oct 2024 07:41:37 +0000 (15:41 +0800)]
btrfs: reduce lock contention when eb cache miss for btree search
When crawling btree, if an eb cache miss occurs, we change to use the eb
read lock and release all previous locks (including the parent lock) to
reduce lock contention.
If an eb cache miss occurs in a leaf and needs to execute IO, before this
change we released locks only from level 2 and up and we read a leaf's
content from disk while holding a lock on its parent (level 1), causing
the unnecessary lock contention on the parent, after this change we
release locks from level 1 and up, but we lock level 0, and read leaf's
content from disk.
Because we have prepared the check parameters and the read lock of eb we
hold, we can ensure that no race will occur during the check and cause
unexpected errors.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:32:17 +0000 (16:32 +0200)]
btrfs: drop unused parameter data from btrfs_fill_super()
The only caller passes NULL, we can drop the parameter. This is since
the new mount option parser done in 3bb17a25bcb09a ("btrfs: add get_tree
callback for new mount API").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:32:05 +0000 (16:32 +0200)]
btrfs: drop unused parameter fs_info from folio_range_has_eb()
The parameter was added in 8ff8466d29efc2 ("btrfs: support subpage for
extent buffer page release") for page but hasn't been used since, so we
can drop it.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:31:58 +0000 (16:31 +0200)]
btrfs: drop unused parameter mask from try_release_extent_state()
The mask parameter used for allocations got unified to GFP_NOFS and
removed from relevant functions in 1d1268004430 ("btrfs: drop gfp from
parameter extent state helpers").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:31:56 +0000 (16:31 +0200)]
btrfs: drop unused parameter refs from visit_node_for_delete()
The parameter duplicates what can be effectively obtained from
wc->refs[level - 1] and this is what's actually used inside. Added in
commit 2b73c7e761c4 ("btrfs: unify logic to decide if we need to walk
down into a node during snapshot delete").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:31:49 +0000 (16:31 +0200)]
btrfs: drop unused parameter iov_iter from btrfs_write_check()
The parameter 'from' has never been used since commit b8d8e1fd570a
("btrfs: introduce btrfs_write_check()"), this is for buffered write.
Direct io write needs it so it was probably an interface thing, but we
can drop it.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:31:47 +0000 (16:31 +0200)]
btrfs: drop unused parameter file_offset from btrfs_encoded_read_regular_fill_pages()
The file_offset parameter used to be passed to encoded read struct but
was removed in commit b665affe93d8 ("btrfs: remove unused members from
struct btrfs_encoded_read_private").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:31:18 +0000 (16:31 +0200)]
btrfs: drop unused parameter map from scrub_simple_mirror()
The parameter map used to be passed to scrub_extent() until e02ee89baa66c4 ("btrfs: scrub: switch scrub_simple_mirror() to
scrub_stripe infrastructure"), where the scrub implementation was
completely reworked.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Oct 2024 14:30:53 +0000 (16:30 +0200)]
btrfs: drop unused parameter path from btrfs_tree_mod_log_rewind()
The path parameter was used for our own locking, that got converted to
rwsem eventually. Last usage in ac5887c8e013d6 ("btrfs: locking: remove
all the blocking helpers").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Sun, 6 Oct 2024 00:06:20 +0000 (10:36 +1030)]
btrfs: remove btrfs_set_range_writeback()
The function btrfs_set_range_writeback() was originally a callback for
metadata and data, to mark a range with writeback flag.
Then it was converted into a common function call for both metadata and
data.
From the very beginning, the function had been only called on a full page,
later converted to handle range inside a page.
But it never needed to handle multiple pages, and since commit 8189197425e7 ("btrfs: refactor __extent_writepage_io() to do
sector-by-sector submission") the function was only called on a
sector-by-sector basis.
This makes the function unnecessary, and can be converted to a simple
btrfs_folio_set_writeback() call instead.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 7 Oct 2024 14:55:43 +0000 (15:55 +0100)]
btrfs: qgroup: run delayed iputs after ordered extent completion
When trying to flush qgroups in order to release space we run delayed
iputs in order to release space from recently deleted files (their link
counted reached zero), and then we start delalloc and wait for any
existing ordered extents to complete.
However there's a time window here where we end up not doing the final
iput on a deleted file which could release necessary space:
1) An unlink operation starts;
2) During the unlink, or right before it completes, delalloc is flushed
and an ordered extent is created;
3) When the ordered extent is created, the inode's ref count is
incremented (with igrab() at alloc_ordered_extent());
4) When the unlink finishes it doesn't drop the last reference on the
inode and so it doesn't trigger inode eviction to delete all of
the inode's items in its root and drop all references on its data
extents;
5) Another task enters try_flush_qgroup() to try to release space,
it runs all delayed iputs, but there's no delayed iput yet for that
deleted file because the ordered extent hasn't completed yet;
6) Then at try_flush_qgroup() we wait for the ordered extent to complete
and that results in adding a delayed iput at btrfs_put_ordered_extent()
when called from btrfs_finish_one_ordered();
7) Adding the delayed iput results in waking the cleaner kthread if it's
not running already. However it may take some time for it to be
scheduled, or it may be running but busy running auto defrag, dropping
deleted snapshots or doing other work, so by the time we return from
try_flush_qgroup() the space for deleted file isn't released.
Improve on this by running delayed iputs only after flushing delalloc
and waiting for ordered extent completion.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Mon, 7 Oct 2024 11:52:48 +0000 (13:52 +0200)]
btrfs: scrub: skip initial RST lookup errors
Performing the initial extent sector read on a RAID stripe-tree backed
filesystem with pre-allocated extents will cause the RAID stripe-tree
lookup code to return ENODATA, as pre-allocated extents do not have any
on-disk bytes and thus no RAID stripe-tree entries.
But the current scrub read code marks these extents as errors, because
the lookup fails.
If btrfs_map_block() returns -ENODATA, it means that the call to
btrfs_get_raid_extent_offset() returned -ENODATA, because there is no
entry for the corresponding range in the RAID stripe-tree. But as this
range is in the extent tree it means we've hit a pre-allocated extent. In
this case, don't mark the sector in the stripe's error bitmaps as faulty
and carry on to the next.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Mon, 7 Oct 2024 11:52:47 +0000 (13:52 +0200)]
btrfs: return ENODATA in case RST lookup fails
In case a lookup in the RAID stripe-tree fails, return ENODATA instead of
ENOENT to better distinguish stripe-tree lookups from other code paths
where we return ENOENT.
Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Fri, 4 Oct 2024 13:19:01 +0000 (15:19 +0200)]
btrfs: handle empty list of NOCOW ordered extents with checksum list
Currently we BUG_ON() in btrfs_finish_one_ordered() if we are finishing
an ordered extent that is flagged as NOCOW, but it's checksum list is
not empty.
This is clearly a logic error which we can recover from by aborting the
transaction.
For developer builds which enable CONFIG_BTRFS_ASSERT, also ASSERT()
that the list is empty.
Suggested-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 1 Oct 2024 23:17:49 +0000 (08:47 +0930)]
btrfs: simplify the page uptodate preparation for prepare_pages()
Currently inside prepare_pages(), we handle the leading and tailing page
differently, and skip the middle pages (if any). This is to avoid
reading pages which are fully covered by the dirty range.
Refactor the code by moving all checks (alignment check, range check,
force read check) into prepare_uptodate_page().
So that prepare_pages() only needs to iterate all the pages
unconditionally.
And since we're here, also update prepare_uptodate_page() to use
folio API other than the old page API.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 1 Oct 2024 23:17:48 +0000 (08:47 +0930)]
btrfs: remove the dirty_page local variable
Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
recording the number of pages we dirtied in the current iteration.
However we do not really need that variable, since it can be calculated
from @pos and @copied.
In fact there is already a problem inside the short copy path, where we
use @dirty_pages to calculate the range we need to release.
But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.
Instead of keeping @dirty_pages and cause incorrect usage, just
calculate the number of dirtied pages inside btrfs_dirty_pages().
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Dr. David Alan Gilbert [Thu, 3 Oct 2024 14:27:27 +0000 (15:27 +0100)]
btrfs: remove unused btrfs_try_tree_write_lock()
btrfs_try_tree_write_lock() has been unused since commit 50b21d7a066f ("btrfs: submit a writeback bio per extent_buffer").
Remove it as we don't need it anymore.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Dr. David Alan Gilbert [Thu, 3 Oct 2024 14:27:26 +0000 (15:27 +0100)]
btrfs: remove unused btrfs_is_parity_mirror()
btrfs_is_parity_mirror() has been unused since commit 4886ff7b50f6
("btrfs: introduce a new helper to submit write bio for repair").
Remove it as the code was refactored and we don't need the helper
anymore.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Dr. David Alan Gilbert [Thu, 3 Oct 2024 20:33:19 +0000 (21:33 +0100)]
btrfs: remove unused btrfs_free_squota_rsv()
btrfs_free_squota_rsv() was added in commit e85a0adacf17 ("btrfs: ensure releasing squota reserve on head refs")
but has remained unused since then.
Remove it as we don't seem to need it and was probably a leftover.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 2 Oct 2024 10:11:48 +0000 (12:11 +0200)]
btrfs: tests: add selftests for raid-stripe-tree
Add first stash of very basic self tests for the RAID stripe-tree.
More test cases will follow exercising the tree.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: remove redundant stop_loop variable in scrub_stripe()
The variable stop_loop was originally introduced in commit 625f1c8dc66d7
("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in
commit 3b080b2564287 ("Btrfs: scrub raid56 stripes in the right way").
However, in a later commit 18d30ab961497 ("btrfs: scrub: use
scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code
that modified stop_loop was removed, making the variable redundant.
Currently, stop_loop is only initialized with 0 and is never used or
modified within the scrub_stripe() function. As a result, this patch
removes the stop_loop variable to clean up the code and eliminate
unnecessary redundancy.
This change has no impact on functionality, as stop_loop was never
utilized in any meaningful way in the final version of the code.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 24 Sep 2024 16:37:28 +0000 (17:37 +0100)]
btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
There's no need to hold the delayed refs spinlock when calling
btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
it doesn't change anything in delayed refs and it only changes the xarray
used to track qgroup extent records, which is protected by the xarray's
lock.
Holding the lock is only adding unnecessary lock contention with other
tasks that actually need to take the lock to add/remove/change delayed
references. So remove the locking.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 24 Sep 2024 16:12:32 +0000 (17:12 +0100)]
btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
Now that we track qgroup extent records in a xarray we don't need to have
a "bytenr" field in struct btrfs_qgroup_extent_record, since we can get
it from the index of the record in the xarray.
So remove the field and grab the bytenr from either the index key or any
other place where it's available (delayed refs). This reduces the size of
struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
that we now can store 128 instances of this structure instead of 102 per
4K page.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Normally it's not a big deal, and later udev can trigger a device path
rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
will show up in mtab.
[CAUSE]
For filename "/proc/self/fd/3", it means the opened file descriptor 3.
In above case, it's exactly the device we want to open, aka points to
"/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2".
Inside kernel we solve the mount source using LOOKUP_FOLLOW, which
follows the symbolic link and grab the proper block device.
But inside btrfs we also save the filename into btrfs_device::name, and
utilize that member to report our mount source, which leads to the above
situation.
[FIX]
Instead of unconditionally trust the path, check if the original file
(not following the symbolic link) is inside "/dev/", if not, then
manually lookup the path to its final destination, and use that as our
device path.
This allows us to still use symbolic links, like
"/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
with LVM2 setup.
And for really weird names, like the above case, we solve it to
"/dev/dm-2" instead.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <fvogt@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: avoid unnecessary device path update for the same device
[PROBLEM]
It is very common for udev to trigger device scan, and every time a
mounted btrfs device got re-scan from different soft links, we will get
some of unnecessary device path updates, this is especially common
for LVM based storage:
# lvs
scratch1 test -wi-ao---- 10.00g
scratch2 test -wi-a----- 10.00g
scratch3 test -wi-a----- 10.00g
scratch4 test -wi-a----- 10.00g
scratch5 test -wi-a----- 10.00g
test test -wi-a----- 10.00g
# mkfs.btrfs -f /dev/test/scratch1
# mount /dev/test/scratch1 /mnt/btrfs
# dmesg -c
[ 205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154)
[ 205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9
[ 205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm
[ 205.713856] BTRFS info (device dm-4): using free-space-tree
[ 205.722324] BTRFS info (device dm-4): checking UUID tree
So far so good, but even if we just touched any soft link of
"dm-4", we will get quite some unnecessary device path updates.
# touch /dev/mapper/test-scratch1
# dmesg -c
[ 469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221)
[ 469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)
Such device path rename is unnecessary and can lead to random path
change due to the udev race.
[CAUSE]
Inside device_list_add(), we are using a very primitive way checking if
the device has changed, strcmp().
Which can never handle links well, no matter if it's hard or soft links.
So every different link of the same device will be treated as a different
device, causing the unnecessary device path update.
[FIX]
Introduce a helper, is_same_device(), and use path_equal() to properly
detect the same block device.
So that the different soft links won't trigger the rename race.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <fvogt@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: allow compression even if the range is not page aligned
Previously for btrfs with sector size smaller than page size (subpage),
we only allow compression if the range is fully page aligned.
This is to work around the asynchronous submission of compressed range,
which delayed the page unlock and writeback into a workqueue,
furthermore asynchronous submission can lock multiple sector range
across page boundary.
Such asynchronous submission makes it very hard to co-operate with other
regular writes.
With the recent changes to the subpage folio unlock path, now
asynchronous submission of compressed pages can co-operate with regular
submission, so enable sector perfect compression if it's an experimental
build.
The ETA for moving this feature out of experimental is 6.15, and I hope
all remaining corner cases can be exposed before that.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
For above 64K page size case, writepage_delalloc() for page 0 will find
and lock the delalloc range [32K, 96K), which is beyond the page
boundary.
Then when writepage_delalloc() is called for the page 64K, since [64K,
96K) is already locked, only [120K, 128K) will be locked.
This means, although range [64K, 96K) is dirty and will be submitted
later by extent_writepage_io(), it will not be marked as locked.
This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to
free every non-compressed sector, and compression is only allowed for
full page range.
But this is not safe for future sector perfect compression support, as
this can lead to double folio unlock:
Thread A | Thread B
---------------------------------------+--------------------------------
| submit_one_async_extent()
| |- extent_clear_unlock_delalloc()
extent_writepage() | |- btrfs_folio_end_writer_lock()
|- btrfs_folio_end_writer_lock_bitmap()| |- btrfs_subpage_end_and_test_writer()
| | | |- atomic_sub_and_test()
| | | /* Now the atomic value is 0 */
|- if (atomic_read() == 0) | |
|- folio_unlock() | |- folio_unlock()
The root cause is the above range [64K, 96K) is dirtied and should also
be locked but it isn't.
So to make everything more consistent and prepare for the incoming
sector perfect compression, mark all dirty sectors as locked.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: move the delalloc range bitmap search into extent_io.c
Currently for subpage (sector size < page size) cases, we reuse subpage
locked bitmap to find out all delalloc ranges we have locked, and run
all those found ranges.
For above range, writepage_delalloc() for page 0 will handle the range
[32K, 96k), note delalloc range can be beyond the page boundary.
But writepage_delalloc() for page 64K will only handle range [120K,
128K), as the previous run on page 0 has already handled range [64K,
96K).
Meanwhile for the writeback we should expect range [64K, 96K) to also be
locked, this leads to the mismatch from locked bitmap and delalloc
range.
This is not causing problems yet, but it's still an inconsistent
behavior.
So instead of relying on the subpage locked bitmap, move the delalloc
range search using local @delalloc_bitmap, so that we can remove the
existing btrfs_folio_find_writer_locked().
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: do not assume the full page range is not dirty in extent_writepage_io()
The function extent_writepage_io() will submit the dirty sectors inside
the page for the write.
But recently to co-operate with the incoming subpage compression
enhancement, a new bitmap is introduced to
btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty
range.
This is because we can have the following cases with 64K page size:
0 16K 32K 48K 64K
| |/////////| |/|
52K
For range [16K, 32K), we queue the dirty range for compression, which is
ran in a delayed workqueue.
Then for range [48K, 52K), we go through the regular submission path.
In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range
[16K, 32K).
The dirty flags for the range [16K, 32K) is only cleared when the
compression is done, by the extent_clear_unlock_delalloc() call inside
submit_one_async_extent().
This patch fix the false alert by removing the
btrfs_folio_assert_not_dirty() check, since it's no longer correct for
subpage compression cases.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: make extent_range_clear_dirty_for_io() to handle sector size < page size cases
For btrfs with sector size < page size (e.g. 4K sector size, 64K page
size), and enable the sector perfect compression support, then the
following dirty range can lead to problems:
0 32K 64K 96K 128K
| |///////||//////| |/|
124K
In above case, if we start writeback for that inode, the last dirty
range [124K, 128K) will not be submitted and cause reserved space
leakage:
- Start writeback for page 0
We find the range [32K, 96K) is suitable for compression, and queue it
into a workqueue to do the delayed compression and submission.
- Compression happens for range [32K, 96K)
Function extent_range_clear_dirty_for_io() is called, however it is
only doing full page handling, not considering any the extra bitmaps
for subpage cases.
That function will clear page dirty for both page 0 and page 64K.
- Writeback for the inode is done
Because page 64K has its dirty flag cleared, it will not be considered
as a writeback target.
This means the range [124K, 128K) will not be submitted, and reserved
space for it will be leaked.
Fix this problem by using the subpage helper to clear the dirty flag.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: wait for writeback if sector size is smaller than page size
[PROBLEM]
If sector perfect compression is enabled for sector size < page size
case, the following case can lead dirty ranges not being written back:
0 32K 64K 96K 128K
| |///////||//////| |/|
124K
In above example, the page size is 64K, and we need to write back above
two pages.
- Submit for page 0 (main thread)
We found delalloc range [32K, 96K), which can be compressed.
So we queue an async range for [32K, 96K).
This means, the page unlock/clearing dirty/setting writeback will
all happen in a workqueue context.
- The compression is done, and compressed range is submitted (workqueue)
Since the compression is done in asynchronously, the compression can
be done before the main thread to submit for page 64K.
Now the whole range [32K, 96K), involving two pages, will be marked
writeback.
- Submit for page 64K (main thread)
extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
so it skips the writeback wait.
And unlock the page and exit. This means the dirty range [124K, 128K)
will never be submitted, until next writeback happens for page 64K.
This will never happen for previous kernels because:
- For sector size == page size case
Since one page is one sector, if a page is marked writeback it will
not have dirty flags.
So this corner case will never hit.
- For sector size < page size case
We never do subpage compression, a range can only be submitted for
compression if the range is fully page aligned.
This change makes the subpage behavior mostly the same as non-subpage
cases.
[ENHANCEMENT]
Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
always wait for writeback flags.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: compression: add an ASSERT() to ensure the read-in length is sane
There are already two bugs (one in zlib, one in zstd) that involved
compression path is not handling sector size < page size cases well.
So it makes more sense to make sure that btrfs_compress_folios() returns
Since we already have two bugs (one in zlib, one in zstd) in the
compression path resulting the @total_in be to larger than the
to-be-compressed range length, there is enough reason to add an ASSERT()
to make sure the total read-in length doesn't exceed the input length.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: zlib: make the compression path to handle sector size < page size
Inside zlib_compress_folios(), each time we switch the input page cache,
the @start is increased by PAGE_SIZE.
But for the incoming compression support for sector size < page size
(previously we support compression only when the range is fully page
aligned), this is not going to handle the following case:
0 32K 64K 96K
| |///////////||///////////|
@start has the initial value 32K, indicating the start filepos of the
to-be-compressed range.
And when grabbing the first page as input, we always call "start +=
PAGE_SIZE;".
But since @start is starting at 32K, it will be increased by 64K,
resulting it to be 96K for the next range, causing incorrect input range
and corruption for the future subpage compression.
Fix it by only increase @start by the input size.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This happens because on RAID stripe-tree filesystems we recurse back into
btrfs_map_block() on scrub to perform the logical to device physical
mapping.
But as the device replace task is already holding the dev_replace::rwsem
we deadlock.
So don't take the dev_replace::rwsem in case our task is the task performing
the device replace.
Suggested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 10 Nov 2024 22:16:28 +0000 (14:16 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of Qualcomm clk driver fixes:
- Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks
- Fix alpha PLL post_div mask for the cases where width is not
specified
- Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
trigger feature on the video clocks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
Linus Torvalds [Sun, 10 Nov 2024 22:13:05 +0000 (14:13 -0800)]
Merge tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c-host fixes for v6.12-rc7 (from Andi):
- Fix designware incorrect behavior when concluding a transmission
- Fix Mule multiplexer error value evaluation"
* tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
i2c: muxes: Fix return value check in mule_i2c_mux_probe()
If the caller supplies an iocb->ki_pos value that is close to the
filesystem upper limit, and an iterator with a count that causes us to
overflow that limit, then filemap_read() enters an infinite loop.
This behaviour was discovered when testing xfstests generic/525 with the
"localio" optimisation for loopback NFS mounts.
Reported-by: Mike Snitzer <snitzer@kernel.org> Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Tested-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 Nov 2024 17:37:47 +0000 (09:37 -0800)]
Merge tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Make sure GICv3 controller interrupt activation doesn't race with a
concurrent deactivation due to propagation delays of the register
write
* tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Force propagation of the active state with a read-back
Linus Torvalds [Sun, 10 Nov 2024 17:04:27 +0000 (09:04 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons"
* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Thorsten Blum
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
signal: restore the override_rlimit logic
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
ucounts: fix counter leak in inc_rlimit_get_ucounts()
selftests: hugetlb_dio: check for initial conditions to skip in the start
mm: fix docs for the kernel parameter ``thp_anon=``
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
mm/damon/core: handle zero schemes apply interval
mm/damon/core: handle zero {aggregation,ops_update} intervals
mm/mlock: set the correct prev on failure
objpool: fix to make percpu slot allocation more robust
mm/page_alloc: keep track of free highatomic
mm: resolve faulty mmap_region() error path behaviour
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: refactor map_deny_write_exec()
mm: unconditionally close VMAs on error
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm/thp: fix deferred split unqueue naming and locking
mm/thp: fix deferred split queue not partially_mapped
Linus Torvalds [Sun, 10 Nov 2024 16:56:48 +0000 (08:56 -0800)]
Merge tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small remaining USB and Thunderbolt fixes and device ids
for 6.12-rc7. Included in here are:
- new USB serial driver device ids
- thunderbolt driver fixes for reported problems
- typec bugfixes
- dwc3 driver fix
- musb driver fix
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: qcserial: add support for Sierra Wireless EM86xx
thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
usb: dwc3: fix fault at system suspend if device was already runtime suspended
usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier
usb: musb: sunxi: Fix accessing an released usb phy
USB: serial: io_edgeport: fix use after free in debug printk
USB: serial: option: add Quectel RG650V
USB: serial: option: add Fibocom FG132 0x0112 composition
thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
Linus Torvalds [Sun, 10 Nov 2024 16:53:24 +0000 (08:53 -0800)]
Merge tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small memory leak fixes for the vchiq_arm staging driver
that have been sitting in my tree for weeks and should get merged for
6.12-rc7 so that people don't keep tripping over them.
They both have been in linux-next for a while with no reported
problems"
* tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation
staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation