Josef Bacik [Tue, 7 May 2024 18:12:05 +0000 (14:12 -0400)]
btrfs: push lookup_info into struct walk_control
Instead of using a flag we're passing around everywhere, add a field to
walk_control that we're already passing around everywhere and use that
instead.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 7 May 2024 18:12:04 +0000 (14:12 -0400)]
btrfs: use btrfs_read_extent_buffer() in do_walk_down()
Currently if our extent buffer isn't uptodate we will drop the lock,
free it, and then call read_tree_block() for the bytenr. This is
inefficient, we already have the extent buffer, we can simply call
btrfs_read_extent_buffer().
Merge these two cases down into one if statement, if we are not uptodate
we can drop the lock, trigger readahead, and do the read using
btrfs_read_extent_buffer(), and carry on.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 7 May 2024 18:12:03 +0000 (14:12 -0400)]
btrfs: remove all extra btrfs_check_eb_owner() calls
Currently we have a handful of btrfs_check_eb_owner() calls in various
places and helpers that read extent buffers. However we call this in
the endio handler for every metadata block, so these extra checks are
unnecessary, simply remove them from everywhere except the endio
handler.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 7 May 2024 18:12:02 +0000 (14:12 -0400)]
btrfs: don't do extra find_extent_buffer() in do_walk_down()
We do find_extent_buffer(), and then if we don't find the eb in cache we
call btrfs_find_create_tree_block(), which calls find_extent_buffer()
first and then allocates the extent buffer.
The reason we're doing this is because if we don't find the extent
buffer in cache we set reada = 1. However this doesn't matter, because
lower down we only trigger reada if !btrfs_buffer_uptodate(eb), which is
what the case would be if we didn't find the extent buffer in cache and
had to allocate it.
Clean this up to simply call btrfs_find_create_tree_block(), and then
use the fact that we're having to read the extent buffer off of disk to
go ahead and kick off readahead.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 11 Jun 2024 10:44:33 +0000 (11:44 +0100)]
btrfs: avoid transaction commit on any fsync after subvolume creation
As of commit 1b53e51a4a8f ("btrfs: don't commit transaction for every
subvol create") we started to make any fsync after creating a subvolume
to fallback to a transaction commit if the fsync is performed in the
same transaction that was used to create the subvolume. This happens
with the following at ioctl.c:create_subvol():
$ cat fs/btrfs/ioctl.c
(...)
/* Tree log can't currently deal with an inode which is a new root. */
btrfs_set_log_full_commit(trans);
(...)
Note that the comment is misleading as the problem is not that fsync can
not deal with the root inode of a new root, but that we can not log any
inode that belongs to a root that was not yet persisted because that would
make log replay fail since the root doesn't exist at log replay time.
The above simply makes any fsync fallback to a full transaction commit if
it happens in the same transaction used to create the subvolume - even if
it's an inode that belongs to any other subvolume. This is a brute force
solution and it doesn't necessarily improve performance for every workload
out there - it just moves a full transaction commit from one place, the
subvolume creation, to another - an fsync for any inode.
Just improve on this by making the fallback to a transaction commit only
for an fsync against an inode of the new subvolume, or for the directory
that contains the dentry that points to the new subvolume (in case anyone
attempts to fsync the directory in the same transaction).
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 6 Jun 2024 08:20:18 +0000 (09:20 +0100)]
btrfs: remove pointless code when creating and deleting a subvolume
When creating and deleting a subvolume, after starting a transaction we
are explicitly calling btrfs_record_root_in_trans() for the root which we
passed to btrfs_start_transaction(). This is pointless because at
transaction.c:start_transaction() we end up doing that call, regardless
of whether we actually start a new transaction or join an existing one,
and if we were not it would mean the root item of that root would not
be updated in the root tree when committing the transaction, leading to
problems easy to spot with fstests for example.
Remove these redundant calls. They were introduced with commit 74e97958121a ("btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume
operations").
Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 5 Jun 2024 14:45:48 +0000 (16:45 +0200)]
btrfs: pass reloc_control to setup_relocation_extent_mapping()
All parameters passed into setup_relocation_extent_mapping() can be
derived from 'struct reloc_control', so only pass in a 'struct
reloc_control'.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 5 Jun 2024 14:15:21 +0000 (16:15 +0200)]
btrfs: pass a struct reloc_control to prealloc_file_extent_cluster()
Pass a 'struct reloc_control' to prealloc_file_extent_cluster()
instead of passing its members 'data_inode' and 'cluster' on their own.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 5 Jun 2024 12:55:54 +0000 (14:55 +0200)]
btrfs: don't pass fs_info to describe_relocation()
In describe_relocation() the fs_info is only needed for printing
information via btrfs_info() and can easily be accessed via the passed
in 'struct btrfs_block_group'.
So we can safely remove the fs_info parameter.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Tue, 4 Jun 2024 15:32:37 +0000 (17:32 +0200)]
btrfs: pass a reloc_control to relocate_one_folio()
Pass a struct reloc_control to relocate_one_folio, instead of passing
it's members data_inode and cluster as separate arguments to the function.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Tue, 4 Jun 2024 11:28:25 +0000 (13:28 +0200)]
btrfs: pass a reloc_control to relocate_file_extent_cluster()
Instead of passing in a reloc_control's data_inode and
file_extent_cluster members, pass in the whole reloc_control structure.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Tue, 4 Jun 2024 11:15:34 +0000 (13:15 +0200)]
btrfs: pass reloc_control to relocate_data_extent()
Pass a 'struct reloc_control' to relocate_data_extent() instead of
it's data_inode and file_extent_cluster separately.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 16:54:36 +0000 (17:54 +0100)]
btrfs: update panic message when splitting ordered extent
During ordered extent splitting if we find a duplicated ordered extent
when attempting to insert the new ordered extent we panic but with a
message that has the "zoned:" prefix. This is because the splitting used
to be exclusive for zoned filesystems, but as of commit b73a6fd1b1ef
("btrfs: split partial dio bios before submit") it can also be done for
non zoned filesystems during direct IO writes. So remove the "zoned:"
prefix from the message and mention the split to make it more specific
and different from the panic message at insert_ordered_extent().
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 16:20:30 +0000 (17:20 +0100)]
btrfs: mark ordered extent insertion failure checks as unlikely
We never expect an ordered extent insertion to fail due to already having
another ordered extent in the tree for the same file offset, since we
always wait for existing ordered extents in a range to complete before
writing into the range again. So mark the failure checks for the results
of tree_insert() as unlikely, to make it clear it's never expected (save
exceptional causes like bugs or memory corruptions) and to serve as a hint
for the compiler to possibly generate better code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 16:02:26 +0000 (17:02 +0100)]
btrfs: avoid removal and re-insertion of split ordered extent
At btrfs_split_ordered_extent(), we are removing and re-inserting the
ordered extent that we are trimming, but we don't need to since the
trimming doesn't change its position in the red black tree because we
don't have overlapping ordered extents (that would imply double allocation
of extents) and we know the split length is smaller than the ordered
extent's num_bytes field (we checked that early in the function).
So drop the remove and re-insert code for the slit ordered extent.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 15:50:31 +0000 (16:50 +0100)]
btrfs: add comment about locking to btrfs_split_ordered_extent()
There are subtle details about why the root's ordered_extent_lock is held,
so add a comment mentioning them.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 12:30:35 +0000 (13:30 +0100)]
btrfs: reduce critical section at btrfs_wait_ordered_extents()
At btrfs_wait_ordered_extents(), there's no point in updating the counters
after locking the root's ordered extent lock, as the counters are local.
So change this to update the counters before taking the lock.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 3 Jun 2024 12:25:00 +0000 (13:25 +0100)]
btrfs: reduce critical section at btrfs_wait_ordered_roots()
At btrfs_wait_ordered_roots(), there's no point in decrementing the
counter after locking fs_info->ordered_root_lock as the counter is local.
So change this to decrement the counter before taking the lock.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 30 May 2024 17:14:12 +0000 (19:14 +0200)]
btrfs: constify pointer parameters where applicable
We can add const to many parameters, this is for clarity and minor
addition to safety. There are some minor effects, in the assembly
code and .ko measured on release config. This patch does not cover all
possible conversions.
Qu Wenruo [Tue, 28 May 2024 05:26:13 +0000 (14:56 +0930)]
btrfs: cleanup recursive include of the same header
We have several headers that are including themselves, triggering clangd
warnings.
Such includes are caused by commit 602035d7fecf ("btrfs: add forward
declarations and headers, part 2").
Just remove such unnecessary include.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Junchao Sun [Tue, 28 May 2024 06:23:43 +0000 (14:23 +0800)]
btrfs: qgroup: delete a TODO about using kmem cache to allocate structures
Generic slab works fine allocating btrfs_qgroup_extent_record
structures. It's not necessary to create a dedicated kmem cache that
would be created but unused if quotas were not enabled. Let's delete the
TODO line.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Junchao Sun <sunjunchao2870@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 5 Mar 2024 06:57:58 +0000 (17:27 +1030)]
btrfs: make extent_write_locked_range() handle subpage writeback correctly
When extent_write_locked_range() generated an inline extent, it would
set and finish the writeback for the whole page.
Although currently it's safe since subpage disables inline creation,
for the sake of consistency, let it go with subpage helpers to set and
clear the writeback flags.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The dmesg includes the following rsv leak detection warning (all call
trace skipped):
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8653 btrfs_destroy_inode+0x1e0/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8654 btrfs_destroy_inode+0x1a8/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8660 btrfs_destroy_inode+0x1a0/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): last unmount of filesystem 1b4abba9-de34-4f07-9e7f-157cf12a18d6
------------[ cut here ]------------
WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): space_info DATA has 268218368 free, is not full
BTRFS info (device sda): space_info total=268435456, used=204800, pinned=0, reserved=0, may_use=12288, readonly=0 zone_unusable=0
BTRFS info (device sda): global_block_rsv: size 0 reserved 0
BTRFS info (device sda): trans_block_rsv: size 0 reserved 0
BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
------------[ cut here ]------------
WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): space_info METADATA has 267796480 free, is not full
BTRFS info (device sda): space_info total=268435456, used=131072, pinned=0, reserved=0, may_use=262144, readonly=0 zone_unusable=245760
BTRFS info (device sda): global_block_rsv: size 0 reserved 0
BTRFS info (device sda): trans_block_rsv: size 0 reserved 0
BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
append size of 64K, and the system has 64K page size.
[CAUSE]
I have added several trace_printk() to show the events (header skipped):
> btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688
> btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288
> btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536
> btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The above lines show our buffered write has dirtied 3 pages of inode
259 of root 5:
704K 768K 832K 896K
I |////I/////////////////I///////////| I
756K 868K
|///| is the dirtied range using subpage bitmaps. and 'I' is the page
boundary.
Meanwhile all three pages (704K, 768K, 832K) have their PageDirty
flag set.
> btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400
Then direct IO write starts, since the range [680K, 780K) covers the
beginning part of the above dirty range, we need to writeback the
two pages at 704K and 768K.
Now the above 2 lines show that we're writing back for dirty range
[756K, 756K + 64K).
We only writeback 64K because the zoned device has max zone append size
as 64K.
> extent_write_locked_range: r/i=5/259 clear dirty for page=786432
!!! The above line shows the root cause. !!!
We're calling clear_page_dirty_for_io() inside extent_write_locked_range(),
for the page 768K.
This is because extent_write_locked_range() can go beyond the current
locked page, here we hit the page at 768K and clear its page dirt.
In fact this would lead to the desync between subpage dirty and page
dirty flags. We have the page dirty flag cleared, but the subpage range
[820K, 832K) is still dirty.
After the writeback of range [756K, 820K), the dirty flags look like
this, as page 768K no longer has dirty flag set.
704K 768K 832K 896K
I I | I/////////////| I
820K 868K
This means we will no longer writeback range [820K, 832K), thus the
reserved data/metadata space would never be properly released.
Now we writeback the remaining dirty range, which is [832K, 868K).
Causing the range [820K, 832K) never to be submitted, thus leaking the
reserved space.
This bug only affects subpage and zoned case. For non-subpage and zoned
case, we have exactly one sector for each page, thus no such partial dirty
cases.
For subpage and non-zoned case, we never go into run_delalloc_cow(), and
normally all the dirty subpage ranges would be properly submitted inside
__extent_writepage_io().
[FIX]
Just do not clear the page dirty at all inside extent_write_locked_range().
As __extent_writepage_io() would do a more accurate, subpage compatible
clear for page and subpage dirty flags anyway.
Now the correct trace would look like this:
> btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688
> btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288
> btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536
> btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
Qu Wenruo [Sun, 18 Feb 2024 06:39:32 +0000 (17:09 +1030)]
btrfs: lock subpage ranges in one go for writepage_delalloc()
If we have a subpage range like this for a 16K page with 4K sectorsize:
0 4K 8K 12K 16K
|/////| |//////| |
|/////| = dirty range
Currently writepage_delalloc() would go through the following steps:
- lock range [0, 4K)
- run delalloc range for [0, 4K)
- lock range [8K, 12K)
- run delalloc range for [8K 12K)
So far it's fine for regular subpage writeback, as
btrfs_run_delalloc_range() can only go into one of run_delalloc_nocow(),
cow_file_range() and run_delalloc_compressed().
But there is a special case for zoned subpage, where we will go
through run_delalloc_cow(), which would create the ordered extent for the
range and immediately submit the range.
This would unlock the whole page range, causing all kinds of different
ASSERT()s related to locked page.
Address the page unlocking problem of run_delalloc_cow(), by changing
the workflow to the following one:
- lock range [0, 4K)
- lock range [8K, 12K)
- run delalloc range for [0, 4K)
- run delalloc range for [8K, 12K)
So that run_delalloc_cow() can only unlock the full page until the
last lock user released.
To do that:
- Utilize subpage locked bitmap
So for every delalloc range we found, call
btrfs_folio_set_writer_lock() to populate the subpage locked bitmap,
and later btrfs_folio_end_all_writers() if the page is fully unlocked.
So we know there is a delalloc range that needs to be run later.
- Save the @delalloc_end as @last_delalloc_end inside writepage_delalloc()
Since subpage locked bitmap is only for ranges inside the page,
meanwhile we can have delalloc range ends beyond our page boundary,
we have to save the @last_delalloc_end just in case it's beyond our
page boundary.
Although there is one extra point to notice:
- We need to handle errors in previous iteration
Since we can have multiple locked delalloc ranges we have to call
run_delalloc_ranges() multiple times.
If we hit an error half way, we still need to unlock the remaining
ranges.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 19 Feb 2024 02:43:24 +0000 (13:13 +1030)]
btrfs: subpage: introduce helpers to handle subpage delalloc locking
Three new helpers are introduced for the incoming subpage delalloc locking
change.
- btrfs_folio_set_writer_lock()
This is to mark specified range with subpage specific writer lock.
After calling this, the subpage range can be proper unlocked by
btrfs_folio_end_writer_lock()
- btrfs_subpage_find_writer_locked()
This is to find the writer locked subpage range in a page.
With the help of btrfs_folio_set_writer_lock(), it can allow us to
record and find previously locked subpage range without extra memory
allocation.
- btrfs_folio_end_all_writers()
This is for the locked_page of __extent_writepage(), as there may be
multiple subpage delalloc ranges locked.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 16 Feb 2024 04:03:41 +0000 (14:33 +1030)]
btrfs: make __extent_writepage_io() to write specified range only
Function __extent_writepage_io() is designed to find all dirty ranges of
a page, and add the dirty ranges to the bio_ctrl for submission.
It requires all the dirtied ranges to be covered by an ordered extent.
It gets called in two locations, but one call site is not subpage aware:
- __extent_writepage()
It gets called when writepage_delalloc() returned 0, which means
writepage_delalloc() has handled delalloc for all subpage sectors
inside the page.
So this call site is OK.
- extent_write_locked_range()
This call site is utilized by zoned support, and in this case, we may
only run delalloc range for a subset of the page, like this: (64K page
size)
0 16K 32K 48K 64K
|/////| |///////| |
In the above case, if extent_write_locked_range() is only triggered for
range [0, 16K), __extent_writepage_io() would still try to submit
the dirty range of [32K, 48K), then it would not find any ordered
extent for it and triggers various ASSERT()s.
Fix this problem by:
- Introducing @start and @len parameters to specify the range
For the first call site, we just pass the whole page, and the behavior
is not touched, since run_delalloc_range() for the page should have
created all ordered extents for the page.
For the second call site, we avoid touching anything beyond the
range, thus avoiding the dirty range which is not yet covered by any
delalloc range.
- Making btrfs_folio_assert_not_dirty() subpage aware
The only caller is inside __extent_writepage_io(), and since that
caller now accepts a subpage range, we should also check the subpage
range other than the whole page.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Fri, 10 May 2024 07:10:50 +0000 (15:10 +0800)]
btrfs: rename ret to err in btrfs_recover_relocation()
In the function btrfs_recover_relocation(), currently the variable 'err'
carries the return value and 'ret' holds the intermediary return value.
However, in some lines, we don't need this two-step approach; we can
directly use 'err'. So, optimize them, which requires reinitializing 'err'
to zero at two locations.
This is a preparatory patch to fix the code style by renaming 'err'
to 'ret'.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Tue, 19 Mar 2024 14:55:09 +0000 (20:25 +0530)]
btrfs: rename err to ret in btrfs_cleanup_fs_roots()
Since err represents the function return value, rename it as ret,
and rename the original ret, which serves as a helper return value,
to found. Also, optimize the code to continue call btrfs_put_root()
for the rest of the root if even after btrfs_orphan_cleanup() returns
error.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 3 May 2024 03:32:00 +0000 (13:02 +0930)]
btrfs: cleanup duplicated parameters related to create_io_em()
Most parameters of create_io_em() can be replaced by the members with
the same name inside btrfs_file_extent.
Do a direct parameters cleanup here.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 3 May 2024 03:19:57 +0000 (12:49 +0930)]
btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent
All parameters after @filepos of btrfs_alloc_ordered_extent() can be
replaced with btrfs_file_extent structure.
This patch does the cleanup, meanwhile some points to note:
- Move btrfs_file_extent structure to ordered-data.h
The structure is needed by both btrfs_alloc_ordered_extent() and
can_nocow_extent(), but since btrfs_inode.h includes
ordered-data.h, so we need to move the structure to ordered-data.h.
- Move the special handling of NOCOW/PREALLOC into
btrfs_alloc_ordered_extent()
This is to allow btrfs_split_ordered_extent() to properly split them
for DIO.
For now just move the handling into btrfs_alloc_ordered_extent() to
simplify the callers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 3 May 2024 00:05:21 +0000 (09:35 +0930)]
btrfs: cleanup duplicated parameters related to can_nocow_file_extent_args
The following functions and structures can be simplified using the
btrfs_file_extent structure:
- can_nocow_extent()
No need to return ram_bytes/orig_block_len through the parameter list,
the @file_extent parameter contains all the needed info.
- can_nocow_file_extent_args
The following members are no longer needed:
* disk_bytenr
This one is confusing as it's not really the
btrfs_file_extent_item::disk_bytenr, but where the IO would be,
thus it's file_extent::disk_bytenr + file_extent::offset now.
* num_bytes
Now file_extent::num_bytes.
* extent_offset
Now file_extent::offset.
* disk_num_bytes
Now file_extent::disk_num_bytes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The member extent_map::block_start can be calculated from
extent_map::disk_bytenr + extent_map::offset for regular extents.
And otherwise just extent_map::disk_bytenr.
And this is already validated by the validate_extent_map(). Now we can
remove the member.
However there is a special case in btrfs_create_dio_extent() where we
for NOCOW/PREALLOC ordered extents cannot directly use the resulting
btrfs_file_extent, as btrfs_split_ordered_extent() cannot handle them
yet.
So for that call site, we pass file_extent->disk_bytenr +
file_extent->num_bytes as disk_bytenr for the ordered extent, and 0 for
offset.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The extent_map::block_len is either extent_map::len (non-compressed
extent) or extent_map::disk_num_bytes (compressed extent).
Since we already have sanity checks to do the cross-checks between the
new and old members, we can drop the old extent_map::block_len now.
For most call sites, they can manually select extent_map::len or
extent_map::disk_num_bytes, since most if not all of them have checked
if the extent is compressed.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Since we have extent_map::offset, the old extent_map::orig_start is just
extent_map::start - extent_map::offset for non-hole/inline extents.
And since the new extent_map::offset is already verified by
validate_extent_map() while the old orig_start is not, let's just remove
the old member from all call sites.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: introduce extra sanity checks for extent maps
Since extent_map structure has the all the needed members to represent a
file extent directly, we can apply all the file extent sanity checks to
an extent map.
The new sanity checks will cross check both the old members
(block_start/block_len/orig_start) and the new members
(disk_bytenr/disk_num_bytes/offset).
There is a special case for offset/orig_start/start cross check, we only
do such sanity check for compressed extent, as only compressed
read/encoded write really utilize orig_start.
This can be proved by the cleanup patch of orig_start.
The checks happens at the following times:
- add_extent_mapping()
This is for newly added extent map
- replace_extent_mapping()
This is for btrfs_drop_extent_map_range() and split_extent_map()
- try_merge_map()
For a lot of call sites we have to properly populate all the members to
pass the sanity check, meanwhile the following code needs extra
modification:
- setup_file_extents() from inode-tests
The file extents layout of setup_file_extents() is already too invalid
that tree-checker would reject most of them in real world.
However there is just a special unaligned regular extent which has
mismatched disk_num_bytes (4096) and ram_bytes (4096 - 1).
So instead of dropping the whole test case, here we just unify
disk_num_bytes and ram_bytes to 4096 - 1.
- test_case_7() from extent-map-tests
An extent is inserted with 16K length, but on-disk extent size is
only 4K.
This means it must be a compressed extent, so set the compressed flag
for it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Both are matching the members with the same name inside
btrfs_file_extent_items.
For now this patch only touches those members when:
- Reading btrfs_file_extent_items from disk
- Inserting new holes
- Merging two extent maps
With the new disk_bytenr and disk_num_bytes, doing merging would be a
little more complex, as we have 3 different cases:
* Both extent maps are referring to the same data extents
|<----- data extent A ----->|
|<- em 1 ->|<- em 2 ->|
* Both extent maps are referring to different data extents
|<-- data extent A -->|<-- data extent B -->|
|<- em 1 ->|<- em 2 ->|
* One of the extent maps is referring to a merged and larger data
extent that covers both extent maps
This is not really valid case other than some selftests.
So this test case would be removed.
A new helper merge_ondisk_extents() is introduced to handle the above
valid cases.
To properly assign values for those new members, a new btrfs_file_extent
parameter is introduced to all the involved call sites.
- For NOCOW writes the btrfs_file_extent would be exposed from
can_nocow_file_extent().
- For other writes, the members can be easily calculated
As most of them have 0 offset and utilizing the whole on-disk data
extent.
The exception is encoded write, but thankfully that interface provided
offset directly and all other needed info.
For now, both the old members (block_start/block_len/orig_start) are
co-existing with the new members (disk_bytenr/offset), meanwhile all the
critical code is still using the old members only.
The cleanup will happen later after all the old and new members are
properly validated.
There would be some re-ordering for the assignment of the extent_map
members, now we follow the new ordering:
- start and len
Or file_pos and num_bytes for other structures.
- disk_bytenr and disk_num_bytes
- offset and ram_bytes
- compression
So expect some seemingly unrelated line movement.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: export the expected file extent through can_nocow_extent()
Currently function can_nocow_extent() only returns members needed for
extent_map.
However since we will soon change the extent_map structure to be more
like btrfs_file_extent_item, we want to expose the expected file extent
caused by the NOCOW write for future usage.
This introduces a new structure, btrfs_file_extent, to be a more
memory access friendly representation of btrfs_file_extent_item.
And use that structure to expose the expected file extent caused by the
NOCOW write.
For now there is no user of the new structure yet.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: rename extent_map::orig_block_len to disk_num_bytes
This would make it very obvious that the member just matches
btrfs_file_extent_item::disk_num_bytes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 22 May 2024 14:29:05 +0000 (15:29 +0100)]
btrfs: move fiemap code into its own file
Currently the core of the fiemap code lives in extent_io.c, which does
not make any sense because it's not related to extent IO at all (and it
was not as well before the big rewrite of fiemap I did some time ago).
The entry point for fiemap, btrfs_fiemap(), lives in inode.c since it's
an inode operation.
Since there's a significant amount of fiemap code, move all of it into a
dedicated file, including its entry point inode.c:btrfs_fiemap().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 22 May 2024 08:33:32 +0000 (09:33 +0100)]
btrfs: send: get rid of the label and gotos at ensure_commit_roots_uptodate()
Now that there is a helper to commit the current transaction and we are
using it, there's no need for the label and goto statements at
ensure_commit_roots_uptodate(). So replace them with direct return
statements that call btrfs_commit_current_transaction().
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 22 May 2024 08:26:44 +0000 (09:26 +0100)]
btrfs: add and use helper to commit the current transaction
We have several places that attach to the current transaction with
btrfs_attach_transaction_barrier() and then commit the transaction if
there is one. Add a helper and use it to deduplicate this pattern.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 21 May 2024 16:08:06 +0000 (17:08 +0100)]
btrfs: scrub: avoid create/commit empty transaction at finish_extent_writes_for_zoned()
At finish_extent_writes_for_zoned() we use btrfs_join_transaction() to
catch any running transaction and then commit it. This will however create
a new and empty transaction in case there's no running transaction anymore
(got committed by the transaction kthread or other task for example) or
there's a running transaction finishing its commit and with a state >=
TRANS_STATE_UNBLOCKED. In the former case we don't need to do anything
while in the second case we just need to wait for the transaction to
complete its commit.
So improve this by using btrfs_attach_transaction_barrier() instead, which
does not create a new transaction if there's none running, and if there's
a current transaction that is committing, it will wait for it to fully
commit and not create a new transaction. This helps avoiding creating and
committing empty transactions, saving IO, time and unnecessary rotation of
the backup roots in the super block.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 21 May 2024 10:57:37 +0000 (11:57 +0100)]
btrfs: send: avoid create/commit empty transaction at ensure_commit_roots_uptodate()
At ensure_commit_roots_uptodate() we use btrfs_join_transaction() to
catch any running transaction and then commit it. This will however create
a new and empty transaction in case there's no running transaction anymore
(got committed by the transaction kthread or other task for example) or
there's a running transaction finishing its commit and with a state >=
TRANS_STATE_UNBLOCKED. In the former case we don't need to do anything
while in the second case we just need to wait for the transaction to
complete its commit.
So improve this by using btrfs_attach_transaction_barrier() instead, which
does not create a new transaction if there's none running, and if there's
a current transaction that is committing, it will wait for it to fully
commit and not create a new transaction. This helps avoiding creating and
committing empty transactions, saving IO, time and unnecessary rotation of
the backup roots in the super block.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 21 May 2024 10:20:54 +0000 (11:20 +0100)]
btrfs: send: make ensure_commit_roots_uptodate() simpler and more efficient
Before starting a send operation we have to make sure that every root has
its commit root matching the regular root, to that send doesn't find stale
inodes in the commit root (inodes that were deleted in the regular root)
and fails the inode lookups with -ESTALE.
Currently we keep looking for roots used by the send operation and as soon
as we find one we commit the current transaction (or a new one since
btrfs_join_transaction() creates one if there isn't any running or the
running one is in a state >= TRANS_STATE_UNBLOCKED). It's pointless to
keep looking until we don't find any, because after the first transaction
commit all the other roots are updated too, as they were already tagged in
the fs_info->fs_roots_radix radix tree when they were modified in order to
have a commit root different from the regular root.
Currently we are also always passing the main send root into
btrfs_join_transaction(), which despite not having any functional issue,
it is not optimal because in case the root wasn't modified we end up
adding it to fs_info->fs_roots_radix and then update its root item in the
root tree when committing the transaction, causing unnecessary work.
So simplify and make this more efficient by removing the looping and by
passing the first root we found that is modified as the argument to
btrfs_join_transaction().
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 21 May 2024 09:45:27 +0000 (10:45 +0100)]
btrfs: avoid create and commit empty transaction when committing super
At btrfs_commit_super(), called in a few contexts such as when unmounting
a filesystem, we use btrfs_join_transaction() to catch any running
transaction and then commit it. This will however create a new and empty
transaction in case there's no running transaction or there's a running
transaction with a state >= TRANS_STATE_UNBLOCKED.
As we just want to be sure that any existing transaction is fully
committed, we can use btrfs_attach_transaction_barrier() instead of
btrfs_join_transaction(), therefore avoiding the creation and commit of
empty transactions, which only waste IO and causes rotation of the
precious backup roots.
Example where we create and commit a pointless empty transaction:
The transaction with id 8 was pointless, an empty transaction that did
not achieve anything.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 20 May 2024 12:21:12 +0000 (13:21 +0100)]
btrfs: qgroup: avoid start/commit empty transaction when flushing reservations
When flushing reservations we are using btrfs_join_transaction() to get a
handle for the current transaction and then commit it to try to release
space. However btrfs_join_transaction() has some undesirable consequences:
1) If there's no running transaction, it will create one, and we will
commit it right after. This is unnecessary because it will not release
any space, and it will result in unnecessary IO and rotation of backup
roots in the superblock;
2) If there's a current transaction and that transaction is committing
(its state is >= TRANS_STATE_COMMIT_DOING), it will wait for that
transaction to almost finish its commit (for its state to be >=
TRANS_STATE_UNBLOCKED) and then start and return a new transaction.
We will then commit that new transaction, which is pointless because
all we wanted was to wait for the current (previous) transaction to
fully finish its commit (state == TRANS_STATE_COMPLETED), and by
starting and committing a new transaction we are wasting IO too and
causing unnecessary rotation of backup roots in the superblock.
So improve this by using btrfs_attach_transaction_barrier() instead, which
does not create a new transaction if there's none running, and if there's
a current transaction that is committing, it will wait for it to fully
commit and not create a new transaction.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently if we fully clean a subvolume (not only delete its directory,
but fully clean all it's related data and root item), the associated
qgroup would not be removed.
We have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
Change the behavior to automatically removie the qgroup of a fully
cleaned subvolume when possible:
- Full qgroup but still consistent
We can and should remove the qgroup.
The qgroup numbers should be 0, without any rsv.
- Full qgroup but inconsistent
Can happen with drop_subtree_threshold feature (skip accounting
and mark qgroup inconsistent).
We can and should remove the qgroup.
Higher level qgroup numbers will be incorrect, but since qgroup
is already inconsistent, it should not be a problem.
- Squota mode
This is the special case, we can only drop the qgroup if its numbers
are all 0.
This would be handled by can_delete_qgroup(), so we only need to check
the return value and ignore the -EBUSY error.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1222847 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: slightly loosen the requirement for qgroup removal
[BUG]
Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
and a snapshot with level higher than that value is dropped, we will
not be able to delete the qgroup until next qgroup rescan:
The final qgroup removal would fail with the following error:
ERROR: unable to destroy quota group: Device or resource busy
[CAUSE]
The above script would generate a subvolume of level 2, then snapshot
it, enable qgroup, set the drop_subtree_threshold, then drop the
snapshot.
Since the subvolume drop would meet the threshold, qgroup would be
marked inconsistent and skip accounting to avoid hanging the system at
transaction commit.
But currently we do not allow a qgroup with any rfer/excl numbers to be
dropped, and this is not really compatible with the new
drop_subtree_threshold behavior.
[FIX]
Only require the strict zero rfer/excl/rfer_cmpr/excl_cmpr for squota
mode. This is due to the fact that squota can never go inconsistent,
and it can have dropped subvolume but with non-zero qgroup numbers for
future accounting.
For full qgroup mode, we only check if there is a subvolume for it.
Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 20 May 2024 18:49:18 +0000 (20:49 +0200)]
btrfs: constify parameters of write_eb_member() and its users
Reported by 'gcc -Wcast-qual', the argument from which write_extent_buffer()
reads data to write to the eb should be const. In addition the const
needs to be also added to __write_extent_buffer() local buffers.
All callers of write_eb_member() can now be updated to use const for the
input buffer structure or type.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 20 May 2024 18:19:53 +0000 (20:19 +0200)]
btrfs: remove unused define EXTENT_SIZE_PER_ITEM
This was added in c61a16a701a126 ("Btrfs: fix the confusion between
delalloc bytes and metadata bytes") and removed in 03fe78cc2942c5
("btrfs: use delalloc_bytes to determine flush amount for
shrink_delalloc") where the calculation was reworked to use a
non-constant numbers. This was found by 'make W=2'.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 20 May 2024 17:46:44 +0000 (19:46 +0200)]
btrfs: use for-local variables that shadow function variables
We've started to use for-loop local variables and in a few places this
shadows a function variable. Convert a few cases reported by 'make W=2'.
If applicable also change the style to post-increment, that's the
preferred one.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 20 May 2024 17:49:17 +0000 (19:49 +0200)]
btrfs: rename macro local variables that clash with other variables
Fix variable names in two macros where there's a local function variable
of the same name. In subpage_calc_start_bit() it's in several callers,
in btrfs_abort_transaction() it's only in replace_file_extents().
Found by 'make W=2'.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 20 May 2024 17:40:26 +0000 (19:40 +0200)]
btrfs: remove duplicate name variable declarations
When running 'make W=2' there are a few reports where a variable of the
same name is declared in a nested block. In all the cases we can use the
one declared in the parent block, no problematic cases were found.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sat, 18 May 2024 17:22:03 +0000 (18:22 +0100)]
btrfs: use a btrfs_inode local variable at btrfs_sync_file()
Instead of using a VFS inode local pointer and then doing many BTRFS_I()
calls inside btrfs_sync_file(), use a btrfs_inode pointer instead. This
makes everything a bit easier to read and less confusing, allowing to
make some statements shorter.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sat, 18 May 2024 17:14:06 +0000 (18:14 +0100)]
btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()
Instead of passing a (VFS) inode pointer argument, pass a btrfs_inode
instead, as this is generally what we do for internal APIs, making it
more consistent with most of the code base. This will later allow to
help to remove a lot of BTRFS_I() calls in btrfs_sync_file().
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sat, 18 May 2024 17:01:47 +0000 (18:01 +0100)]
btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()
Instead of passing a (VFS) inode pointer argument, pass a btrfs_inode
instead, as this is generally what we do for internal APIs, making it
more consistent with most of the code base. This will later allow to
help to remove a lot of BTRFS_I() calls in btrfs_sync_file().
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sat, 18 May 2024 13:09:41 +0000 (14:09 +0100)]
btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx)
Instead of using a inode pointer, use a btrfs_inode pointer in the log
context structure, as this is generally what we need and allows for some
internal APIs to take a btrfs_inode instead, making them more consistent
with most of the code base. This will later allow to help to remove a lot
of BTRFS_I() calls in btrfs_sync_file().
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 15 May 2024 12:54:14 +0000 (13:54 +0100)]
btrfs: make btrfs_finish_ordered_extent() return void
Currently btrfs_finish_ordered_extent() returns a boolean indicating if
the ordered extent was added to the work queue for completion, but none
of its callers cares about it, so make it return void.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Sun, 19 May 2024 00:14:56 +0000 (08:14 +0800)]
btrfs: move btrfs_block_group_root() to block-group.c
The function btrfs_block_group_root() is declared in disk-io.c; however,
all its callers are in block-group.c. Move it to the latter file and
declare it static.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Thu, 16 May 2024 03:10:23 +0000 (11:10 +0800)]
btrfs: drop bytenr_orig and fix comment in btrfs_scan_one_device()
Drop the single-use variable bytenr_orig and instead use btrfs_sb_offset()
in the function argument passing.
Fix a stale comment about not automatically fixing a bad primary
superblock from the backup mirror copies. Also, move the comment closer
to where the primary superblock read occurs.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 10 May 2024 16:41:04 +0000 (17:41 +0100)]
btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree
We are currently using a cached rb_root (struct rb_root_cached) for the
rb root of struct extent_map_tree. This doesn't offer much of an advantage
here because:
1) It's only advantage over the regular rb_root is that it caches a
pointer to the left most node (first node), so a call to
rb_first_cached() doesn't have to chase pointers until it reaches
the left most node;
2) We only have two scenarios that access left most node with
rb_first_cached():
When dropping all extent maps from an inode, during inode eviction;
When iterating over extent maps during the extent map shrinker;
3) In both cases we keep removing extent maps, which causes deletion of
the left most node so rb_erase_cached() has to call rb_next() to find
out what's the next left most node and assign it to
struct rb_root_cached::rb_leftmost;
4) We can do that ourselves in those two uses cases and stop using a
rb_root_cached rb tree and use instead a regular rb_root rb tree.
This reduces the size of struct extent_map_tree by 8 bytes and, since
this structure is embedded in struct btrfs_inode, it also reduces the
size of that structure by 8 bytes.
So on a 64 bits platform the size of btrfs_inode is reduced from 1032
bytes down to 1024 bytes.
This means we will be able to have 4 inodes per 4K page instead of 3.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sun, 5 May 2024 12:47:02 +0000 (13:47 +0100)]
btrfs: remove objectid from struct btrfs_inode on 64 bits platforms
On 64 bits platforms we don't really need to have a dedicated member (the
objectid field) for the inode's number since we store in the VFS inode's
i_ino member, which is an unsigned long and this type is 64 bits wide on
64 bits platforms. We only need that field in case we are on a 32 bits
platform because the unsigned long type is 32 bits wide on such platforms
See commit 33345d01522f ("Btrfs: Always use 64bit inode number") regarding
this 64/32 bits detail.
The objectid field of struct btrfs_inode is also used to store the ID of
a root for directories that are stubs for unreferenced roots. In such
cases the inode is a directory and has the BTRFS_INODE_ROOT_STUB runtime
flag set.
So in order to reduce the size of btrfs_inode structure on 64 bits
platforms we can remove the objectid member and use the VFS inode's i_ino
member instead whenever we need to get the inode number. In case the inode
is a root stub (BTRFS_INODE_ROOT_STUB set) we can use the member
last_reflink_trans to store the ID of the unreferenced root, since such
inode is a directory and reflinks can't be done against directories.
So remove the objectid fields for 64 bits platforms and alias the
last_reflink_trans field with a name of ref_root_id in a union.
On a release kernel config, this reduces the size of struct btrfs_inode
from 1040 bytes down to 1032 bytes.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 3 May 2024 17:10:06 +0000 (18:10 +0100)]
btrfs: remove location key from struct btrfs_inode
Currently struct btrfs_inode has a key member, named "location", that is
either:
1) The key of the inode's item. In this case the objectid is the number
of the inode;
2) A key stored in a dir entry with a type of BTRFS_ROOT_ITEM_KEY, for
the case where we have a root that is a snapshot of a subvolume that
points to other subvolumes. In this case the objectid is the ID of
a subvolume inside the snapshotted parent subvolume.
The key is only used to lookup the inode item for the first case, while
for the second it's never used since it corresponds to directory stubs
created with new_simple_dir() and which are marked as dummy, so there's
no actual inode item to ever update. In the second case we only check
the key type at btrfs_ino() for 32 bits platforms and its objectid is
only needed for unlink.
Instead of using a key we can do fine with just the objectid, since we
can generate the key whenever we need it having only the objectid, as
in all use cases the type is always BTRFS_INODE_ITEM_KEY and the offset
is always 0.
So use only an objectid instead of a full key. This reduces the size of
struct btrfs_inode from 1048 bytes down to 1040 bytes on a release kernel.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 30 Apr 2024 15:52:27 +0000 (16:52 +0100)]
btrfs: don't allocate file extent tree for non regular files
When not using the NO_HOLES feature we always allocate an io tree for an
inode's file_extent_tree. This is wasteful because that io tree is only
used for regular files, so we allocate more memory than needed for inodes
that represent directories or symlinks for example, or for inodes that
correspond to free space inodes.
So improve on this by allocating the io tree only for inodes of regular
files that are not free space inodes.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 30 Apr 2024 09:55:05 +0000 (10:55 +0100)]
btrfs: unify index_cnt and csum_bytes from struct btrfs_inode
The index_cnt field of struct btrfs_inode is used only for two purposes:
1) To store the index for the next entry added to a directory;
2) For the data relocation inode to track the logical start address of the
block group currently being relocated.
For the relocation case we use index_cnt because it's not used for
anything else in the relocation use case - we could have used other fields
that are not used by relocation such as defrag_bytes, last_unlink_trans
or last_reflink_trans for example (among others).
Since the csum_bytes field is not used for directories, do the following
changes:
1) Put index_cnt and csum_bytes in a union, and index_cnt is only
initialized when the inode is a directory. The csum_bytes is only
accessed in IO paths for regular files, so we're fine here;
2) Use the defrag_bytes field for relocation, since the data relocation
inode is never used for defrag purposes. And to make the naming better,
alias it to reloc_block_group_start by using a union.
This reduces the size of struct btrfs_inode by 8 bytes in a release
kernel, from 1056 bytes down to 1048 bytes.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 6 May 2024 12:27:29 +0000 (13:27 +0100)]
btrfs: remove inode_lock from struct btrfs_root and use xarray locks
Currently we use the spinlock inode_lock from struct btrfs_root to
serialize access to two different data structures:
1) The delayed inodes xarray (struct btrfs_root::delayed_nodes);
2) The inodes xarray (struct btrfs_root::inodes).
Instead of using our own lock, we can use the spinlock that is part of the
xarray implementation, by using the xa_lock() and xa_unlock() APIs and
using the xarray APIs with the double underscore prefix that don't take
the xarray locks and assume the caller is using xa_lock() and xa_unlock().
So remove the spinlock inode_lock from struct btrfs_root and use the
corresponding xarray locks. This brings 2 benefits:
1) We reduce the size of struct btrfs_root, from 1336 bytes down to
1328 bytes on a 64 bits release kernel config;
2) We reduce lock contention by not using anymore the same lock for
changing two different and unrelated xarrays.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 29 Apr 2024 13:01:09 +0000 (14:01 +0100)]
btrfs: reduce nesting and deduplicate error handling at btrfs_iget_path()
Make btrfs_iget_path() simpler and easier to read by avoiding nesting of
if-then-else statements and having an error label to do all the error
handling instead of repeating it a couple times.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 29 Apr 2024 12:08:12 +0000 (13:08 +0100)]
btrfs: preallocate inodes xarray entry to avoid transaction abort
When creating a new inode, at btrfs_create_new_inode(), one of the very
last steps is to add the inode to the root's inodes xarray. This often
requires allocating memory which may fail (even though xarrays have a
dedicated kmem_cache which make it less likely to fail), and at that point
we are forced to abort the current transaction (as some, but not all, of
the inode metadata was added to its subvolume btree).
To avoid a transaction abort, preallocate memory for the xarray early at
btrfs_create_new_inode(), so that if we fail we don't need to abort the
transaction and the insertion into the xarray is guaranteed to succeed.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 24 Apr 2024 15:58:01 +0000 (16:58 +0100)]
btrfs: use an xarray to track open inodes in a root
Currently we use a red black tree (rb-tree) to track the currently open
inodes of a root (in struct btrfs_root::inode_tree). This however is not
very efficient when the number of inodes is large since rb-trees are
binary trees. For example for 100K open inodes, the tree has a depth of
17. Besides that, inserting into the tree requires navigating through it
and pulling useless cache lines in the process since the red black tree
nodes are embedded within the btrfs inode - on the other hand, by being
embedded, it requires no extra memory allocations.
We can improve this by using an xarray instead, which is efficient when
indices are densely clustered (such as inode numbers), is more cache
friendly and behaves like a resizable array, with a much better search
and insertion complexity than a red black tree. This only has one small
disadvantage which is that insertion will sometimes require allocating
memory for the xarray - which may fail (not that often since it uses a
kmem_cache) - but on the other hand we can reduce the btrfs inode
structure size by 24 bytes (from 1080 down to 1056 bytes) after removing
the embedded red black tree node, which after the next patches will allow
to reduce the size of the structure to 1024 bytes, meaning we will be able
to store 4 inodes per 4K page instead of 3 inodes.
This change does a straightforward change to use an xarray, and results
in a transaction abort if we can't allocate memory for the xarray when
creating an inode - but the next patch changes things so that we don't
need to abort.
Running the following fs_mark test showed some improvements:
The test was run with a release kernel config (Debian's default config).
Also capturing the insertion times into the rb tree and into the xarray,
that is measuring the duration of the old function inode_tree_add() and
the duration of the new btrfs_add_inode_to_root() function, gave the
following results (in nanoseconds):
Before this patch, inode_tree_add() execution times:
Qu Wenruo [Tue, 14 May 2024 09:02:13 +0000 (18:32 +0930)]
btrfs: raid56: do extra dumping for CONFIG_BTRFS_ASSERT
There are several hard-to-hit ASSERT()s hit inside raid56.
Unfortunately the ASSERT() expression is a little complex, and except
the ASSERT(), there is nothing to provide any clue.
Considering if race is involved, it's pretty hard to reproduce.
Meanwhile sometimes the dump of the rbio structure can provide some
pretty good clues, it's worth to do the extra multi-line dump for
btrfs raid56 related code.
Filipe Manana [Tue, 14 May 2024 12:19:12 +0000 (13:19 +0100)]
btrfs: fix function name in comment for btrfs_remove_ordered_extent()
Due to a refactoring introduced by commit 53d9981ca20e ("btrfs: split
btrfs_alloc_ordered_extent to allocation and insertion helpers"), the
function btrfs_alloc_ordered_extent() was renamed to
alloc_ordered_extent(), so the comment at btrfs_remove_ordered_extent()
is no longer very accurate. Update the comment to refer to the new
name "alloc_ordered_extent()".
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 13 May 2024 17:05:47 +0000 (18:05 +0100)]
btrfs: remove no longer used btrfs_migrate_to_delayed_refs_rsv()
The function btrfs_migrate_to_delayed_refs_rsv() is no longer used.
Its last use was removed in commit 2f6397e448e6 ("btrfs: don't refill
whole delayed refs block reserve when starting transaction").
So remove the function.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 7 May 2024 22:50:16 +0000 (23:50 +0100)]
btrfs: zoned: make btrfs_get_dev_zone() static
It's not used outside zoned.c, so make it static.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 8 May 2024 11:14:47 +0000 (13:14 +0200)]
btrfs: pass struct btrfs_io_geometry into handle_ops_on_dev_replace()
Passing in a 'struct btrfs_io_geometry into handle_ops_on_dev_replace
can reduce the number of arguments by two.
No functional changes otherwise.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 2 May 2024 21:11:33 +0000 (23:11 +0200)]
btrfs: qgroup: do quick checks if quotas are enabled before starting ioctls
The ioctls that add relations, create qgroups or set limits start/join
transaction. When quotas are not enabled this is not necessary, there
will be errors reported back anyway but this could be also misleading
and we should really report that quotas are not enabled. For that use
-ENOTCONN.
The helper is meant to do a quick check before any other standard ioctl
checks are done. If quota is disabled meanwhile we still rely on proper
locking inside any active operation changing the qgroup structures.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A set of clk fixes for the Qualcomm, Mediatek, and Allwinner drivers:
- Fix the Qualcomm Stromer Plus PLL set_rate() clk_op to explicitly
set the alpha enable bit and not set bits that don't exist
- Mark Qualcomm IPQ9574 crypto clks as voted to avoid stuck clk
warnings
- Fix the parent of some PLLs on Qualcomm sm6530 so their rate is
correct
- Fix the min/max rate clamping logic in the Allwinner driver that
got broken in v6.9
- Limit runtime PM enabling in the Mediatek driver to only
mt8183-mfgcfg so that system wide resume doesn't break on other
Mediatek SoCs"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: mediatek: mt8183: Only enable runtime PM on mt8183-mfgcfg
clk: sunxi-ng: common: Don't call hw_to_ccu_common on hw without common
clk: qcom: gcc-ipq9574: Add BRANCH_HALT_VOTED flag
clk: qcom: apss-ipq-pll: remove 'config_ctl_hi_val' from Stromer pll configs
clk: qcom: clk-alpha-pll: set ALPHA_EN bit for Stromer Plus PLLs
clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents
Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix unnecessary copy to 0 when kernel is booted at address 0
- Fix usercopy crash when dumping dtl via debugfs
- Avoid possible crash when PCI hotplug races with error handling
- Fix kexec crash caused by scv being disabled before other CPUs
call-in
- Fix powerpc selftests build with USERCFLAGS set
Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.
* tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix build with USERCFLAGS set
powerpc/pseries: Fix scv instruction crash with kexec
powerpc/eeh: avoid possible crash when edev->pdev changes
powerpc/pseries: Whitelist dtl slub object for copying to userspace
powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
Merge tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"An i2c driver fix"
* tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
Michael Ellerman [Sat, 6 Jul 2024 12:08:33 +0000 (22:08 +1000)]
selftests/powerpc: Fix build with USERCFLAGS set
Currently building the powerpc selftests with USERCFLAGS set to anything
causes the build to break:
$ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
...
gcc -Wno-error cache_shape.c ...
cache_shape.c:18:10: fatal error: utils.h: No such file or directory
18 | #include "utils.h"
| ^~~~~~~~~
compilation terminated.
This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which
causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at
all, resulting in none of the usual CFLAGS being passed. That can
be seen in the output above, the only flag passed to the compiler is
-Wno-error.
Fix it by dropping the conditional setting of CFLAGS in flags.mk.
Instead always set CFLAGS, but also append USERCFLAGS if they are set.
Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk
is included by multiple Makefiles (to support partial builds), causing
CFLAGS to be appended to multiple times. Additionally that would place
the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS
couldn't override the standard flags. Being able to override the
standard flags is desirable, for example for adding -Wno-error.
With the fix in place, the CFLAGS are set correctly, including the
USERCFLAGS:
Merge tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fix from Mimi Zohar:
"A single bug fix to properly remove all of the securityfs IMA
measurement lists"
* tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: fix wrong zero-assignment during securityfs dentry remove
Merge tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci update from Bjorn Helgaas:
- Update MAINTAINERS and CREDITS to credit Gustavo Pimentel with the
Synopsys DesignWare eDMA driver and reflect that he is no longer at
Synopsys and isn't in a position to maintain the DesignWare xData
traffic generator (Bjorn Helgaas)
* tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
CREDITS: Add Synopsys DesignWare eDMA driver for Gustavo Pimentel
MAINTAINERS: Orphan Synopsys DesignWare xData traffic generator
Merge tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the CMODX example in the recently added icache flushing
prctl()
- A fix to the perf driver to avoid corrupting event data on counter
overflows when external overflow handlers are in use
- A fix to clear all hardware performance monitor events on boot, to
avoid dangling events firmware or previously booted kernels from
triggering spuriously
- A fix to the perf event probing logic to avoid erroneously reporting
the presence of unimplemented counters. This also prevents some
implemented counters from being reported
- A build fix for the vector sigreturn selftest on clang
- A fix to ftrace, which now requires the previously optional index
argument to ftrace_graph_ret_addr()
- A fix to avoid deadlocking if kexec crash handling triggers in an
interrupt context
* tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kexec: Avoid deadlock in kexec crash path
riscv: stacktrace: fix usage of ftrace_graph_ret_addr()
riscv: selftests: Fix vsetivli args for clang
perf: RISC-V: Check standard event availability
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
drivers/perf: riscv: Do not update the event data if uptodate
documentation: Fix riscv cmodx example
- xe: migration error handling + typoed register name in gt setup
- i915: usb-c fix to shut up warnings on MTL+
- panthor: fix sync-only jobs + ioctl validation fix to not EINVAL
wrongly
- panel quirks
- nouveau: NULL deref in get_modes
drm core:
- fbdev big endian fix for the dma memory backed variant
drivers/firmware:
- fix sysfb refcounting"
* tag 'drm-fixes-2024-07-05' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/mcr: Avoid clobbering DSS steering
drm/xe: fix error handling in xe_migrate_update_pgtables
drm/ttm: Always take the bo delayed cleanup path for imported bos
drm/fbdev-generic: Fix framebuffer on big endian devices
drm/panthor: Fix sync-only jobs
drm/panthor: Don't check the array stride on empty uobj arrays
drm/amdgpu/atomfirmware: silence UBSAN warning
drm/radeon: check bo_va->bo is non-NULL before using it
drm/amd/display: Fix array-index-out-of-bounds in dml2/FCLKChangeSupport
drm/amd/display: Update efficiency bandwidth for dcn351
drm/amd/display: Fix refresh rate range for some panel
drm/amd/display: Account for cursor prefetch BW in DML1 mode support
drm/amd/display: Add refresh rate range check
drm/amd/display: Reset freesync config before update new state
drm: panel-orientation-quirks: Add labels for both Valve Steam Deck revisions
drm: panel-orientation-quirks: Add quirk for Valve Galileo
drm/i915/display: For MTL+ platforms skip mg dp programming
drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes
firmware: sysfb: Fix reference count of sysfb parent device
Merge tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Two OF lookup quirks and one fix for an issue in the generic gpio-mmio
driver:
- add two OF lookup quirks for TSC2005 and MIPS Lantiq
- don't try to figure out bgpio_bits from the 'ngpios' property in
gpio-mmio"
* tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: of: add polarity quirk for TSC2005
gpio: mmio: do not calculate bgpio_bits via "ngpios"
gpiolib: of: fix lookup quirk for MIPS Lantiq
Merge tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM fixes from Jarkko Sakkinen:
"This contains the fixes for !chip->auth condition, preventing the
breakage of:
- tpm_ftpm_tee.c
- tpm_i2c_nuvoton.c
- tpm_ibmvtpm.c
- tpm_tis_i2c_cr50.c
- tpm_vtpm_proxy.c
All drivers will continue to work as they did in 6.9, except a single
warning (dev_warn() not WARN()) is printed to klog only to inform that
authenticated sessions are not enabled"
* tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Address !chip->auth in tpm_buf_append_hmac_session*()
tpm: Address !chip->auth in tpm_buf_append_name()
tpm: Address !chip->auth in tpm2_*_auth_session()
Wolfram Sang [Fri, 5 Jul 2024 14:08:55 +0000 (16:08 +0200)]
Merge tag 'i2c-host-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
This tag includes a nice fix in the PNX driver that has been
pending for a long time. Piotr has replaced a potential lock in
the interrupt context with a more efficient and straightforward
handling of the timeout signaling.