Baokun Li [Tue, 2 Jul 2024 13:23:49 +0000 (21:23 +0800)]
ext4: make sure the first directory block is not a hole
The syzbot constructs a directory that has no dirblock but is non-inline,
i.e. the first directory block is a hole. And no errors are reported when
creating files in this directory in the following flow.
ext4_mknod
...
ext4_add_entry
// Read block 0
ext4_read_dirblock(dir, block, DIRENT)
bh = ext4_bread(NULL, inode, block, 0)
if (!bh && (type == INDEX || type == DIRENT_HTREE))
// The first directory block is a hole
// But type == DIRENT, so no error is reported.
After that, we get a directory block without '.' and '..' but with a valid
dentry. This may cause some code that relies on dot or dotdot (such as
make_indexed_dir()) to crash.
Therefore when ext4_read_dirblock() finds that the first directory block
is a hole report that the filesystem is corrupted and return an error to
avoid loading corrupted data from disk causing something bad.
Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The immediate cause of this problem is that there is only one valid dentry
for the block to be split during do_split, so split==0 results in out of
bounds accesses to the map triggering the issue.
The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called.
But syzbot's crafted image has no dot and dotdot in dir, and the dentry
distribution in dirblock is as follows:
So when renaming dentry1 increases its name_len length by 1, neither hole
nor free is sufficient to hold the new dentry, and make_indexed_dir() is
called.
In make_indexed_dir() it is assumed that the first two entries of the
dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root
because they are treated as dot and dotdot, and only dentry2 is moved
to the new leaf block. That's why count is equal to 1.
Therefore add the ext4_check_dx_root() helper function to add more sanity
checks to dot and dotdot before starting the conversion to avoid the above
issue.
Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Wojciech Gładysz [Wed, 3 Jul 2024 07:01:12 +0000 (09:01 +0200)]
ext4: sanity check for NULL pointer after ext4_force_shutdown
Test case: 2 threads write short inline data to a file.
In ext4_page_mkwrite the resulting inline data is converted.
Handling ext4_grp_locked_error with description "block bitmap
and bg descriptor inconsistent: X vs Y free clusters" calls
ext4_force_shutdown. The conversion clears
EXT4_STATE_MAY_INLINE_DATA but fails for
ext4_destroy_inline_data_nolock and ext4_mark_iloc_dirty due
to ext4_forced_shutdown. The restoration of inline data fails
for the same reason not setting EXT4_STATE_MAY_INLINE_DATA.
Without the flag set a regular process path in ext4_da_write_end
follows trying to dereference page folio private pointer that has
not been set. The fix calls early return with -EIO error shall the
pointer to private be NULL.
Jan Kara [Mon, 1 Jul 2024 13:28:00 +0000 (15:28 +0200)]
jbd2: increase maximum transaction size
Originally, we were quite conservative in limiting maximum transaction
size to a quarter of the journal because we were not accounting
transaction descriptor and revoke blocks. These days we do properly
account them and reserve space for them from the total transaction
credits. Thus there's no need to be so conservative and we can increase
the maximum transaction size to one third of the journal (even half
should work fine in principle but the performance will likely suffer in
that case). This also fixes failures to grow filesystems with tiny
journals.
Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Mon, 24 Jun 2024 17:01:20 +0000 (19:01 +0200)]
jbd2: drop pointless shrinker batch initialization
In jbd2_journal_init_common() we set batch size of a shrinker shrinking
checkpointed buffers to journal->j_max_transaction_buffers. But that is
guaranteed to be 0 at that point so we effectively stay with the default
shrinker batch size of 128. It has been like this since introduction of
jbd2 shrinkers so just drop the pointless initialization.
Jan Kara [Mon, 24 Jun 2024 17:01:19 +0000 (19:01 +0200)]
jbd2: avoid infinite transaction commit loop
Commit 9f356e5a4f12 ("jbd2: Account descriptor blocks into
t_outstanding_credits") started to account descriptor blocks into
transactions outstanding credits. However it didn't appropriately
decrease the maximum amount of credits available to userspace. Thus if
the filesystem requests a transaction smaller than
j_max_transaction_buffers but large enough that when descriptor blocks
are added the size exceeds j_max_transaction_buffers, we confuse
add_transaction_credits() into thinking previous handles have grown the
transaction too much and enter infinite journal commit loop in
start_this_handle() -> add_transaction_credits() trying to create
transaction with enough credits available.
Fix the problem by properly accounting for transaction space reserved
for descriptor blocks when verifying requested transaction handle size.
Jan Kara [Mon, 24 Jun 2024 17:01:18 +0000 (19:01 +0200)]
jbd2: precompute number of transaction descriptor blocks
Instead of computing the number of descriptor blocks a transaction can
have each time we need it (which is currently when starting each
transaction but will become more frequent later) precompute the number
once during journal initialization together with maximum transaction
size. We perform the precomputation whenever journal feature set is
updated similarly as for computation of
journal->j_revoke_records_per_block.
Jan Kara [Mon, 24 Jun 2024 17:01:17 +0000 (19:01 +0200)]
jbd2: make jbd2_journal_get_max_txn_bufs() internal
There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.
Ye Bin [Thu, 20 Jun 2024 07:24:05 +0000 (15:24 +0800)]
jbd2: avoid mount failed when commit block is partial submitted
We encountered a problem that the file system could not be mounted in
the power-off scenario. The analysis of the file system mirror shows that
only part of the data is written to the last commit block.
The valid data of the commit block is concentrated in the first sector.
However, the data of the entire block is involved in the checksum calculation.
For different hardware, the minimum atomic unit may be different.
If the checksum of a committed block is incorrect, clear the data except the
'commit_header' and then calculate the checksum. If the checkusm is correct,
it is considered that the block is partially committed, Then continue to replay
journal.
Jan Kara [Thu, 13 Jun 2024 15:02:34 +0000 (17:02 +0200)]
ext4: avoid writing unitialized memory to disk in EA inodes
If the extended attribute size is not a multiple of block size, the last
block in the EA inode will have uninitialized tail which will get
written to disk. We will never expose the data to userspace but still
this is not a good practice so just zero out the tail of the block as it
isn't going to cause a noticeable performance overhead.
Luis Henriques (SUSE) [Tue, 18 Jun 2024 14:43:12 +0000 (15:43 +0100)]
ext4: don't track ranges in fast_commit if inode has inlined data
When fast-commit needs to track ranges, it has to handle inodes that have
inlined data in a different way because ext4_fc_write_inode_data(), in the
actual commit path, will attempt to map the required blocks for the range.
However, inodes that have inlined data will have it's data stored in
inode->i_block and, eventually, in the extended attribute space.
Unfortunately, because fast commit doesn't currently support extended
attributes, the solution is to mark this commit as ineligible.
Luis Henriques (SUSE) [Wed, 29 May 2024 09:20:30 +0000 (10:20 +0100)]
ext4: fix possible tid_t sequence overflows
In the fast commit code there are a few places where tid_t variables are
being compared without taking into account the fact that these sequence
numbers may wrap. Fix this issue by using the helper functions tid_gt()
and tid_geq().
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20240529092030.9557-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Luis Henriques (SUSE) [Mon, 27 May 2024 16:14:47 +0000 (17:14 +0100)]
ext4: use ext4_update_inode_fsync_trans() helper in inode creation
Call helper function ext4_update_inode_fsync_trans() instead of open
coding it in __ext4_new_inode(). This helper checks both that the handle
is valid *and* that it hasn't been aborted due to some fatal error in the
journalling layer, using is_handle_aborted().
Kees Cook [Thu, 23 May 2024 22:54:12 +0000 (15:54 -0700)]
ext4: use memtostr_pad() for s_volume_name
As with the other strings in struct ext4_super_block, s_volume_name is
not NUL terminated. The other strings were marked in commit 072ebb3bffe6
("ext4: add nonstring annotations to ext4.h"). Using strscpy() isn't
the right replacement for strncpy(); it should use memtostr_pad()
instead.
Zhang Yi [Mon, 20 May 2024 13:18:31 +0000 (21:18 +0800)]
jbd2: speed up jbd2_transaction_committed()
jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it holds j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.
We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.
Zhang Yi [Fri, 17 May 2024 12:40:05 +0000 (20:40 +0800)]
ext4: make ext4_da_map_blocks() buffer_head unaware
After calling the ext4_da_map_blocks(), a delalloc extent state could
be identified through the EXT4_MAP_DELAYED flag in map. So factor out
buffer_head related handles in ext4_da_map_blocks(), make this function
buffer_head unaware and becomes a common helper, and also update the
stale function commtents, preparing for the iomap da write path in the
future.
Zhang Yi [Fri, 17 May 2024 12:40:04 +0000 (20:40 +0800)]
ext4: make ext4_insert_delayed_block() insert multi-blocks
Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
pass length parameter to make it insert multiple delalloc blocks at a
time. For non-bigalloc case, just reserve len blocks and insert delalloc
extent. For bigalloc case, we can ensure that the clusters in the middle
of a extent must be unallocated, we only need to check whether the start
and end clusters are delayed/allocated. We should subtract the space for
the start and/or end block(s) if they are allocated.
Zhang Yi [Fri, 17 May 2024 12:40:03 +0000 (20:40 +0800)]
ext4: factor out a helper to check the cluster allocation state
Factor out a common helper ext4_clu_alloc_state(), check whether the
cluster containing a delalloc block to be added has been allocated or
has delalloc reservation, no logic changes.
Zhang Yi [Fri, 17 May 2024 12:40:02 +0000 (20:40 +0800)]
ext4: make ext4_da_reserve_space() reserve multi-clusters
Add 'nr_resv' parameter to ext4_da_reserve_space(), which indicates the
number of clusters wants to reserve, make it reserve multiple clusters
at a time.
Zhang Yi [Fri, 17 May 2024 12:40:01 +0000 (20:40 +0800)]
ext4: make ext4_es_insert_delayed_block() insert multi-blocks
Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
and pass length parameter to make it insert multiple delalloc blocks at
a time. For the case of bigalloc, split the allocated parameter to
lclu_allocated and end_allocated. lclu_allocated indicates the
allocation state of the cluster which is containing the lblk,
end_allocated indicates the allocation state of the extent end, clusters
in the middle of delay allocated extent must be unallocated.
Zhang Yi [Fri, 17 May 2024 12:39:59 +0000 (20:39 +0800)]
ext4: trim delalloc extent
In ext4_da_map_blocks(), we could find four kind of extents in the
extent status tree: hole, unwritten, written and delayed extent. Now we
only trim the map len if we found an unwritten extent or a written
extent. This is okay now since map->m_len is always set to one and we
always insert one delayed block at a time. But this will become isn't
okay for other two cases if ext4_insert_delayed_block() and
ext4_da_map_blocks() support inserting multiple map->len blocks later.
1. If we found a hole in the extent status tree which es->es_len is
shorter than the length we want to write, we should trim the
map->m_len to prevent adding extra delay more blocks than we
expected. For example, assume we write data [A, C) to a file that
contains a hole extent [A, B) and a written extent [B, D) in the
cache.
A B C D
before da write: ...hhhhhh|wwwwww....
Then we will get extent [A, B), we should trim map->m_len to B-A
before inserting new delalloc blocks, if not, the range [B, C) will
be duplicated.
2. If we found a delayed extent in the extent status tree which
es->es_len is shorter than the length we want to write, we should
trim the map->m_len to es->es_len and return directly since the front
part of this map has been delayed, we can't insert the delalloc
extent that contains the latter part in this round, we should return
the delayed length and the caller should increase the position and
call ext4_da_map_blocks() again. For example, assume we write data
[A, C) to a file that contains a delayed extent [A, B) in the cache.
A B C
before da write: ...dddddd|hhh....
Then we will get delayed extent [A, B), we should also trim map->m_len
to B-A and return, if not, we will incorrectly assume that the write
is complete and won't insert [B, C).
So we need to always trim the map->m_len if the found es->es_len in the
extent status tree is shorter than the map->m_len, prearing for
inserting a extent with multiple delalloc blocks. This patch only does a
pre-fix, the handle is crude and ext4_da_map_blocks() deserve a cleanup,
we will do that later.
Zhang Yi [Fri, 17 May 2024 12:39:58 +0000 (20:39 +0800)]
ext4: warn if delalloc counters are not zero on inactive
The per-inode i_reserved_data_blocks count the reserved delalloc blocks
in a regular file, it should be zero when destroying the file. The
per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
filesystem, it also should be zero when umounting the filesystem. Now we
have only an error message if the i_reserved_data_blocks is not zero,
which is unable to be simply captured, so add WARN_ON_ONCE to make it
more visable.
Zhang Yi [Fri, 17 May 2024 12:39:57 +0000 (20:39 +0800)]
ext4: check the extent status again before inserting delalloc block
ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).
If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.
Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
lock.
But this can race between ext4_page_mkwrite() & ext4 fallocate path
ext4_page_mkwrite() ext4_fallocate()
block_page_mkwrite()
ext4_da_map_blocks()
//find hole in extent status tree
ext4_alloc_file_blocks()
ext4_map_blocks()
//allocate block and unwritten extent
ext4_insert_delayed_block()
ext4_da_reserve_space()
//reserve one more block
ext4_es_insert_delayed_block()
//drop unwritten extent and add delayed extent by mistake
Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:
EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
Fix the problem by looking up extent status tree again while the
i_data_sem is held in write mode. If it still can't find any entry, then
we insert a new da entry into the extent status tree.
Zhang Yi [Fri, 17 May 2024 12:39:56 +0000 (20:39 +0800)]
ext4: factor out a common helper to query extent map
Factor out a new common helper ext4_map_query_blocks() from the
ext4_da_map_blocks(), it query and return the extent map status on the
inode's extent path, no logic changes.
Luis Henriques (SUSE) [Wed, 15 May 2024 08:28:57 +0000 (09:28 +0100)]
ext4: fix infinite loop when replaying fast_commit
When doing fast_commit replay an infinite loop may occur due to an
uninitialized extent_status struct. ext4_ext_determine_insert_hole() does
not detect the replay and calls ext4_es_find_extent_range(), which will
return immediately without initializing the 'es' variable.
Because 'es' contains garbage, an integer overflow may happen causing an
infinite loop in this function, easily reproducible using fstest generic/039.
This commit fixes this issue by unconditionally initializing the structure
in function ext4_es_find_extent_range().
Thanks to Zhang Yi, for figuring out the real problem!
Kemeng Shi [Tue, 14 May 2024 11:24:37 +0000 (19:24 +0800)]
jbd2: remove dead check of JBD2_UNMOUNT in kjournald2
We always set JBD2_UNMOUNT with j_state_lock held in journal_kill_thread.
In kjournald2, we check JBD2_UNMOUNT flag two times under the same
j_state_lock. Then the second check is unnecessary.
Kemeng Shi [Tue, 14 May 2024 11:24:36 +0000 (19:24 +0800)]
jbd2: remove dead equality check of j_commit_[sequence/request] in kjournald2
The j_commit_[sequence/request] are updated with j_state_lock held during
runtime. In kjournald2, two equality checks of j_commit_[sequence/request]
are under the same j_state_lock, then the second check is unnecessary.
Linus Torvalds [Sun, 23 Jun 2024 15:06:01 +0000 (11:06 -0400)]
Merge tag 'i2c-for-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"The core gains placeholders for recently added functions when
CONFIG_I2C is not defined as well documentation fixes to start using
inclusive terminology.
The drivers get paths in DT bindings fixed as well as proper interrupt
handling for the ocores driver"
* tag 'i2c-for-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
docs: i2c: summary: be clearer with 'controller/target' and 'adapter/client' pairs
docs: i2c: summary: document 'local' and 'remote' targets
docs: i2c: summary: document use of inclusive language
docs: i2c: summary: update speed mode description
docs: i2c: summary: update I2C specification link
docs: i2c: summary: start sentences consistently.
i2c: Add nop fwnode operations
i2c: ocores: set IACK bit after core is enabled
dt-bindings: i2c: google,cros-ec-i2c-tunnel: correct path to i2c-controller schema
dt-bindings: i2c: atmel,at91sam: correct path to i2c-controller schema
Linus Torvalds [Sun, 23 Jun 2024 15:01:57 +0000 (11:01 -0400)]
Merge tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Five smb3 client fixes
- three nets/fiolios cifs fixes
- fix typo in module parameters description
- fix incorrect swap warning"
* tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Move the 'pid' from the subreq to the req
cifs: Only pick a channel once per read request
cifs: Defer read completion
cifs: fix typo in module parameter enable_gcm_256
cifs: drop the incorrect assertion in cifs_swap_rw()
Linus Torvalds [Sun, 23 Jun 2024 14:32:24 +0000 (10:32 -0400)]
Merge tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"Fix fragility in checks for unset node ID.
Use numa_valid_node() function to verify that nid is a valid node
ID instead of inconsistent comparisons with either NUMA_NO_NODE or
MAX_NUMNODES"
* tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: use numa_valid_node() helper to check for invalid node ID
Linus Torvalds [Sun, 23 Jun 2024 14:13:23 +0000 (07:13 -0700)]
Merge tag 'powerpc-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Prevent use-after-free in 64-bit KVM VFIO
- Add generated Power8 crypto asm to .gitignore
Thanks to Al Viro and Nathan Lynch.
* tag 'powerpc-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()
powerpc/crypto: Add generated P8 asm to .gitignore
Wolfram Sang [Sun, 23 Jun 2024 00:13:27 +0000 (02:13 +0200)]
Merge tag 'i2c-host-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
This pull request fixes the paths of the dt-schema to their
complete locations for the ChromeOS EC tunnel driver and the
Atmel at91sam drivers.
Additionally, the OpenCores driver receives a fix for an issue
that dates back to version 2.6.18. Specifically, the interrupts
need to be acknowledged (clearing all pending interrupts) after
enabling the core.
Linus Torvalds [Sat, 22 Jun 2024 21:02:16 +0000 (14:02 -0700)]
Merge tag 'regulator-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few driver specific fixes for incorrect device descriptions, plus a
fix for a missing symbol export which causes build failures for some
newly added drivers in other trees"
* tag 'regulator-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: axp20x: AXP717: fix LDO supply rails and off-by-ones
regulator: bd71815: fix ramp values
regulator: core: Fix modpost error "regulator_get_regmap" undefined
regulator: tps6594-regulator: Fix the number of irqs for TPS65224 and TPS6594
Linus Torvalds [Sat, 22 Jun 2024 20:58:47 +0000 (13:58 -0700)]
Merge tag 'spi-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A number of fixes that have built up for SPI, a bunch of driver
specific ones including an unfortunate revert of an optimisation for
the i.MX driver which was causing issues with some configurations,
plus a couple of core fixes for the rarely used octal mode and for a
bad interaction between multi-CS support and target mode"
* tag 'spi-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-imx: imx51: revert burst length calculation back to bits_per_word
spi: Fix SPI slave probe failure
spi: Fix OCTAL mode support
spi: stm32: qspi: Clamp stm32_qspi_get_mode() output to CCR_BUSWIDTH_4
spi: stm32: qspi: Fix dual flash mode sanity test in stm32_qspi_setup()
spi: cs42l43: Drop cs35l56 SPI speed down to 11MHz
spi: cs42l43: Correct SPI root clock speed
Linus Torvalds [Sat, 22 Jun 2024 20:55:56 +0000 (13:55 -0700)]
Merge tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix crashes triggered by administrative operations on the server
* tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()
nfsd: fix oops when reading pool_stats before server is started
Linus Torvalds [Sat, 22 Jun 2024 16:02:39 +0000 (09:02 -0700)]
Merge tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of (mostly boring) fixes for syzbot bugs and rare(r) CI bugs.
The LRU_TIME_BITS fix was slightly more involved; we only have 48 bits
for the LRU position (we would prefer 64), so wraparound is possible
for the cached data LRUs on a filesystem that has done sufficient
(petabytes) reads; this is now handled.
One notable user reported bugfix, where we were forgetting to
correctly set the bucket data type, which should have been
BCH_DATA_need_gc_gens instead of BCH_DATA_free; this was causing us to
go emergency read-only on a filesystem that had seen heavy enough use
to see bucket gen wraparoud.
We're now starting to fix simple (safe) errors without requiring user
intervention - i.e. a small incremental step towards full self
healing.
This is currently limited to just certain allocation information
counters, and the error is still logged in the superblock; see that
patch for more information. ("bcachefs: Fix safe errors by default")"
* tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Move the ei_flags setting to after initialization
bcachefs: Fix a UAF after write_super()
bcachefs: Use bch2_print_string_as_lines for long err
bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()
bcachefs: Replace bare EEXIST with private error codes
bcachefs: Fix missing alloc_data_type_set()
closures: Change BUG_ON() to WARN_ON()
bcachefs: fix alignment of VMA for memory mapped files on THP
bcachefs: Fix safe errors by default
bcachefs: Fix bch2_trans_put()
bcachefs: set_worker_desc() for delete_dead_snapshots
bcachefs: Fix bch2_sb_downgrade_update()
bcachefs: Handle cached data LRU wraparound
bcachefs: Guard against overflowing LRU_TIME_BITS
bcachefs: delete_dead_snapshots() doesn't need to go RW
bcachefs: Fix early init error path in journal code
bcachefs: Check for invalid btree IDs
bcachefs: Fix btree ID bitmasks
bcachefs: Fix shift overflow in read_one_super()
bcachefs: Fix a locking bug in the do_discard_fast() path
...
Linus Torvalds [Sat, 22 Jun 2024 15:16:17 +0000 (08:16 -0700)]
Merge tag 'ata-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- We currently enable DIPM (device initiated power management) in the
device (using a SET FEATURES call to the device), regardless if the
HBA supports any LPM states or not. It seems counter intuitive, and
potentially dangerous to enable a device side feature, when the HBA
does not have the corresponding support. Thus, make sure that we do
not enable DIPM if the HBA does not support any LPM states.
* tag 'ata-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: ahci: Do not enable LPM if no LPM states are supported by the HBA
Linus Torvalds [Sat, 22 Jun 2024 15:03:47 +0000 (08:03 -0700)]
Merge tag 'pwm/for-6.10-rc5-fixes-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
"Three fixes for the pwm-stm32 driver.
The first patch prevents an integer wrap-around for small periods. In
the second patch the calculation of the prescaler is fixed which
resulted in values for the ARR register that don't fit into the
corresponding register bit field. The last commit improves an error
message that was wrongly copied from another error path"
* tag 'pwm/for-6.10-rc5-fixes-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: stm32: Fix error message to not describe the previous error path
pwm: stm32: Fix calculation of prescaler
pwm: stm32: Refuse too small period requests
Linus Torvalds [Sat, 22 Jun 2024 14:58:21 +0000 (07:58 -0700)]
Merge tag 'arm-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"There are seven oneline patches that each address a distinct problem
on the NXP i.MX platform, mostly the popular i.MX8M variant.
The only other two fixes are for error handling on the psci firmware
driver and SD card support on the milkv duo riscv board"
* tag 'arm-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: psci: Fix return value from psci_system_suspend()
riscv: dts: sophgo: disable write-protection for milkv duo
arm64: dts: imx8qm-mek: fix gpio number for reg_usdhc2_vmmc
arm64: dts: freescale: imx8mm-verdin: enable hysteresis on slow input pin
arm64: dts: imx93-11x11-evk: Remove the 'no-sdio' property
arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix BT shutdown GPIO
arm: dts: imx53-qsb-hdmi: Disable panel instead of deleting node
arm64: dts: imx8mp: Fix TC9595 input clock on DH i.MX8M Plus DHCOM SoM
arm64: dts: freescale: imx8mm-verdin: Fix GPU speed
Linus Torvalds [Sat, 22 Jun 2024 14:41:57 +0000 (07:41 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix dangling references to a redistributor region if the vgic was
prematurely destroyed.
- Properly mark FFA buffers as released, ensuring that both parties
can make forward progress.
x86:
- Allow getting/setting MSRs for SEV-ES guests, if they're using the
pre-6.9 KVM_SEV_ES_INIT API.
- Always sync pending posted interrupts to the IRR prior to IOAPIC
route updates, so that EOIs are intercepted properly if the old
routing table requested that.
Generic:
- Avoid __fls(0)
- Fix reference leak on hwpoisoned page
- Fix a race in kvm_vcpu_on_spin() by ensuring loads and stores are
atomic.
- Fix bug in __kvm_handle_hva_range() where KVM calls a function
pointer that was intended to be a marker only (nothing bad happens
but kind of a mine and also technically undefined behavior)
- Do not bother accounting allocations that are small and freed
before getting back to userspace.
Selftests:
- Fix compilation for RISC-V.
- Fix a "shift too big" goof in the KVM_SEV_INIT2 selftest.
- Compute the max mappable gfn for KVM selftests on x86 using
GuestMaxPhyAddr from KVM's supported CPUID (if it's available)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: Fix svm_get_msr()/svm_set_msr() for KVM_SEV_ES_INIT guests
KVM: Discard zero mask with function kvm_dirty_ring_reset
virt: guest_memfd: fix reference leak on hwpoisoned page
kvm: do not account temporary allocations to kmem
MAINTAINERS: Drop Wanpeng Li as a Reviewer for KVM Paravirt support
KVM: x86: Always sync PIR to IRR prior to scanning I/O APIC routes
KVM: Stop processing *all* memslots when "null" mmu_notifier handler is found
KVM: arm64: FFA: Release hyp rx buffer
KVM: selftests: Fix RISC-V compilation
KVM: arm64: Disassociate vcpus from redistributor region on teardown
KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
KVM: selftests: x86: Prioritize getting max_gfn from GuestPhysBits
KVM: selftests: Fix shift of 32 bit unsigned int more than 32 bits
Uwe Kleine-König [Fri, 21 Jun 2024 14:37:14 +0000 (16:37 +0200)]
pwm: stm32: Fix error message to not describe the previous error path
"Failed to lock the clock" is an appropriate error message for
clk_rate_exclusive_get() failing, but not for the clock running too
fast for the driver's calculations.
Uwe Kleine-König [Fri, 21 Jun 2024 14:37:13 +0000 (16:37 +0200)]
pwm: stm32: Fix calculation of prescaler
A small prescaler is beneficial, as this improves the resolution of the
duty_cycle configuration. However if the prescaler is too small, the
maximal possible period becomes considerably smaller than the requested
value.
One situation where this goes wrong is the following: With a parent
clock rate of 208877930 Hz and max_arr = 0xffff = 65535, a request for
period = 941243 ns currently results in PSC = 1. The value for ARR is
then calculated to
This value is bigger than 65535 however and so doesn't fit into the
respective register field. In this particular case the PWM was
configured for a period of 313733.4806027616 ns (with ARR = 98301 &
0xffff). Even if ARR was configured to its maximal value, only period =
627495.6861167669 ns would be achievable.
Fix the calculation accordingly and adapt the comment to match the new
algorithm.
With the calculation fixed the above case results in PSC = 2 and so an
actual period of 941229.1667195285 ns.
Linus Torvalds [Fri, 21 Jun 2024 21:28:28 +0000 (14:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two fixes: one in the ufs driver fixing an obvious memory leak and the
other (with a core flag based update) trying to prevent USB crashes by
stopping the core from issuing a request for the I/O Hints mode page"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: usb: uas: Do not query the IO Advice Hints Grouping mode page for USB/UAS devices
scsi: core: Introduce the BLIST_SKIP_IO_HINTS flag
scsi: ufs: core: Free memory allocated for model before reinit
Linus Torvalds [Fri, 21 Jun 2024 21:11:50 +0000 (14:11 -0700)]
Merge tag 'drm-fixes-2024-06-22' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Still pretty quiet, two weeks worth of amdgpu fixes, with one i915 and
one xe. I didn't get the drm-misc-fixes tree PR this week, but there
was only one fix queued and I think it can wait another week, so seems
pretty normal.
xe:
- Fix for invalid register access
i915:
- Fix conditions for joiner usage, it's not possible with eDP MSO
* tag 'drm-fixes-2024-06-22' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/vf: Don't touch GuC irq registers if using memory irqs
drm/amdgpu: init TA fw for psp v14
drm/amdgpu: cleanup MES11 command submission
drm/amdgpu: fix UBSAN warning in kv_dpm.c
drm/radeon: fix UBSAN warning in kv_dpm.c
drm/amd/display: Disable CONFIG_DRM_AMD_DC_FP for RISC-V with clang
drm/amd/display: Attempt to avoid empty TUs when endpoint is DPIA
drm/amd/display: change dram_clock_latency to 34us for dcn35
drm/amd/display: Change dram_clock_latency to 34us for dcn351
drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
drm/amdgpu: Indicate CU havest info to CP
drm/amd/display: prevent register access while in IPS
drm/amdgpu: fix locking scope when flushing tlb
drm/amd/display: Remove redundant idle optimization check
drm/i915/mso: using joiner is not possible with eDP MSO
Linus Torvalds [Fri, 21 Jun 2024 21:06:14 +0000 (14:06 -0700)]
Merge tag 'ovl-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, one originating in this cycle and one from 6.6"
* tag 'ovl-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix encoding fid for lower only root
ovl: fix copy-up in tmpfile
Linus Torvalds [Fri, 21 Jun 2024 20:55:38 +0000 (13:55 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Small bug fixes:
- Prevent a crash in bnxt if the en and rdma drivers disagree on the
MSI vectors
- Have rxe memcpy inline data from the correct address
- Fix rxe's validation of UD packets
- Several mlx5 mr cache issues: bad lock balancing on error, missing
propagation of the ATS property to the HW, wrong bucketing of freed
mrs in some cases
- Incorrect goto error unwind in mlx5 driver probe
- Missed userspace input validation in mlx5 SRQ create
- Incorrect uABI in MANA rejecting valid optional MR creation flags"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mana_ib: Ignore optional access flags for MRs
RDMA/mlx5: Add check for srq max_sge attribute
RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_init
RDMA/mlx5: Ensure created mkeys always have a populated rb_key
RDMA/mlx5: Follow rb_key.ats when creating new mkeys
RDMA/mlx5: Remove extra unlock on error path
RDMA/rxe: Fix responder length checking for UD request packets
RDMA/rxe: Fix data copy for IB_SEND_INLINE
RDMA/bnxt_re: Fix the max msix vectors macro
Linus Torvalds [Fri, 21 Jun 2024 18:26:43 +0000 (11:26 -0700)]
Merge tag 'sound-6.10-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull more sound fixes from Takashi Iwai:
"A follow-up fix for a random build issue, as well as another trivial
HD-audio quirk"
* tag 'sound-6.10-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Use imply for suggesting CONFIG_SERIAL_MULTI_INSTANTIATE
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14AHP9
Linus Torvalds [Fri, 21 Jun 2024 18:20:37 +0000 (11:20 -0700)]
Merge tag 'acpi-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These address a possible NULL pointer dereference in the ACPICA code
and quirk camera enumeration on multiple platforms where incorrect
data are present in the platform firmware.
Specifics:
- Undo an ACPICA code change that attempted to keep operation regions
within a page boundary, but allowed accesses to unmapped memory to
occur (Raju Rangoju)
- Ignore MIPI camera graph port nodes created with the help of the
information from the ACPI tables on all Dell Tiger, Alder and
Raptor Lake models as that information is reported to be invalid on
the platforms in question (Hans de Goede)
- Use new Intel CPU model matching macros in the MIPI DisCo for
Imaging part of ACPI device enumeration (Hans de Goede)"
* tag 'acpi-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: mipi-disco-img: Switch to new Intel CPU model defines
ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and Raptor Lake models
ACPICA: Revert "ACPICA: avoid Info: mapping multiple BARs. Your kernel is fine."
Linus Torvalds [Fri, 21 Jun 2024 18:16:56 +0000 (11:16 -0700)]
Merge tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix the Mediatek lvts_thermal driver, the Intel int340x driver,
and the thermal core (two issues related to system suspend).
Specifics:
- Remove the filtered mode for mt8188 from lvts_thermal as it is not
supported on this platform and fail the lvts_thermal initialization
when the golden temperature is zero as that means the efuse data is
not correctly set (Julien Panis)
- Update the processor_thermal part of the Intel int340x driver to
support shared interrupts as the processor thermal device interrupt
may in fact be shared with PCI devices (Srinivas Pandruvada)
- Synchronize the suspend-prepare and post-suspend actions of the
thermal PM notifier to avoid a destructive race condition and
change the priority of that notifier to the minimum to avoid
interference between the work items spawned by it and the other
PM notifiers during system resume (Rafael Wysocki)"
* tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: processor_thermal: Support shared interrupts
thermal: core: Change PM notifier priority to the minimum
thermal: core: Synchronize suspend-prepare and post-suspend actions
thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
Kent Overstreet [Thu, 20 Jun 2024 17:20:49 +0000 (13:20 -0400)]
bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()
discard_new_inode() is the correct interface for tearing down an indoe
that was fully created but not made visible to other threads, but it
expects I_NEW to be set, which we don't use.
Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 20 Jun 2024 14:04:35 +0000 (10:04 -0400)]
bcachefs: Fix missing alloc_data_type_set()
Incorrect bucket state transition in the discard path; when incrementing
a bucket's generation number that had already been discarded, we were
forgetting to check if it should be need_gc_gens, not free.
This was caught by the .invalid checks in the transaction commit path,
causing us to go emergency read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Patrisious Haddad [Tue, 28 May 2024 12:52:56 +0000 (15:52 +0300)]
RDMA/mlx5: Add check for srq max_sge attribute
max_sge attribute is passed by the user, and is inserted and used
unchecked, so verify that the value doesn't exceed maximum allowed value
before using it.
Jason Gunthorpe [Tue, 28 May 2024 12:52:53 +0000 (15:52 +0300)]
RDMA/mlx5: Follow rb_key.ats when creating new mkeys
When a cache ent already exists but doesn't have any mkeys in it the cache
will automatically create a new one based on the specification in the
ent->rb_key.
ent->ats was missed when creating the new key and so ma_translation_mode
was not being set even though the ent requires it.
Jason Gunthorpe [Tue, 28 May 2024 12:52:52 +0000 (15:52 +0300)]
RDMA/mlx5: Remove extra unlock on error path
The below commit lifted the locking out of this function but left this
error path unlock behind resulting in unbalanced locking. Remove the
missed unlock too.
Michael Roth [Tue, 4 Jun 2024 23:35:10 +0000 (18:35 -0500)]
KVM: SEV-ES: Fix svm_get_msr()/svm_set_msr() for KVM_SEV_ES_INIT guests
With commit 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA
encryption"), older VMMs like QEMU 9.0 and older will fail when booting
SEV-ES guests with something like the following error:
qemu-system-x86_64: error: failed to get MSR 0x174
qemu-system-x86_64: ../qemu.git/target/i386/kvm/kvm.c:3950: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
This is because older VMMs that might still call
svm_get_msr()/svm_set_msr() for SEV-ES guests after guest boot even if
those interfaces were essentially just noops because of the vCPU state
being encrypted and stored separately in the VMSA. Now those VMMs will
get an -EINVAL and generally crash.
Newer VMMs that are aware of KVM_SEV_INIT2 however are already aware of
the stricter limitations of what vCPU state can be sync'd during
guest run-time, so newer QEMU for instance will work both for legacy
KVM_SEV_ES_INIT interface as well as KVM_SEV_INIT2.
So when using KVM_SEV_INIT2 it's okay to assume userspace can deal with
-EINVAL, whereas for legacy KVM_SEV_ES_INIT the kernel might be dealing
with either an older VMM and so it needs to assume that returning
-EINVAL might break the VMM.
Address this by only returning -EINVAL if the guest was started with
KVM_SEV_INIT2. Otherwise, just silently return.
Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nikunj A Dadhania <nikunj@amd.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Closes: https://lore.kernel.org/lkml/37usuu4yu4ok7be2hqexhmcyopluuiqj3k266z4gajc2rcj4yo@eujb23qc3zcm/ Fixes: 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA encryption") Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240604233510.764949-1-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rafael J. Wysocki [Fri, 21 Jun 2024 10:55:12 +0000 (12:55 +0200)]
Merge branch 'acpi-scan'
Merge ACPI device enumeration fixes for 6.10-rc5:
- Ignore MIPI camera graph port nodes created with the help of the
information from the ACPI tables on all Dell Tiger, Alder and Raptor
Lake models as that information is reported to be invalid on the
systems in question (Hans de Goede).
- Use new Intel CPU model matching macros in the MIPI DisCo for Imaging
part of ACPI device enumeration (Hans de Goede).
* acpi-scan:
ACPI: mipi-disco-img: Switch to new Intel CPU model defines
ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and Raptor Lake models
Takashi Iwai [Fri, 21 Jun 2024 07:39:09 +0000 (09:39 +0200)]
ALSA: hda: Use imply for suggesting CONFIG_SERIAL_MULTI_INSTANTIATE
The recent fix introduced a reverse selection of
CONFIG_SERIAL_MULTI_INSTANTIATE, but its condition isn't always met.
Use a weak reverse selection to suggest the config for avoiding such
inconsistencies, instead.
Arnd Bergmann [Thu, 20 Jun 2024 16:23:04 +0000 (18:23 +0200)]
mips: fix compat_sys_lseek syscall
This is almost compatible, but passing a negative offset should result
in a EINVAL error, but on mips o32 compat mode would seek to a large
32-bit byte offset.
Use compat_sys_lseek() to correctly sign-extend the argument.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>