www.infradead.org Git - users/willy/linux.git/log

ext4: make sure the first directory block is not a hole

The syzbot constructs a directory that has no dirblock but is non-inline,
i.e. the first directory block is a hole. And no errors are reported when
creating files in this directory in the following flow.

    ext4_mknod
     ...
      ext4_add_entry
        // Read block 0
        ext4_read_dirblock(dir, block, DIRENT)
          bh = ext4_bread(NULL, inode, block, 0)
          if (!bh && (type == INDEX || type == DIRENT_HTREE))
          // The first directory block is a hole
          // But type == DIRENT, so no error is reported.

After that, we get a directory block without '.' and '..' but with a valid
dentry. This may cause some code that relies on dot or dotdot (such as
make_indexed_dir()) to crash.

Therefore when ext4_read_dirblock() finds that the first directory block
is a hole report that the filesystem is corrupted and return an error to
avoid loading corrupted data from disk causing something bad.

Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
Fixes: 4e19d6b65fb4 ("ext4: allow directory holes")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: check dot and dotdot of dx_root before making dir indexed

Syzbot reports a issue as follows:
============================================
BUG: unable to handle page fault for address: ffffed11022e24fe
PGD 23ffee067 P4D 23ffee067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-g55027e689933 #0
Call Trace:
<TASK>
make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341
ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451
ext4_rename fs/ext4/namei.c:3936 [inline]
ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214
[...]
============================================

The immediate cause of this problem is that there is only one valid dentry
for the block to be split during do_split, so split==0 results in out of
bounds accesses to the map triggering the issue.

    do_split
      unsigned split
      dx_make_map
       count = 1
      split = count/2 = 0;
      continued = hash2 == map[split - 1].hash;
       ---> map[4294967295]

The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called.

But syzbot's crafted image has no dot and dotdot in dir, and the dentry
distribution in dirblock is as follows:

  bus     dentry1          hole           dentry2           free
|xx--|xx-------------|...............|xx-------------|...............|
0   12 (8+248)=256  268     256     524 (8+256)=264 788     236     1024

So when renaming dentry1 increases its name_len length by 1, neither hole
nor free is sufficient to hold the new dentry, and make_indexed_dir() is
called.

In make_indexed_dir() it is assumed that the first two entries of the
dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root
because they are treated as dot and dotdot, and only dentry2 is moved
to the new leaf block. That's why count is equal to 1.

Therefore add the ext4_check_dx_root() helper function to add more sanity
checks to dot and dotdot before starting the conversion to avoid the above
issue.

Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240702132349.2600605-2-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: sanity check for NULL pointer after ext4_force_shutdown

Test case: 2 threads write short inline data to a file.
In ext4_page_mkwrite the resulting inline data is converted.
Handling ext4_grp_locked_error with description "block bitmap
and bg descriptor inconsistent: X vs Y free clusters" calls
ext4_force_shutdown. The conversion clears
EXT4_STATE_MAY_INLINE_DATA but fails for
ext4_destroy_inline_data_nolock and ext4_mark_iloc_dirty due
to ext4_forced_shutdown. The restoration of inline data fails
for the same reason not setting EXT4_STATE_MAY_INLINE_DATA.
Without the flag set a regular process path in ext4_da_write_end
follows trying to dereference page folio private pointer that has
not been set. The fix calls early return with -EIO error shall the
pointer to private be NULL.

Sample crash report:

Unable to handle kernel paging request at virtual address dfff800000000004
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000004] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 20274 Comm: syz-executor185 Not tainted 6.9.0-rc7-syzkaller-gfda5695d692c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __block_commit_write+0x64/0x2b0 fs/buffer.c:2167
lr : __block_commit_write+0x3c/0x2b0 fs/buffer.c:2160
sp : ffff8000a1957600
x29: ffff8000a1957610 x28: dfff800000000000 x27: ffff0000e30e34b0
x26: 0000000000000000 x25: dfff800000000000 x24: dfff800000000000
x23: fffffdffc397c9e0 x22: 0000000000000020 x21: 0000000000000020
x20: 0000000000000040 x19: fffffdffc397c9c0 x18: 1fffe000367bd196
x17: ffff80008eead000 x16: ffff80008ae89e3c x15: 00000000200000c0
x14: 1fffe0001cbe4e04 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : 0000000000000004 x7 : 0000000000000000 x6 : 0000000000000000
x5 : fffffdffc397c9c0 x4 : 0000000000000020 x3 : 0000000000000020
x2 : 0000000000000040 x1 : 0000000000000020 x0 : fffffdffc397c9c0
Call trace:
__block_commit_write+0x64/0x2b0 fs/buffer.c:2167
block_write_end+0xb4/0x104 fs/buffer.c:2253
ext4_da_do_write_end fs/ext4/inode.c:2955 [inline]
ext4_da_write_end+0x2c4/0xa40 fs/ext4/inode.c:3028
generic_perform_write+0x394/0x588 mm/filemap.c:3985
ext4_buffered_write_iter+0x2c0/0x4ec fs/ext4/file.c:299
ext4_file_write_iter+0x188/0x1780
call_write_iter include/linux/fs.h:2110 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x968/0xc3c fs/read_write.c:590
ksys_write+0x15c/0x26c fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__arm64_sys_write+0x7c/0x90 fs/read_write.c:652
__invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: 97f85911 f94002da 91008356 d343fec8 (38796908)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
   0: 97f85911 bl 0xffffffffffe16444
   4: f94002da ldr x26, [x22]
   8: 91008356 add x22, x26, #0x20
   c: d343fec8 lsr x8, x22, #3
* 10: 38796908 ldrb w8, [x8, x25] <-- trapping instruction

Reported-by: syzbot+18df508cf00a0598d9a6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18df508cf00a0598d9a6
Link: https://lore.kernel.org/all/000000000000f19a1406109eb5c5@google.com/T/
Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com>
Link: https://patch.msgid.link/20240703070112.10235-1-wojciech.gladysz@infogain.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: increase maximum transaction size

Originally, we were quite conservative in limiting maximum transaction
size to a quarter of the journal because we were not accounting
transaction descriptor and revoke blocks. These days we do properly
account them and reserve space for them from the total transaction
credits. Thus there's no need to be so conservative and we can increase
the maximum transaction size to one third of the journal (even half
should work fine in principle but the performance will likely suffer in
that case). This also fixes failures to grow filesystems with tiny
journals.

Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: drop pointless shrinker batch initialization

In jbd2_journal_init_common() we set batch size of a shrinker shrinking
checkpointed buffers to journal->j_max_transaction_buffers. But that is
guaranteed to be 0 at that point so we effectively stay with the default
shrinker batch size of 128. It has been like this since introduction of
jbd2 shrinkers so just drop the pointless initialization.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: avoid infinite transaction commit loop

Commit 9f356e5a4f12 ("jbd2: Account descriptor blocks into
t_outstanding_credits") started to account descriptor blocks into
transactions outstanding credits. However it didn't appropriately
decrease the maximum amount of credits available to userspace. Thus if
the filesystem requests a transaction smaller than
j_max_transaction_buffers but large enough that when descriptor blocks
are added the size exceeds j_max_transaction_buffers, we confuse
add_transaction_credits() into thinking previous handles have grown the
transaction too much and enter infinite journal commit loop in
start_this_handle() -> add_transaction_credits() trying to create
transaction with enough credits available.

Fix the problem by properly accounting for transaction space reserved
for descriptor blocks when verifying requested transaction handle size.

CC: stable@vger.kernel.org
Fixes: 9f356e5a4f12 ("jbd2: Account descriptor blocks into t_outstanding_credits")
Reported-by: Alexander Coffin <alex.coffin@maticrobots.com>
Link: https://lore.kernel.org/all/CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: precompute number of transaction descriptor blocks

Instead of computing the number of descriptor blocks a transaction can
have each time we need it (which is currently when starting each
transaction but will become more frequent later) precompute the number
once during journal initialization together with maximum transaction
size. We perform the precomputation whenever journal feature set is
updated similarly as for computation of
journal->j_revoke_records_per_block.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: make jbd2_journal_get_max_txn_bufs() internal

There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: avoid mount failed when commit block is partial submitted

We encountered a problem that the file system could not be mounted in
the power-off scenario. The analysis of the file system mirror shows that
only part of the data is written to the last commit block.
The valid data of the commit block is concentrated in the first sector.
However, the data of the entire block is involved in the checksum calculation.
For different hardware, the minimum atomic unit may be different.
If the checksum of a committed block is incorrect, clear the data except the
'commit_header' and then calculate the checksum. If the checkusm is correct,
it is considered that the block is partially committed, Then continue to replay
journal.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240620072405.3533701-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: avoid writing unitialized memory to disk in EA inodes

If the extended attribute size is not a multiple of block size, the last
block in the EA inode will have uninitialized tail which will get
written to disk. We will never expose the data to userspace but still
this is not a good practice so just zero out the tail of the block as it
isn't going to cause a noticeable performance overhead.

Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: don't track ranges in fast_commit if inode has inlined data

When fast-commit needs to track ranges, it has to handle inodes that have
inlined data in a different way because ext4_fc_write_inode_data(), in the
actual commit path, will attempt to map the required blocks for the range.
However, inodes that have inlined data will have it's data stored in
inode->i_block and, eventually, in the extended attribute space.

Unfortunately, because fast commit doesn't currently support extended
attributes, the solution is to mark this commit as ineligible.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Tested-by: Ben Hutchings <benh@debian.org>
Fixes: 9725958bb75c ("ext4: fast commit may miss tracking unwritten range during ftruncate")
Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: fix possible tid_t sequence overflows

In the fast commit code there are a few places where tid_t variables are
being compared without taking into account the fact that these sequence
numbers may wrap. Fix this issue by using the helper functions tid_gt()
and tid_geq().

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://patch.msgid.link/20240529092030.9557-3-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: use ext4_update_inode_fsync_trans() helper in inode creation

Call helper function ext4_update_inode_fsync_trans() instead of open
coding it in __ext4_new_inode(). This helper checks both that the handle
is valid *and* that it hasn't been aborted due to some fatal error in the
journalling layer, using is_handle_aborted().

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Link: https://patch.msgid.link/20240527161447.21434-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: add missing MODULE_DESCRIPTION()

Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/ext4/ext4-inode-test.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240527-md-fs-ext4-v1-1-07aad5936bb1@quicinc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: add missing MODULE_DESCRIPTION()

Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/jbd2/jbd2.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://patch.msgid.link/20240526-md-fs-jbd2-v1-1-7bba6665327d@quicinc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: use memtostr_pad() for s_volume_name

As with the other strings in struct ext4_super_block, s_volume_name is
not NUL terminated. The other strings were marked in commit 072ebb3bffe6
("ext4: add nonstring annotations to ext4.h"). Using strscpy() isn't
the right replacement for strncpy(); it should use memtostr_pad()
instead.

Reported-by: syzbot+50835f73143cc2905b9e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000019f4c00619192c05@google.com/
Fixes: 744a56389f73 ("ext4: replace deprecated strncpy with alternatives")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://patch.msgid.link/20240523225408.work.904-kees@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: speed up jbd2_transaction_committed()

jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it holds j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.

We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.

fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
    -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
    -group_reporting

Before:
  overwrite       IOPS=88.2k, BW=344MiB/s
  read            IOPS=95.7k, BW=374MiB/s
  rand overwrite  IOPS=98.7k, BW=386MiB/s
  randread        IOPS=102k, BW=397MiB/s

After:
  overwrite       IOPS=105k, BW=410MiB/s
  read            IOPS=112k, BW=436MiB/s
  rand overwrite  IOPS=104k, BW=404MiB/s
  randread        IOPS=111k, BW=432MiB/s

CC: Dave Chinner <david@fromorbit.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://patch.msgid.link/20240520131831.2910790-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: make ext4_da_map_blocks() buffer_head unaware

After calling the ext4_da_map_blocks(), a delalloc extent state could
be identified through the EXT4_MAP_DELAYED flag in map. So factor out
buffer_head related handles in ext4_da_map_blocks(), make this function
buffer_head unaware and becomes a common helper, and also update the
stale function commtents, preparing for the iomap da write path in the
future.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-11-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: make ext4_insert_delayed_block() insert multi-blocks

Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
pass length parameter to make it insert multiple delalloc blocks at a
time. For non-bigalloc case, just reserve len blocks and insert delalloc
extent. For bigalloc case, we can ensure that the clusters in the middle
of a extent must be unallocated, we only need to check whether the start
and end clusters are delayed/allocated. We should subtract the space for
the start and/or end block(s) if they are allocated.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-10-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: factor out a helper to check the cluster allocation state

Factor out a common helper ext4_clu_alloc_state(), check whether the
cluster containing a delalloc block to be added has been allocated or
has delalloc reservation, no logic changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-9-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: make ext4_da_reserve_space() reserve multi-clusters

Add 'nr_resv' parameter to ext4_da_reserve_space(), which indicates the
number of clusters wants to reserve, make it reserve multiple clusters
at a time.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: make ext4_es_insert_delayed_block() insert multi-blocks

Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
and pass length parameter to make it insert multiple delalloc blocks at
a time. For the case of bigalloc, split the allocated parameter to
lclu_allocated and end_allocated. lclu_allocated indicates the
allocation state of the cluster which is containing the lblk,
end_allocated indicates the allocation state of the extent end, clusters
in the middle of delay allocated extent must be unallocated.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: drop iblock parameter

The start block of the delalloc extent to be inserted is equal to
map->m_lblk, just drop the duplicate iblock input parameter.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://patch.msgid.link/20240517124005.347221-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: trim delalloc extent

In ext4_da_map_blocks(), we could find four kind of extents in the
extent status tree: hole, unwritten, written and delayed extent. Now we
only trim the map len if we found an unwritten extent or a written
extent. This is okay now since map->m_len is always set to one and we
always insert one delayed block at a time. But this will become isn't
okay for other two cases if ext4_insert_delayed_block() and
ext4_da_map_blocks() support inserting multiple map->len blocks later.

1. If we found a hole in the extent status tree which es->es_len is
   shorter than the length we want to write, we should trim the
   map->m_len to prevent adding extra delay more blocks than we
   expected. For example, assume we write data [A, C) to a file that
   contains a hole extent [A, B) and a written extent [B, D) in the
   cache.

                         A     B  C  D
   before da write:   ...hhhhhh|wwwwww....

   Then we will get extent [A, B), we should trim map->m_len to B-A
   before inserting new delalloc blocks, if not, the range [B, C) will
   be duplicated.

2. If we found a delayed extent in the extent status tree which
   es->es_len is shorter than the length we want to write, we should
   trim the map->m_len to es->es_len and return directly since the front
   part of this map has been delayed, we can't insert the delalloc
   extent that contains the latter part in this round, we should return
   the delayed length and the caller should increase the position and
   call ext4_da_map_blocks() again. For example, assume we write data
   [A, C) to a file that contains a delayed extent [A, B) in the cache.

                         A     B  C
   before da write:   ...dddddd|hhh....

   Then we will get delayed extent [A, B), we should also trim map->m_len
   to B-A and return, if not, we will incorrectly assume that the write
   is complete and won't insert [B, C).

So we need to always trim the map->m_len if the found es->es_len in the
extent status tree is shorter than the map->m_len, prearing for
inserting a extent with multiple delalloc blocks. This patch only does a
pre-fix, the handle is crude and ext4_da_map_blocks() deserve a cleanup,
we will do that later.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: warn if delalloc counters are not zero on inactive

The per-inode i_reserved_data_blocks count the reserved delalloc blocks
in a regular file, it should be zero when destroying the file. The
per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
filesystem, it also should be zero when umounting the filesystem. Now we
have only an error message if the i_reserved_data_blocks is not zero,
which is unable to be simply captured, so add WARN_ON_ONCE to make it
more visable.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: check the extent status again before inserting delalloc block

ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).

If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.

Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
   folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
   lock.

But this can race between ext4_page_mkwrite() & ext4 fallocate path

ext4_page_mkwrite()             ext4_fallocate()
block_page_mkwrite()
  ext4_da_map_blocks()
   //find hole in extent status tree
                                 ext4_alloc_file_blocks()
                                  ext4_map_blocks()
                                   //allocate block and unwritten extent
   ext4_insert_delayed_block()
    ext4_da_reserve_space()
     //reserve one more block
    ext4_es_insert_delayed_block()
     //drop unwritten extent and add delayed extent by mistake

Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:

EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!

Fix the problem by looking up extent status tree again while the
i_data_sem is held in write mode. If it still can't find any entry, then
we insert a new da entry into the extent status tree.

Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240517124005.347221-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: factor out a common helper to query extent map

Factor out a new common helper ext4_map_query_blocks() from the
ext4_da_map_blocks(), it query and return the extent map status on the
inode's extent path, no logic changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://patch.msgid.link/20240517124005.347221-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: fix infinite loop when replaying fast_commit

When doing fast_commit replay an infinite loop may occur due to an
uninitialized extent_status struct. ext4_ext_determine_insert_hole() does
not detect the replay and calls ext4_es_find_extent_range(), which will
return immediately without initializing the 'es' variable.

Because 'es' contains garbage, an integer overflow may happen causing an
infinite loop in this function, easily reproducible using fstest generic/039.

This commit fixes this issue by unconditionally initializing the structure
in function ext4_es_find_extent_range().

Thanks to Zhang Yi, for figuring out the real problem!

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240515082857.32730-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove unnecessary "should_sleep" in kjournald2

We only need to sleep if no running transaction is expired. Simply remove
unnecessary "should_sleep".

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-10-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove dead check of JBD2_UNMOUNT in kjournald2

We always set JBD2_UNMOUNT with j_state_lock held in journal_kill_thread.
In kjournald2, we check JBD2_UNMOUNT flag two times under the same
j_state_lock. Then the second check is unnecessary.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-9-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove dead equality check of j_commit_[sequence/request] in kjournald2

The j_commit_[sequence/request] are updated with j_state_lock held during
runtime. In kjournald2, two equality checks of j_commit_[sequence/request]
are under the same j_state_lock, then the second check is unnecessary.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-8-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: use bh_in instead of jh2bh(jh_in) to simplify code

We save jh2bh(jh_in) to bh_in, so use bh_in directly instead of
jh2bh(jh_in) to simplify the code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove unneeded kmap to do escape in jbd2_journal_write_metadata_buffer

The data to do escape could be accessed directly from b_frozen_data,
just remove unneeded kmap.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: jump to new copy_done tag when b_frozen_data is created concurrently

If b_frozen_data is created concurrently, we can update new_folio and
new_offset with b_frozen_data and then move forward

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove unnedded "need_copy_out" in jbd2_journal_write_metadata_buffer

As we only need to copy out when we should do escape, need_copy_out
could be simply replaced by "do_escape".

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: remove unused return info from jbd2_journal_write_metadata_buffer

The done_copy_out info from jbd2_journal_write_metadata_buffer is not
used. Simply remove it.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: avoid memleak in jbd2_journal_write_metadata_buffer

The new_bh is from alloc_buffer_head, we should call free_buffer_head to
free it in error case.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240514112438.1269037-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: fix uninitialized variable in ext4_inlinedir_to_tree

Syzbot has found an uninit-value bug in ext4_inlinedir_to_tree

This error happens because ext4_inlinedir_to_tree does not
handle the case when ext4fs_dirhash returns an error

This can be avoided by checking the return value of ext4fs_dirhash
and propagating the error,
similar to how it's done with ext4_htree_store_dirent

Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Reported-and-tested-by: syzbot+eaba5abe296837a640c0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=eaba5abe296837a640c0
Link: https://patch.msgid.link/20240501033017.220000-1-shenxiaxi26@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2: use str_plural() to fix Coccinelle warning

Fixes the following Coccinelle/coccicheck warning reported by
string_choices.cocci:

opportunity for str_plural(dropped)

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://patch.msgid.link/20240402105157.254389-2-thorsten.blum@toblux.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: block_validity: Remove unnecessary ‘NULL’ values from new_node

new_node is assigned first, so it does not need to initialize the
assignment.

Signed-off-by: Li zeming <zeming@nfschina.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://patch.msgid.link/20240402022300.25858-1-zeming@nfschina.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Linux 6.10-rc5

Merge tag 'i2c-for-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"The core gains placeholders for recently added functions when
  CONFIG_I2C is not defined as well documentation fixes to start using
  inclusive terminology.

  The drivers get paths in DT bindings fixed as well as proper interrupt
  handling for the ocores driver"

* tag 'i2c-for-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  docs: i2c: summary: be clearer with 'controller/target' and 'adapter/client' pairs
  docs: i2c: summary: document 'local' and 'remote' targets
  docs: i2c: summary: document use of inclusive language
  docs: i2c: summary: update speed mode description
  docs: i2c: summary: update I2C specification link
  docs: i2c: summary: start sentences consistently.
  i2c: Add nop fwnode operations
  i2c: ocores: set IACK bit after core is enabled
  dt-bindings: i2c: google,cros-ec-i2c-tunnel: correct path to i2c-controller schema
  dt-bindings: i2c: atmel,at91sam: correct path to i2c-controller schema

Merge tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
"Five smb3 client fixes

   - three nets/fiolios cifs fixes

   - fix typo in module parameters description

   - fix incorrect swap warning"

* tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Move the 'pid' from the subreq to the req
  cifs: Only pick a channel once per read request
  cifs: Defer read completion
  cifs: fix typo in module parameter enable_gcm_256
  cifs: drop the incorrect assertion in cifs_swap_rw()

Merge tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
"Fix fragility in checks for unset node ID.

  Use numa_valid_node() function to verify that nid is a valid node
  ID instead of inconsistent comparisons with either NUMA_NO_NODE or
  MAX_NUMNODES"

* tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: use numa_valid_node() helper to check for invalid node ID

Merge tag 'mips-fixes_6.10_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- fix lseek in o32 compat mode

- fix for microMIPS MT ASE helpers

* tag 'mips-fixes_6.10_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: fix compat_sys_lseek syscall
MIPS: mipsmtregs: Fix target register for MFTC0

Merge tag 'x86_urgent_for_v6.10_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- An ARM-relevant fix to not free default RMIDs of a resource control
   group

- A randconfig build fix for the VMware virtual GPU driver

* tag 'x86_urgent_for_v6.10_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Don't try to free nonexistent RMIDs
  drm/vmwgfx: Fix missing HYPERVISOR_GUEST dependency

Merge tag 'powerpc-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Prevent use-after-free in 64-bit KVM VFIO

- Add generated Power8 crypto asm to .gitignore

Thanks to Al Viro and Nathan Lynch.

* tag 'powerpc-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()
powerpc/crypto: Add generated P8 asm to .gitignore

Merge tag 'i2c-host-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

This pull request fixes the paths of the dt-schema to their
complete locations for the ChromeOS EC tunnel driver and the
Atmel at91sam drivers.

Additionally, the OpenCores driver receives a fix for an issue
that dates back to version 2.6.18. Specifically, the interrupts
need to be acknowledged (clearing all pending interrupts) after
enabling the core.

Merge tag 'rust-fixes-6.10' of https://github.com/Rust-for-Linux/linux

Pull rust fix from Miguel Ojeda:

- Avoid unused import warning in 'rusttest'.

* tag 'rust-fixes-6.10' of https://github.com/Rust-for-Linux/linux:
rust: avoid unused import warning in `rusttest`

Merge tag 'regulator-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A few driver specific fixes for incorrect device descriptions, plus a
  fix for a missing symbol export which causes build failures for some
  newly added drivers in other trees"

* tag 'regulator-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: axp20x: AXP717: fix LDO supply rails and off-by-ones
  regulator: bd71815: fix ramp values
  regulator: core: Fix modpost error "regulator_get_regmap" undefined
  regulator: tps6594-regulator: Fix the number of irqs for TPS65224 and TPS6594

Merge tag 'spi-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A number of fixes that have built up for SPI, a bunch of driver
  specific ones including an unfortunate revert of an optimisation for
  the i.MX driver which was causing issues with some configurations,
  plus a couple of core fixes for the rarely used octal mode and for a
  bad interaction between multi-CS support and target mode"

* tag 'spi-fix-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-imx: imx51: revert burst length calculation back to bits_per_word
  spi: Fix SPI slave probe failure
  spi: Fix OCTAL mode support
  spi: stm32: qspi: Clamp stm32_qspi_get_mode() output to CCR_BUSWIDTH_4
  spi: stm32: qspi: Fix dual flash mode sanity test in stm32_qspi_setup()
  spi: cs42l43: Drop cs35l56 SPI speed down to 11MHz
  spi: cs42l43: Correct SPI root clock speed

Merge tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix crashes triggered by administrative operations on the server

* tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()
nfsd: fix oops when reading pool_stats before server is started

Merge tag 'xfs-6.10-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Chandan Babu:

- Fix assertion failure due to a race between unlink and cluster buffer
instantiation.

* tag 'xfs-6.10-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix unlink vs cluster buffer instantiation race

Merge tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"Lots of (mostly boring) fixes for syzbot bugs and rare(r) CI bugs.

  The LRU_TIME_BITS fix was slightly more involved; we only have 48 bits
  for the LRU position (we would prefer 64), so wraparound is possible
  for the cached data LRUs on a filesystem that has done sufficient
  (petabytes) reads; this is now handled.

  One notable user reported bugfix, where we were forgetting to
  correctly set the bucket data type, which should have been
  BCH_DATA_need_gc_gens instead of BCH_DATA_free; this was causing us to
  go emergency read-only on a filesystem that had seen heavy enough use
  to see bucket gen wraparoud.

  We're now starting to fix simple (safe) errors without requiring user
  intervention - i.e. a small incremental step towards full self
  healing.

  This is currently limited to just certain allocation information
  counters, and the error is still logged in the superblock; see that
  patch for more information. ("bcachefs: Fix safe errors by default")"

* tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs: (22 commits)
  bcachefs: Move the ei_flags setting to after initialization
  bcachefs: Fix a UAF after write_super()
  bcachefs: Use bch2_print_string_as_lines for long err
  bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()
  bcachefs: Replace bare EEXIST with private error codes
  bcachefs: Fix missing alloc_data_type_set()
  closures: Change BUG_ON() to WARN_ON()
  bcachefs: fix alignment of VMA for memory mapped files on THP
  bcachefs: Fix safe errors by default
  bcachefs: Fix bch2_trans_put()
  bcachefs: set_worker_desc() for delete_dead_snapshots
  bcachefs: Fix bch2_sb_downgrade_update()
  bcachefs: Handle cached data LRU wraparound
  bcachefs: Guard against overflowing LRU_TIME_BITS
  bcachefs: delete_dead_snapshots() doesn't need to go RW
  bcachefs: Fix early init error path in journal code
  bcachefs: Check for invalid btree IDs
  bcachefs: Fix btree ID bitmasks
  bcachefs: Fix shift overflow in read_one_super()
  bcachefs: Fix a locking bug in the do_discard_fast() path
  ...

Merge tag 'ata-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Niklas Cassel:

- We currently enable DIPM (device initiated power management) in the
   device (using a SET FEATURES call to the device), regardless if the
   HBA supports any LPM states or not. It seems counter intuitive, and
   potentially dangerous to enable a device side feature, when the HBA
   does not have the corresponding support. Thus, make sure that we do
   not enable DIPM if the HBA does not support any LPM states.

* tag 'ata-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: ahci: Do not enable LPM if no LPM states are supported by the HBA

Merge tag 'pwm/for-6.10-rc5-fixes-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm fixes from Uwe Kleine-König:
"Three fixes for the pwm-stm32 driver.

  The first patch prevents an integer wrap-around for small periods. In
  the second patch the calculation of the prescaler is fixed which
  resulted in values for the ARR register that don't fit into the
  corresponding register bit field. The last commit improves an error
  message that was wrongly copied from another error path"

* tag 'pwm/for-6.10-rc5-fixes-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: stm32: Fix error message to not describe the previous error path
  pwm: stm32: Fix calculation of prescaler
  pwm: stm32: Refuse too small period requests

Merge tag 'arm-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"There are seven oneline patches that each address a distinct problem
  on the NXP i.MX platform, mostly the popular i.MX8M variant.

  The only other two fixes are for error handling on the psci firmware
  driver and SD card support on the milkv duo riscv board"

* tag 'arm-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: psci: Fix return value from psci_system_suspend()
  riscv: dts: sophgo: disable write-protection for milkv duo
  arm64: dts: imx8qm-mek: fix gpio number for reg_usdhc2_vmmc
  arm64: dts: freescale: imx8mm-verdin: enable hysteresis on slow input pin
  arm64: dts: imx93-11x11-evk: Remove the 'no-sdio' property
  arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix BT shutdown GPIO
  arm: dts: imx53-qsb-hdmi: Disable panel instead of deleting node
  arm64: dts: imx8mp: Fix TC9595 input clock on DH i.MX8M Plus DHCOM SoM
  arm64: dts: freescale: imx8mm-verdin: Fix GPU speed

Merge tag 'loongarch-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Some hw breakpoint fixes, an objtool build warnging fix, and a trivial
  cleanup"

* tag 'loongarch-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Remove an unneeded semicolon
  LoongArch: Fix multiple hardware watchpoint issues
  LoongArch: Trigger user-space watchpoints correctly
  LoongArch: Fix watchpoint setting error
  LoongArch: Only allow OBJTOOL & ORC unwinder if toolchain supports -mthin-add-sub

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Fix dangling references to a redistributor region if the vgic was
     prematurely destroyed.

   - Properly mark FFA buffers as released, ensuring that both parties
     can make forward progress.

  x86:

   - Allow getting/setting MSRs for SEV-ES guests, if they're using the
     pre-6.9 KVM_SEV_ES_INIT API.

   - Always sync pending posted interrupts to the IRR prior to IOAPIC
     route updates, so that EOIs are intercepted properly if the old
     routing table requested that.

  Generic:

   - Avoid __fls(0)

   - Fix reference leak on hwpoisoned page

   - Fix a race in kvm_vcpu_on_spin() by ensuring loads and stores are
     atomic.

   - Fix bug in __kvm_handle_hva_range() where KVM calls a function
     pointer that was intended to be a marker only (nothing bad happens
     but kind of a mine and also technically undefined behavior)

   - Do not bother accounting allocations that are small and freed
     before getting back to userspace.

  Selftests:

   - Fix compilation for RISC-V.

   - Fix a "shift too big" goof in the KVM_SEV_INIT2 selftest.

   - Compute the max mappable gfn for KVM selftests on x86 using
     GuestMaxPhyAddr from KVM's supported CPUID (if it's available)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV-ES: Fix svm_get_msr()/svm_set_msr() for KVM_SEV_ES_INIT guests
  KVM: Discard zero mask with function kvm_dirty_ring_reset
  virt: guest_memfd: fix reference leak on hwpoisoned page
  kvm: do not account temporary allocations to kmem
  MAINTAINERS: Drop Wanpeng Li as a Reviewer for KVM Paravirt support
  KVM: x86: Always sync PIR to IRR prior to scanning I/O APIC routes
  KVM: Stop processing *all* memslots when "null" mmu_notifier handler is found
  KVM: arm64: FFA: Release hyp rx buffer
  KVM: selftests: Fix RISC-V compilation
  KVM: arm64: Disassociate vcpus from redistributor region on teardown
  KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
  KVM: selftests: x86: Prioritize getting max_gfn from GuestPhysBits
  KVM: selftests: Fix shift of 32 bit unsigned int more than 32 bits

pwm: stm32: Fix error message to not describe the previous error path

"Failed to lock the clock" is an appropriate error message for
clk_rate_exclusive_get() failing, but not for the clock running too
fast for the driver's calculations.

Adapt the error message accordingly.

Fixes: d44d635635a7 ("pwm: stm32: Fix for settings using period > UINT32_MAX")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/285182163211203fc823a65b180761f46e828dcb.1718979150.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>

pwm: stm32: Fix calculation of prescaler

A small prescaler is beneficial, as this improves the resolution of the
duty_cycle configuration. However if the prescaler is too small, the
maximal possible period becomes considerably smaller than the requested
value.

One situation where this goes wrong is the following: With a parent
clock rate of 208877930 Hz and max_arr = 0xffff = 65535, a request for
period = 941243 ns currently results in PSC = 1. The value for ARR is
then calculated to

ARR = 941243 * 208877930 / (1000000000 * 2) - 1 = 98301

This value is bigger than 65535 however and so doesn't fit into the
respective register field. In this particular case the PWM was
configured for a period of 313733.4806027616 ns (with ARR = 98301 &
0xffff). Even if ARR was configured to its maximal value, only period =
627495.6861167669 ns would be achievable.

Fix the calculation accordingly and adapt the comment to match the new
algorithm.

With the calculation fixed the above case results in PSC = 2 and so an
actual period of 941229.1667195285 ns.

Fixes: 8002fbeef1e4 ("pwm: stm32: Calculate prescaler with a division instead of a loop")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/b4d96b79917617434a540df45f20cb5de4142f88.1718979150.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>

docs: i2c: summary: be clearer with 'controller/target' and 'adapter/client' pairs

This not only includes rewording, but also where to put which emphasis
on terms in this document.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

docs: i2c: summary: document 'local' and 'remote' targets

Because Linux can be a target as well, add terminology to differentiate
between Linux being the target and Linux accessing targets.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

docs: i2c: summary: document use of inclusive language

We now have the updated I2C specs and our own Code of Conduct, so we
have all we need to switch over to the inclusive terminology. Define
them here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

docs: i2c: summary: update speed mode description

Fastest I2C mode is 5 MHz. Update the docs and reword the paragraph
slightly.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

docs: i2c: summary: update I2C specification link

Luckily, the specs are directly downloadable again, so update the link.
Also update its title to the original name "I²C".

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

docs: i2c: summary: start sentences consistently.

Change the first paragraphs to contain only one space after the end of
the previous sentence like in the rest of the document.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two fixes: one in the ufs driver fixing an obvious memory leak and the
  other (with a core flag based update) trying to prevent USB crashes by
  stopping the core from issuing a request for the I/O Hints mode page"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: usb: uas: Do not query the IO Advice Hints Grouping mode page for USB/UAS devices
  scsi: core: Introduce the BLIST_SKIP_IO_HINTS flag
  scsi: ufs: core: Free memory allocated for model before reinit

Merge tag 'drm-fixes-2024-06-22' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Still pretty quiet, two weeks worth of amdgpu fixes, with one i915 and
  one xe. I didn't get the drm-misc-fixes tree PR this week, but there
  was only one fix queued and I think it can wait another week, so seems
  pretty normal.

  xe:
   - Fix for invalid register access

  i915:
   - Fix conditions for joiner usage, it's not possible with eDP MSO

  amdgpu:
   - Fix display idle optimization race
   - Fix GPUVM TLB flush locking scope
   - IPS fix
   - GFX 9.4.3 harvesting fix
   - Runtime pm fix for shared buffers
   - DCN 3.5.x fixes
   - USB4 fix
   - RISC-V clang fix
   - Silence UBSAN warnings
   - MES11 fix
   - PSP 14.0.x fix"

* tag 'drm-fixes-2024-06-22' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/vf: Don't touch GuC irq registers if using memory irqs
  drm/amdgpu: init TA fw for psp v14
  drm/amdgpu: cleanup MES11 command submission
  drm/amdgpu: fix UBSAN warning in kv_dpm.c
  drm/radeon: fix UBSAN warning in kv_dpm.c
  drm/amd/display: Disable CONFIG_DRM_AMD_DC_FP for RISC-V with clang
  drm/amd/display: Attempt to avoid empty TUs when endpoint is DPIA
  drm/amd/display: change dram_clock_latency to 34us for dcn35
  drm/amd/display: Change dram_clock_latency to 34us for dcn351
  drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
  drm/amdgpu: Indicate CU havest info to CP
  drm/amd/display: prevent register access while in IPS
  drm/amdgpu: fix locking scope when flushing tlb
  drm/amd/display: Remove redundant idle optimization check
  drm/i915/mso: using joiner is not possible with eDP MSO

Merge tag 'ovl-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, one originating in this cycle and one from 6.6"

* tag 'ovl-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix encoding fid for lower only root
ovl: fix copy-up in tmpfile

Merge tag 'io_uring-6.10-20240621' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"Just a single cleanup for the fixed buffer iov_iter import.

  More cosmetic than anything else, but let's get it cleaned up as it's
  confusing"

* tag 'io_uring-6.10-20240621' of git://git.kernel.dk/linux:
  io_uring/rsrc: fix incorrect assignment of iter->nr_segs in io_import_fixed

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Small bug fixes:

   - Prevent a crash in bnxt if the en and rdma drivers disagree on the
     MSI vectors

   - Have rxe memcpy inline data from the correct address

   - Fix rxe's validation of UD packets

   - Several mlx5 mr cache issues: bad lock balancing on error, missing
     propagation of the ATS property to the HW, wrong bucketing of freed
     mrs in some cases

   - Incorrect goto error unwind in mlx5 driver probe

   - Missed userspace input validation in mlx5 SRQ create

   - Incorrect uABI in MANA rejecting valid optional MR creation flags"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mana_ib: Ignore optional access flags for MRs
  RDMA/mlx5: Add check for srq max_sge attribute
  RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_init
  RDMA/mlx5: Ensure created mkeys always have a populated rb_key
  RDMA/mlx5: Follow rb_key.ats when creating new mkeys
  RDMA/mlx5: Remove extra unlock on error path
  RDMA/rxe: Fix responder length checking for UD request packets
  RDMA/rxe: Fix data copy for IB_SEND_INLINE
  RDMA/bnxt_re: Fix the max msix vectors macro

Merge tag 'sound-6.10-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull  more sound fixes from Takashi Iwai:
"A follow-up fix for a random build issue, as well as another trivial
  HD-audio quirk"

* tag 'sound-6.10-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Use imply for suggesting CONFIG_SERIAL_MULTI_INSTANTIATE
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14AHP9

Merge tag 'acpi-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These address a possible NULL pointer dereference in the ACPICA code
  and quirk camera enumeration on multiple platforms where incorrect
  data are present in the platform firmware.

  Specifics:

   - Undo an ACPICA code change that attempted to keep operation regions
     within a page boundary, but allowed accesses to unmapped memory to
     occur (Raju Rangoju)

   - Ignore MIPI camera graph port nodes created with the help of the
     information from the ACPI tables on all Dell Tiger, Alder and
     Raptor Lake models as that information is reported to be invalid on
     the platforms in question (Hans de Goede)

   - Use new Intel CPU model matching macros in the MIPI DisCo for
     Imaging part of ACPI device enumeration (Hans de Goede)"

* tag 'acpi-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: mipi-disco-img: Switch to new Intel CPU model defines
  ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and Raptor Lake models
  ACPICA: Revert "ACPICA: avoid Info: mapping multiple BARs. Your kernel is fine."

Merge tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"These fix the Mediatek lvts_thermal driver, the Intel int340x driver,
  and the thermal core (two issues related to system suspend).

  Specifics:

   - Remove the filtered mode for mt8188 from lvts_thermal as it is not
     supported on this platform and fail the lvts_thermal initialization
     when the golden temperature is zero as that means the efuse data is
     not correctly set (Julien Panis)

   - Update the processor_thermal part of the Intel int340x driver to
     support shared interrupts as the processor thermal device interrupt
     may in fact be shared with PCI devices (Srinivas Pandruvada)

   - Synchronize the suspend-prepare and post-suspend actions of the
     thermal PM notifier to avoid a destructive race condition and
     change the priority of that notifier to the minimum to avoid
     interference between the work items spawned by it and the other
     PM notifiers during system resume (Rafael Wysocki)"

* tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: int340x: processor_thermal: Support shared interrupts
  thermal: core: Change PM notifier priority to the minimum
  thermal: core: Synchronize suspend-prepare and post-suspend actions
  thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
  thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188

Merge tag 'dmaengine-fix-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:

- kmemleak, error path handling and missing kmem_cache_destroy() fixes
   for ioatdma driver

- use after free fix for idxd driver

- data synchronisation fix for xdma isr handling

- fsl driver channel constraints and linking two fsl module fixes

* tag 'dmaengine-fix-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: ioatdma: Fix missing kmem_cache_destroy()
  dt-bindings: dma: fsl-edma: fix dma-channels constraints
  dmaengine: fsl-edma: avoid linking both modules
  dmaengine: ioatdma: Fix kmemleak in ioat_pci_probe()
  dmaengine: ioatdma: Fix error path in ioat3_dma_probe()
  dmaengine: ioatdma: Fix leaking on version mismatch
  dmaengine: ti: k3-udma-glue: Fix of_k3_udma_glue_parse_chn_by_id()
  dmaengine: idxd: Fix possible Use-After-Free in irq_process_work_list
  dmaengine: xilinx: xdma: Fix data synchronisation in xdma_channel_isr()

Merge tag 'phy-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

- Qualcomm QMP driver fixes for missing register offsets and correct N4
   offsets for registers

* tag 'phy-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: qcom: qmp-combo: Switch from V6 to V6 N4 register offsets
  phy: qcom-qmp: pcs: Add missing v6 N4 register offsets
  phy: qcom-qmp: qserdes-txrx: Add missing registers offsets

Merge tag 'soundwire-6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire fix from Vinod Koul:

- Single fix for calling fwnode_handle_put() on the
returned fwnode pointer

* tag 'soundwire-6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: fix usages of device_get_named_child_node()

Merge tag 'kvm-riscv-fixes-6.10-2' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 6.10, take #2

- Fix compilation for KVM selftests

pwm: stm32: Refuse too small period requests

If period_ns is small, prd might well become 0. Catch that case because
otherwise with

regmap_write(priv->regmap, TIM_ARR, prd - 1);

a few lines down quite a big period is configured.

Fixes: 7edf7369205b ("pwm: Add driver for STM32 plaftorm")
Cc: stable@vger.kernel.org
Reviewed-by: Trevor Gamblin <tgamblin@baylibre.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/b86f62f099983646f97eeb6bfc0117bb2d0c340d.1718979150.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>

bcachefs: Move the ei_flags setting to after initialization

`inode->ei_flags` setting and cleaning should be done after initialization,
otherwise the operation is invalid.

Fixes: 9ca4853b98af ("bcachefs: Fix quota support for snapshots")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a UAF after write_super()

write_super() may reallocate the superblock buffer - but
bch_sb_field_ext was referencing it; don't use it after the write_super
call.

Reported-by: syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use bch2_print_string_as_lines for long err

printk strings get truncated to 1024 bytes; if we have a long error
message (journal debug info) we need to use a helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()

discard_new_inode() is the correct interface for tearing down an indoe
that was fully created but not made visible to other threads, but it
expects I_NEW to be set, which we don't use.

Reported-by: https://github.com/koverstreet/bcachefs/issues/690
Fixes: bcachefs: Fix race path in bch2_inode_insert()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Replace bare EEXIST with private error codes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix missing alloc_data_type_set()

Incorrect bucket state transition in the discard path; when incrementing
a bucket's generation number that had already been discarded, we were
forgetting to check if it should be need_gc_gens, not free.

This was caught by the .invalid checks in the transaction commit path,
causing us to go emergency read only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

closures: Change BUG_ON() to WARN_ON()

If a BUG_ON() can be hit in the wild, it shouldn't be a BUG_ON()

For reference, this has popped up once in the CI, and we'll need more
info to debug it:

03240 ------------[ cut here ]------------
03240 kernel BUG at lib/closure.c:21!
03240 kernel BUG at lib/closure.c:21!
03240 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
03240 Modules linked in:
03240 CPU: 15 PID: 40534 Comm: kworker/u80:1 Not tainted 6.10.0-rc4-ktest-ga56da69799bd #25570
03240 Hardware name: linux,dummy-virt (DT)
03240 Workqueue: btree_update btree_interior_update_work
03240 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
03240 pc : closure_put+0x224/0x2a0
03240 lr : closure_put+0x24/0x2a0
03240 sp : ffff0000d12071c0
03240 x29: ffff0000d12071c0 x28: dfff800000000000 x27: ffff0000d1207360
03240 x26: 0000000000000040 x25: 0000000000000040 x24: 0000000000000040
03240 x23: ffff0000c1f20180 x22: 0000000000000000 x21: ffff0000c1f20168
03240 x20: 0000000040000000 x19: ffff0000c1f20140 x18: 0000000000000001
03240 x17: 0000000000003aa0 x16: 0000000000003ad0 x15: 1fffe0001c326974
03240 x14: 0000000000000a1e x13: 0000000000000000 x12: 1fffe000183e402d
03240 x11: ffff6000183e402d x10: dfff800000000000 x9 : ffff6000183e402e
03240 x8 : 0000000000000001 x7 : 00009fffe7c1bfd3 x6 : ffff0000c1f2016b
03240 x5 : ffff0000c1f20168 x4 : ffff6000183e402e x3 : ffff800081391954
03240 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 00000000a8000000
03240 Call trace:
03240  closure_put+0x224/0x2a0
03240  bch2_check_for_deadlock+0x910/0x1028
03240  bch2_six_check_for_deadlock+0x1c/0x30
03240  six_lock_slowpath.isra.0+0x29c/0xed0
03240  six_lock_ip_waiter+0xa8/0xf8
03240  __bch2_btree_node_lock_write+0x14c/0x298
03240  bch2_trans_lock_write+0x6d4/0xb10
03240  __bch2_trans_commit+0x135c/0x5520
03240  btree_interior_update_work+0x1248/0x1c10
03240  process_scheduled_works+0x53c/0xd90
03240  worker_thread+0x370/0x8c8
03240  kthread+0x258/0x2e8
03240  ret_from_fork+0x10/0x20
03240 Code: aa1303e0 d63f0020 a94363f7 17ffff8c (d4210000)
03240 ---[ end trace 0000000000000000 ]---
03240 Kernel panic - not syncing: Oops - BUG: Fatal exception
03240 SMP: stopping secondary CPUs
03241 SMP: failed to stop secondary CPUs 13,15
03241 Kernel Offset: disabled
03241 CPU features: 0x00,00000003,80000008,4240500b
03241 Memory Limit: none
03241 ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
03246 ========= FAILED TIMEOUT copygc_torture_no_checksum in 7200s

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

RDMA/mana_ib: Ignore optional access flags for MRs

Ignore optional ib_access_flags when an MR is created.

Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://lore.kernel.org/r/1717575368-14879-1-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Add check for srq max_sge attribute

max_sge attribute is passed by the user, and is inserted and used
unchecked, so verify that the value doesn't exceed maximum allowed value
before using it.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/277ccc29e8d57bfd53ddeb2ac633f2760cf8cdd0.1716900410.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_init

Fix unwind flow as part of mlx5_ib_stage_init_init to use the correct
goto upon an error.

Fixes: 758ce14aee82 ("RDMA/mlx5: Implement MACsec gid addition and deletion")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/aa40615116eda14ec9eca21d52017d632ea89188.1716900410.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Ensure created mkeys always have a populated rb_key

cachable and mmkey.rb_key together are used by mlx5_revoke_mr() to put the
MR/mkey back into the cache. In all cases they should be set correctly.

alloc_cacheable_mr() was setting cachable but not filling rb_key,
resulting in cache_ent_find_and_store() bucketing them all into a 0 length
entry.

implicit_get_child_mr()/mlx5_ib_alloc_implicit_mr() failed to set cachable
or rb_key at all, so the cache was not working at all for implicit ODP.

Cc: stable@vger.kernel.org
Fixes: 8c1185fef68c ("RDMA/mlx5: Change check for cacheable mkeys")
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7778c02dfa0999a30d6746c79a23dd7140a9c729.1716900410.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Follow rb_key.ats when creating new mkeys

When a cache ent already exists but doesn't have any mkeys in it the cache
will automatically create a new one based on the specification in the
ent->rb_key.

ent->ats was missed when creating the new key and so ma_translation_mode
was not being set even though the ent requires it.

Cc: stable@vger.kernel.org
Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/7c5613458ecb89fbe5606b7aa4c8d990bdea5b9a.1716900410.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Remove extra unlock on error path

The below commit lifted the locking out of this function but left this
error path unlock behind resulting in unbalanced locking. Remove the
missed unlock too.

Cc: stable@vger.kernel.org
Fixes: 627122280c87 ("RDMA/mlx5: Add work to remove temporary entries from the cache")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/78090c210c750f47219b95248f9f782f34548bb1.1716900410.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>

Merge tag 'kvm-x86-fixes-6.10-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM fixes for 6.10

- Fix a "shift too big" goof in the KVM_SEV_INIT2 selftest.

- Compute the max mappable gfn for KVM selftests on x86 using GuestMaxPhyAddr
from KVM's supported CPUID (if it's available).

- Fix a race in kvm_vcpu_on_spin() by ensuring loads and stores are atomic.

- Fix technically benign bug in __kvm_handle_hva_range() where KVM consumes
the return from a void-returning function as if it were a boolean.

i2c: Add nop fwnode operations

Add nop variants of i2c_find_device_by_fwnode(),
i2c_find_adapter_by_fwnode() and i2c_get_adapter_by_fwnode() for use
without CONFIG_I2C.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

KVM: SEV-ES: Fix svm_get_msr()/svm_set_msr() for KVM_SEV_ES_INIT guests

With commit 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA
encryption"), older VMMs like QEMU 9.0 and older will fail when booting
SEV-ES guests with something like the following error:

qemu-system-x86_64: error: failed to get MSR 0x174
qemu-system-x86_64: ../qemu.git/target/i386/kvm/kvm.c:3950: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

This is because older VMMs that might still call
svm_get_msr()/svm_set_msr() for SEV-ES guests after guest boot even if
those interfaces were essentially just noops because of the vCPU state
being encrypted and stored separately in the VMSA. Now those VMMs will
get an -EINVAL and generally crash.

Newer VMMs that are aware of KVM_SEV_INIT2 however are already aware of
the stricter limitations of what vCPU state can be sync'd during
guest run-time, so newer QEMU for instance will work both for legacy
KVM_SEV_ES_INIT interface as well as KVM_SEV_INIT2.

So when using KVM_SEV_INIT2 it's okay to assume userspace can deal with
-EINVAL, whereas for legacy KVM_SEV_ES_INIT the kernel might be dealing
with either an older VMM and so it needs to assume that returning
-EINVAL might break the VMM.

Address this by only returning -EINVAL if the guest was started with
KVM_SEV_INIT2. Otherwise, just silently return.

Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Closes: https://lore.kernel.org/lkml/37usuu4yu4ok7be2hqexhmcyopluuiqj3k266z4gajc2rcj4yo@eujb23qc3zcm/
Fixes: 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA encryption")
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240604233510.764949-1-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'acpi-scan'

Merge ACPI device enumeration fixes for 6.10-rc5:

- Ignore MIPI camera graph port nodes created with the help of the
   information from the ACPI tables on all Dell Tiger, Alder and Raptor
   Lake models as that information is reported to be invalid on the
   systems in question (Hans de Goede).

- Use new Intel CPU model matching macros in the MIPI DisCo for Imaging
   part of ACPI device enumeration (Hans de Goede).

* acpi-scan:
  ACPI: mipi-disco-img: Switch to new Intel CPU model defines
  ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and Raptor Lake models

ALSA: hda: Use imply for suggesting CONFIG_SERIAL_MULTI_INSTANTIATE

The recent fix introduced a reverse selection of
CONFIG_SERIAL_MULTI_INSTANTIATE, but its condition isn't always met.
Use a weak reverse selection to suggest the config for avoiding such
inconsistencies, instead.

Fixes: 9b1effff19cd ("ALSA: hda: cs35l56: Select SERIAL_MULTI_INSTANTIATE")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202406210732.ozgk8IMK-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202406211244.oLhoF3My-lkp@intel.com/
Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20240621073915.19576-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

mips: fix compat_sys_lseek syscall

This is almost compatible, but passing a negative offset should result
in a EINVAL error, but on mips o32 compat mode would seek to a large
32-bit byte offset.

Use compat_sys_lseek() to correctly sign-extend the argument.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: mipsmtregs: Fix target register for MFTC0

Target register of mftc0 should be __res instead of $1, this is
a leftover from old .insn code.

Fixes: dd6d29a61489 ("MIPS: Implement microMIPS MT ASE helpers")
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>