The panic was caused by UAF issue w/ below race condition:
kworker
- writepages
- f2fs_write_cache_pages
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- f2fs_inplace_write_data
- f2fs_merge_page_bio
- add_inu_page
: cache page #1 into bio & cache bio in
io->bio_list
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- f2fs_inplace_write_data
- f2fs_merge_page_bio
- add_inu_page
: cache page #2 into bio which is linked
in io->bio_list
write
- f2fs_write_begin
: write page #1
- f2fs_folio_wait_writeback
- f2fs_submit_merged_ipu_write
- f2fs_submit_write_bio
: submit bio which inclues page #1 and #2
software IRQ
- f2fs_write_end_io
- fscrypt_free_bounce_page
: freed bounced page which belongs to page #2
- inc_page_count( , WB_DATA_TYPE(data_folio), false)
: data_folio points to fio->encrypted_page
the bounced page can be freed before
accessing it in f2fs_is_cp_guarantee()
It can reproduce w/ below testcase:
Run below script in shell #1:
for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \
-c "pwrite 0 32k" -c "fdatasync"
Run below script in shell #2:
for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \
-c "pwrite 0 32k" -c "fdatasync"
So, in f2fs_merge_page_bio(), let's avoid using fio->encrypted_page after
commit page into internal ipu cache.
Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio") Reported-by: JY <JY.Ho@mediatek.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunji Kang [Wed, 24 Sep 2025 07:43:58 +0000 (16:43 +0900)]
f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode
In f2fs_precache_extents(), For large files, It requires reading many
node blocks. Instead of reading each node block with synchronous I/O,
this patch applies readahead so that node blocks can be fetched in
advance.
It reduces the overhead of repeated sync reads and improves efficiency
when precaching extents of large files.
I created a file with the same largest extent and executed the test.
For this experiment, I set the file's largest extent with an offset of 0
and a size of 1GB. I configured the remaining area with 100MB extents.
When we get wrong extent info data, and look up extent_node in rb tree,
it will cause infinite loop (CONFIG_F2FS_CHECK_FS=n). Avoiding this by
return NULL and print some kernel messages in that case.
When the data layout is something like this:
dnode1: dnode2:
[0] A [0] NEW_ADDR
[1] A+1 [1] 0x0
...
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
During precache_extents, we map the last block(valid blkaddr) in dnode1:
map->m_flags |= F2FS_MAP_MAPPED;
map->m_pblk = blkaddr(valid blkaddr);
map->m_len = 1;
then we goto next_dnode, meet the first block in dnode2(hole), goto sync_out:
map->m_flags & F2FS_MAP_MAPPED == true, and we make zero-sized extent:
Rebased on patch[1], this patch can cover these cases to avoid zero-sized extent:
A,B,C is valid blkaddr
case1:
dnode1: dnode2:
[0] A [0] NEW_ADDR
[1] A+1 [1] 0x0
... ....
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
case2:
dnode1: dnode2:
[0] A [0] C (C!=B+1)
[1] A+1 [1] C+1
... ....
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
case3:
dnode1: dnode2:
[0] A [0] C (C!=B+2)
[1] A+1 [1] C+1
... ....
[1015] A+1015
[1016] B (B!=A+1016)
[1017] B+1 [1017] 0x0
f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page()
f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock
and page lookup, change as below to mitigate its overhead:
- check new_size before grabbing invalidate_lock
- lookup and invalidate pages only in range of [old_size, new_size]
Fixes: ba8dac350faf ("f2fs: fix to zero post-eof page") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The root cause is: fallocate on pinning file may race w/ block allocation
as above, result in do_garbage_collect() from fallocate() may migrate
segment which is just allocated by a log, the log will update segment type
in its in-memory structure, however GC will get segment type from on-disk
SSA block, once segment type changes by log, we can detect such
inconsistency, then shutdown filesystem.
In this case, on-disk SSA shows type of segno #173822 is 1 (SUM_TYPE_NODE),
however segno #173822 was just allocated as data type segment, so in-memory
SIT shows type of segno #173822 is 0 (SUM_TYPE_DATA).
Change as below to fix this issue:
- check whether current section is empty before gc
- add sanity checks on do_garbage_collect() to avoid any race case, result
in migrating segment used by log.
- btw, it fixes misc issue in printed logs: "SSA and SIT" -> "SIT and SSA".
Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During f2fs_evict_inode(), clear_inode() detects that we missed to truncate
all page cache before destorying inode, that is because in below path, we
will create page #0 in cache, but missed to drop it in error path, let's fix
it.
- evict
- f2fs_evict_inode
- f2fs_truncate
- f2fs_convert_inline_inode
- f2fs_grab_cache_folio
: create page #0 in cache
- f2fs_convert_inline_folio
: sanity check failed, return -EFSCORRUPTED
- clear_inode detects that inode->i_data.nrpages is not zero
Tracepoint output:
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 0, len = 512, blkaddr = 1055744, c_len = 0
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 513, len = 351, blkaddr = 17921, c_len = 0
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 864, len = 160, blkaddr = 18272, c_len = 0
During precache_extents, there is off-by-one issue, we should update
map->m_next_extent to pgofs rather than pgofs + 1, if last blkaddr is
valid and not contiguous to previous extent.
Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: clean up error handing of f2fs_submit_page_read()
Below two functions should never fail, clean up error handling in their
callers:
1) f2fs_grab_read_bio() in f2fs_submit_page_read()
2) bio_add_folio() in f2fs_submit_page_read()
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Liao Yuanhong [Thu, 28 Aug 2025 08:11:30 +0000 (16:11 +0800)]
f2fs: Use allocate_section_policy to control write priority in multi-devices setups
Introduces two new sys nodes: allocate_section_hint and
allocate_section_policy. The allocate_section_hint identifies the boundary
between devices, measured in sections; it defaults to the end of the device
for single storage setups, and the end of the first device for multiple
storage setups. The allocate_section_policy determines the write strategy,
with a default value of 0 for normal sequential write strategy. A value of
1 prioritizes writes before the allocate_section_hint, while a value of 2
prioritizes writes after it.
This strategy addresses the issue where, despite F2FS supporting multiple
devices, SOC vendors lack multi-devices support (currently only supporting
zoned devices). As a workaround, multiple storage devices are mapped to a
single dm device. Both this workaround and the F2FS multi-devices solution
may require prioritizing writing to certain devices, such as a device with
better performance or when switching is needed due to performance
degradation near a device's end. For scenarios with more than two devices,
sort them at mount time to utilize this feature.
When using this feature with a single storage device, it has almost no
impact. However, for configurations where multiple storage devices are
mapped to the same dm device using F2FS, utilizing this feature can provide
some optimization benefits. Therefore, I believe it should not be limited
to just multi-devices usage.
Bagas Sanjaya [Wed, 20 Aug 2025 04:34:32 +0000 (11:34 +0700)]
Documentation: f2fs: Reword title
"What is F2FS" is rather a mistitle for the whole f2fs docs, as it
implies the overview section (before "Background and design issues"
section) and the docs covers beyond that: from mount options to
filesystem implementation details.
w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent
truncation range in direct node in f2fs_truncate_hole().
The root cause is: a non-inode dnode may has the same footer.ino and
footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE()
may return wrong blkaddr count which may be 923 typically, by chance,
dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below
statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 || ...).
This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing
passing the new_type to sanity_check_node_footer in f2fs_get_node_folio()
to detect corruption that a non-inode dnode has the same footer.ino and
footer.nid.
Jaegeuk Kim [Mon, 11 Aug 2025 23:50:25 +0000 (23:50 +0000)]
f2fs: show the list of donation files
This patch introduces a proc entry to show the currently enrolled donation
files.
- "File path" indicates a file.
- "Status"
a. "Donated" means the file is registed in the donation list by
fadvise(offset, length, POSIX_FADV_NOREUSE)
b. "Evicted" means the donated pages were reclaimed.
- "Offset (kb)" and "Length (kb) show the registered donation range.
- "Cached pages (kb)" shows the amount of cached pages in the inode page cache.
Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The direct reason is f2fs_check_quota_consistency() may suffer null-ptr-deref
issue in strcmp().
The bug can be reproduced w/ below scripts:
mkfs.f2fs -f /dev/vdb
mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs
quotacheck -uc /mnt/f2fs/
umount /mnt/f2fs
mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs
mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs
umount /mnt/f2fs
So, before old_qname and new_qname comparison, we need to check whether
they are all valid pointers, fix it.
Reported-by: syzbot+d371efea57d5aeab877b@syzkaller.appspotmail.com Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Closes: https://lore.kernel.org/linux-f2fs-devel/689ff889.050a0220.e29e5.0037.GAE@google.com Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Liao Yuanhong [Fri, 8 Aug 2025 09:48:01 +0000 (17:48 +0800)]
f2fs: Add bggc_io_aware to adjust the priority of BG_GC when issuing IO
Currently, we have encountered some issues while testing ZUFS. In
situations near the storage limit (e.g., 50GB remaining), and after
simulating fragmentation by repeatedly writing and deleting data, we found
that application installation and startup tests conducted after idling for
a few minutes take significantly longer several times that of traditional
UFS. Tracing the operations revealed that the majority of I/Os were issued
by background GC, which blocks normal I/O operations.
Under normal circumstances, ZUFS indeed requires more background GC and
employs a more aggressive GC strategy. However, I aim to find a way to
minimize the impact on regular I/O operations under these near-limit
conditions. To address this, I have introduced a bggc_io_aware feature,
which controls the prioritization of background GC in the presence of I/Os.
This switch can be adjusted at the framework level to implement different
strategies. If set to AWARE_ALL_IO, all background GC operations will be
skipped during active I/O issuance. The default option remains consistent
with the current strategy, ensuring no change in behavior.
Chao Yu [Thu, 7 Aug 2025 04:00:25 +0000 (12:00 +0800)]
f2fs: add timeout in f2fs_enable_checkpoint()
During f2fs_enable_checkpoint() in remount(), if we flush a large
amount of dirty pages into slow device, it may take long time which
will block write IO, let's add a timeout machanism during dirty
pages flush to avoid long time block in f2fs_enable_checkpoint().
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 7 Aug 2025 02:44:31 +0000 (10:44 +0800)]
f2fs: fix to detect potential corrupted nid in free_nid_list
As reported, on-disk footer.ino and footer.nid is the same and
out-of-range, let's add sanity check on f2fs_alloc_nid() to detect
any potential corruption in free_nid_list.
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 7 Aug 2025 01:48:35 +0000 (09:48 +0800)]
f2fs: fix to clear unusable_cap for checkpoint=enable
mount -t f2fs -o checkpoint=disable:10% /dev/vdb /mnt/f2fs/
mount -t f2fs -o remount,checkpoint=enable /dev/vdb /mnt/f2fs/
kernel log:
F2FS-fs (vdb): Adjust unusable cap for checkpoint=disable = 204440 / 10%
If we has assigned checkpoint=enable mount option, unusable_cap{,_perc}
parameters of checkpoint=disable should be reset, then calculation and
log print could be avoid in adjust_unusable_cap_perc().
Chao Yu [Wed, 6 Aug 2025 06:11:06 +0000 (14:11 +0800)]
f2fs: fix to zero data after EOF for compressed file correctly
generic/091 may fail, then it bisects to the bad commit ba8dac350faf
("f2fs: fix to zero post-eof page").
What will cause generic/091 to fail is something like below Testcase #1:
1. write 16k as compressed blocks
2. truncate to 12k
3. truncate to 20k
4. verify data in range of [12k, 16k], however data is not zero as
expected
Analisys:
in step 2), we will redirty all data pages from #0 to #3 in compressed
cluster, and zero page #3,
in step 3), f2fs_setattr() will call f2fs_zero_post_eof_page() to drop
all page cache post eof, includeing dirtied page #3,
in step 4) when we read data from page #3, it will decompressed cluster
and extra random data to page #3, finally, we hit the non-zeroed data
post eof.
However, the commit ba8dac350faf ("f2fs: fix to zero post-eof page") just
let the issue be reproduced easily, w/o the commit, it can reproduce this
bug w/ below Testcase #2:
1. write 16k as compressed blocks
2. truncate to 8k
3. truncate to 12k
4. truncate to 20k
5. verify data in range of [12k, 16k], however data is not zero as
expected
Anlysis:
in step 2), we will redirty all data pages from #0 to #3 in compressed
cluster, and zero page #2 and #3,
in step 3), we will truncate page #3 in page cache,
in step 4), expand file size,
in step 5), hit random data post eof w/ the same reason in Testcase #1.
Root Cause:
In f2fs_truncate_partial_cluster(), after we truncate partial data block
on compressed cluster, all pages in cluster including the one post eof
will be dirtied, after another tuncation, dirty page post eof will be
dropped, however on-disk compressed cluster is still valid, it may
include non-zero data post eof, result in exposing previous non-zero data
post eof while reading.
Fix:
In f2fs_truncate_partial_cluster(), let change as below to fix:
- call filemap_write_and_wait_range() to flush dirty page
- call truncate_pagecache() to drop pages or zero partial page post eof
- call f2fs_do_truncate_blocks() to truncate non-compress cluster to
last valid block
Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode") Reported-by: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daniel Lee [Tue, 5 Aug 2025 06:52:27 +0000 (23:52 -0700)]
f2fs: add sysfs entry for effective lookup mode
This commit introduces a new read-only sysfs entry at
/sys/fs/f2fs/<device>/effective_lookup_mode.
This entry displays the actual directory lookup mode F2FS is
currently using. This is needed for debugging and verification,
as the behavior is determined by both on-disk flags and mount
options.
Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daniel Lee [Tue, 5 Aug 2025 06:52:26 +0000 (23:52 -0700)]
f2fs: add lookup_mode mount option
For casefolded directories, f2fs may fall back to a linear search if
a hash-based lookup fails. This can cause severe performance
regressions.
While this behavior can be controlled by userspace tools (e.g. mkfs,
fsck) by setting an on-disk flag, a kernel-level solution is needed
to guarantee the lookup behavior regardless of the on-disk state.
This commit introduces the 'lookup_mode' mount option to provide this
kernel-side control.
The option accepts three values:
- perf: (Default) Enforces a hash-only lookup. The linear fallback
is always disabled.
- compat: Enables the linear search fallback for compatibility with
directory entries from older kernels.
- auto: Determines the mode based on the on-disk flag, preserving the
userspace-based behavior.
Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
checkpoint was blocked for 18643 ms
Step 0: 0 ms
Step 1: 38 ms
Step 2: 63 ms
Step 3: 4 ms
Step 4: 0 ms
Step 5: 0 ms
Step 6: 9 ms
Step 7: 0 ms
Step 8: 18277 ms
Step 9: 249 ms
Cc: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Sun, 10 Aug 2025 05:51:37 +0000 (08:51 +0300)]
Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp fixes from Borislav Petkov:
- Remove an obsolete comment and fix spelling
* tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Remove obsolete comment from takedown_cpu()
smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment
Linus Torvalds [Sun, 10 Aug 2025 05:46:47 +0000 (08:46 +0300)]
Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix a wrong ioremap size in mvebu-gicp
- Remove yet another compile-test case for a driver which needs an
additional dependency
- Fix a lock inversion scenario in the IRQ unit test suite
- Remove an impossible flag situation in gic-v5
- Do not iounmap resources in gic-v5 which are managed by devm
- Make sure stale, left-over interrupts in mvebu-gicp are cleared on
driver init
- Fix a reference counting mishap in msi-lib
- Fix a dereference-before-null-ptr-check case in the riscv-imsic
irqchip driver
* tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mvebu-gicp: Use resource_size() for ioremap()
irqchip: Build IMX_MU_MSI only on ARM
genirq/test: Resolve irq lock inversion warnings
irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs
irqchip/gic-v5: iwb: Fix iounmap probe failure path
irqchip/mvebu-gicp: Clear pending interrupts on init
irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select()
irqchip/riscv-imsic: Don't dereference before NULL pointer check
Linus Torvalds [Sun, 10 Aug 2025 05:15:32 +0000 (08:15 +0300)]
Merge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix an interrupt vector setup race which leads to a non-functioning
device
- Add new Intel CPU models *and* a family: 0x12. Finally. Yippie! :-)
* tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Plug vector setup race
x86/cpu: Add new Intel CPU model numbers for Wildcatlake and Novalake
Len Brown [Sun, 10 Aug 2025 01:08:26 +0000 (21:08 -0400)]
tools/power turbostat: version 2025.09.09
Probe and display L3 Cache topology
Add ability to average an added counter
(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.
Non-root turbostat users see complaints in this scenario.
Fail probe of the interface if we can't read current_freq_khz.
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Original-patch-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Wed, 11 Jun 2025 06:50:26 +0000 (14:50 +0800)]
tools/power turbostat: Fix DMR support
Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.
Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.
Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Michael Hebenstreit [Fri, 8 Aug 2025 19:57:53 +0000 (15:57 -0400)]
tools/power turbostat: add format "average" for external attributes
External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data
Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 12 Jul 2025 20:16:56 +0000 (16:16 -0400)]
tools/power turbostat: Support more than 64 built-in-counters
We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.
Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...
Linus Torvalds [Sat, 9 Aug 2025 15:12:23 +0000 (18:12 +0300)]
Merge tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY fix from Greg KH:
"Here is a single revert of one of the previous patches that went in
the last tty/serial merge that is breaking userspace on some platforms
(specifically powerpc, probably a few others.)
It accidentially changed the ioctl values of some tty ioctls, which
breaks xorg.
The revert has been in linux-next all this week with no reported
issues"
* tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: vt: use _IO() to define ioctl numbers"
Linus Torvalds [Sat, 9 Aug 2025 05:47:28 +0000 (08:47 +0300)]
Merge tag 'block-6.17-20250808' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- MD pull request via Yu:
- mddev null-ptr-dereference fix, by Erkun
- md-cluster fail to remove the faulty disk regression fix, by
Heming
- minor cleanup, by Li Nan and Jinchao
- mdadm lifetime regression fix reported by syzkaller, by Yu Kuai
- MD pull request via Christoph
- add support for getting the FDP featuee in fabrics passthru path
(Nitesh Shetty)
- add capability to connect to an administrative controller
(Kamaljit Singh)
- fix a leak on sgl setup error (Keith Busch)
- initialize discovery subsys after debugfs is initialized
(Mohamed Khalfella)
- fix various comment typos (Bjorn Helgaas)
- remove unneeded semicolons (Jiapeng Chong)
- nvmet debugfs ordering issue fix
- Fix UAF in the tag_set in zloop
- Ensure sbitmap shallow depth covers entire set
- Reduce lock roundtrips in io context lookup
- Move scheduler tags alloc/free out of elevator and freeze lock, to
fix some lockdep found issues
- Improve robustness of queue limits checking
- Fix a regression with IO priorities, if no io context exists
* tag 'block-6.17-20250808' of git://git.kernel.dk/linux: (26 commits)
lib/sbitmap: make sbitmap_get_shallow() internal
lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
nvmet: exit debugfs after discovery subsystem exits
block, bfq: Reorder struct bfq_iocq_bfqq_data
md: make rdev_addable usable for rcu mode
md/raid1: remove struct pool_info and related code
md/raid1: change r1conf->r1bio_pool to a pointer type
block: ensure discard_granularity is zero when discard is not supported
zloop: fix KASAN use-after-free of tag set
block: Fix default IO priority if there is no IO context
nvme: fix various comment typos
nvme-auth: remove unneeded semicolon
nvme-pci: fix leak on sgl setup error
nvmet: initialize discovery subsys after debugfs is initialized
nvme: add capability to connect to an administrative controller
nvmet: add support for FDP in fabrics passthru path
md: rename recovery_cp to resync_offset
md/md-cluster: handle REMOVE message earlier
md: fix create on open mddev lifetime regression
block: fix potential deadlock while running nr_hw_queue update
...
Linus Torvalds [Sat, 9 Aug 2025 05:45:08 +0000 (08:45 +0300)]
Merge tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Allow vectorized payloads for send/send-zc - like sendmsg, but
without the hassle of a msghdr.
- Fix for an integer wrap that should go to stable, spotted by syzbot.
Nothing alarming here, as you need to be root to hit this.
Nevertheless, it should get fixed.
FWIW, kudos to the syzbot crew for having much nicer reproducers now,
and with nicely annotated source code as well. This is particularly
useful as syzbot uses the raw interface rather than liburing,
historically it's been difficult to turn a syzbot reproducer into a
meaningful test case. With the recent changes, not true anymore!
* tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux:
io_uring/memmap: cast nr_pages to size_t before shifting
io_uring/net: Allow to do vectorized send
Linus Torvalds [Sat, 9 Aug 2025 05:43:24 +0000 (08:43 +0300)]
Merge tag 'spi-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's one fix here for an issue with the CS42L43 where we were
allocating a single property for client devices as just that property
rather than a terminated array of properties like we are supposed to.
We also have an update to the MAINTAINERS file for some Renesas
devices"
* tag 'spi-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cs42l43: Property entry should be a null-terminated array
MAINTAINERS: Add entries for the RZ/V2H(P) RSPI
Linus Torvalds [Sat, 9 Aug 2025 05:41:53 +0000 (08:41 +0300)]
Merge tag 'regulator-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"This fixes an issue with the newly added code for handling large
voltage changes on regulators which require that individual voltage
changes cover a limited range, the check for convergence was broken"
* tag 'regulator-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: correct convergence check in regulator_set_voltage()
Linus Torvalds [Sat, 9 Aug 2025 05:40:28 +0000 (08:40 +0300)]
Merge tag 'regmap-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"These patches fix a lockdep issue Russell King reported with nested
regmap-irqs (unusual since regmap is generally for devices on slow
buses so devices don't get nested), plus add a missing mutex free
which I noticed while implementing a fix for that issue"
* tag 'regmap-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: Avoid lockdep warnings with nested regmap-irq chips
regmap: irq: Free the regmap-irq mutex
Linus Torvalds [Sat, 9 Aug 2025 05:37:17 +0000 (08:37 +0300)]
Merge tag 'mailbox-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
- aspeed: add driver and bindings for ast2700
- broadcom: add driver and bindings for bcm74110
- mediatek: fix RPM api usage
- qcom: use dev_fwnode
- pcc: support shared buffer
- misc dt-bindings cleanup
* tag 'mailbox-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox/pcc: support mailbox management of the shared buffer
mailbox: bcm74110: Fix spelling mistake
mailbox: bcm74110: remove unneeded semicolon
mailbox: aspeed: add mailbox driver for AST27XX series SoC
dt-bindings: mailbox: Add ASPEED AST2700 series SoC
dt-bindings: mailbox: Drop consumers example DTS
dt-bindings: mailbox: nvidia,tegra186-hsp: Use generic node name
dt-bindings: mailbox: Correct example indentation
dt-bindings: mailbox: ti,secure-proxy: Add missing reg maxItems
dt-bindings: mailbox: amlogic,meson-gxbb-mhu: Add missing interrupts maxItems
dt-bindings: mailbox: qcom-ipcc: document the Milos Inter-Processor Communication Controller
mailbox: Add support for bcm74110
dt-bindings: mailbox: Add support for bcm74110
mailbox: Use dev_fwnode()
mailbox: mtk-cmdq: Switch to pm_runtime_put_autosuspend()
Linus Torvalds [Sat, 9 Aug 2025 05:15:43 +0000 (08:15 +0300)]
Merge tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"As discussed: there's a small commit that removes the legacy GPIO line
value setter callbacks as they're no longer used and a big, treewide
commit that renames the new ones to the old names across all GPIO
drivers at once.
While at it: there are also two fixes that I picked up over the course
of the merge window:
- remove unused, legacy GPIO line value setters from struct gpio_chip
- rename the new set callbacks back to the original names treewide
- fix interrupt handling in gpio-mlxbf2
- revert a buggy immutable irqchip conversion"
* tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
treewide: rename GPIO set callbacks back to their original names
gpio: remove legacy GPIO line value setter callbacks
gpio: mlxbf2: use platform_get_irq_optional()
Revert "gpio: pxa: Make irq_chip immutable"
Linus Torvalds [Sat, 9 Aug 2025 05:12:41 +0000 (08:12 +0300)]
Merge tag 'sound-fix-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
- Support for ASoC AMD ACP 7.2 with new IDs
- ASoC Intel AVS and SOF fixes
- Yet more kconfig adjustments for HD-audio codecs
- TAS2781 codec fixes
- Fixes for longstanding (rather minor) bugs in Intel LPE audio and
USB-audio drivers
* tag 'sound-fix-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/cirrus: Restrict prompt only for CONFIG_EXPERT
ALSA: hda/hdmi: Restrict prompt only for CONFIG_EXPERT
ALSA: hda/realtek: Restrict prompt only for CONFIG_EXPERT
ALSA: hda/ca0132: Fix missing error handling in ca0132_alt_select_out()
ASoC: SOF: Intel: hda-sdw-bpt: fix SND_SOF_SOF_HDA_SDW_BPT dependencies
ALSA: hda/tas2781: Support L"SmartAmpCalibrationData" to save calibrated data
ALSA: intel_hdmi: Fix off-by-one error in __hdmi_lpe_audio_probe()
ALSA: hda/realtek: add LG gram 16Z90R-A to alc269 fixup table
ALSA: usb-audio: Don't use printk_ratelimit for debug prints
ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCC SKU
ASoC: tas2781: Fix the wrong step for TLV on tas2781
ASoC: amd: acp: Add SoundWire SOF machine driver support for acp7.2 platform
ASoC: amd: acp: Add SoundWire legacy machine driver support for acp7.2 platform
ASoC: amd: ps: Add SoundWire pci and dma driver support for acp7.2 platform
ASoC: SOF: amd: Add sof audio support for acp7.2 platform
ASoC: Intel: avs: Fix uninitialized pointer error in probe()
ASoC: wm8962: Clear master mode when enter runtime suspend
ASoC: SOF: amd: acp-loader: Use GFP_KERNEL for DMA allocations in resume context
Linus Torvalds [Sat, 9 Aug 2025 04:58:55 +0000 (07:58 +0300)]
Merge tag 'soc-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"These are a few patches to fix up bits that went missing during the
merge window: The tegra and s3c patches address trivial regressions
from conflicts, the bcm7445 makes the dt conform to the binding that
was made stricter"
* tag 'soc-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: tegra: Remove numa-node-id properties
ARM: s3c/gpio: complete the conversion to new GPIO value setters
ARM: dts: broadcom: Fix bcm7445 memory controller compatible
Linus Torvalds [Sat, 9 Aug 2025 04:35:03 +0000 (07:35 +0300)]
Merge tag 'xtensa-20250808' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa update from Max Filippov:
- replace __ASSEMBLY__ with __ASSEMBLER__ in arch headers
* tag 'xtensa-20250808' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
xtensa: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
Linus Torvalds [Sat, 9 Aug 2025 04:26:19 +0000 (07:26 +0300)]
Merge tag 'v6.17-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that broke hmac(sha3-224-s390)"
* tag 'v6.17-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hash - Increase HASH_MAX_DESCSIZE for hmac(sha3-224-s390)
Linus Torvalds [Sat, 9 Aug 2025 04:20:44 +0000 (07:20 +0300)]
Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- don't inherit NFS filesystem capabilities when crossing from one
filesystem to another
Bugfixes:
- NFS wakeup of __nfs_lookup_revalidate() needs memory barriers
- NFS improve bounds checking in nfs_fh_to_dentry()
- NFS Fix allocation errors when writing to a NFS file backed
loopback device
- NFSv4: More listxattr fixes
- SUNRPC: fix client handling of TLS alerts
- pNFS block/scsi layout fix for an uninitialised pointer
dereference
- pNFS block/scsi layout fixes for the extent encoding, stripe
mapping, and disk offset overflows
- pNFS layoutcommit work around for RPC size limitations
- pNFS/flexfiles avoid looping when handling fatal errors after
layoutget
- localio: fix various race conditions
Features and cleanups:
- Add NFSv4 support for retrieving the btime
- NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
- NFS: Support using a kernel keyring to store TLS certificates
- NFSv4: Speed up delegation lookup using a hash table
- Assorted cleanups to remove unused variables and struct fields
- Assorted new tracepoints to improve debugging"
* tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
NFS/localio: nfs_close_local_fh() fix check for file closed
NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
NFS: Fix the setting of capabilities when automounting a new filesystem
sunrpc: fix client side handling of tls alerts
nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
NFSv4.2: another fix for listxattr
NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
SUNRPC: Silence warnings about parameters not being described
NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
NFS: use a hash table for delegation lookup
NFS: track active delegations per-server
NFS: move the delegation_watermark module parameter
NFS: cleanup nfs_inode_reclaim_delegation
NFS: cleanup error handling in nfs4_server_common_setup
pNFS/flexfiles: don't attempt pnfs on fatal DS errors
NFS: drop __exit from nfs_exit_keyring
...
Linus Torvalds [Sat, 9 Aug 2025 04:12:43 +0000 (07:12 +0300)]
Merge tag 'v6.17rc-part2-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
"Non-smbdirect:
- Fix null ptr deref caused by delay in global spinlock
initialization
- Two fixes for native symlink creation with SMB3.1.1 POSIX
Extensions
- Fix for socket special file creation with SMB3.1.1 POSIX Exensions
- Reduce lock contention by splitting out mid_counter_lock
- move SMB1 transport code to separate file to reduce module size
when support for legacy servers is disabled
- Two cleanup patches: rename mid_lock to make it clearer what it
protects and one to convert mid flags to bool to make clearer
Smbdirect/RDMA restructuring and fixes:
- Fix for error handling in send done
- Remove unneeded empty packet queue
- Fix put_receive_buffer error path
- Two fixes to recv_done error paths
- Remove unused variable
- Improve response and recvmsg type handling
- Fix handling of incoming message type
- Two cleanup fixes for better handling smbdirect recv io
- Two cleanup fixes for socket spinlock
- Two patches that add socket reassembly struct
- Remove unused connection_status enum
- Use flag in common header for SMBDIRECT_RECV_IO_MAX_SGE
- Two cleanup patches to introduce and use smbdirect send io
- Two cleanup patches to introduce and use smbdirect send_io struct
- Fix to return error if rdma connect takes longer than 5 seconds
- Error logging improvements
- Fix redundand call to init_waitqueue_head
- Remove unneeded wait queue"
* tag 'v6.17rc-part2-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (33 commits)
smb: client: only use a single wait_queue to monitor smbdirect connection status
smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in _smbd_get_connection
smb: client: improve logging in smbd_conn_upcall()
smb: client: return an error if rdma_connect does not return within 5 seconds
smb: client: make use of smbdirect_socket.{send,recv}_io.mem.{cache,pool}
smb: smbdirect: add smbdirect_socket.{send,recv}_io.mem.{cache,pool}
smb: client: make use of struct smbdirect_send_io
smb: smbdirect: introduce struct smbdirect_send_io
smb: client: make use of SMBDIRECT_RECV_IO_MAX_SGE
smb: smbdirect: add SMBDIRECT_RECV_IO_MAX_SGE
smb: client: remove unused enum smbd_connection_status
smb: client: make use of smbdirect_socket.recv_io.reassembly.*
smb: smbdirect: introduce smbdirect_socket.recv_io.reassembly.*
smb: client: make use of smb: smbdirect_socket.recv_io.free.{list,lock}
smb: smbdirect: introduce smbdirect_socket.recv_io.free.{list,lock}
smb: client: make use of struct smbdirect_recv_io
smb: smbdirect: introduce struct smbdirect_recv_io
smb: client: make use of smbdirect_socket->recv_io.expected
smb: smbdirect: introduce smbdirect_socket.recv_io.expected
smb: client: remove unused smbd_connection->fragment_reassembly_remaining
...
Linus Torvalds [Sat, 9 Aug 2025 03:52:37 +0000 (06:52 +0300)]
Merge tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix limiting repeated connections from same IP
- Fix for extracting shortname when name begins with a dot
- Four smbdirect fixes:
- three fixes to the receive path: potential unmap bug, potential
resource leaks and stale connections, and also potential use
after free race
- cleanup to remove unneeded queue
* tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
smb: server: Fix extension string in ksmbd_extract_shortname()
ksmbd: limit repeated connections from clients with the same IP
smb: server: let recv_done() avoid touching data_transfer after cleanup/move
smb: server: let recv_done() consistently call put_recvmsg/smb_direct_disconnect_rdma_connection
smb: server: make sure we call ib_dma_unmap_single() only if we called ib_dma_map_single already
smb: server: remove separate empty_recvmsg_queue
Zhang Rui [Tue, 17 Jun 2025 12:48:59 +0000 (20:48 +0800)]
tools/power turbostat: Fix bogus SysWatt for forked program
Similar to delta_cpu(), delta_platform() is called in turbostat main
loop. This ensures accurate SysWatt readings in periodic monitoring mode
$ sudo turbostat -S -q --show power -i 1
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07
58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75
58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22
58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55
However, delta_platform() is missing for forked program and causes bogus
SysWatt reporting,
$ sudo turbostat -S -q --show power sleep 1
1.004736 sec
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03
Add missing delta_platform() for forked program.
Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc().
Check for ENOSYS to recognize this case, and continue on to
attempt to access the requested MSRs (such as temperature).
Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Len Brown <len.brown@intel.com>
Calvin Owens [Fri, 13 Jun 2025 16:54:23 +0000 (09:54 -0700)]
tools/power turbostat: Fix build with musl
turbostat.c: In function 'parse_int_file':
turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function)
5567 | char path[PATH_MAX];
| ^~~~~~~~
turbostat.c: In function 'probe_graphics':
turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function)
6787 | char path[PATH_MAX];
| ^~~~~~~~
Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Jens Axboe [Fri, 8 Aug 2025 12:35:14 +0000 (06:35 -0600)]
io_uring/memmap: cast nr_pages to size_t before shifting
If the allocated size exceeds UINT_MAX, then it's necessary to cast
the mr->nr_pages value to size_t to prevent it from overflowing. In
practice this isn't much of a concern as the required memory size will
have been validated upfront, and accounted to the user. And > 4GB sizes
will be necessary to make the lack of a cast a problem, which greatly
exceeds normal user locked_vm settings that are generally in the kb to
mb range. However, if root is used, then accounting isn't done, and
then it's possible to hit this issue.
Adam Young [Tue, 15 Jul 2025 00:10:07 +0000 (20:10 -0400)]
mailbox/pcc: support mailbox management of the shared buffer
Define a new, optional, callback that allows the driver to
specify how the return data buffer is allocated. If that callback
is set, mailbox/pcc.c is now responsible for reading from and
writing to the PCC shared buffer.
This also allows for proper checks of the Commnand complete flag
between the PCC sender and receiver.
For Type 4 channels, initialize the command complete flag prior
to accepting messages.
Since the mailbox does not know what memory allocation scheme
to use for response messages, the client now has an optional
callback that allows it to allocate the buffer for a response
message.
When an outbound message is written to the buffer, the mailbox
checks for the flag indicating the client wants an tx complete
notification via IRQ. Upon receipt of the interrupt It will
pair it with the outgoing message. The expected use is to
free the kernel memory buffer for the previous outgoing message.
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
- net: drop UFO packets (injected via virtio) in udp_rcv_segment()
- eth: mlx5: correctly set gso_segs when LRO is used, avoid false
positive checksum validation errors
- netpoll: prevent hanging NAPI when netcons gets enabled
- phy: mscc: fix parsing of unicast frames for PTP timestamping
- a number of device tree / OF reference leak fixes"
* tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
pptp: fix pptp_xmit() error path
net: ti: icssg-prueth: Fix skb handling for XDP_PASS
net: Update threaded state in napi config in netif_set_threaded
selftests: netdevsim: Xfail nexthop test on slow machines
eth: fbnic: Lock the tx_dropped update
eth: fbnic: Fix tx_dropped reporting
eth: fbnic: remove the debugging trick of super high page bias
net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect
dt-bindings: net: Replace bouncing Alexandru Tachici emails
dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET
net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing
Revert "net: mdio_bus: Use devm for getting reset GPIO"
selftests: net: packetdrill: xfail all problems on slow machines
net/packet: fix a race in packet_set_ring() and packet_notifier()
benet: fix BUG when creating VFs
net: airoha: npu: Add missing MODULE_FIRMWARE macros
net: devmem: fix DMA direction on unmapping
ipa: fix compile-testing with qcom-mdt=m
eth: fbnic: unlink NAPIs from queues on error to open
net: Add locking to protect skb->dev access in ip_output
...
Linus Torvalds [Fri, 8 Aug 2025 03:56:55 +0000 (06:56 +0300)]
Merge tag 's390-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Support MMIO read/write tracing
- Enable THP swapping and THP migration
- Unmask SLCF bit ("stateless command filtering") introduced with CEX8
cards, so that user space applications like lszcrypt could evaluate
and list this feature
- Fix the value of high_memory variable, so it considers possible
tailing offline memory blocks
- Make vmem_pte_alloc() consistent and always allocate memory of
PAGE_SIZE for page tables. This ensures a page table occupies the
whole page, as the rest of the code assumes
- Fix kernel image end address in the decompressor debug output
- Fix a typo in debug_sprintf_format_fn() comment
* tag 's390-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/debug: Fix typo in debug_sprintf_format_fn() comment
s390/boot: Fix startup debugging log
s390/mm: Allocate page table with PAGE_SIZE granularity
s390/mm: Enable THP_SWAP and THP_MIGRATION
s390: Support CONFIG_TRACE_MMIO_ACCESS
s390/mm: Set high_memory at the end of the identity mapping
s390/ap: Unmask SLCF bit in card and queue ap functions sysfs
Linus Torvalds [Fri, 8 Aug 2025 03:48:14 +0000 (06:48 +0300)]
Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"This is the fixes that built up in the merge window, mostly amdgpu and
xe with one i915 display fix, seems like things are pretty good for
rc1.
i915:
- DP LPFS fixes
xe:
- SRIOV: PF fixes and removal of need of module param
- Fix driver unbind around Devcoredump
- Mark xe driver as BROKEN if kernel page size is not 4kB
* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
drm/amdgpu: add missing vram lost check for LEGACY RESET
drm/amdgpu/discovery: fix fw based ip discovery
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
drm/amdgpu: Update SDMA firmware version check for user queue support
drm/amdgpu: Add NULL check for asic_funcs
drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
drm/amd/display: fix a Null pointer dereference vulnerability
drm/amd/display: Add primary plane to commits for correct VRR handling
drm/amdgpu: update mmhub 3.3 client id mappings
drm/amdgpu: update mmhub 3.0.1 client id mappings
drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
drm/amd/display: Don't overwrite dce60_clk_mgr
drm/amdkfd: Fix checkpoint-restore on multi-xcc
drm/amd: Restore cached manual clock settings during resume
drm/amd: Restore cached power limit during resume
drm/amdgpu: Update external revid for GC v9.5.0
drm/amdgpu: Update supported modes for GC v9.5.0
Mark xe driver as BROKEN if kernel page size is not 4kB
...
Linus Torvalds [Fri, 8 Aug 2025 03:43:20 +0000 (06:43 +0300)]
Merge tag 'fbdev-for-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes for 6.17-rc1:
- Revert a patch which broke VGA console
- Fix an out-of-bounds access bug which may happen during console
resizing when a console is mapped to a frame buffer
* tag 'fbdev-for-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
Revert "vgacon: Add check for vc_origin address range in vgacon_scroll()"
fbdev: Fix vmalloc out-of-bounds write in fast_imageblit
Linus Torvalds [Fri, 8 Aug 2025 03:36:48 +0000 (06:36 +0300)]
Merge tag 'loongarch-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Complete KSave registers definition
- Support the mem=<size> kernel parameter
- Support BPF dynamic modification & trampoline
- Add MMC/SDIO controller nodes in dts
- Some bug fixes and other small changes
* tag 'loongarch-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: vDSO: Remove -nostdlib complier flag
LoongArch: dts: Add eMMC/SDIO controller support to Loongson-2K2000
LoongArch: dts: Add SDIO controller support to Loongson-2K1000
LoongArch: dts: Add SDIO controller support to Loongson-2K0500
LoongArch: BPF: Set bpf_jit_bypass_spec_v1/v4()
LoongArch: BPF: Fix the tailcall hierarchy
LoongArch: BPF: Fix jump offset calculation in tailcall
LoongArch: BPF: Add struct ops support for trampoline
LoongArch: BPF: Add basic bpf trampoline support
LoongArch: BPF: Add dynamic code modification support
LoongArch: BPF: Rename and refactor validate_code()
LoongArch: Add larch_insn_gen_{beq,bne} helpers
LoongArch: Don't use %pK through printk() in unwinder
LoongArch: Avoid in-place string operation on FDT content
LoongArch: Support mem=<size> kernel parameter
LoongArch: Make relocate_new_kernel_size be a .quad value
LoongArch: Complete KSave registers definition
Thorsten Blum [Wed, 6 Aug 2025 01:03:49 +0000 (03:03 +0200)]
smb: server: Fix extension string in ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Tue, 5 Aug 2025 09:13:13 +0000 (18:13 +0900)]
ksmbd: limit repeated connections from clients with the same IP
Repeated connections from clients with the same IP address may exhaust
the max connections and prevent other normal client connections.
This patch limit repeated connections from clients with the same IP.
Reported-by: tianshuo han <hantianshuo233@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Dave Airlie [Thu, 7 Aug 2025 19:50:02 +0000 (05:50 +1000)]
Merge tag 'drm-xe-next-fixes-2025-08-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
- SRIOV: PF fixes and removal of need of module param (Michal)
- Fix driver unbind around Devcoredump (Bala)
- Mark xe driver as BROKEN if kernel page size is not 4kB (Simon)
Stefan Metzmacher [Thu, 7 Aug 2025 16:12:14 +0000 (18:12 +0200)]
smb: client: only use a single wait_queue to monitor smbdirect connection status
There's no need for separate conn_wait and disconn_wait queues.
This will simplify the move to common code, the server code
already a single wait_queue for this.
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Stefan Metzmacher [Thu, 7 Aug 2025 16:12:13 +0000 (18:12 +0200)]
smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in _smbd_get_connection
It is already called long before we may hit this cleanup code path.
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Stefan Metzmacher [Thu, 7 Aug 2025 16:12:12 +0000 (18:12 +0200)]
smb: client: improve logging in smbd_conn_upcall()
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Stefan Metzmacher [Thu, 7 Aug 2025 16:12:11 +0000 (18:12 +0200)]
smb: client: return an error if rdma_connect does not return within 5 seconds
This matches the timeout for tcp connections.
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Nam Cao [Thu, 7 Aug 2025 08:10:51 +0000 (10:10 +0200)]
PCI: vmd: Fix wrong kfree() in vmd_msi_free()
vmd_msi_alloc() allocates struct vmd_irq and stashes it into
irq_data->chip_data associated with the VMD's interrupt domain.
vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and
frees it.
irq_get_chip_data() returns the chip_data associated with the top interrupt
domain. This worked in the past because VMD's interrupt domain was the top
domain.
But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()")
changed the interrupt domain hierarchy so VMD's interrupt domain is not the
top domain anymore. irq_get_chip_data() now returns the chip_data at the
MSI devices' interrupt domains. It is therefore broken for vmd_msi_free()
to kfree() this chip_data.
Fix by extracting the chip_data associated with the VMD's interrupt domain.
====================
perf/s390: Regression: Move uid filtering to BPF filters
v4: https://lore.kernel.org/bpf/20250806114227.14617-1-iii@linux.ibm.com/
v4 -> v5: Fix a typo in the commit message (Yonghong).
v3: https://lore.kernel.org/bpf/20250805130346.1225535-1-iii@linux.ibm.com/
v3 -> v4: Rename the new field to dont_enable (Alexei, Eduard).
Switch the Fixes: tag in patch 2 (Alexander, Thomas).
Fix typos in the cover letter (Thomas).
v2: https://lore.kernel.org/bpf/20250728144340.711196-1-tmricht@linux.ibm.com/
v2 -> v3: Use no_ioctl_enable in perf.
This series fixes a regression caused by moving UID filtering to BPF.
The regression affects all events that support auxiliary data, most
notably, "cycles" events on s390, but also PT events on Intel. The
symptom is missing events when UID filtering is enabled.
Patch 1 introduces a new option for the
bpf_program__attach_perf_event_opts() function.
Patch 2 makes use of it in perf, and also contains a lot of technical
details of why exactly the problem is occurring.
Thanks to Thomas Richter for the investigation and the initial version
of this fix, and to Jiri Olsa for suggestions.
This allocates the ring buffer for the event 'cycles'. With mmap()
the kernel creates the ring buffer:
perf_mmap(): kernel function to create the event's ring
| buffer to save the sampled data.
|
+-> ring_buffer_attach(): Allocates memory for ring buffer.
| The PMU has auxiliary data setup function. The
| has_aux(event) condition is true and the PMU's
| stop() is called to stop sampling. It is not
| restarted:
|
| if (has_aux(event))
| perf_event_stop(event, 0);
|
+-> cpumsf_pmu_stop():
Hardware sampling is stopped. No samples are generated and saved
anymore.
3. After the event 'cycles' has been mapped, the event is enabled a
second time in:
return immediately because event::state is already set to
PERF_EVENT_STATE_ACTIVE.
This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.
To remedy this, remove the first invocation of
ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.
Output after:
# ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.876 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 16200 (99.5%)
SAMPLE events: 16200
#
The software event succeeded both before and after the patch:
# ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
./perf test -w thloop 2
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 2.870 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 53506 (99.8%)
SAMPLE events: 53506
#
Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ilya Leoshkevich [Wed, 6 Aug 2025 16:22:41 +0000 (18:22 +0200)]
libbpf: Add the ability to suppress perf event enablement
Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.
Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.
Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>