Baokun Li [Wed, 22 Jan 2025 11:05:29 +0000 (19:05 +0800)]
ext4: abort journal on data writeback failure if in data_err=abort mode
The data_err=abort was initially introduced to address users' worries
about data corruption spreading unnoticed. With direct writes, we can
rely on return values to confirm successful writes to disk. But with
buffered writes, a successful return only means the data has been written
to memory. Users have no way of knowing if the data has actually written
it to disk unless they use fsync (which impacts performance and can
sometimes miss errors).
The current data_err=abort implementation relies on the ordered data list,
but past changes have inadvertently altered its behavior. For example, if
an extent is unwritten, we do not add the inode to the ordered data list.
Therefore, jbd2 will not wait for the data write-back of that inode to
complete and check for errors in the inode mapping. Moreover, the checks
performed by jbd2 can also miss errors.
Now, all buffered writes eventually call ext4_end_bio(), where I/O errors
are checked. Therefore, we can check for the data_err=abort mode at this
point and abort the journal in a kworker (due to the interrupt context).
Therefore, when data_err=abort is enabled, the journal is aborted in
ext4_end_io_end() when an I/O error is detected in ext4_end_bio() to make
users who are concerned about the contents of the file happy.
Baokun Li [Wed, 22 Jan 2025 11:05:27 +0000 (19:05 +0800)]
ext4: reject the 'data_err=abort' option in nojournal mode
data_err=abort aborts the journal on I/O errors. However, this option is
meaningless if journal is disabled, so it is rejected in nojournal mode
to reduce unnecessary checks. Also, this option is ignored upon remount.
Baokun Li [Wed, 22 Jan 2025 11:05:26 +0000 (19:05 +0800)]
ext4: do not convert the unwritten extents if data writeback fails
When dioread_nolock is turned on (the default), it will convert unwritten
extents to written at ext4_end_io_end(), even if the data writeback fails.
It leads to the possibility that stale data may be exposed when the
physical block corresponding to the file data is read-only (i.e., writes
return -EIO, but reads are normal).
Therefore a new ext4_io_end->flags EXT4_IO_END_FAILED is added, which
indicates that some bio write-back failed in the current ext4_io_end.
When this flag is set, the unwritten to written conversion is no longer
performed. Users can read the data normally until the caches are dropped,
after that, the failed extents can only be read to all 0.
Charles Han [Fri, 10 Jan 2025 09:24:21 +0000 (17:24 +0800)]
ext4: fix potential null dereference in ext4 kunit test
kunit_kzalloc() may return a NULL pointer, dereferencing it
without NULL check may lead to NULL dereference.
Add a NULL check for grp.
Fixes: ac96b56a2fbd ("ext4: Add unit test for mb_mark_used") Fixes: b7098e1fa7bc ("ext4: Add unit test for mb_free_blocks") Signed-off-by: Charles Han <hanchunchao@inspur.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20250110092421.35619-1-hanchunchao@inspur.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Julian Sun [Tue, 7 Jan 2025 04:55:49 +0000 (12:55 +0800)]
ext4: Introduce a new helper function ext4_generic_write_inline_data()
A new function, ext4_generic_write_inline_data(), is introduced
to provide a generic implementation of the common logic found in
ext4_da_write_inline_data_begin() and ext4_try_to_write_inline_data().
This function will be utilized in the subsequent two patches.
Ojaswin Mujoo [Thu, 21 Nov 2024 12:38:55 +0000 (18:08 +0530)]
ext4: protect ext4_release_dquot against freezing
Protect ext4_release_dquot against freezing so that we
don't try to start a transaction when FS is frozen, leading
to warnings.
Further, avoid taking the freeze protection if a transaction
is already running so that we don't need end up in a deadlock
as described in
46e294efc355 ext4: fix deadlock with fs freezing and EA inodes
Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-3-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sat, 8 Feb 2025 04:08:02 +0000 (23:08 -0500)]
ext4: introduce linear search for dentries
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel
function, utf8_casefold(), generates case-folded strings from the
commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code
points").
There are good reasons why this change should be made; it's actually
quite stupid that Unicode seems to think that the characters ❤ and ❤️
should be casefolded. Unfortimately because of the backwards
compatibility issue, this commit was reverted in 231825b2e1ff.
This problem is addressed by instituting a brute-force linear fallback
if a lookup fails on case-folded directory, which does result in a
performance hit when looking up files affected by the changing how
thekernel treats ignorable Uniode characters, or when attempting to
look up non-existent file names. So this fallback can be disabled by
setting an encoding flag if in the future, the system administrator or
the manufacturer of a mobile handset or tablet can be sure that there
was no opportunity for a kernel to insert file names with incompatible
encodings.
Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Jan Kara [Tue, 21 Jan 2025 14:09:26 +0000 (15:09 +0100)]
jbd2: Avoid long replay times due to high number or revoke blocks
Some users are reporting journal replay takes a long time when there is
excessive number of revoke blocks in the journal. Reported times are
like:
1048576 records - 95 seconds 2097152 records - 580 seconds
The problem is that hash chains in the revoke table gets excessively
long in these cases. Fix the problem by sizing the revoke table
appropriately before the revoke pass.
Thanks to Alexey Zhuravlev <azhuravlev@ddn.com> for benchmarking the
patch with large numbers of revoke blocks [1].
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250121140925.17231-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Fri, 20 Dec 2024 01:16:37 +0000 (09:16 +0800)]
ext4: move out common parts into ext4_fallocate()
Currently, all zeroing ranges, punch holes, collapse ranges, and insert
ranges first wait for all existing direct I/O workers to complete, and
then they acquire the mapping's invalidate lock before performing the
actual work. These common components are nearly identical, so we can
simplify the code by factoring them out into the ext4_fallocate().
Zhang Yi [Fri, 20 Dec 2024 01:16:36 +0000 (09:16 +0800)]
ext4: move out inode_lock into ext4_fallocate()
Currently, all five sub-functions of ext4_fallocate() acquire the
inode's i_rwsem at the beginning and release it before exiting. This
process can be simplified by factoring out the management of i_rwsem
into the ext4_fallocate() function.
Zhang Yi [Fri, 20 Dec 2024 01:16:35 +0000 (09:16 +0800)]
ext4: factor out ext4_do_fallocate()
Now the real job of normal fallocate are open coded in ext4_fallocate(),
factor out a new helper ext4_do_fallocate() to do the real job, like
others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this
can make the code more clear, no functional changes.
Zhang Yi [Fri, 20 Dec 2024 01:16:34 +0000 (09:16 +0800)]
ext4: refactor ext4_insert_range()
Simplify ext4_insert_range() and align its code style with that of
ext4_collapse_range(). Refactor it by: a) renaming variables, b)
removing redundant input parameter checks and moving the remaining
checks under i_rwsem in preparation for future refactoring, and c)
renaming the three stale error tags.
Zhang Yi [Fri, 20 Dec 2024 01:16:33 +0000 (09:16 +0800)]
ext4: refactor ext4_collapse_range()
Simplify ext4_collapse_range() and align its code style with that of
ext4_zero_range() and ext4_punch_hole(). Refactor it by: a) renaming
variables, b) removing redundant input parameter checks and moving
the remaining checks under i_rwsem in preparation for future
refactoring, and c) renaming the three stale error tags.
Zhang Yi [Fri, 20 Dec 2024 01:16:32 +0000 (09:16 +0800)]
ext4: refactor ext4_zero_range()
The current implementation of ext4_zero_range() contains complex
position calculations and stale error tags. To improve the code's
clarity and maintainability, it is essential to clean up the code and
improve its readability, this can be achieved by: a) simplifying and
renaming variables, making the style the same as ext4_punch_hole(); b)
eliminating unnecessary position calculations, writing back all data in
data=journal mode, and drop page cache from the original offset to the
end, rather than using aligned blocks; c) renaming the stale out_mutex
tags.
Zhang Yi [Fri, 20 Dec 2024 01:16:31 +0000 (09:16 +0800)]
ext4: refactor ext4_punch_hole()
The current implementation of ext4_punch_hole() contains complex
position calculations and stale error tags. To improve the code's
clarity and maintainability, it is essential to clean up the code and
improve its readability, this can be achieved by: a) simplifying and
renaming variables; b) eliminating unnecessary position calculations;
c) writing back all data in data=journal mode, and drop page cache from
the original offset to the end, rather than using aligned blocks,
d) renaming the stale error tags.
Zhang Yi [Fri, 20 Dec 2024 01:16:30 +0000 (09:16 +0800)]
ext4: don't write back data before punch hole in nojournal mode
There is no need to write back all data before punching a hole in
non-journaled mode since it will be dropped soon after removing space.
Therefore, the call to filemap_write_and_wait_range() can be eliminated.
Besides, similar to ext4_zero_range(), we must address the case of
partially punched folios when block size < page size. It is essential to
remove writable userspace mappings to ensure that the folio can be
faulted again during subsequent mmap write access.
In journaled mode, we need to write dirty pages out before discarding
page cache in case of crash before committing the freeing data
transaction, which could expose old, stale data, even if synchronization
has been performed.
Zhang Yi [Fri, 20 Dec 2024 01:16:29 +0000 (09:16 +0800)]
ext4: don't explicit update times in ext4_fallocate()
After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to
update permissions consistently"), we can update mtime and ctime
appropriately through file_modified() when doing zero range, collapse
rage, insert range and punch hole, hence there is no need to explicit
update times in those paths, just drop them.
Zhang Yi [Fri, 20 Dec 2024 01:16:28 +0000 (09:16 +0800)]
ext4: remove writable userspace mappings before truncating page cache
When zeroing a range of folios on the filesystem which block size is
less than the page size, the file's mapped blocks within one page will
be marked as unwritten, we should remove writable userspace mappings to
ensure that ext4_page_mkwrite() can be called during subsequent write
access to these partial folios. Otherwise, data written by subsequent
mmap writes may not be saved to disk.
Fix this by introducing ext4_truncate_page_cache_block_range() to remove
writable userspace mappings when truncating a partial folio range.
Additionally, move the journal data mode-specific handlers and
truncate_pagecache_range() into this function, allowing it to serve as a
common helper that correctly manages the page cache in preparation for
block range manipulations.
Linus Torvalds [Sun, 9 Feb 2025 18:05:32 +0000 (10:05 -0800)]
Merge tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Suppress false-positive -Wformat-{overflow,truncation}-non-kprintf
warnings regardless of the W= option
- Avoid CONFIG_TRIM_UNUSED_KSYMS dropping symbols passed to symbol_get()
- Fix a build regression of the Debian linux-headers package
* tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: install-extmod-build: add missing quotation marks for CC variable
kbuild: fix misspelling in scripts/Makefile.lib
kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS
scripts/Makefile.extrawarn: Do not show clang's non-kprintf warnings at W=1
Linus Torvalds [Sun, 9 Feb 2025 17:47:06 +0000 (09:47 -0800)]
Merge tag 'pm-6.14-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a recently introduced kernel crash due to a NULL pointer
dereference during system-wide suspend (Rafael Wysocki)"
* tag 'pm-6.14-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Restrict power.set_active propagation
Linus Torvalds [Sun, 9 Feb 2025 17:41:38 +0000 (09:41 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly clean the BSS to the PoC before allowing EL2 to access it
on nVHE/hVHE/protected configurations
- Propagate ownership of debug registers in protected mode after the
rework that landed in 6.14-rc1
- Stop pretending that we can run the protected mode without a GICv3
being present on the host
- Fix a use-after-free situation that can occur if a vcpu fails to
initialise the NV shadow S2 MMU contexts
- Always evaluate the need to arm a background timer for fully
emulated guest timers
- Fix the emulation of EL1 timers in the absence of FEAT_ECV
- Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
s390:
- move some of the guest page table (gmap) logic into KVM itself,
inching towards the final goal of completely removing gmap from the
non-kvm memory management code.
As an initial set of cleanups, move some code from mm/gmap into kvm
and start using __kvm_faultin_pfn() to fault-in pages as needed;
but especially stop abusing page->index and page->lru to aid in the
pgdesc conversion.
x86:
- Add missing check in the fix to defer starting the huge page
recovery vhost_task
- SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking
KVM: remove kvm_arch_post_init_vm
KVM: selftests: Fix spelling mistake "initally" -> "initially"
kvm: x86: SRSO_USER_KERNEL_NO is not synthesized
KVM: arm64: timer: Don't adjust the EL2 virtual timer offset
KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV
KVM: arm64: timer: Always evaluate the need for a soft timer
KVM: arm64: Fix nested S2 MMU structures reallocation
KVM: arm64: Fail protected mode init if no vgic hardware is present
KVM: arm64: Flush/sync debug state in protected mode
KVM: s390: selftests: Streamline uc_skey test to issue iske after sske
KVM: s390: remove the last user of page->index
KVM: s390: move PGSTE softbits
KVM: s390: remove useless page->index usage
KVM: s390: move gmap_shadow_pgt_lookup() into kvm
KVM: s390: stop using lists to keep track of used dat tables
KVM: s390: stop using page->index for non-shadow gmaps
KVM: s390: move some gmap shadowing functions away from mm/gmap.c
KVM: s390: get rid of gmap_translate()
KVM: s390: get rid of gmap_fault()
...
Commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of
parents and children") exposed an issue related to simple_pm_bus_pm_ops
that uses pm_runtime_force_suspend() and pm_runtime_force_resume() as
bus type PM callbacks for the noirq phases of system-wide suspend and
resume.
The problem is that pm_runtime_force_suspend() does not distinguish
runtime-suspended devices from devices for which runtime PM has never
been enabled, so if it sees a device with runtime PM status set to
RPM_ACTIVE, it will assume that runtime PM is enabled for that device
and so it will attempt to suspend it with the help of its runtime PM
callbacks which may not be ready for that. As it turns out, this
causes simple_pm_bus_runtime_suspend() to crash due to a NULL pointer
dereference.
Another problem related to the above commit and simple_pm_bus_pm_ops is
that setting runtime PM status of a device handled by the latter to
RPM_ACTIVE will actually prevent it from being resumed because
pm_runtime_force_resume() only resumes devices with runtime PM status
set to RPM_SUSPENDED.
To mitigate these issues, do not allow power.set_active to propagate
beyond the parent of the device with DPM_FLAG_SMART_SUSPEND set that
will need to be resumed, which should be a sufficient stop-gap for the
time being, but they will need to be properly addressed in the future
because in general during system-wide resume it is necessary to resume
all devices in a dependency chain in which at least one device is going
to be resumed.
Fixes: 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of parents and children") Closes: https://lore.kernel.org/linux-pm/1c2433d4-7e0f-4395-b841-b8eac7c25651@nvidia.com/ Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/6137505.lOV4Wx5bFT@rjwysocki.net
Linus Torvalds [Sat, 8 Feb 2025 22:12:17 +0000 (14:12 -0800)]
Merge tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
"Address a KUnit stack initialization regression that got tickled on
m68k, and solve a Clang(v14 and earlier) bug found by 0day:
- Fix stackinit KUnit regression on m68k
- Use ARRAY_SIZE() for memtostr*()/strtomem*()"
* tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
compiler.h: Introduce __must_be_byte_array()
compiler.h: Move C string helpers into C-only kernel section
stackinit: Fix comment for test_small_end
stackinit: Keep selftest union size small on m68k
Linus Torvalds [Sat, 8 Feb 2025 22:04:21 +0000 (14:04 -0800)]
Merge tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fix from Kees Cook:
"This is really a work-around for x86_64 having grown a syscall to
implement uretprobe, which has caused problems since v6.11.
This may change in the future, but for now, this fixes the unintended
seccomp filtering when uretprobe switched away from traps, and does so
with something that should be easy to backport.
- Allow uretprobe on x86_64 to avoid behavioral complications (Eyal
Birger)"
* tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: validate uretprobe syscall passes through seccomp
seccomp: passthrough uretprobe systemcall without filtering
Linus Torvalds [Sat, 8 Feb 2025 21:59:24 +0000 (13:59 -0800)]
Merge tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
"This is an alpha-specific fix, but since it touched ELF I was asked to
carry it.
- alpha/elf: Fix misc/setarch test of util-linux by removing 32bit
support (Eric W. Biederman)"
* tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
alpha/elf: Fix misc/setarch test of util-linux by removing 32bit support
Linus Torvalds [Sat, 8 Feb 2025 21:45:34 +0000 (13:45 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A number of fairly small fixes, mostly in drivers but two in the core
to change a retry for depopulation (a trendy new hdd thing that
reorganizes blocks away from failing elements) and one to fix a GFP_
annotation to avoid a lock dependency (the third core patch is all in
testing)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla1280: Fix kernel oops when debug level > 2
scsi: ufs: core: Fix error return with query response
scsi: storvsc: Set correct data length for sending SCSI command without payload
scsi: ufs: core: Fix use-after free in init error and remove paths
scsi: core: Do not retry I/Os during depopulation
scsi: core: Use GFP_NOIO to avoid circular locking dependency
scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed
scsi: ufs: core: Ensure clk_gating.lock is used only after initialization
scsi: ufs: core: Simplify temperature exception event handling
scsi: target: core: Add line break to status show
scsi: ufs: core: Fix the HIGH/LOW_TEMP Bit Definitions
scsi: core: Add passthrough tests for success and no failure definitions
Linus Torvalds [Sat, 8 Feb 2025 20:22:21 +0000 (12:22 -0800)]
Merge tag 'rust-fixes-6.14' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
- Do not export KASAN ODR symbols to avoid gendwarfksyms warnings
- Fix future Rust 1.86.0 (to be released 2025-04-03) x86_64 builds
- Clean future Rust 1.86.0 (to be released 2025-04-03) warning
- Fix future GCC 15 (to be released in a few months) builds
- Fix `rusttest` target in macOS
* tag 'rust-fixes-6.14' of https://github.com/Rust-for-Linux/linux:
x86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0
rust: kbuild: do not export generated KASAN ODR symbols
rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflags
rust: init: use explicit ABI to clean warning in future compilers
rust: kbuild: use host dylib naming in rusttestlib-kernel
Linus Torvalds [Sat, 8 Feb 2025 20:18:02 +0000 (12:18 -0800)]
Merge tag 'ftrace-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fix from Steven Rostedt:
"Function graph fix of notrace functions.
When the function graph tracer was restructured to use the global
section of the meta data in the shadow stack, the bit logic was
changed. There's a TRACE_GRAPH_NOTRACE_BIT that is the bit number in
the mask that tells if the function graph tracer is currently in the
"notrace" mode. The TRACE_GRAPH_NOTRACE is the mask with that bit set.
But when the code we restructured, the TRACE_GRAPH_NOTRACE_BIT was
used when it should have been the TRACE_GRAPH_NOTRACE mask. This made
notrace not work properly"
* tag 'ftrace-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BIT
Linus Torvalds [Sat, 8 Feb 2025 20:04:00 +0000 (12:04 -0800)]
Merge tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix a build regression on GCC 15 builds, caused by GCC changing the
default C version that is overriden in the main Makefile but not in
the x86 boot code Makefile"
* tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use '-std=gnu11' to fix build with GCC 15
Linus Torvalds [Sat, 8 Feb 2025 19:55:03 +0000 (11:55 -0800)]
Merge tag 'timers-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Fix a PREEMPT_RT bug in the clocksource verification code that caused
false positive warnings.
Also fix a timer migration setup bug when new CPUs are added"
* tag 'timers-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Fix off-by-one root mis-connection
clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context
Linus Torvalds [Sat, 8 Feb 2025 19:16:22 +0000 (11:16 -0800)]
Merge tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix a cfs_rq->h_nr_runnable accounting bug that trips up a defensive
SCHED_WARN_ON() on certain workloads. The bug is believed to be
(accidentally) self-correcting, hence no behavioral side effects are
expected.
Also print se.slice in debug output, since this value can now be set
via the syscall ABI and can be useful to track"
* tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Provide slice length for fair tasks
sched/fair: Fix inaccurate h_nr_runnable accounting with delayed dequeue
Linus Torvalds [Sat, 8 Feb 2025 19:05:54 +0000 (11:05 -0800)]
Merge tag 'irq-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Another followup fix for the procps genirq output formatting
regression caused by an optimization"
* tag 'irq-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove leading space from irq_chip::irq_print_chip() callbacks
Linus Torvalds [Sat, 8 Feb 2025 18:54:11 +0000 (10:54 -0800)]
Merge tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fix a dangling pointer bug in the futex code used by the uring code.
It isn't causing problems at the moment due to uring ABI limitations
leaving it essentially unused in current usages, but is a good idea to
fix nevertheless"
* tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Pass in task to futex_queue()
Steven Rostedt [Sat, 8 Feb 2025 05:15:11 +0000 (00:15 -0500)]
fgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BIT
The code was restructured where the function graph notrace code, that
would not trace a function and all its children is done by setting a
NOTRACE flag when the function that is not to be traced is hit.
There's a TRACE_GRAPH_NOTRACE_BIT which defines the bit in the flags and a
TRACE_GRAPH_NOTRACE which is the mask with that bit set. But the
restructuring used TRACE_GRAPH_NOTRACE_BIT when it should have used
TRACE_GRAPH_NOTRACE.
Nathan Chancellor [Thu, 17 Oct 2024 17:09:22 +0000 (10:09 -0700)]
kbuild: Move -Wenum-enum-conversion to W=2
-Wenum-enum-conversion was strengthened in clang-19 to warn for C, which
caused the kernel to move it to W=1 in commit 75b5ab134bb5 ("kbuild:
Move -Wenum-{compare-conditional,enum-conversion} into W=1") because
there were numerous instances that would break builds with -Werror.
Unfortunately, this is not a full solution, as more and more developers,
subsystems, and distributors are building with W=1 as well, so they
continue to see the numerous instances of this warning.
Since the move to W=1, there have not been many new instances that have
appeared through various build reports and the ones that have appeared
seem to be following similar existing patterns, suggesting that most
instances of this warning will not be real issues. The only alternatives
for silencing this warning are adding casts (which is generally seen as
an ugly practice) or refactoring the enums to macro defines or a unified
enum (which may be undesirable because of type safety in other parts of
the code).
Move the warning to W=2, where warnings that occur frequently but may be
relevant should reside.
Linus Torvalds [Sat, 8 Feb 2025 03:23:06 +0000 (19:23 -0800)]
Merge tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Three DFS fixes: DFS mount fix, fix for noisy log msg and one to
remove some unused code
- SMB3 Lease fix
* tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: change lease epoch type from unsigned int to __u16
smb: client: get rid of kstrdup() in get_ses_refpath()
smb: client: fix noisy when tree connecting to DFS interlink targets
smb: client: don't trust DFSREF_STORAGE_SERVER bit
Linus Torvalds [Fri, 7 Feb 2025 20:21:54 +0000 (12:21 -0800)]
Merge tag 'drm-fixes-2025-02-08' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Just regular drm fixes, amdgpu, xe and i915 mostly, but a few
scattered fixes. I think one of the i915 fixes fixes some build combos
that Guenter was seeing.
amdgpu:
- Add new tiling flag for DCC write compress disable
- Add BO metadata flag for DCC
- Fix potential out of bounds access in display
- Seamless boot fix
- CONFIG_FRAME_WARN fix
- PSR1 fix
xe:
- OA uAPI related fixes
- Fix SRIOV migration initialization
- Restore devcoredump to a sane state
i915:
- Fix the build error with clamp after WARN_ON on gcc 13.x+
- HDCP related fixes
- PMU fix zero delta busyness issue
- Fix page cleanup on DMA remap failure
- Drop 64bpp YUV formats from ICL+ SDR planes
- GuC log related fix
- DisplayPort related fixes
ivpu:
- Fix error handling
komeda:
- add return check
zynqmp:
- fix locking in DP code
ast:
- fix AST DP timeout
cec:
- fix broken CEC adapter check"
* tag 'drm-fixes-2025-02-08' of https://gitlab.freedesktop.org/drm/kernel: (29 commits)
drm/i915/dp: Fix potential infinite loop in 128b/132b SST
Revert "drm/amd/display: Use HW lock mgr for PSR1"
drm/amd/display: Respect user's CONFIG_FRAME_WARN more for dml files
accel/amdxdna: Add MODULE_FIRMWARE() declarations
drm/i915/dp: Iterate DSC BPP from high to low on all platforms
drm/xe: Fix and re-enable xe_print_blob_ascii85()
drm/xe/devcoredump: Move exec queue snapshot to Contexts section
drm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked
drm/xe/pf: Fix migration initialization
drm/xe/oa: Preserve oa_ctrl unused bits
drm/amd/display: Fix seamless boot sequence
drm/amd/display: Fix out-of-bound accesses
drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan
drm/i915/backlight: Return immediately when scale() finds invalid parameters
drm/i915/dp: Return min bpc supported by source instead of 0
drm/i915/dp: fix the Adaptive sync Operation mode for SDP
drm/i915/guc: Debug print LRC state entries only if the context is pinned
drm/i915: Drop 64bpp YUV formats from ICL+ SDR planes
drm/i915: Fix page cleanup on DMA remap failure
drm/i915/pmu: Fix zero delta busyness issue
...
Linus Torvalds [Fri, 7 Feb 2025 19:00:33 +0000 (11:00 -0800)]
Merge tag 'block-6.14-20250207' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- MD pull request via Song:
- fix an error handling path for md-linear
- NVMe pull request via Keith:
- Connection fixes for fibre channel transport (Daniel)
- Endian fixes (Keith, Christoph)
- Cleanup fix for host memory buffer (Francis)
- Platform specific power quirks (Georg)
- Target memory leak (Sagi)
- Use appropriate controller state accessor (Daniel)
- Fixup for a regression introduced last week, where sunvdc wasn't
updated for an API change, causing compilation failures on sparc64.
* tag 'block-6.14-20250207' of git://git.kernel.dk/linux:
drivers/block/sunvdc.c: update the correct AIP call
md: Fix linear_set_limits()
nvme-fc: use ctrl state getter
nvme: make nvme_tls_attrs_group static
nvmet: add a missing endianess conversion in nvmet_execute_admin_connect
nvmet: the result field in nvmet_alloc_ctrl_args is little endian
nvmet: fix a memory leak in controller identify
nvme-fc: do not ignore connectivity loss during connecting
nvme: handle connectivity loss in nvme_set_queue_count
nvme-fc: go straight to connecting state when initializing
nvme-pci: Add TUXEDO IBP Gen9 to Samsung sleep quirk
nvme-pci: Add TUXEDO InfinityFlex to Samsung sleep quirk
nvme-pci: remove redundant dma frees in hmb
nvmet: fix rw control endian access
WangYuli [Fri, 7 Feb 2025 07:08:55 +0000 (15:08 +0800)]
kbuild: install-extmod-build: add missing quotation marks for CC variable
While attempting to build a Debian packages with CC="ccache gcc", I
saw the following error as builddeb builds linux-headers-$KERNELVERSION:
make HOSTCC=ccache gcc VPATH= srcroot=. -f ./scripts/Makefile.build obj=debian/linux-headers-6.14.0-rc1/usr/src/linux-headers-6.14.0-rc1/scripts
make[6]: *** No rule to make target 'gcc'. Stop.
Upon investigation, it seems that one instance of $(CC) variable reference
in ./scripts/package/install-extmod-build was missing quotation marks,
causing the above error.
Add the missing quotation marks around $(CC) to fix build.
Linus Torvalds [Fri, 7 Feb 2025 18:34:50 +0000 (10:34 -0800)]
Merge tag 'pm-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a handful of issues in the amd-pstate driver, the airoha
cpufreq driver build, a (recently added) possible NULL pointer
dereference in the cpufreq code and a possible memory leak in the
power capping subsystem:
- Fix cpufreq_policy reference counting and prevent max_perf from
going above the current limit in amd-pstate, and drop a redundant
goto label from it (Dhananjay Ugwekar)
- Prevent the per-policy boost_enabled flag in amd-pstate from
getting out of sync with the actual state after boot failures
(Lifeng Zheng)
- Fix a recently added possible NULL pointer dereference in the
cpufreq core (Aboorva Devarajan)
- Fix a build issue related to CONFIG_OF and COMPILE_TEST
dependencies in the airoha cpufreq driver (Arnd Bergmann)
- Fix a possible memory leak in the power capping subsystem (Joe
Hattori)"
* tag 'pm-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Fix cpufreq_policy ref counting
cpufreq: prevent NULL dereference in cpufreq_online()
cpufreq: airoha: modify CONFIG_OF dependency
cpufreq/amd-pstate: Fix max_perf updation with schedutil
cpufreq/amd-pstate: Remove the goto label in amd_pstate_update_limits
cpufreq/amd-pstate: Fix per-policy boost flag incorrect when fail
powercap: call put_device() on an error path in powercap_register_control_type()
Linus Torvalds [Fri, 7 Feb 2025 17:50:33 +0000 (09:50 -0800)]
Merge tag 'gpio-fixes-for-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix interrupt support in gpio-pca953x
- fix configfs attribute locking in gpio-sim
- limit the visibility of the GPIO_GRGPIO Kconfig symbol to OF systems
only
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Use my kernel.org address for ACPI GPIO work
gpio: GPIO_GRGPIO should depend on OF
gpio: sim: lock hog configfs items if present
gpio: pca953x: Improve interrupt support
Linus Torvalds [Fri, 7 Feb 2025 17:22:31 +0000 (09:22 -0800)]
Merge tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix fsnotify FMODE_NONOTIFY* handling.
This also disables fsnotify on all pseudo files by default apart from
very select exceptions. This carries a regression risk so we need to
watch out and adapt accordingly. However, it is overall a significant
improvement over the current status quo where every rando file can
get fsnotify enabled.
- Cleanup and simplify lockref_init() after recent lockref changes.
- Fix vboxfs build with gcc-15.
- Add an assert into inode_set_cached_link() to catch corrupt links.
- Allow users to also use an empty string check to detect whether a
given mount option string was empty or not.
- Fix how security options were appended to statmount()'s ->mnt_opt
field.
- Fix statmount() selftests to always check the returned mask.
- Fix uninitialized value in vfs_statx_path().
- Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading
and preserve extensibility.
* tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: sanity check the length passed to inode_set_cached_link()
pidfs: improve ioctl handling
fsnotify: disable pre-content and permission events by default
selftests: always check mask returned by statmount(2)
fsnotify: disable notification by default for all pseudo files
fs: fix adding security options to statmount.mnt_opt
fsnotify: use accessor to set FMODE_NONOTIFY_*
lockref: remove count argument of lockref_init
gfs2: switch to lockref_init(..., 1)
gfs2: use lockref_init for gl_lockref
statmount: let unset strings be empty
vboxsf: fix building with GCC 15
fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
Linus Torvalds [Fri, 7 Feb 2025 17:16:07 +0000 (09:16 -0800)]
Merge tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Nothing major, things continue to be fairly quiet over here.
- add a SubmittingPatches to clarify that patches submitted for
bcachefs do, in fact, need to be tested
- discard path now correctly issues journal flushes when needed, this
fixes performance issues when the filesystem is nearly full and
we're bottlenecked on copygc
- fix a bug that could cause the pending rebalance work accounting to
be off when devices are being onlined/offlined; users should report
if they are still seeing this
- and a few more trivial ones"
* tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs:
bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance
bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()
bcachefs: Fix discard path journal flushing
bcachefs: fix deadlock in journal_entry_open()
bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()
bcachefs docs: SubmittingPatches.rst
Hector Martin [Thu, 6 Feb 2025 18:21:46 +0000 (03:21 +0900)]
MAINTAINERS: Remove myself
I no longer have any faith left in the kernel development process or
community management approach.
Apple/ARM platform development will continue downstream. If I feel like
sending some patches upstream in the future myself for whatever subtree
I may, or I may not. Anyone who feels like fighting the upstreaming
fight themselves is welcome to do so.
Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Fri, 7 Feb 2025 12:06:31 +0000 (13:06 +0100)]
Merge branches 'acpi-property' and 'acpi-resource'
Merge a new ACPI IRQ override quirk for Eluktronics MECH-17 (Gannon
Kolding) and an acpi_data_prop_read() fix making it reflect the OF
counterpart behavior in error cases (Andy Shevchenko).
* acpi-property:
ACPI: property: Fix return value for nval == 0 in acpi_data_prop_read()
* acpi-resource:
ACPI: resource: IRQ override for Eluktronics MECH-17
Mateusz Guzik [Tue, 4 Feb 2025 21:32:07 +0000 (22:32 +0100)]
vfs: sanity check the length passed to inode_set_cached_link()
This costs a strlen() call when instatianating a symlink.
Preferably it would be hidden behind VFS_WARN_ON (or compatible), but
there is no such facility at the moment. With the facility in place the
call can be patched out in production kernels.
In the meantime, since the cost is being paid unconditionally, use the
result to a fixup the bad caller.
This is not expected to persist in the long run (tm).
Sample splat:
bad length passed for symlink [/tmp/syz-imagegen43743633/file0/file0] (got 131109, expected 37)
[rest of WARN blurp goes here]
Christian Brauner [Tue, 4 Feb 2025 13:51:20 +0000 (14:51 +0100)]
pidfs: improve ioctl handling
Pidfs supports extensible and non-extensible ioctls. The extensible
ioctls need to check for the ioctl number itself not just the ioctl
command otherwise both backward- and forward compatibility are broken.
The pidfs ioctl handler also needs to look at the type of the ioctl
command to guard against cases where "[...] a daemon receives some
random file descriptor from a (potentially less privileged) client and
expects the FD to be of some specific type, it might call ioctl() on
this FD with some type-specific command and expect the call to fail if
the FD is of the wrong type; but due to the missing type check, the
kernel instead performs some action that userspace didn't expect."
(cf. [1]]
Christian Brauner [Tue, 4 Feb 2025 10:25:45 +0000 (11:25 +0100)]
Merge patch series "Fix for huge faults regression"
Amir Goldstein <amir73il@gmail.com> says:
The two Fix patches have been tested by Alex together and each one
independently.
I also verified that they pass the LTP inoityf/fanotify tests.
* patches from https://lore.kernel.org/r/20250203223205.861346-1-amir73il@gmail.com:
fsnotify: disable pre-content and permission events by default
fsnotify: disable notification by default for all pseudo files
fsnotify: use accessor to set FMODE_NONOTIFY_*
Amir Goldstein [Mon, 3 Feb 2025 22:32:05 +0000 (23:32 +0100)]
fsnotify: disable pre-content and permission events by default
After introducing pre-content events, we had a regression related to
disabling huge faults on files that should never have pre-content events
enabled.
This happened because the default f_mode of allocated files (0) does
not disable pre-content events.
Pre-content events are disabled in file_set_fsnotify_mode_by_watchers()
but internal files may not get to call this helper.
Initialize f_mode to disable permission and pre-content events for all
files and if needed they will be enabled for the callers of
file_set_fsnotify_mode_by_watchers().
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Miklos Szeredi [Wed, 29 Jan 2025 16:06:41 +0000 (17:06 +0100)]
selftests: always check mask returned by statmount(2)
STATMOUNT_MNT_OPTS can actually be missing if there are no options. This
is a change of behavior since 75ead69a7173 ("fs: don't let statmount return
empty strings").
The other checks shouldn't actually trigger, but add them for correctness
and for easier debugging if the test fails.
Amir Goldstein [Mon, 3 Feb 2025 22:32:04 +0000 (23:32 +0100)]
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.
Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.
Pre-content events are also kept disabled for sockets and pipes.
Amir Goldstein [Mon, 3 Feb 2025 22:32:03 +0000 (23:32 +0100)]
fsnotify: use accessor to set FMODE_NONOTIFY_*
The FMODE_NONOTIFY_* bits are a 2-bits mode. Open coding manipulation
of those bits is risky. Use an accessor file_set_fsnotify_mode() to
set the mode.
Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.
Christian Brauner [Thu, 30 Jan 2025 15:04:42 +0000 (16:04 +0100)]
Merge patch series "further lockref cleanups"
Andreas Gruenbacher <agruenba@redhat.com> says:
Here's an updated version with an additional comment saying that
lockref_init() initializes count to 1.
* patches from https://lore.kernel.org/r/20250130135624.1899988-1-agruenba@redhat.com:
lockref: remove count argument of lockref_init
gfs2: switch to lockref_init(..., 1)
gfs2: use lockref_init for gl_lockref
Andreas Gruenbacher [Thu, 30 Jan 2025 13:56:22 +0000 (14:56 +0100)]
gfs2: switch to lockref_init(..., 1)
In qd_alloc(), initialize the lockref count to 1 to cover the common
case. Compensate for that in gfs2_quota_init() by adjusting the count
back down to 0; this only occurs when mounting the filesystem rw.
Miklos Szeredi [Thu, 30 Jan 2025 12:15:00 +0000 (13:15 +0100)]
statmount: let unset strings be empty
Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.
It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.
With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).
Brahmajit Das [Tue, 21 Jan 2025 16:26:48 +0000 (21:56 +0530)]
vboxsf: fix building with GCC 15
Building with GCC 15 results in build error
fs/vboxsf/super.c:24:54: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization]
24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375";
| ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Due to GCC having enabled -Werror=unterminated-string-initialization[0]
by default. Separately initializing each array element of
VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15
and fixing the build error.
Su Hui [Sun, 19 Jan 2025 02:59:47 +0000 (10:59 +0800)]
fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
Clang static checker(scan-build) warning:
fs/stat.c:287:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
287 | stat->result_mask |= STATX_MNT_ID_UNIQUE;
| ~~~~~~~~~~~~~~~~~ ^
fs/stat.c:290:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
290 | stat->result_mask |= STATX_MNT_ID;
When vfs_getattr() failed because of security_inode_getattr(), 'stat' is
uninitialized. In this case, there is a harmless garbage problem in
vfs_statx_path(). It's better to return error directly when
vfs_getattr() failed, avoiding garbage value and more clearly.
Before attaching a new root to the old root, the children counter of the
new root is checked to verify that only the upcoming CPU's top group have
been connected to it. However since the recently added commit b729cc1ec21a
("timers/migration: Fix another race between hotplug and idle entry/exit")
this check is not valid anymore because the old root is pre-accounted
as a child to the new root. Therefore after connecting the upcoming
CPU's top group to the new root, the children count to be expected must
be 2 and not 1 anymore.
This omission results in the old root to not be connected to the new
root. Then eventually the system may run with more than one top level,
which defeats the purpose of a single idle migrator.
Also the old root is pre-accounted but not connected upon the new root
creation. But it can be connected to the new root later on. Therefore
the old root may be accounted twice to the new root. The propagation of
such overcommit can end up creating a double final top-level root with a
groupmask incorrectly initialized. Although harmless given that the final
top level roots will never have a parent to walk up to, this oddity
opportunistically reported the core issue:
Geert Uytterhoeven [Wed, 5 Feb 2025 14:22:56 +0000 (15:22 +0100)]
genirq: Remove leading space from irq_chip::irq_print_chip() callbacks
The space separator was factored out from the multiple chip name prints,
but several irq_chip::irq_print_chip() callbacks still print a leading
space. Remove the superfluous double spaces.
Dave Airlie [Fri, 7 Feb 2025 04:47:11 +0000 (14:47 +1000)]
Merge tag 'drm-misc-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A couple of fixes for ivpu to error handling, komeda for format
handling, AST DP timeout fix when enabling the output, locking fix for
zynqmp DP support, tiled format handling in drm/client, and refcounting
fix for bochs
Kent Overstreet [Sun, 26 Jan 2025 02:29:45 +0000 (21:29 -0500)]
bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance
Previously, bch2_bkey_sectors_need_rebalance() called
bch2_target_accepts_data(), checking whether the target is writable.
However, this means that adding or removing devices from a target would
change the value of bch2_bkey_sectors_need_rebalance() for an existing
extent; this needs to be invariant so that the extent trigger can
correctly maintain rebalance_work accounting.
Instead, check target_accepts_data() in io_opts_to_rebalance_opts(),
before creating the bch_extent_rebalance entry.
This fixes (one?) cause of rebalance_work accounting being off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 27 Jan 2025 06:21:44 +0000 (01:21 -0500)]
bcachefs: Fix discard path journal flushing
The discard path is supposed to issue journal flushes when there's too
many buckets empty buckets that need a journal commit before they can be
written to again, but at some point this code seems to have been lost.
Bring it back with a new optimization to make sure we don't issue too
many journal flushes: the journal now tracks the sequence number of the
most recent flush in progress, which the discard path uses when deciding
which buckets need a journal flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Jeongjun Park [Sun, 2 Feb 2025 06:13:51 +0000 (15:13 +0900)]
bcachefs: fix deadlock in journal_entry_open()
In the previous commit b3d82c2f2761, code was added to prevent journal sequence
overflow. Among them, the code added to journal_entry_open() uses the
bch2_fs_fatal_err_on() function to handle errors.
However, __journal_res_get() , which calls journal_entry_open() , calls
journal_entry_open() while holding journal->lock , but bch2_fs_fatal_err_on()
internally tries to acquire journal->lock , which results in a deadlock.
So we need to add a locked helper to handle fatal errors even when the
journal->lock is held.
Fixes: b3d82c2f2761 ("bcachefs: Guard against journal seq overflow") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Jeongjun Park [Fri, 31 Jan 2025 16:20:31 +0000 (01:20 +0900)]
bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()
For some unknown reason, checks on struct bkey_s_c_snapshot and struct
bkey_s_c_snapshot_tree pointers are missing.
Therefore, I think it would be appropriate to fix the incorrect pointer checking
through this patch.
Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kees Cook [Wed, 5 Feb 2025 21:45:26 +0000 (13:45 -0800)]
string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
The destination argument of memtostr*() and strtomem*() must be a
fixed-size char array at compile time, so there is no need to use
__builtin_object_size() (which is useful for when an argument is
either a pointer or unknown). Instead use ARRAY_SIZE(), which has the
benefit of working around a bug in Clang (fixed[1] in 15+) that got
__builtin_object_size() wrong sometimes.
Kees Cook [Wed, 5 Feb 2025 20:48:07 +0000 (12:48 -0800)]
compiler.h: Introduce __must_be_byte_array()
In preparation for adding stricter type checking to the str/mem*()
helpers, provide a way to check that a variable is a byte array
via __must_be_byte_array().
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kees Cook <kees@kernel.org>
Kees Cook [Wed, 5 Feb 2025 20:32:49 +0000 (12:32 -0800)]
compiler.h: Move C string helpers into C-only kernel section
The C kernel helpers for evaluating C Strings were positioned where they
were visible to assembly inclusion, which was not intended. Move them
into the kernel and C-only area of the header so future changes won't
confuse the assembler.
Fixes: d7a516c6eeae ("compiler.h: Fix undefined BUILD_BUG_ON_ZERO()") Fixes: 559048d156ff ("string: Check for "nonstring" attribute on strscpy() arguments") Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org>
Alice Ryhl [Mon, 3 Feb 2025 08:40:57 +0000 (08:40 +0000)]
x86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0
When using Rust on the x86 architecture, we are currently using the
unstable target.json feature to specify the compilation target. Rustc is
going to change how softfloat is specified in the target.json file on
x86, thus update generate_rust_target.rs to specify softfloat using the
new option.
Note that if you enable this parameter with a compiler that does not
recognize it, then that triggers a warning but it does not break the
build.
[ For future reference, this solves the following error:
RUSTC L rust/core.o
error: Error loading target specification: target feature
`soft-float` is incompatible with the ABI but gets enabled in
target spec. Run `rustc --print target-list` for a list of
built-in targets
Eyal Birger [Sun, 2 Feb 2025 16:29:21 +0000 (08:29 -0800)]
selftests/seccomp: validate uretprobe syscall passes through seccomp
The uretprobe syscall is implemented as a performance enhancement on
x86_64 by having the kernel inject a call to it on function exit; User
programs cannot call this system call explicitly.
As such, this syscall is considered a kernel implementation detail and
should not be filtered by seccomp.
Enhance the seccomp bpf test suite to check that uretprobes can be
attached to processes without the killing the process regardless of
seccomp policy.
Eyal Birger [Sun, 2 Feb 2025 16:29:20 +0000 (08:29 -0800)]
seccomp: passthrough uretprobe systemcall without filtering
When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.
The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.
Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.
Pass this systemcall through seccomp without depending on configuration.
Note: uretprobe is currently only x86_64 and isn't expected to ever be
supported in i386.
Linus Torvalds [Thu, 6 Feb 2025 20:32:03 +0000 (12:32 -0800)]
Merge tag 'pci-v6.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- When saving a device's state, always save the upstream bridge's PM L1
Substates configuration as well because the bridge never saves its
own state, and restoring a device needs the state for both ends; this
was a regression that caused link and power management errors after
suspend/resume (Ilpo Järvinen)
- Correct TPH Control Register write, where we wrote the ST Mode where
the THP Requester Enable value was intended (Robin Murphy)
* tag 'pci-v6.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/TPH: Restore TPH Requester Enable correctly
PCI/ASPM: Fix L1SS saving
Linus Torvalds [Thu, 6 Feb 2025 20:25:35 +0000 (12:25 -0800)]
Merge tag 'for-linus-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Three fixes for xen_hypercall_hvm() that was introduced in the 6.13
cycle"
* tag 'for-linus-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: remove unneeded dummy push from xen_hypercall_hvm()
x86/xen: add FRAME_END to xen_hypercall_hvm()
x86/xen: fix xen_hypercall_hvm() to not clobber %rbx
Rafael J. Wysocki [Thu, 6 Feb 2025 19:39:43 +0000 (20:39 +0100)]
Merge tag 'amd-pstate-v6.14-2025-02-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate driver fixes for 6.14-rc2 from Mario Limonciello:
"* Fix some error cleanup paths with mutex use and boost
* Fix a ref counting issue
* Fix a schedutil issue"
* tag 'amd-pstate-v6.14-2025-02-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Fix cpufreq_policy ref counting
cpufreq/amd-pstate: Fix max_perf updation with schedutil
cpufreq/amd-pstate: Remove the goto label in amd_pstate_update_limits
cpufreq/amd-pstate: Fix per-policy boost flag incorrect when fail
Kees Cook [Tue, 4 Feb 2025 17:45:13 +0000 (09:45 -0800)]
stackinit: Keep selftest union size small on m68k
The stack frame on m68k is very sensitive to the size of what needs to
be stored. Like done for long string testing, reduce the size of the
large trailing struct in the union initialization testing.