gcc-15: disable '-Wunterminated-string-initialization' entirely for now
I had left the warning around but as a non-fatal error to get my gcc-15
builds going, but fixed up some of the most annoying warning cases so
that it wouldn't be *too* verbose.
Because I like the _concept_ of the warning, even if I detested the
implementation to shut it up.
It turns out the implementation to shut it up is even more broken than I
thought, and my "shut up most of the warnings" patch just caused fatal
errors on gcc-14 instead.
I had tested with clang, but when I upgrade my development environment,
I try to do it on all machines because I hate having different systems
to maintain, and hadn't realized that gcc-14 now had issues.
The ACPI case is literally why I wanted to have a *type* that doesn't
trigger the warning (see commit d5d45a7f2619: "gcc-15: make
'unterminated string initialization' just a warning"), instead of
marking individual places as "__nonstring".
But gcc-14 doesn't like that __nonstring location that shut gcc-15 up,
because it's on an array of char arrays, not on one single array:
drivers/acpi/tables.c:399:1: error: 'nonstring' attribute ignored on objects of type 'const char[][4]' [-Werror=attributes]
399 | static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst __nonstring = {
| ^~~~~~
and my attempts to nest it properly with a type had failed, because of
how gcc doesn't like marking the types as having attributes, only
symbols.
There may be some trick to it, but I was already annoyed by the bad
attribute design, now I'm just entirely fed up with it.
I wish gcc had a proper way to say "this type is a *byte* array, not a
string".
The obvious thing would be to distinguish between "char []" and an
explicitly signed "unsigned char []" (as opposed to an implicitly
unsigned char, which is typically an architecture-specific default, but
for the kernel is universal thanks to '-funsigned-char').
But any "we can typedef a 8-bit type to not become a string just because
it's an array" model would be fine.
But "__attribute__((nonstring))" is sadly not that sane model.
Reported-by: Chris Clayton <chris2553@googlemail.com> Fixes: 4b4bd8c50f48 ("gcc-15: acpi: sprinkle random '__nonstring' crumbles around") Fixes: d5d45a7f2619 ("gcc-15: make 'unterminated string initialization' just a warning") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.
That's a great warning.
Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.
So that seems kind of silly.
That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:
drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
1102 | netdetect_info->matches[netdetect_info->n_matches++] = match;
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
1120 | match->channels[match->n_channels++] =
| ~~~~~~~~~~~~~~~~~^~
side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.
The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators. That happens to work fine on any
sane architecture, but it's still wrong.
This does *not* fix that more serious problem. This only splits the two
assignments into two statements and fixes the compiler warning. I need
to get rid of the new warnings in order to be able to actually do any
build testing.
All of these cases are perfectly valid and good traditional C, but hit
by the "you're not NUL-terminating your byte array" warning.
And none of the cases want any terminating NUL character.
Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974
magnetometer driver, I just removed the explicit array size and let gcc
expand the 3-byte and 6-byte arrays by one extra byte, because it was
the simpler change).
gcc-15: get rid of misc extra NUL character padding
This removes two cases of explicit NUL padding that now causes warnings
because of '-Wunterminated-string-initialization' being part of -Wextra
in gcc-15.
Gcc is being silly in this case when it says that it truncates a NUL
terminator, because in these cases there were _multiple_ NUL characters.
But we can get rid of the warning by just simplifying the two
initializers that trigger the warning for me, so this does exactly that.
I'm not sure why the power supply code did that odd
.attr_name = #_name "\0",
pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core:
Cleanup power supply sysfs attribute list"), but that 'attr_name[]'
field is an explicitly sized character array in a statically initialized
variable, and a string initializer always has a terminating NUL _and_
statically initialized character arrays are zero-padded anyway, so it
really seems to be rather extraneous belt-and-suspenders.
The zero_uuid[16] initialization in drivers/md/bcache/super.c makes
perfect sense, but it isn't necessary for the same reasons, and not
worth the new gcc warning noise.
gcc-15: acpi: sprinkle random '__nonstring' crumbles around
This is not great: I'd much rather introduce a typedef that is a "ACPI
name byte buffer", and use that to mark these special 4-byte ACPI names
that do not use NUL termination.
But as noted in the previous commit ("gcc-15: make 'unterminated string
initialization' just a warning") gcc doesn't actually seem to support
that notion, so instead you have to just mark every single array
declaration individually.
So this is not pretty, but this gets rid of the bulk of the annoying
warnings during an allmodconfig build for me.
and we use this all over the kernel. And the warning is fine, but gcc
developers apparently never made a reasonable way to disable it. As is
(sadly) tradition with these things.
Yes, there's "__attribute__((nonstring))", and we have a macro to make
that absolutely disgusting syntax more palatable (ie the kernel syntax
for that monstrosity is just "__nonstring").
But that attribute is misdesigned. What you'd typically want to do is
tell the compiler that you are using a type that isn't a string but a
byte array, but that doesn't work at all:
warning: ‘nonstring’ attribute does not apply to types [-Wattributes]
and because of this fundamental mis-design, you then have to mark each
instance of that pattern.
This is particularly noticeable in our ACPI code, because ACPI has this
notion of a 4-byte "type name" that gets used all over, and is exactly
this kind of byte array.
This is a sad oversight, because the warning is useful, but really would
be so much better if gcc had also given a sane way to indicate that we
really just want a byte array type at a type level, not the broken "each
and every array definition" level.
So now instead of creating a nice "ACPI name" type using something like
typedef char acpi_name_t[4] __nonstring;
we have to do things like
char name[ACPI_NAMESEG_SIZE] __nonstring;
in every place that uses this concept and then happens to have the
typical initializers.
This is annoying me mainly because I think the warning _is_ a good
warning, which is why I'm not just turning it off in disgust. But it is
hampered by this bad implementation detail.
[ And obviously I'm doing this now because system upgrades for me are
something that happen in the middle of the release cycle: don't do it
before or during travel, or just before or during the busy merge
window period. ]
Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"16 hotfixes. 2 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
All patches are basically for MM although five are alterations to
MAINTAINERS"
[ Basic counting skills are clearly not a strictly necessary requirement
for kernel maintainers. - Linus ]
* tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add section for locking of mm's and VMAs
mm: vmscan: fix kswapd exit condition in defrag_mode
mm: vmscan: restore high-cpu watermark safety in kswapd
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
mm, hugetlb: increment the number of pages to be reset on HVO
writeback: fix false warning in inode_to_wb()
docs: ABI: replace mcroce@microsoft.com with new Meta address
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
MAINTAINERS: add memory advice section
MAINTAINERS: add mmap trace events to MEMORY MAPPING
mm: memcontrol: fix swap counter leak from offline cgroup
MAINTAINERS: add MM subsection for the page allocator
MAINTAINERS: update SLAB ALLOCATOR maintainers
fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
Merge tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Revert the hfs{plus} deprecation warning that's also included in this
pull request. The commit introducing the deprecation warning resides
rather early in this branch. So simply dropping it would've rebased
all other commits which I decided to avoid. Hence the revert in the
same branch
[ Background - the deprecation warning discussion resulted in people
stepping up, and so hfs{plus} will have a maintainer taking care of
it after all.. - Linus ]
- Switch CONFIG_SYSFS_SYCALL default to n and decouple from
CONFIG_EXPERT
- Fix an audit bug caused by changes to our kernel path lookup helpers
this cycle. Audit needs the parent path even if the dentry it tried
to look up is negative
- Ensure that the kernel path lookup helpers leave the passed in path
argument clean when they return an error. This is consistent with all
our other helpers
- Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant
information is available to kernel consumers as well
- Don't set a timer and call schedule() if the timer will expire
immediately in epoll
- Make netfs lookup tables with __nonstring
* tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Revert "hfs{plus}: add deprecation warning"
fs: move the bdex_statx call to vfs_getattr_nosec
netfs: Mark __nonstring lookup tables
eventpoll: Set epoll timeout if it's in the future
fs: ensure that *path_locked*() helpers leave passed path pristine
fs: add kern_path_locked_negative()
hfs{plus}: add deprecation warning
Kconfig: switch CONFIG_SYSFS_SYCALL default to n
* tag 'i2c-for-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: atr: Fix wrong include
i2c: cros-ec-tunnel: defer probe if parent EC is not present
Merge tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Initialize hash variables in ftrace subops logic
The fix that simplified the ftrace subops logic opened a path where
some variables could be used without being initialized, and done
subtly where the compiler did not catch it. Initialize those
variables to the EMPTY_HASH, which is the default hash.
- Reinitialize the hash pointers after they are freed
Some of the hash pointers in the subop logic were freed but may still
be referenced later. To prevent use-after-free bugs, initialize them
back to the EMPTY_HASH.
- Free the ftrace hashes when they are replaced
The fix that simplified the subops logic updated some hash pointers,
but left the original hash that they were pointing to where they are
no longer used. This caused a memory leak. Free the hashes that are
pointed to by the pointers when they are replaced.
- Fix size initialization of ftrace direct function hash
The ftrace direct function hash used by BPF initialized the hash size
incorrectly. It checked the size of items to a hard coded 32, which
made the hash bit size of 5. The hash size is supposed to be limited
by the bit size of the hash, as the bitmask is allowed to be greater
than 5. Rework the size check to first pass the number of elements to
fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating
the hash.
- Fix format output of ftrace_graph_ent_entry event
The field depth of the ftrace_graph_ent_entry event is of size 4 but
the output showed it as unsigned long and use "%lu". Change it to
unsigned int and use "%u" in the print format that is displayed to
user space.
- Fix the trace event filter on strings
Events can be filtered on numbers or string values. The return value
checked from strncpy_from_kernel_nofault() and
strncpy_from_user_nofault() was used to determine if reading the
strings would fault or not. It would return fault if the value was
non zero, which is basically meant that it was always considering the
read as a fault.
- Add selftest to test trace event string filtering
In order to catch the breakage of the string filtering, add a self
test to make sure that it continues to work.
* tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: selftests: Add testing a user string to filters
tracing: Fix filter string testing
ftrace: Fix type of ftrace_graph_ent_entry.depth
ftrace: fix incorrect hash size in register_ftrace_direct()
ftrace: Free ftrace hashes after they are replaced in the subops code
ftrace: Reinitialize hash to EMPTY_HASH after freeing
ftrace: Initialize variables for ftrace_startup/shutdown_subops()
Merge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- v6.15 libcrc clean-up makes invalid configurations possible
- Fix a potential deadlock introduced during the v6.15 merge window
* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: decrease sc_count directly if fail to queue dl_recall
nfs: add missing selections of CONFIG_CRC32
Merge tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix missing KASAN LLVM flags on first build (and fix spurious
rebuilds) by skipping '--target'
- Fix Make < 4.3 build error by using '$(pound)'
- Fix UML build error by removing 'volatile' qualifier from io
helpers
- Fix UML build error by adding 'dma_{alloc,free}_attrs()' helpers
- Clean gendwarfksyms warnings by avoiding to export '__pfx' symbols
- Clean objtool warning by adding a new 'noreturn' function for
1.86.0
- Disable 'needless_continue' Clippy lint due to new 1.86.0 warnings
- Add missing 'ffi' crate to 'generate_rust_analyzer.py'
'pin-init' crate:
- Import a couple fixes from upstream"
* tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: helpers: Add dma_alloc_attrs() and dma_free_attrs()
rust: helpers: Remove volatile qualifier from io helpers
rust: kbuild: use `pound` to support GNU Make < 4.3
objtool/rust: add one more `noreturn` Rust function for Rust 1.86.0
rust: kasan/kbuild: fix missing flags on first build
rust: disable `clippy::needless_continue`
rust: kbuild: Don't export __pfx symbols
rust: pin-init: use Markdown autolinks in Rust comments
rust: pin-init: alloc: restrict `impl ZeroableOption` for `Box` to `T: Sized`
scripts: generate_rust_analyzer: Add ffi crate
Merge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Easter rc3 pull request, fixes in all the usuals, amdgpu, xe, msm,
with some i915/ivpu/mgag200/v3d fixes, then a couple of bits in
dma-buf/gem.
Hopefully has no easter eggs in it.
dma-buf:
- Correctly decrement refcounter on errors
i915:
- Fix DP DSC configurations that require 3 DSC engines per pipe
xe:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally
msm:
- Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
case of multi-rect
- Fix to validate plane_state pointer before using it in
dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized to
-1 because NO_IRQ indices start from 0 now
- GPU:
- Fix IB_SIZE overflow
ivpu:
- Fix debugging
- Fixes to frequency
- Support firmware API 3.28.3
- Flush jobs upon reset
mgag200:
- Set vblank start to correct values
v3d:
- Fix Indirect Dispatch"
* tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
drm/msm/a6xx+: Don't let IB_SIZE overflow
drm/xe/pxp: do not queue unneeded terminations from debugfs
drm/xe/dma_buf: stop relying on placement in unmap
drm/xe/userptr: fix notifier vs folio deadlock
drm/xe: Set LRC addresses before guc load
drm/mgag200: Fix value in <VBLKSTR> register
drm/gem: Internally test import_attach for imported objects
drm/amdgpu: Use the right function for hdp flush
drm/amd/display/dml2: use vzalloc rather than kzalloc
drm/amdgpu: Add back JPEG to video caps for carrizo and newer
drm/amdgpu: fix warning of drm_mm_clean
drm/amd: Forbid suspending into non-default suspend states
drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
drm/i915/dp: Check for HAS_DSC_3ENGINES while configuring DSC slices
drm/i915/display: Add macro for checking 3 DSC engines
dma-buf/sw_sync: Decrement refcount on error in sw_sync_ioctl_get_deadline()
accel/ivpu: Add cmdq_id to job related logs
accel/ivpu: Show NPU frequency in sysfs
accel/ivpu: Fix the NPU's DPU frequency calculation
accel/ivpu: Update FW Boot API to version 3.28.3
...
Dave Airlie [Sat, 19 Apr 2025 05:09:29 +0000 (15:09 +1000)]
Merge tag 'drm-msm-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.15-rc3
Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
case of multi-rect
- Fix to validate plane_state pointer before using it in
dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized
to -1 because NO_IRQ indices start from 0 now
Dave Airlie [Sat, 19 Apr 2025 04:59:47 +0000 (14:59 +1000)]
Merge tag 'drm-xe-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally
Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix hard link lease key problem when close is deferred
- Revert the socket lockdep/refcount workarounds done in cifs.ko now
that it is fixed at the socket layer
* tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
Revert "smb: client: fix TCP timers deadlock after rmmod"
Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free"
smb3 client: fix open hardlink on deferred close file error
Rob Clark [Mon, 17 Mar 2025 15:00:06 +0000 (08:00 -0700)]
drm/msm/a6xx+: Don't let IB_SIZE overflow
IB_SIZE is only b0..b19. Starting with a6xx gen3, additional fields
were added above the IB_SIZE. Accidentially setting them can cause
badness. Fix this by properly defining the CP_INDIRECT_BUFFER packet
and using the generated builder macro to ensure unintended bits are not
set.
v2: add missing type attribute for IB_BASE
v3: fix offset attribute in xml
Reported-by: Connor Abbott <cwabbott0@gmail.com> Fixes: a83366ef19ea ("drm/msm/a6xx: add A640/A650 to gpulist") Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/643396/
Merge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix hypercall detection on Xen guests
- Extend the AMD microcode loader SHA check to Zen5, to block loading
of any unreleased standalone Zen5 microcode patches
- Add new Intel CPU model number for Bartlett Lake
- Fix the workaround for AMD erratum 1054
- Fix buggy early memory acceptance between SEV-SNP guests and the EFI
stub
* tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/sev: Avoid shared GHCB page for early memory acceptance
x86/cpu/amd: Fix workaround for erratum 1054
x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores
x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches
x86/xen: Fix __xen_hypercall_setfunc()
Merge tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a lockdep false positive in the i8253 driver"
* tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/i8253: Call clockevent_i8253_disable() with interrupts disabled
Merge tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fixes from Ingo Molnar:
"Miscellaneous fixes and a hardware-enabling change:
- Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR
systems
- Fix Intel PEBS buffer overflow handling
- Fix skid in Intel PEBS sampling of user-space general purpose
registers
- Enable Panther Lake PMU support - similar to Lunar Lake"
* tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Panther Lake support
perf/x86/intel: Allow to update user space GPRs from PEBS records
perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR
perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX
perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR
Merge tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irq fixes from Ingo Molnar:
- Fix BCM2712 irqchip driver Kconfig dependencies required on the
Raspberry PI5
- Fix spurious interrupts on RZ/G3E SMARC EVK systems
- Fix crash regression on Sun/NIU hardware
- Apply MSI driver quirk for Sun Neptune chips
* tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled
irqchip/renesas-rzv2h: Prevent TINT spurious interrupt
net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads
PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
Merge tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
"Fix a genksyms related bug, triggered by recent changes to the percpu
code, and update the .clang-format file to not include obsolete
function names"
* tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genksyms: Handle typeof_unqual keyword and __seg_{fs,gs} qualifiers
clang-format: Update the ForEachMacros list for v6.15-rc1
Merge tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert
Uytterhoeven)
- ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh)
- ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan
Chancellor)
- string: Add load_unaligned_zeropad() code path to sized_strscpy()
(Peter Collingbourne)
- kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo
Frascino)
- Disable GCC randstruct for COMPILE_TEST
* tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/prime_numbers: KUnit test should not select PRIME_NUMBERS
ubsan: Fix panic from test_ubsan_out_of_bounds
lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP
hardening: Disable GCC randstruct for COMPILE_TEST
kasan: Add strscpy() test to trigger tag fault on arm64
string: Add load_unaligned_zeropad() code path to sized_strscpy()
Merge tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- check for both the new AND old (deprecated) setter callback when
changing GPIO direction to output
* tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Allow to use setters with return value for output-only gpios
Merge tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Add missing DVFS support flags for the Lunar Lake and Panther Lake
platforms to the int340x Intel thermal driver and fix DLVR support
for Panther Lake in it (Srinivas Pandruvada)"
* tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Fix Panther Lake DLVR support
thermal: intel: int340x: Add missing DVFS support flags
Merge tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These are mostly cpufreq fixes, some of which address recent
regressions and some address older issues that have come to light
during the last two weeks, and a runtime PM documentation correction:
- Fix the performance-to-frequency scaling factor computation on
systems using HWP in the intel_pstate driver after a recent
incorrect update of it (Rafael Wysocki)
- Fix the usage of the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
in the schedutil cpufreq governor after a recent update of it that
has caused frequency limits changes to be missed sometimes (Rafael
Wysocki)
- Address some recently discovered synchronization issues related to
frequency limits changes in the schedutil cpufreq governor and in
the cpufreq core (Rafael Wysocki)
- Fix ITMT support in the amd-pstate cpufreq driver so that it is
enabled after asym priorities have been correctly initialized for
all CPUs (K Prateek Nayak)
- Fix changing min/max limits in the amd-pstate cpufreq driver while
on the performance governor (Dhananjay Ugwekar)
- Fix a function name in the runtime PM documentation that was
previously incorrectly updated by mistake (Sakari Ailus)"
* tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid using inconsistent policy->min and policy->max
cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
cpufreq/sched: Explicitly synchronize limits_changed flag handling
cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
cpufreq: intel_pstate: Fix hwp_get_cpu_scaling()
cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor
Merge tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for an issue where C instructions ended up in non-C builds, due
to some broken inline assembly in the KGDB breakpoint insertion code
- A fix to avoid spurious printk messages about misaligned access
performance probing
- A fix for a handful of issues with /proc/iomem's reserved region
handling
- A pair of fixes for module relocation processing
- A few build-time fixes
* tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break
riscv: KGDB: Do not inline arch_kgdb_breakpoint()
riscv: Avoid fortify warning in syscall_get_arguments()
riscv: Provide all alternative macros all the time
riscv: module: Allocate PLT entries for R_RISCV_PLT32
riscv: module: Fix out-of-bounds relocation access
riscv: Properly export reserved regions in /proc/iomem
riscv: Fix unaligned access info messages
riscv: Avoid fortify warning in syscall_get_arguments()
Documentation: riscv: Fix typo MIMPLID -> MIMPID
riscv: Use kvmalloc_array on relocation_hashtable
Merge tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"Fixes dynevent_limitations.tc test failure on dash by detecting and
handling bash and dash differences in evaluating \\"
* tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc
Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix integer overflow in server disconnect deadtime calculation
- Three fixes for potential use after frees: one for oplocks, and one
for leases and one for kerberos authentication
- Fix to prevent attempted write to directory
- Fix locking warning for durable scavenger thread
* tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Prevent integer overflow in calculation of deadtime
ksmbd: fix the warning from __kernel_write_iter
ksmbd: fix use-after-free in smb_break_all_levII_oplock()
ksmbd: fix use-after-free in __smb2_lease_break_noti()
ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING"
ksmbd: Fix dangling pointer in krb_authenticate
Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- MD pull via Yu:
- fix raid10 missing discard IO accounting (Yu Kuai)
- fix bitmap stats for bitmap file (Zheng Qixing)
- fix oops while reading all member disks failed during
check/repair (Meir Elisha)
- NVMe pull via Christoph:
- fix scan failure for non-ANA multipath controllers (Hannes
Reinecke)
- fix multipath sysfs links creation for some cases (Hannes
Reinecke)
- PCIe endpoint fixes (Damien Le Moal)
- use NULL instead of 0 in the auth code (Damien Le Moal)
- Various ublk fixes:
- Slew of selftest additions
- Improvements and fixes for IO cancelation
- Tweak to Kconfig verbiage
- Fix for page dirtying for blk integrity mapped pages
* tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits)
selftests: ublk: add generic_06 for covering fault inject
ublk: simplify aborting ublk request
ublk: remove __ublk_quiesce_dev()
ublk: improve detection and handling of ublk server exit
ublk: move device reset into ublk_ch_release()
ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
ublk: add ublk_force_abort_dev()
ublk: properly serialize all FETCH_REQs
selftests: ublk: move creating UBLK_TMP into _prep_test()
selftests: ublk: add test_stress_05.sh
selftests: ublk: support user recovery
selftests: ublk: support target specific command line
selftests: ublk: increase max nr_queues and queue depth
selftests: ublk: set queue pthread's cpu affinity
selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
selftests: ublk: add two stress tests for zero copy feature
selftests: ublk: run stress tests in parallel
selftests: ublk: make sure _add_ublk_dev can return in sub-shell
selftests: ublk: cleanup backfile automatically
selftests: ublk: add io_uring uapi header
...
Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Correctly cap iov_iter->nr_segs for imports of registered buffers,
both kbuf and normal ones.
Three cleanups to make it saner first, then two fixes for each of the
buffer types.
This fixes a performance regression where partial buffer usage
doesn't trim the tail number of segments, leading the block layer to
iterate the IOs to check if it needs splitting.
- Two patches tweaking the newly introduced zero-copy rx API, mostly to
keep the API consistent once we add multiple interface queues per
ring support in the 6.16 release.
- zc rx unmapping fix for a dead device
* tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux:
io_uring/zcrx: fix late dma unmap for a dead dev
io_uring/rsrc: ensure segments counts are correct on kbuf buffers
io_uring/rsrc: send exact nr_segs for fixed buffer
io_uring/rsrc: refactor io_import_fixed
io_uring/rsrc: separate kbuf offset adjustments
io_uring/rsrc: don't skip offset calculation
io_uring/zcrx: add pp to ifq conversion helper
io_uring/zcrx: return ifq id to the user
x86/boot/sev: Avoid shared GHCB page for early memory acceptance
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Sandipan Das [Fri, 18 Apr 2025 06:19:40 +0000 (11:49 +0530)]
x86/cpu/amd: Fix workaround for erratum 1054
Erratum 1054 affects AMD Zen processors that are a part of Family 17h
Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However,
when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected
processors was incorrectly changed in a way that the IRPerfEn bit gets
set only for unaffected Zen 1 processors.
Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This
includes a subset of Zen 1 (Family 17h Models 30h and above) and all
later processors. Also clear X86_FEATURE_IRPERF on affected processors
so that the IRPerfCount register is not used by other entities like the
MSR PMU driver.
Pavel Begunkov [Fri, 18 Apr 2025 12:02:27 +0000 (13:02 +0100)]
io_uring/zcrx: fix late dma unmap for a dead dev
There is a problem with page pools not dma-unmapping immediately when
the device is going down, and delaying it until the page pool is
destroyed, which is not allowed (see links). That just got fixed for
normal page pools, and we need to address memory providers as well.
Unmap pages in the memory provider uninstall callback, and protect it
with a new lock. There is also a gap between when a dma mapping is
created and the mp is installed, so if the device is killed in between,
io_uring would be holding on to dma mappings to a dead device with no
one to call ->uninstall. Move it to page pool init and rely on
->is_mapped to make sure it's only done once.
Lorenzo Stoakes [Wed, 16 Apr 2025 10:38:37 +0000 (11:38 +0100)]
MAINTAINERS: add section for locking of mm's and VMAs
We place this under memory mapping as related to memory mapping
abstractions in the form of mm_struct and vm_area_struct (VMA). Now we
have separated out mmap/vma locking logic into the mmap_lock.c and
mmap_lock.h files, so this should encapsulate the majority of the mm
locking logic in the kernel.
Suren is best placed to maintain this logic as the core architect of VMA
locking as a whole.
Link: https://lkml.kernel.org/r/e6ed679a184ca444b20dfa77af96913fd8b5efa0.1744799282.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Wed, 16 Apr 2025 13:45:40 +0000 (09:45 -0400)]
mm: vmscan: fix kswapd exit condition in defrag_mode
Vlastimil points out an issue with kswapd in defrag_mode not waking up
kcompactd reliably.
Background: When kswapd is woken for any higher-order request, it
initially checks those high-order watermarks to decide if work is
necesary. However, it cannot (efficiently) meet the contiguity goal of
such a request by itself. So once it has reclaimed a compaction gap, it
adjusts the request down to check for free order-0 pages, then wakes
kcompactd to coalesce them into larger blocks.
In defrag_mode, the initial watermark check needs to be analogously
against free pageblocks. However, once kswapd drops the high-order to
hand off contiguity work, it also needs to fall back to base page
watermarks - otherwise it'll keep reclaiming until blocks are freed.
While it appears kcompactd is woken up frequently enough to do most of the
compaction work, kswapd ends up overreclaiming by quite a bit:
Johannes Weiner [Wed, 16 Apr 2025 13:45:39 +0000 (09:45 -0400)]
mm: vmscan: restore high-cpu watermark safety in kswapd
Vlastimil points out that commit a211c6550efc ("mm: page_alloc:
defrag_mode kswapd/kcompactd watermarks") switched kswapd from
zone_watermark_ok_safe() to the standard, percpu-cached version of reading
free pages, thus dropping the watermark safety precautions for systems
with high CPU counts (e.g. >212 cpus on 64G). Restore them.
Since zone_watermark_ok_safe() is no longer the right interface, and this
was the last caller of the function anyway, open-code the
zone_page_state_snapshot() conditional and delete the function.
Lorenzo Stoakes [Wed, 16 Apr 2025 13:53:01 +0000 (14:53 +0100)]
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
Pedro has offered to review memory mapping code. He has good experience
in this area and has provided excellent feedback on memory mapping series
in the past so I feel he'll be a great addition.
Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Tue, 15 Apr 2025 09:50:07 +0000 (11:50 +0200)]
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with
CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute
mapped shared vs. mapped exclusively) to then adjust the entire mapcount.
This means that another process might stumble in do_wp_page() over a
PTE-mapped PMD folio that is indicated as "exclusively mapped", but still
has an entire mapcount (PMD mapping), because it is racing with the
process that is unmapping the folio (PMD mapping). Note that do_wp_page()
will back off once it detects the remaining folio reference from the
process that is in the process of unmapping the folio.
This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio))
check in do_wp_page(), that can easily be reproduced by looping a couple
of times over allocating a PMD THP, forking a child where we immediately
unmap it again, and writing in the parent concurrently to the THP.
While we could adjust the sequence in __folio_remove_rmap(), let's rater
move the mapcount sanity checks after the mapcount vs. refcount
stabilization phase. With this fix, a simple reproducer is happy.
While at it, convert the two VM_WARN_ON_ONCE() we are moving to
VM_WARN_ON_ONCE_FOLIO().
Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oscar Salvador [Tue, 15 Apr 2025 11:18:59 +0000 (13:18 +0200)]
mm, hugetlb: increment the number of pages to be reset on HVO
commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to
page[3]") shifted hugetlb specific stuff, and now mapping overlaps
_hugetlb_cgroup field.
Upon restoring the vmemmap for HVO, only the first two tail pages are
reset, and this causes the check in free_tail_page_prepare() to fail as it
finds an unexpected mapping value in some tails.
Increment the number of pages to be reset to 4 (head + 3 tail pages)
Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andreas Gruenbacher [Sat, 12 Apr 2025 16:39:12 +0000 (18:39 +0200)]
writeback: fix false warning in inode_to_wb()
inode_to_wb() is used also for filesystems that don't support cgroup
writeback. For these filesystems inode->i_wb is stable during the
lifetime of the inode (it points to bdi->wb) and there's no need to hold
locks protecting the inode->i_wb dereference. Improve the warning in
inode_to_wb() to not trigger for these filesystems.
Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 10 Apr 2025 03:57:14 +0000 (11:57 +0800)]
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
Not like fault_in_readable() or fault_in_writeable(), in
fault_in_safe_writeable() local variable 'start' is increased page by page
to loop till the whole address range is handled. However, it mistakenly
calculates the size of the handled range with 'uaddr - start'.
Fix it here.
Andreas said:
: In gfs2, fault_in_iov_iter_writeable() is used in
: gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially
: affects buffered as well as direct reads. This bug could cause those
: gfs2 functions to spin in a loop.
Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Yanjun.Zhu <yanjun.zhu@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Fri, 11 Apr 2025 07:27:24 +0000 (08:27 +0100)]
MAINTAINERS: add memory advice section
The madvise code straddles both VMA and page table manipulation. As a
result, separate it out into its own section and add maintainers/reviewers
as appropriate.
We additionally include the mman-common.h file as this contains the shared
madvise flags and it is important we maintain this alongside madvise.c.
Link: https://lkml.kernel.org/r/20250411072724.10841-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MAINTAINERS: add mmap trace events to MEMORY MAPPING
MEMORY MAPPING does not list the mmap.h trace point file, but does list
the mmap.c file. Couple the trace points with the users and authors of
the trace points for notifications of updates.
Link: https://lkml.kernel.org/r/20250411173328.8172-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 10 Apr 2025 08:18:12 +0000 (16:18 +0800)]
mm: memcontrol: fix swap counter leak from offline cgroup
commit 73f839b6d2ed addressed an issue regarding the swap counter leak
that occurred from an offline cgroup. However, commit 89ce924f0bd4
modified the parameter from @swap_memcg to @memcg (presumably this
alteration was introduced while resolving conflicts). Fix this problem by
reverting this minor change.
Link: https://lkml.kernel.org/r/20250410081812.10073-1-songmuchun@bytedance.com Fixes: 89ce924f0bd4 ("mm: memcontrol: move memsw charge callbacks to v1") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MAINTAINERS: add MM subsection for the page allocator
Add a subsection for the page allocator, including compaction as it's
crucial for high-order allocations and works together with the
anti-fragmentation features. Add reviewers (including myself) who
voluteered.
Link: https://lkml.kernel.org/r/20250410090021.72296-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Brendan Jackman <jackmanb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With permission, reduce the number of maintainers. Create a CREDITS entry
for Joonsoo (Pekka already has one). Thanks for all the work!
Link: https://lkml.kernel.org/r/20250410090021.72296-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The issue is that when we split a large ZONE_DEVICE folio to order-0 ones,
we don't reset the order/_nr_pages. As folio->_nr_pages overlays
page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it
has folio->memcg_data set. And we never manually initialize
folio->memcg_data in fsdax code, because we never expect it to be set at
all.
When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries
to use folio->memcg_data (because it's non-NULL) but it does not actually
point at a memcg, resulting in the problem.
Alison also observed that these folios sometimes have "locked" set, which
is rather concerning (folios locked from the beginning ...). The reason
is that the order for large folios is stored in page[1]->flags, which
become the folio->flags of a new small folio.
Let's fix it by adding a folio helper to clear order/_nr_pages for
splitting purposes.
Maybe we should reinitialize other large folio flags / folio members as
well when splitting, because they might similarly cause harm once page[1]
becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be
set for fsdax, so at least page[1]->flags might be as expected with this
fix.
From a quick glimpse, initializing ->mapping, ->pgmap and ->share should
re-initialize most things from a previous page[1] used by large folios
that fsdax cares about. For example folio->private might not get
reinitialized, but maybe that's not relevant -- no traces of it's use in
fsdax code. Needs a closer look.
Another thing that should be considered in the future is performing
similar checks as we perform in free_tail_page_prepare()
-- checking pincount etc.
-- when freeing a large fsdax folio.
Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kirill A. Shutemov [Sat, 29 Mar 2025 17:10:29 +0000 (19:10 +0200)]
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
When the last page in the zone is accepted, __accept_page() calls
static_branch_dec(). This function takes cpu_hotplug_lock, which can lead
to a deadlock if the allocation occurs during CPU bringup path as
_cpu_up() also takes the lock.
To prevent this deadlock, defer static_branch_dec() to a workqueue.
Call static_branch_dec() only when the workqueue is not yet initialized.
Workqueues are initialized before CPU bring up, so this will not conflict
with the first scenario.
Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com Fixes: 55ad43e8ba0f ("mm: add a helper to accept page") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Steven Rostedt [Thu, 17 Apr 2025 22:30:03 +0000 (18:30 -0400)]
tracing: Fix filter string testing
The filter string testing uses strncpy_from_kernel/user_nofault() to
retrieve the string to test the filter against. The if() statement was
incorrect as it considered 0 as a fault, when it is only negative that it
faulted.
drm/xe/pxp: do not queue unneeded terminations from debugfs
The PXP terminate debugfs currently unconditionally simulates a
termination, no matter what the HW status is. This is unneeded if PXP is
not in use and can cause errors if the HW init hasn't completed yet.
To solve these issues, we can simply limit the terminations to the cases
where PXP is fully initialized and in use.
v2: s/pxp_status/ready/ to avoid confusion with pxp->status (John)
Fixes: 385a8015b214 ("drm/xe/pxp: Add PXP debugfs support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4749 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250416201622.1295369-1-daniele.ceraolospurio@intel.com
(cherry picked from commit ba1f62a0cac84757ca35f4217e3cd3a2654233ae) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Matthew Auld [Thu, 10 Apr 2025 16:27:17 +0000 (17:27 +0100)]
drm/xe/dma_buf: stop relying on placement in unmap
The is_vram() is checking the current placement, however if we consider
exported VRAM with dynamic dma-buf, it looks possible for the xe driver
to async evict the memory, notifying the importer, however importer does
not have to call unmap_attachment() immediately, but rather just as
"soon as possible", like when the dma-resv idles. Following from this we
would then pipeline the move, attaching the fence to the manager, and
then update the current placement. But when the unmap_attachment() runs
at some later point we might see that is_vram() is now false, and take
the complete wrong path when dma-unmapping the sg, leading to
explosions.
To fix this check if the sgl was mapping a struct page.
v2:
- The attachment can be mapped multiple times it seems, so we can't
really rely on encoding something in the attachment->priv. Instead
see if the page_link has an encoded struct page. For vram we expect
this to be NULL.
Matthew Auld [Mon, 14 Apr 2025 13:25:40 +0000 (14:25 +0100)]
drm/xe/userptr: fix notifier vs folio deadlock
User is reporting what smells like notifier vs folio deadlock, where
migrate_pages_batch() on core kernel side is holding folio lock(s) and
then interacting with the mappings of it, however those mappings are
tied to some userptr, which means calling into the notifier callback and
grabbing the notifier lock. With perfect timing it looks possible that
the pages we pulled from the hmm fault can get sniped by
migrate_pages_batch() at the same time that we are holding the notifier
lock to mark the pages as accessed/dirty, but at this point we also want
to grab the folio locks(s) to mark them as dirty, but if they are
contended from notifier/migrate_pages_batch side then we deadlock since
folio lock won't be dropped until we drop the notifier lock.
Fortunately the mark_page_accessed/dirty is not really needed in the
first place it seems and should have already been done by hmm fault, so
just remove it.
Lucas De Marchi [Thu, 10 Apr 2025 04:59:34 +0000 (21:59 -0700)]
drm/xe: Set LRC addresses before guc load
The metadata saved in the ADS is read by GuC when it's initialized.
Saving the addresses to the LRCs when they are populated is too late as
GuC will keep using the old ones.
This was causing GuC to use the RCS LRC for any engine class. It's not a
big problem on a Linux-only scenario since the they are used by GuC only
on media engines when the watchdog is triggered. However, in a
virtualization scenario with Windows as the VF, it causes the wrong LRCs
to be loaded as the watchdog is used for all engines.
Fix it by letting guc_golden_lrc_init() initialize the metadata, like
other *_init() functions, and later guc_golden_lrc_populate() to copy
the LRCs to the right places. The former is called before the second GuC
load, while the latter is called after LRCs have been recorded.
Cc: Chee Yin Wong <chee.yin.wong@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.11+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Chee Yin Wong <chee.yin.wong@intel.com> Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit c31a0b6402d15b530514eee9925adfcb8cfbb1c9) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Merge tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Usual set of small fixes/logging improvements.
One bigger user reported fix, for inode <-> dirent inconsistencies
reported in fsck, after moving a subvolume that had been snapshotted"
* tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix snapshotting a subvolume, then renaming it
bcachefs: Add missing READ_ONCE() for metadata replicas
bcachefs: snapshot_node_missing is now autofix
bcachefs: Log message when incompat version requested but not enabled
bcachefs: Print version_incompat_allowed on startup
bcachefs: Silence extent_poisoned error messages
bcachefs: btree_root_unreadable_and_scan_found_nothing now AUTOFIX
bcachefs: fix bch2_dev_usage_full_read_fast()
bcachefs: Don't print data read retry success on non-errors
bcachefs: Add missing error handling
bcachefs: Prevent granting write refs when filesystem is read-only
Merge tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio
Pull vfio fix from Alex Williamson:
- Include devices where the platform indicates PCI INTx is not routed
by setting pdev->irq to zero in the expanded virtualization of the
PCI pin register. This provides consistency in the INFO and SET_IRQS
ioctls (Alex Williamson)
* tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio:
vfio/pci: Virtualize zero INTx PIN if no pdev->irq
Merge tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more device specific fixes plus one trivial quirk.
There's a couple of patches for Tegra which avoid some fairly
spectacular log spam if the hardware breaks in ways which were
actually seen in production, plus a fix for the i.MX driver to
propagate errors properly when setting up the hardware.
We also have a trivial patch marking the sun4i driver as being
compatible with GPIO chip selects"
* tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-imx: Add check for spi_imx_setupxfer()
spi: tegra210-quad: add rate limiting and simplify timeout error message
spi: tegra210-quad: use WARN_ON_ONCE instead of WARN_ON for timeouts
spi: sun4i: add support for GPIO chip select lines
ftrace_graph_ent.depth is int, but ftrace_graph_ent_entry.depth is
unsigned long. This confuses trace-cmd on 64-bit big-endian systems and
makes it print a huge amount of spaces. Fix this by using unsigned int,
which has a matching size, instead.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250412221847.17310-2-iii@linux.ibm.com Fixes: ff5c9c576e75 ("ftrace: Add support for function argument to graph tracer") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
ftrace: fix incorrect hash size in register_ftrace_direct()
The maximum of the ftrace hash bits is made fls(32) in
register_ftrace_direct(), which seems illogical. So, we fix it by making
the max hash bits FTRACE_HASH_MAX_BITS instead.
Link: https://lore.kernel.org/20250413014444.36724-1-dongml2@chinatelecom.cn Fixes: d05cb470663a ("ftrace: Fix modification of direct_function hash while in use") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 17 Apr 2025 17:59:39 +0000 (13:59 -0400)]
ftrace: Free ftrace hashes after they are replaced in the subops code
The subops processing creates new hashes when adding and removing subops.
There were some places that the old hashes that were replaced were not
freed and this caused some memory leaks.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250417135939.245b128d@gandalf.local.home Fixes: 0ae6b8ce200d ("ftrace: Fix accounting of subop hashes") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 17 Apr 2025 14:40:17 +0000 (10:40 -0400)]
ftrace: Initialize variables for ftrace_startup/shutdown_subops()
The reworking to fix and simplify the ftrace_startup_subops() and the
ftrace_shutdown_subops() made it possible for the filter_hash and
notrace_hash variables to be used uninitialized in a way that the compiler
did not catch it.
Initialize both filter_hash and notrace_hash to the EMPTY_HASH as that is
what they should be if they never are used.
- ipv6: add exception routes to GC list in rt6_insert_exception()
- netfilter: conntrack: fix erroneous removal of offload bit
- Bluetooth:
- fix sending MGMT_EV_DEVICE_FOUND for invalid address
- l2cap: process valid commands in too long frame
- btnxpuart: Revert baudrate change in nxp_shutdown
Previous releases - always broken:
- ethtool: fix memory corruption during SFP FW flashing
- eth:
- hibmcge: fixes for link and MTU handling, pause frames etc
- igc: fixes for PTM (PCIe timestamping)
- dsa: b53: enable BPDU reception for management port
Misc:
- fixes for Netlink protocol schemas"
* tag 'net-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings
net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps
net: ethernet: mtk_eth_soc: reapply mdc divider on reset
net: ti: icss-iep: Fix possible NULL pointer dereference for perout request
net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()
net: ti: icssg-prueth: Fix kernel warning while bringing down network interface
netfilter: conntrack: fix erronous removal of offload bit
net: don't try to ops lock uninitialized devs
ptp: ocp: fix start time alignment in ptp_ocp_signal_set
net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails
net: dsa: free routing table on probe failure
net: dsa: clean up FDB, MDB, VLAN entries on unbind
net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupported
net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registered
net: txgbe: fix memory leak in txgbe_probe() error path
net: bridge: switchdev: do not notify new brentries as changed
net: b53: enable BPDU reception for management port
netlink: specs: rt-neigh: prefix struct nfmsg members with ndm
netlink: specs: rt-link: adjust mctp attribute naming
netlink: specs: rtnetlink: attribute naming corrections
...
Kent Overstreet [Thu, 17 Apr 2025 18:09:56 +0000 (14:09 -0400)]
bcachefs: Fix snapshotting a subvolume, then renaming it
Subvolume roots and the dirents that point to them are special; they
don't obey the normal snapshot versioning rules because they cross
snapshot boundaries.
We don't keep around older versions of subvolume dirents on rename - we
don't need to, because subvolume dirents are only visible in the parent
subvolume, and we wouldn't be able to match up the different dirent and
inode versions due to crossing the snapshot ID boundary.
That means that when we rename a subvolume, that's been snapshotted, the
older version of the subvolume root will become dangling - it won't have
a dirent that points to it.
That's expected, we just need to tell fsck that this is ok.
Fixes: https://github.com/koverstreet/bcachefs/issues/856 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
io_uring/rsrc: ensure segments counts are correct on kbuf buffers
kbuf imports have the front offset adjusted and segments removed, but
the tail segments are still included in the segment count that gets
passed in the iov_iter. As the segments aren't necessarily all the
same size, move importing to a separate helper and iterate the
mapped length to get an exact count.
Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull fwctl fixes from Jason Gunthorpe:
"Three small changes from further build testing:
- Don't rely on the userspace uuid.h for the uapi header
- Fix sparse warnings in pds
- Typo in log message"
* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
fwctl: Fix repeated device word in log message
pds_fwctl: Fix type and endian complaints
fwctl/cxl: Fix uuid_t usage in uapi
Merge tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All are device-specific like quirks, new
IDs, and other safe (or rather boring) changes"
* tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw version
ASoC: codecs: Add of_match_table for aw888081 driver
ASoC: fsl: fsl_qmc_audio: Reset audio data pointers on TRIGGER_START event
mailmap: Add entry for Srinivas Kandagatla
MAINTAINERS: use kernel.org alias
ASoC: cs42l43: Reset clamp override on jack removal
ALSA: hda/realtek - Fixed ASUS platform headset Mic issue
ALSA: hda/cirrus_scodec_test: Don't select dependencies
ALSA: azt2320: Replace deprecated strcpy() with strscpy()
ASoC: hdmi-codec: use RTD ID instead of DAI ID for ELD entry
ASoC: Intel: avs: Constrain path based on BE capabilities
ALSA: hda/tas2781: Remove unnecessary NULL check before release_firmware()
ASoC: Intel: avs: Fix null-ptr-deref in avs_component_probe()
ASoC: fsl_asrc_dma: get codec or cpu dai from backend
ASoC: qcom: Fix sc7280 lpass potential buffer overflow
ASoC: dwc: always enable/disable i2s irqs
ASoC: Intel: sof_sdw: Add quirk for Asus Zenbook S16
ASoC: codecs:lpass-wsa-macro: Fix logic of enabling vi channels
ASoC: codecs:lpass-wsa-macro: Fix vi feedback rate
Merge tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
"Fixes:
- amd/pmf: Fix STT limits
- asus-laptop: Fix an uninitialized variable
- intel_pmc_ipc: Allow building without ACPI
- mlxbf-bootctl: Use sysfs_emit_at() in secure_boot_fuse_state_show()
- msi-wmi-platform: Add locking to workaround ACPI firmware bug
New HW support:
- alienware-wmi-wmax:
- Extended thermal control support to:
- Alienware Area-51m R2
- Alienware m16 R1
- Alienware m16 R2
- Dell G16 7630
- Dell G5 5505 SE
- G-Mode support to Alienware m16 R1
- x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data"
* tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug
platform/x86: msi-wmi-platform: Rename "data" variable
platform/x86: alienware-wmi-wmax: Extend support to more laptops
platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1
platform/x86: amd: pmf: Fix STT limits
mlxbf-bootctl: use sysfs_emit_at() in secure_boot_fuse_state_show()
platform/x86: x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data
platform/x86: x86-android-tablets: Add "9v" to Vexia EDU ATLA 10 tablet symbols
asus-laptop: Fix an uninitialized variable
platform/x86: intel_pmc_ipc: add option to build without ACPI
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Small drivers fixes, except for ufs which has two large updates, one
for exposing the device level feature, which is a new addition to the
device spec and the other reworking the exynos driver to fix coherence
issues on some android phones"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: megaraid_sas: Driver version update to 07.734.00.00-rc1
scsi: megaraid_sas: Block zero-length ATA VPD inquiry
scsi: scsi_transport_srp: Replace min/max nesting with clamp()
scsi: ufs: core: Add device level exception support
scsi: ufs: core: Rename ufshcd_wb_presrv_usrspc_keep_vcc_on()
scsi: smartpqi: Use is_kdump_kernel() to check for kdump
scsi: pm80xx: Set phy_attached to zero when device is gone
scsi: ufs: exynos: gs101: Put UFS device in reset on .suspend()
scsi: ufs: exynos: Move phy calls to .exit() callback
scsi: ufs: exynos: Enable PRDT pre-fetching with UFSHCD_CAP_CRYPTO
scsi: ufs: exynos: Ensure consistent phy reference counts
scsi: ufs: exynos: Disable iocc if dma-coherent property isn't set
scsi: ufs: exynos: Move UFS shareability value to drvdata
scsi: ufs: exynos: Ensure pre_link() executes before exynos_ufs_phy_init()
scsi: iscsi: Fix missing scsi_host_put() in error path
scsi: ufs: core: Fix a race condition related to device commands
scsi: hisi_sas: Fix I/O errors caused by hardware port ID changes
scsi: hisi_sas: Enable force phy when SATA disk directly connected
Merge tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:
- Fix how sense data from the sense data for successfull NCQ commands
log page is used to fully initialize the result_tf of a completed
command, so that the sense data returned to the scsi layer is fully
initialized with all the device provided information (from Niklas)
* tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-sata: Save all fields from sense data descriptor
Merge tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS fixes from Carlos Maiolino:
"This mostly includes fixes and documentation for the zoned allocator
feature merged during previous merge window, but it also adds a sysfs
tunable for the zone garbage collector.
There is also a fix for a regression to the RT device that we'd like
to fix ASAP now that we're getting more users on the RT zoned
allocator"
* tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: document zoned rt specifics in admin-guide
xfs: fix fsmap for internal zoned devices
xfs: Fix spelling mistake "drity" -> "dirty"
xfs: compute buffer address correctly in xmbuf_map_backing_mem
xfs: add tunable threshold parameter for triggering zone GC
xfs: mark xfs_buf_free as might_sleep()
xfs: remove the leftover xfs_{set,clear}_li_failed infrastructure
Merge tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- handle encoded read ioctl returning EAGAIN so it does not mistakenly
free the work structure
- escape subvolume path in mount option list so it cannot be wrongly
parsed when the path contains ","
- remove folio size assertions when writing super block to device with
enabled large folios
* tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: remove folio order ASSERT()s in super block writeback path
btrfs: correctly escape subvol in btrfs_show_options()
btrfs: ioctl: don't free iov when btrfs_encoded_read() returns -EAGAIN
Merge tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Stable fix adding zero initialization of slab->obj_ext to prevent
crashes with allocation profiling (Suren Baghdasaryan)
* tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: ensure slab->obj_exts is clear in a newly allocated slab page
Rafael J. Wysocki [Thu, 17 Apr 2025 15:55:09 +0000 (17:55 +0200)]
Merge tag 'amd-pstate-v6.15-2025-04-15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate content for 6.15 (4/15/25) from Mario Limonciello:
"Add a fix for X3D processors where depending upon what BIOS was
set initially rankings might be set improperly.
Add a fix for changing min/max limits while on the performance
governor."
* tag 'amd-pstate-v6.15-2025-04-15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor
Rafael J. Wysocki [Wed, 16 Apr 2025 14:12:37 +0000 (16:12 +0200)]
cpufreq: Avoid using inconsistent policy->min and policy->max
Since cpufreq_driver_resolve_freq() can run in parallel with
cpufreq_set_policy() and there is no synchronization between them,
the former may access policy->min and policy->max while the latter
is updating them and it may see intermediate values of them due
to the way the update is carried out. Also the compiler is free
to apply any optimizations it wants both to the stores in
cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
which may result in additional inconsistencies.
To address this, use WRITE_ONCE() when updating policy->min and
policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
in cpufreq_set_policy() to avoid storing intermediate values in
policy->min and policy->max with the help of the observation that
their new values are expected to be properly ordered upfront.
Also modify cpufreq_driver_resolve_freq() to take the possible reverse
ordering of policy->min and policy->max, which may happen depending on
the ordering of operations when this function and cpufreq_set_policy()
run concurrently, into account by always honoring the max when it
turns out to be less than the min (in case it comes from thermal
throttling or similar).
Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements") Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/5907080.DvuYhMxLoT@rjwysocki.net
Rafael J. Wysocki [Tue, 15 Apr 2025 10:00:52 +0000 (12:00 +0200)]
cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).
Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().
Rafael J. Wysocki [Tue, 15 Apr 2025 09:59:15 +0000 (11:59 +0200)]
cpufreq/sched: Explicitly synchronize limits_changed flag handling
The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.
Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.
Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.
To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.
Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change") Cc: 5.3+ <stable@vger.kernel.org> # 5.3+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net
Rafael J. Wysocki [Tue, 15 Apr 2025 09:58:08 +0000 (11:58 +0200)]
cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.
However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).
Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.
Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update") Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/ Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net
The QDMA packet scheduler suffers from a performance issue.
Fix this by picking up changes from MediaTek's SDK which change to use
Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration.
net: ethernet: mtk_eth_soc: reapply mdc divider on reset
In the current method, the MDC divider was reset to the default setting
of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC
divider configuration function in mtk_hw_init() after reset.
Paolo Abeni [Thu, 17 Apr 2025 13:20:41 +0000 (15:20 +0200)]
Merge tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fix for net
The following batch contains one Netfilter fix for net:
1) conntrack offload bit is erroneously unset in a race scenario,
from Florian Westphal.
netfilter pull request 25-04-17
* tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: fix erronous removal of offload bit
====================
Pavel Begunkov [Thu, 17 Apr 2025 09:32:33 +0000 (10:32 +0100)]
io_uring/rsrc: refactor io_import_fixed
io_import_fixed is a mess. Even though we know the final len of the
iterator, we still assign offset + len and do some magic after to
correct for that.
Do offset calculation first and finalise it with iov_iter_bvec at the
end.
Pavel Begunkov [Thu, 17 Apr 2025 09:32:32 +0000 (10:32 +0100)]
io_uring/rsrc: separate kbuf offset adjustments
Kernel registered buffers are special because segments are not uniform
in size, and we have a bunch of optimisations based on that uniformity
for normal buffers. Handle kbuf separately, it'll be cleaner this way.
Pavel Begunkov [Thu, 17 Apr 2025 09:32:31 +0000 (10:32 +0100)]
io_uring/rsrc: don't skip offset calculation
Don't optimise for requests with offset=0. Large registered buffers are
the preference and hence the user is likely to pass an offset, and the
adjustments are not expensive and will be made even cheaper in following
patches.
Kan Liang [Tue, 15 Apr 2025 11:44:07 +0000 (11:44 +0000)]
perf/x86/intel: Add Panther Lake support
From PMU's perspective, Panther Lake is similar to the previous
generation Lunar Lake. Both are hybrid platforms, with e-core and
p-core.
The key differences are the ARCH PEBS feature and several new events.
The ARCH PEBS is supported in the following patches.
The new events will be supported later in perf tool.
Share the code path with the Lunar Lake. Only update the name.
Dapeng Mi [Tue, 15 Apr 2025 10:41:35 +0000 (10:41 +0000)]
perf/x86/intel: Allow to update user space GPRs from PEBS records
Currently when a user samples user space GPRs (--user-regs option) with
PEBS, the user space GPRs actually always come from software PMI
instead of from PEBS hardware. This leads to the sampled GPRs to
possibly be inaccurate for single PEBS record case because of the
skid between counter overflow and GPRs sampling on PMI.
For the large PEBS case, it is even worse. If user sets the
exclude_kernel attribute, large PEBS would be used to sample user space
GPRs, but since PEBS GPRs group is not really enabled, it leads to all
samples in the large PEBS record to share the same piece of user space
GPRs, like this reproducer shows:
So enable GPRs group for user space GPRs sampling and prioritize reading
GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single
PEBS record case), perf_sample_regs_user() modifies them to user space
GPRs.
[ mingo: Clarified the changelog. ]
Fixes: c22497f5838c ("perf/x86/intel: Support adaptive PEBS v4") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com
Dapeng Mi [Tue, 15 Apr 2025 10:41:34 +0000 (10:41 +0000)]
perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
The below code would always unconditionally clear other status bits like
perf metrics overflow bit once PEBS buffer overflows:
status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
This is incorrect. Perf metrics overflow bit should be cleared only when
fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow
could be missed to handle.
Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/ Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com