Ard Biesheuvel [Tue, 7 Jan 2025 15:18:27 +0000 (16:18 +0100)]
x86/sev: Disable ftrace branch profiling in SEV startup code
Ftrace branch profiling inserts absolute references to its metadata at
call sites, and this implies that this kind of instrumentation cannot be
used while executing from the 1:1 mapping of memory.
Therefore, disable ftrace branch profiling in the SEV startup routines,
by disabling it for the entire SEV core source file.
David Woodhouse [Thu, 9 Jan 2025 14:04:20 +0000 (14:04 +0000)]
x86/kexec: Cope with relocate_kernel() not being at the start of the page
A few places in the kexec control code page make the assumption that the first
instruction of relocate_kernel is at the very start of the page.
To allow for Clang CFI information to be added to relocate_kernel(), as well
as the general principle of removing unwarranted assumptions, fix them to use
the external __relocate_kernel_start symbol that the linker adds. This means
using a separate addq and subq for calculating offsets, as the assembler can
no longer calculate the delta directly for itself and relocations aren't that
versatile. But those values can at least be used relative to a local label to
avoid absolute relocations.
Turn the jump from relocate_kernel() to identity_mapped() into a real indirect
'jmp *%rsi' too, while touching it. There was no real reason for it to be
a push+ret in the first place, and adding Clang CFI info will also give
objtool enough visibility to start complaining 'return with modified stack
frame' about it.
Rafael J. Wysocki [Thu, 9 Jan 2025 14:04:19 +0000 (14:04 +0000)]
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
The KEXEC_JUMP flow is analogous to hibernation flows occurring before
and after creating an image and before and after jumping from the
restore kernel to the image one, which is why it uses the same device
callbacks as those hibernation flows.
Add comments explaining that to the code in question and update an
existing comment in it which appears a bit out of context.
David Woodhouse [Thu, 9 Jan 2025 14:04:18 +0000 (14:04 +0000)]
x86/kexec: Mark machine_kexec() with __nocfi
A recent commit caused the relocate_kernel() function to be invoked through
a function pointer, but it does not have CFI information. The resulting trap
occurs after the IDT and GDT have been invalidated, leading to a triple-fault
if CONFIG_CFI_CLANG is enabled.
Using SYM_TYPED_FUNC_START() to provide the CFI information looks like it will
require a prolonged battle with objtool. And is fairly pointless anyway, as
the actual signature comes from a __kcfi_typeid_… symbol emitted from the
C code based on the function prototype it thinks that relocate_kernel has,
rendering the check somewhat tautological.
The simple fix is just to mark machine_kexec() with __nocfi.
Fixes: eeebbde57113 ("x86/kexec: Invoke copy of relocate_kernel() instead of the original") Reported-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250109140757.2841269-7-dwmw2@infradead.org
Nathan Chancellor [Thu, 9 Jan 2025 14:04:17 +0000 (14:04 +0000)]
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
After commit
cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),
kernels configured with an option that uses -ffunction-sections, such as
CONFIG_LTO_CLANG, crash when kexecing because the value of relocate_kernel
does not match the value of __relocate_kernel_start so incorrect code gets
copied via machine_kexec_prepare().
When -ffunction-sections is enabled, TEXT_MAIN matches on
'.text.[0-9a-zA-Z_]*' to coalesce the function specific functions back
into .text during link time after they have been optimized. Due to the
placement of TEXT_TEXT before KEXEC_RELOCATE_KERNEL in the x86 linker
script, the .text.relocate_kernel section ends up in .text instead of
.data.
Use a second dot in the relocate_kernel section name to avoid matching
on TEXT_MAIN, which matches a similar situation that happened in
commit
79cd2a11224e ("x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG"),
which allows kexec to function properly.
While .data.relocate_kernel still ends up in the .data section via
DATA_MAIN -> DATA_DATA, ensure it is located with the
.text.relocate_kernel section as intended by performing the same
transformation.
Fixes: cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section") Fixes: 8dbec5c77bc3 ("x86/kexec: Add data section to relocate_kernel") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250109140757.2841269-6-dwmw2@infradead.org
David Woodhouse [Thu, 9 Jan 2025 14:04:16 +0000 (14:04 +0000)]
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
A ::preserve_context kimage can be invoked more than once, and the entry point
can be different every time. When the callee returns to the kernel, it leaves
the address of its entry point for next time on the stack.
That being the case, one might reasonably assume that the caller would
allocate space for it on the stack frame before actually performing the 'call'
into the callee.
Apparently not, though. Ever since the kjump code was first added in 2009, it
has set up a *new* stack at the top of the swap_page scratch page, then just
performed the 'call' without allocating any space for the re-entry address to
be returned. It then reads the re-entry point for next time from 0(%rsp) which
is actually the first qword of the page *after* the swap page, which might not
exist at all! And if the callee has written to that, then it will have
corrupted memory it doesn't own.
Correct this by pushing the entry point of the callee onto the stack before
calling it. The callee may then adjust it, or not, as it sees fit, and
subsequent invocations should work correctly either way.
Remove a stray push of zero to the *relocate_kernel* stack, which may have
been intended for this purpose, but which was actually just noise.
Also, loading the stack for the callee relied on the address of the swap page
being in %r10 without ever documenting that fact. Recent code changes made
that no longer true, so load it directly from the local kexec_pa_swap_page
variable instead.
David Woodhouse [Thu, 9 Jan 2025 14:04:15 +0000 (14:04 +0000)]
x86/kexec: Use correct swap page in swap_pages function
The swap_pages function expects the swap page to be in %r10, but there
was no documentation to that effect. Once upon a time the setup code
used to load its value from a kernel virtual address and save it to an
address which is accessible in the identity-mapped page tables, and
*happened* to use %r10 to do so, with no comment that it was left there
on *purpose* instead of just being a scratch register. Once that was no
longer necessary, %r10 just holds whatever the kernel happened to leave
in it.
Now that the original value passed by the kernel is accessible via
%rip-relative addressing, load directly from there instead of using %r10
for it. But document the other parameters that the swap_pages function
*does* expect in registers.
David Woodhouse [Thu, 9 Jan 2025 14:04:14 +0000 (14:04 +0000)]
x86/kexec: Ensure preserve_context flag is set on return to kernel
The swap_pages() function will only actually *swap*, as its name implies, if
the preserve_context flag in the %r11 register is non-zero. On the way back
from a ::preserve_context kexec, ensure that the %r11 register is non-zero so
that the pages get swapped back.
David Woodhouse [Thu, 9 Jan 2025 14:04:13 +0000 (14:04 +0000)]
x86/kexec: Disable global pages before writing to control page
The kernel switches to a new set of page tables during kexec. The global
mappings (_PAGE_GLOBAL==1) can remain in the TLB after this switch. This
is generally not a problem because the new page tables use a different
portion of the virtual address space than the normal kernel mappings.
The critical exception to that generalisation (and the only mapping
which isn't an identity mapping) is the kexec control page itself —
which was ROX in the original kernel mapping, but should be RWX in the
new page tables. If there is a global TLB entry for that in its prior
read-only state, it definitely needs to be flushed before attempting to
write through that virtual mapping.
It would be possible to just avoid writing to the virtual address of the
page and defer all writes until they can be done through the identity
mapping. But there's no good reason to keep the old TLB entries around,
as they can cause nothing but trouble.
Clear the PGE bit in %cr4 early, before storing data in the control page.
Fixes: 5a82223e0743 ("x86/kexec: Mark relocate_kernel page as ROX instead of RWX") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219592 Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: "Ning, Hongyu" <hongyu.ning@linux.intel.com> Co-developed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: "Ning, Hongyu" <hongyu.ning@linux.intel.com> Link: https://lore.kernel.org/r/20250109140757.2841269-2-dwmw2@infradead.org
Ard Biesheuvel [Mon, 6 Jan 2025 15:57:46 +0000 (16:57 +0100)]
x86/sev: Don't hang but terminate on failure to remap SVSM CA
Commit
09d35045cd0f ("x86/sev: Avoid WARN()s and panic()s in early boot code")
replaced a panic() that could potentially hit before the kernel is even
mapped with a deadloop, to ensure that execution does not proceed when the
condition in question hits.
As Tom suggests, it is better to terminate and return to the hypervisor
in this case, using a newly invented failure code to describe the
failure condition.
Ard Biesheuvel [Wed, 1 Jan 2025 11:51:20 +0000 (12:51 +0100)]
x86/sev: Disable UBSAN on SEV code that may execute very early
Clang 14 and older may emit UBSAN instrumentation into code that is
inlined into functions marked with __no_sanitize_undefined¹. This may
result in faults when the code is executed very early, which may be the
case for functions annotated as __head. Now that this requirement is
strictly enforced, the build will fail in this case with the following
message
Absolute reference to symbol '.data' not permitted in .head.text
Work around this by disabling UBSAN instrumentation on all SEV core
code.
Ard Biesheuvel [Mon, 9 Dec 2024 09:41:06 +0000 (10:41 +0100)]
x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
In __startup_64(), the bool 'la57' can only assume the 'true' value if
CONFIG_X86_5LEVEL is enabled in the build, and generally, the compiler
can make this inference at build time, and elide any references to the
symbol 'level4_kernel_pgt', which may be undefined if 'la57' is false.
As it turns out, GCC 12 gets this wrong sometimes, and gives up with a
build error:
ld: arch/x86/kernel/head64.o: in function `__startup_64':
head64.c:(.head.text+0xbd): undefined reference to `level4_kernel_pgt'
even though the reference is in unreachable code. Fix this by
duplicating the IS_ENABLED(CONFIG_X86_5LEVEL) in the conditional that
tests the value of 'la57'.
Thomas Weißschuh [Mon, 2 Dec 2024 19:43:24 +0000 (20:43 +0100)]
x86/sysfs: Constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
David Woodhouse [Thu, 5 Dec 2024 15:05:19 +0000 (15:05 +0000)]
x86/kexec: Mark relocate_kernel page as ROX instead of RWX
All writes to the page now happen before it gets marked as executable
(or after it's already switched to the identmap page tables where it's
OK to be RWX).
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20241205153343.3275139-14-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:18 +0000 (15:05 +0000)]
x86/kexec: Clean up register usage in relocate_kernel()
The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%r8 for other things, just leave it in %r8 and use other registers
instead.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-13-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:17 +0000 (15:05 +0000)]
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, which means the DATA()
macro with its pointer arithmetic can also now be removed.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-12-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:16 +0000 (15:05 +0000)]
x86/kexec: Drop page_list argument from relocate_kernel()
The kernel's virtual mapping of the relocate_kernel page currently needs
to be RWX because it is written to before the %cr3 switch.
Now that the relocate_kernel page has its own .data section and local
variables, it can also have *global* variables. So eliminate the separate
page_list argument, and write the same information directly to variables
in the relocate_kernel page instead. This way, the relocate_kernel code
itself doesn't need to copy it.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-11-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:14 +0000 (15:05 +0000)]
x86/kexec: Move relocate_kernel to kernel .data section
Now that the copy is executed instead of the original, the relocate_kernel
page can live in the kernel's .text section. This will allow subsequent
commits to actually add real data to it and clean up the code somewhat as
well as making the control page ROX.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-9-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:13 +0000 (15:05 +0000)]
x86/kexec: Invoke copy of relocate_kernel() instead of the original
This currently calls set_memory_x() from machine_kexec_prepare() just
like the 32-bit version does. That's actually a bit earlier than I'd
like, as it leaves the page RWX all the time the image is even *loaded*.
Subsequent commits will eliminate all the writes to the page between the
point it's marked executable in machine_kexec_prepare() the time that
relocate_kernel() is running and has switched to the identmap %cr3, so
that it can be ROX. But that can't happen until it's moved to the .data
section of the kernel, and *that* can't happen until we start executing
the copy instead of executing it in place in the kernel .text. So break
the circular dependency in those commits by letting it be RWX for now.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-8-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:12 +0000 (15:05 +0000)]
x86/kexec: Copy control page into place in machine_kexec_prepare()
There's no need for this to wait until the actual machine_kexec() invocation;
future changes will need to make the control page read-only and executable,
so all writes should be completed before machine_kexec_prepare() returns.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-7-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:11 +0000 (15:05 +0000)]
x86/kexec: Allocate PGD for x86_64 transition page tables separately
Now that the following fix:
d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.
( It's not clear that that was the reason for x86_64 kexec doing it that
way in the first place either; there were no comments to that effect and
it seems to have been the case even before PTI came along. It looks like
it was just a happy accident which prevented memory corruption on kexec. )
Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:10 +0000 (15:05 +0000)]
x86/kexec: Only swap pages for ::preserve_context mode
There's no need to swap pages (which involves three memcopies for each
page) in the plain kexec case. Just do a single copy from source to
destination page.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-5-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:08 +0000 (15:05 +0000)]
x86/kexec: Clean up and document register use in relocate_kernel_64.S
Add more comments explaining what each register contains, and save the
preserve_context flag to a non-clobbered register sooner, to keep things
simpler.
No change in behavior intended.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Kai Huang <kai.huang@intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205153343.3275139-3-dwmw2@infradead.org
David Woodhouse [Thu, 5 Dec 2024 15:05:07 +0000 (15:05 +0000)]
x86/kexec: Restore GDT on return from ::preserve_context kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Fernando Fernandez Mancera [Mon, 2 Dec 2024 14:58:45 +0000 (14:58 +0000)]
x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
The rework of possible CPUs management erroneously disabled SMP when the
IO/APIC is disabled either by the 'noapic' command line parameter or during
IO/APIC setup. SMP is possible without IO/APIC.
Remove the ioapic_is_disabled conditions from the relevant possible CPU
management code paths to restore the orgininal behaviour.
Fixes: 7c0edad3643f ("x86/cpu/topology: Rework possible CPU management") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241202145905.1482-1-ffmancera@riseup.net
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:12 +0000 (12:28 +0100)]
x86/boot: Reject absolute references in .head.text
The .head.text section used to contain asm code that bootstrapped the
page tables and switched to the kernel virtual address space before
executing C code. The asm code carefully avoided dereferencing absolute
symbol references, as those will fault before the page tables are
installed.
Today, the .head.text section contains lots of C code too, and getting
the compiler to reason about absolute addresses taken from, e.g.,
section markers such as _text[] or _end[] but never use such absolute
references to access global variables [*] is intractible.
So instead, forbid the use of absolute references in .head.text
entirely, and rely on explicit arithmetic involving VA-to-PA offsets
generated by the asm startup code to construct virtual addresses where
needed (e.g., to construct the page tables).
Note that the 'relocs' tool is only used on the core kernel image when
building a relocatable image, but this is the default, and so adding the
check there is sufficient to catch new occurrences of code that use
absolute references before the kernel mapping is up.
[*] it is feasible when using PIC codegen but there is strong pushback
to using this for all of the core kernel, and using it only for
.head.text is not straight-forward.
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:11 +0000 (12:28 +0100)]
x86/boot: Move .head.text into its own output section
In order to be able to double check that vmlinux is emitted without
absolute symbol references in .head.text, it needs to be distinguishable
from the rest of .text in the ELF metadata.
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:10 +0000 (12:28 +0100)]
x86/kernel: Move ENTRY_TEXT to the start of the image
Since commit:
7734a0f31e99 ("x86/boot: Robustify calling startup_{32,64}() from the decompressor code")
it is no longer necessary for .head.text to appear at the start of the
image. Since ENTRY_TEXT needs to appear PMD-aligned, it is easier to
just place it at the start of the image, rather than line it up with the
end of the .text section. The amount of padding required should be the
same, but this arrangement also permits .head.text to be split off and
emitted separately, which is needed by a subsequent change.
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:09 +0000 (12:28 +0100)]
x86/boot: Disable UBSAN in early boot code
The early boot code runs from a 1:1 mapping of memory, and may execute
before the kernel virtual mapping is even up. This means absolute symbol
references cannot be permitted in this code.
UBSAN injects references to global data structures into the code, and
without -fPIC, those references are emitted as absolute references to
kernel virtual addresses. Accessing those will fault before the kernel
virtual mapping is up, so UBSAN needs to be disabled in early boot code.
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:08 +0000 (12:28 +0100)]
x86/boot/64: Avoid intentional absolute symbol references in .head.text
The code in .head.text executes from a 1:1 mapping and cannot generally
refer to global variables using their kernel virtual addresses. However,
there are some occurrences of such references that are valid: the kernel
virtual addresses of _text and _end are needed to populate the page
tables correctly, and some other section markers are used in a similar
way.
To avoid the need for making exceptions to the rule that .head.text must
not contain any absolute symbol references, derive these addresses from
the RIP-relative 1:1 mapped physical addresses, which can be safely
determined using RIP_REL_REF().
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:07 +0000 (12:28 +0100)]
x86/boot/64: Determine VA/PA offset before entering C code
Implicit absolute symbol references (e.g., taking the address of a
global variable) must be avoided in the C code that runs from the early
1:1 mapping of the kernel, given that this is a practice that violates
assumptions on the part of the toolchain. I.e., RIP-relative and
absolute references are expected to produce the same values, and so the
compiler is free to choose either. However, the code currently assumes
that RIP-relative references are never emitted here.
So an explicit virtual-to-physical offset needs to be used instead to
derive the kernel virtual addresses of _text and _end, instead of simply
taking the addresses and assuming that the compiler will not choose to
use a RIP-relative references in this particular case.
Currently, phys_base is already used to perform such calculations, but
it is derived from the kernel virtual address of _text, which is taken
using an implicit absolute symbol reference. So instead, derive this
VA-to-PA offset in asm code, and pass it to the C startup code.
Ard Biesheuvel [Thu, 5 Dec 2024 11:28:06 +0000 (12:28 +0100)]
x86/sev: Avoid WARN()s and panic()s in early boot code
Using WARN() or panic() while executing from the early 1:1 mapping is
unlikely to do anything useful: the string literals are passed using
their kernel virtual addresses which are not even mapped yet. But even
if they were, calling into the printk() machinery from the early 1:1
mapped code is not going to get very far.
So drop the WARN()s entirely, and replace panic() with a deadloop.
David Woodhouse [Wed, 4 Dec 2024 11:27:14 +0000 (11:27 +0000)]
x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups
respectively) assume that the root page table is actually a 8KiB allocation,
with the userspace root immediately after the kernel root page table (so that
the former can enforce NX on on all the subordinate page tables, which are
actually shared).
However, users of the kernel_ident_mapping_init() code do not give it an 8KiB
allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset()
allocate only a single 4KiB page. The kexec code on x86_64 currently gets
away with it purely by chance, because it allocates 8KiB for its "control
code page" and then actually uses the first half for the PGD, then copies the
actual trampoline code into the second half only after the identmap code has
finished scribbling over it.
Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as
_PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is
exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to
write to the userspace 'shadow' PGD.
Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the
actual page tables; since __pti_set_user_pgtbl() returns the value to be
written to the kernel page table, it could be filtered out. But there seems
to be no benefit to actually doing so.
Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/412c90a4df7aef077141d9f68d19cbe5602d6c6d.camel@infradead.org Cc: stable@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com>
Linus Torvalds [Tue, 3 Dec 2024 19:02:17 +0000 (11:02 -0800)]
Merge tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- add lockdep annotations for io_uring/encoded read integration, inode
lock is held when returning to userspace
- properly reflect experimental config option to sysfs
- handle NULL root in case the rescue mode accepts invalid/damaged tree
roots (rescue=ibadroot)
- regression fix of a deadlock between transaction and extent locks
- fix pending bio accounting bug in encoded read ioctl
- fix NOWAIT mode when checking references for NOCOW files
- fix use-after-free in a rb-tree cleanup in ref-verify debugging tool
* tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lockdep warnings on io_uring encoded reads
btrfs: ref-verify: fix use-after-free after invalid ref action
btrfs: add a sanity check for btrfs root in btrfs_search_slot()
btrfs: don't loop for nowait writes when checking for cross references
btrfs: sysfs: advertise experimental features only if CONFIG_BTRFS_EXPERIMENTAL=y
btrfs: fix deadlock between transaction commits and extent locks
btrfs: fix use-after-free in btrfs_encoded_read_endio()
Linus Torvalds [Tue, 3 Dec 2024 18:50:22 +0000 (10:50 -0800)]
Merge tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and udf fixes from Jan Kara:
"Two small UDF fixes for better handling of corrupted filesystem and a
quota fix to fix handling of filesystem freezing"
* tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Verify inode link counts before performing rename
udf: Skip parent dir link count update if corrupted
quota: flush quota_release_work upon quota writeback
Linus Torvalds [Tue, 3 Dec 2024 18:46:49 +0000 (10:46 -0800)]
Merge tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- Use xchg() in xlog_cil_insert_pcp_aggregate()
- Fix ABBA deadlock on a race between mount and log shutdown
- Fix quota softlimit incoherency on delalloc
- Fix sparse inode limits on runt AG
- remove unknown compat feature checks in SB write valdation
- Eliminate a lockdep false positive
* tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
xfs: Use xchg() in xlog_cil_insert_pcp_aggregate()
xfs: prevent mount and log shutdown race
xfs: delalloc and quota softlimit timers are incoherent
xfs: fix sparse inode limits on runt AG
xfs: remove unknown compat feature check in superblock write validation
xfs: eliminate lockdep false positives in xfs_attr_shortform_list
Masahiro Yamada [Tue, 3 Dec 2024 10:21:07 +0000 (19:21 +0900)]
module: Convert default symbol namespace to string literal
Commit cdd30ebb1b9f ("module: Convert symbol namespace to string
literal") only converted MODULE_IMPORT_NS() and EXPORT_SYMBOL_NS(),
leaving DEFAULT_SYMBOL_NAMESPACE as a macro expansion.
This commit converts DEFAULT_SYMBOL_NAMESPACE in the same way to avoid
annoyance for the default namespace as well.
Masahiro Yamada [Tue, 3 Dec 2024 10:21:05 +0000 (19:21 +0900)]
scripts/nsdeps: get 'make nsdeps' working again
Since commit cdd30ebb1b9f ("module: Convert symbol namespace to string
literal"), when MODULE_IMPORT_NS() is missing, 'make nsdeps' inserts
pointless code:
MODULE_IMPORT_NS("ns");
Here, "ns" is not a namespace, but the variable in the semantic patch.
It must not be quoted. Instead, a string literal must be passed to
Coccinelle.
Fixes: cdd30ebb1b9f ("module: Convert symbol namespace to string literal") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aruna Ramakrishna [Tue, 19 Nov 2024 17:45:20 +0000 (17:45 +0000)]
x86/pkeys: Ensure updated PKRU value is XRSTOR'd
When XSTATE_BV[i] is 0, and XRSTOR attempts to restore state component
'i' it ignores any value in the XSAVE buffer and instead restores the
state component's init value.
This means that if XSAVE writes XSTATE_BV[PKRU]=0 then XRSTOR will
ignore the value that update_pkru_in_sigframe() writes to the XSAVE buffer.
XSTATE_BV[PKRU] only gets written as 0 if PKRU is in its init state. On
Intel CPUs, basically never happens because the kernel usually
overwrites the init value (aside: this is why we didn't notice this bug
until now). But on AMD, the init tracker is more aggressive and will
track PKRU as being in its init state upon any wrpkru(0x0).
Unfortunately, sig_prepare_pkru() does just that: wrpkru(0x0).
This writes XSTATE_BV[PKRU]=0 which makes XRSTOR ignore the PKRU value
in the sigframe.
To fix this, always overwrite the sigframe XSTATE_BV with a value that
has XSTATE_BV[PKRU]==1. This ensures that XRSTOR will not ignore what
update_pkru_in_sigframe() wrote.
The problematic sequence of events is something like this:
Userspace does:
* wrpkru(0xffff0000) (or whatever)
* Hardware sets: XINUSE[PKRU]=1
Signal happens, kernel is entered:
* sig_prepare_pkru() => wrpkru(0x00000000)
* Hardware sets: XINUSE[PKRU]=0 (aggressive AMD init tracker)
* XSAVE writes most of XSAVE buffer, including
XSTATE_BV[PKRU]=XINUSE[PKRU]=0
* update_pkru_in_sigframe() overwrites PKRU in XSAVE buffer
... signal handling
* XRSTOR sees XSTATE_BV[PKRU]==0, ignores just-written value
from update_pkru_in_sigframe()
Fixes: 70044df250d0 ("x86/pkeys: Update PKRU to enable all pkeys before XSAVE") Suggested-by: Rudi Horn <rudi.horn@oracle.com> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241119174520.3987538-3-aruna.ramakrishna%40oracle.com
Aruna Ramakrishna [Tue, 19 Nov 2024 17:45:19 +0000 (17:45 +0000)]
x86/pkeys: Change caller of update_pkru_in_sigframe()
update_pkru_in_sigframe() will shortly need some information which
is only available inside xsave_to_user_sigframe(). Move
update_pkru_in_sigframe() inside the other function to make it
easier to provide it that information.
Peter Zijlstra [Mon, 2 Dec 2024 14:59:47 +0000 (15:59 +0100)]
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Linus Torvalds [Sun, 1 Dec 2024 23:12:43 +0000 (15:12 -0800)]
Get rid of 'remove_new' relic from platform driver struct
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Linus Torvalds [Sun, 1 Dec 2024 21:38:24 +0000 (13:38 -0800)]
Merge tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c component probing support from Wolfram Sang:
"Add OF component probing.
Some devices are designed and manufactured with some components having
multiple drop-in replacement options. These components are often
connected to the mainboard via ribbon cables, having the same signals
and pin assignments across all options. These may include the display
panel and touchscreen on laptops and tablets, and the trackpad on
laptops. Sometimes which component option is used in a particular
device can be detected by some firmware provided identifier, other
times that information is not available, and the kernel has to try to
probe each device.
Instead of a delicate dance between drivers and device tree quirks,
this change introduces a simple I2C component probe function. For a
given class of devices on the same I2C bus, it will go through all of
them, doing a simple I2C read transfer and see which one of them
responds. It will then enable the device that responds"
* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
of: base: Document prefix argument for of_get_next_child_with_prefix()
i2c: Fix whitespace style issue
arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
platform/chrome: Introduce device tree hardware prober
i2c: of-prober: Add GPIO support to simple helpers
i2c: of-prober: Add simple helpers for regulator support
i2c: Introduce OF component probe function
of: base: Add for_each_child_of_node_with_prefix()
of: dynamic: Add of_changeset_update_prop_string
Linus Torvalds [Sun, 1 Dec 2024 21:10:51 +0000 (13:10 -0800)]
Merge tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bprintf() removal from Steven Rostedt:
- Remove unused bprintf() function, that was added with the rest of the
"bin-printf" functions.
These are functions that are used by trace_printk() that allows to
quickly save the format and arguments into the ring buffer without
the expensive processing of converting numbers to ASCII. Then on
output, at a much later time, the ring buffer is read and the string
processing occurs then. The bprintf() was added for consistency but
was never used. It can be safely removed.
* tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
printf: Remove unused 'bprintf'
Linus Torvalds [Sun, 1 Dec 2024 20:41:21 +0000 (12:41 -0800)]
Merge tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
- Fix a case where posix timers with a thread-group-wide target would
miss signals if some of the group's threads are exiting
- Fix a hang caused by ndelay() calling the wrong delay function
__udelay()
- Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO
(microsecond resolution) and a negative offset
* tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Target group sigqueue to current task only if not exiting
delay: Fix ndelay() spuriously treated as udelay()
ntp: Remove invalid cast in time offset math
Linus Torvalds [Sun, 1 Dec 2024 20:37:58 +0000 (12:37 -0800)]
Merge tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Move the ->select callback to the correct ops structure in
irq-mvebu-sei to fix some Marvell Armada platforms
- Add a workaround for Hisilicon ITS erratum 162100801 which can cause
some virtual interrupts to get lost
- More platform_driver::remove() conversion
* tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Switch back to struct platform_driver::remove()
irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
Linus Torvalds [Sun, 1 Dec 2024 20:35:37 +0000 (12:35 -0800)]
Merge tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a terminating zero end-element to the array describing AMD CPUs
affected by erratum 1386 so that the matching loop actually
terminates instead of going off into the weeds
- Update the boot protocol documentation to mention the fact that the
preferred address to load the kernel to is considered in the
relocatable kernel case too
- Flush the memory buffer containing the microcode patch after applying
microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns
- Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
has been disabled
* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Terminate the erratum_1386_microcode array
x86/Documentation: Update algo in init_size description of boot protocol
x86/microcode/AMD: Flush patch buffer mapping after application
x86/mm: Carve out INVLPG inline asm for use by others
x86/cpu: Fix PPIN initialization
Linus Torvalds [Sun, 1 Dec 2024 17:23:33 +0000 (09:23 -0800)]
strscpy: write destination buffer only once
The point behind strscpy() was to once and for all avoid all the
problems with 'strncpy()' and later broken "fixed" versions like
strlcpy() that just made things worse.
So strscpy not only guarantees NUL-termination (unlike strncpy), it also
doesn't do unnecessary padding at the destination. But at the same time
also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL
writes - within the size, of course - so that the whole copy can be done
with word operations.
It is also stable in the face of a mutable source string: it explicitly
does not read the source buffer multiple times (so an implementation
using "strnlen()+memcpy()" would be wrong), and does not read the source
buffer past the size (like the mis-design that is strlcpy does).
Finally, the return value is designed to be simple and unambiguous: if
the string cannot be copied fully, it returns an actual negative error,
making error handling clearer and simpler (and the caller already knows
the size of the buffer). Otherwise it returns the string length of the
result.
However, there was one final stability issue that can be important to
callers: the stability of the destination buffer.
In particular, the same way we shouldn't read the source buffer more
than once, we should avoid doing multiple writes to the destination
buffer: first writing a potentially non-terminated string, and then
terminating it with NUL at the end does not result in a stable result
buffer.
Yes, it gives the right result in the end, but if the rule for the
destination buffer was that it is _always_ NUL-terminated even when
accessed concurrently with updates, the final byte of the buffer needs
to always _stay_ as a NUL byte.
[ Note that "final byte is NUL" here is literally about the final byte
in the destination array, not the terminating NUL at the end of the
string itself. There is no attempt to try to make concurrent reads and
writes give any kind of consistent string length or contents, but we
do want to guarantee that there is always at least that final
terminating NUL character at the end of the destination array if it
existed before ]
This is relevant in the kernel for the tsk->comm[] array, for example.
Even without locking (for either readers or writers), we want to know
that while the buffer contents may be garbled, it is always a valid C
string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and
never has any "out of thin air" data).
So avoid any "copy possibly non-terminated string, and terminate later"
behavior, and write the destination buffer only once.
Dr. David Alan Gilbert [Wed, 2 Oct 2024 17:31:47 +0000 (18:31 +0100)]
printf: Remove unused 'bprintf'
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa753
("vsprintf: add binary printf") but as far as I can see was never used,
unlike the other two functions in that patch.
Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org Reviewed-by: Andy Shevchenko <andy@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sun, 1 Dec 2024 02:23:05 +0000 (18:23 -0800)]
Merge tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:
- When removing a PCI device, only look up and remove a platform device
if there is an associated device node for which there could be a
platform device, to fix a merge window regression (Brian Norris)
* tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/pwrctrl: Unregister platform device only if one actually exists
Linus Torvalds [Sun, 1 Dec 2024 02:14:56 +0000 (18:14 -0800)]
Merge tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull ima fix from Paul Moore:
"One small patch to fix a function parameter / local variable naming
snafu that went up to you in the current merge window"
* tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: uncover hidden variable in ima_match_rules()
Linus Torvalds [Sat, 30 Nov 2024 23:47:29 +0000 (15:47 -0800)]
Merge tag 'block-6.13-20242901' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- NVMe pull request via Keith:
- Use correct srcu list traversal (Breno)
- Scatter-gather support for metadata (Keith)
- Fabrics shutdown race condition fix (Nilay)
- Persistent reservations updates (Guixin)
- Add the required bits for MD atomic write support for raid0/1/10
- Correct return value for unknown opcode in ublk
- Fix deadlock with zone revalidation
- Fix for the io priority request vs bio cleanups
- Use the correct unsigned int type for various limit helpers
- Fix for a race in loop
- Cleanup blk_rq_prep_clone() to prevent uninit-value warning and make
it easier for actual humans to read
- Fix potential UAF when iterating tags
- A few fixes for bfq-iosched UAF issues
- Fix for brd discard not decrementing the allocated page count
- Various little fixes and cleanups
* tag 'block-6.13-20242901' of git://git.kernel.dk/linux: (36 commits)
brd: decrease the number of allocated pages which discarded
block, bfq: fix bfqq uaf in bfq_limit_depth()
block: Don't allow an atomic write be truncated in blkdev_write_iter()
mq-deadline: don't call req_get_ioprio from the I/O completion handler
block: Prevent potential deadlock in blk_revalidate_disk_zones()
block: Remove extra part pointer NULLify in blk_rq_init()
nvme: tuning pr code by using defined structs and macros
nvme: introduce change ptpl and iekey definition
block: return bool from get_disk_ro and bdev_read_only
block: remove a duplicate definition for bdev_read_only
block: return bool from blk_rq_aligned
block: return unsigned int from blk_lim_dma_alignment_and_pad
block: return unsigned int from queue_dma_alignment
block: return unsigned int from bdev_io_opt
block: req->bio is always set in the merge code
block: don't bother checking the data direction for merges
block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor
Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()"
md/raid10: Atomic write support
md/raid1: Atomic write support
...
Linus Torvalds [Sat, 30 Nov 2024 23:43:02 +0000 (15:43 -0800)]
Merge tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
- Remove a leftover struct from when the cqwait registered waiting was
transitioned to regions.
- Fix for an issue introduced in this merge window, where nop->fd might
be used uninitialized. Ensure it's always set.
- Add capping of the task_work run in local task_work mode, to prevent
bursty and long chains from adding too much latency.
- Work around xa_store() leaving ->head non-NULL if it encounters an
allocation error during storing. Just a debug trigger, and can go
away once xa_store() behaves in a more expected way for this
condition. Not a major thing as it basically requires fault injection
to trigger it.
- Fix a few mapping corner cases
- Fix KCSAN complaint on reading the table size post unlock. Again not
a "real" issue, but it's easy to silence by just keeping the reading
inside the lock that protects it.
* tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux:
io_uring/tctx: work around xa_store() allocation error issue
io_uring: fix corner case forgetting to vunmap
io_uring: fix task_work cap overshooting
io_uring: check for overflows in io_pin_pages
io_uring/nop: ensure nop->fd is always initialized
io_uring: limit local tw done
io_uring: add io_local_work_pending()
io_uring/region: return negative -E2BIG in io_create_region()
io_uring: protect register tracing
io_uring: remove io_uring_cqwait_reg_arg
Linus Torvalds [Sat, 30 Nov 2024 23:36:17 +0000 (15:36 -0800)]
Merge tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- fix physical address calculation for struct dma_debug_entry (Fedor
Pchelkin)
* tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: fix physical address calculation for struct dma_debug_entry
Linus Torvalds [Sat, 30 Nov 2024 22:51:08 +0000 (14:51 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
- ARM fixes
- RISC-V Svade and Svadu (accessed and dirty bit) extension support for
host and guest
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test
RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM
dt-bindings: riscv: Add Svade and Svadu Entries
RISC-V: Add Svade and Svadu Extensions Support
KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
KVM: arm64: vgic: Make vgic_get_irq() more robust
KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
Linus Torvalds [Sat, 30 Nov 2024 22:45:29 +0000 (14:45 -0800)]
Merge tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"Two small fixes.
The first one by Huacai Chen addresses a runtime warning when
CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected
which occurs because the cpuinfo code on sh incorrectly uses NR_CPUS
when iterating CPUs instead of the runtime limit nr_cpu_ids.
A second fix by Dan Carpenter fixes a use-after-free bug in
register_intc_controller() which occurred as a result of improper
error handling in the interrupt controller driver code when
registering an interrupt controller during plat_irq_setup() on sh"
* tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: intc: Fix use-after-free bug in register_intc_controller()
sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Linus Torvalds [Sat, 30 Nov 2024 22:33:44 +0000 (14:33 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
on it don't run (and fail) on arm64
- Fix lockdep assert in the Arm SMMUv3 PMU driver
- Fix the port and device ID bits setting in the Arm CMN perf driver
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
perf/arm-cmn: Ensure port and device id bits are set properly
perf/arm-smmuv3: Fix lockdep assert in ->event_init()
arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests
Patryk Wlazlyn [Wed, 2 Oct 2024 13:05:15 +0000 (15:05 +0200)]
tools/power turbostat: Add RAPL psys as a built-in counter
Introduce the counter as a part of global, platform counters structure.
We open the counter for only one cpu, but otherwise treat it as an
ordinary RAPL counter, allowing for grouped perf read.
The counter is disabled by default, because it's interpretation may
require additional, platform specific information, making it unsuitable
for general use.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Add '+' to optstring when early scanning for --no-msr and --no-perf.
It causes option processing to stop as soon as a nonoption argument is
encountered, effectively skipping child's arguments.
Fixes: 3e4048466c39 ("tools/power turbostat: Add --no-msr option") Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Thu, 24 Oct 2024 13:17:45 +0000 (15:17 +0200)]
tools/power turbostat: Force --no-perf in --dump mode
Force the --no-perf early to prevent using it as a source. User asks for
raw values, but perf returns them relative to the opening of the file
descriptor.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Thu, 14 Nov 2024 07:59:46 +0000 (15:59 +0800)]
tools/power turbostat: Add support for /sys/class/drm/card1
On some machines, the graphics device is enumerated as
/sys/class/drm/card1 instead of /sys/class/drm/card0. The current
implementation does not handle this scenario, resulting in the loss of
graphics C6 residency and frequency information.
Add support for /sys/class/drm/card1, ensuring that turbostat can
retrieve and display the graphics columns for these platforms.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Thu, 14 Nov 2024 07:59:45 +0000 (15:59 +0800)]
tools/power turbostat: Cache graphics sysfs file descriptors during probe
Snapshots of the graphics sysfs knobs are taken based on file
descriptors. To optimize this process, open the files and cache the file
descriptors during the graphics probe phase. As a result, the previously
cached pathnames become redundant and are removed.
This change aims to streamline the code without altering its functionality.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Currently, there is an inconsistency in how graphics sysfs knobs are
accessed: graphics residency sysfs knobs are opened and closed for each
read, while graphics frequency sysfs knobs are opened once and remain
open until turbostat exits. This inconsistency is confusing and adds
unnecessary code complexity.
Consolidate the access method by opening the sysfs files once and
reusing the file pointers for subsequent accesses. This approach
simplifies the code and ensures a consistent method for accessing
graphics sysfs knobs.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
In various generations, platforms often share a majority of features,
diverging only in a few specific aspects. The current approach of using
hardcoded values in 'platform_features' structure fails to effectively
represent these divergences.
To improve the description of platform divergence:
1. Each newly introduced 'platform_features' structure must have a base,
typically derived from the previous generation.
2. Platform feature values should be inherited from the base structure
rather than being hardcoded.
This approach ensures a more accurate and maintainable representation of
platform-specific features across different generations.
Converts `adl_features` and `lnl_features` to follow this new scheme.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 27 Aug 2024 05:07:51 +0000 (13:07 +0800)]
tools/power turbostat: Fix trailing '\n' parsing
parse_cpu_string() parses the string input either from command line or
from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that
turbostat can run with.
The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains
a trailing '\n', but strtoul() fails to treat this as an error.
That says, for the code below
val = ("\n", NULL, 10);
val returns 0, and errno is also not set.
As a result, CPU0 is erroneously considered as allowed CPU and this
causes failures when turbostat tries to run on CPU0.
get_counters: Could not migrate to CPU 0
...
turbostat: re-initialized with num_cpus 8, allowed_cpus 5
get_counters: Could not migrate to CPU 0
Add a check to return immediately if '\n' or '\0' is detected.
Fixes: 8c3dd2c9e542 ("tools/power/turbostat: Abstrct function for parsing cpu string") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Tue, 20 Aug 2024 16:47:59 +0000 (18:47 +0200)]
tools/power turbostat: Allow using cpu device in perf counters on hybrid platforms
Intel hybrid platforms expose different perf devices for P and E cores.
Instead of one, "/sys/bus/event_source/devices/cpu" device, there are
"/sys/bus/event_source/devices/{cpu_core,cpu_atom}".
This, however makes it more complicated for the user,
because most of the counters are available on both and had to be
handled manually.
This patch allows users to use "virtual" cpu device that is seemingly
translated to cpu_core and cpu_atom perf devices, depending on the type
of a CPU we are opening the counter for.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Wed, 7 Aug 2024 11:43:39 +0000 (13:43 +0200)]
tools/power turbostat: Fix column printing for PMT xtal_time counters
If the very first printed column was for a PMT counter of type xtal_time
we would misalign the column header, because we were always printing the
delimiter.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sat, 30 Nov 2024 21:41:50 +0000 (13:41 -0800)]
Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add generic support for built-in boot DTB files
- Enable TAB cycling for dialog buttons in nconfig
- Fix issues in streamline_config.pl
- Refactor Kconfig
- Add support for Clang's AutoFDO (Automatic Feedback-Directed
Optimization)
- Add support for Clang's Propeller, a profile-guided optimization.
- Change the working directory to the external module directory for M=
builds
- Support building external modules in a separate output directory
- Enable objtool for *.mod.o and additional kernel objects
- Use lz4 instead of deprecated lz4c
- Work around a performance issue with "git describe"
- Refactor modpost
* tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits)
kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms
gitignore: Don't ignore 'tags' directory
kbuild: add dependency from vmlinux to resolve_btfids
modpost: replace tdb_hash() with hash_str()
kbuild: deb-pkg: add python3:native to build dependency
genksyms: reduce indentation in export_symbol()
modpost: improve error messages in device_id_check()
modpost: rename alias symbol for MODULE_DEVICE_TABLE()
modpost: rename variables in handle_moddevtable()
modpost: move strstarts() to modpost.h
modpost: convert do_usb_table() to a generic handler
modpost: convert do_of_table() to a generic handler
modpost: convert do_pnp_device_entry() to a generic handler
modpost: convert do_pnp_card_entries() to a generic handler
modpost: call module_alias_printf() from all do_*_entry() functions
modpost: pass (struct module *) to do_*_entry() functions
modpost: remove DEF_FIELD_ADDR_VAR() macro
modpost: deduplicate MODULE_ALIAS() for all drivers
modpost: introduce module_alias_printf() helper
modpost: remove unnecessary check in do_acpi_entry()
...
Linus Torvalds [Sat, 30 Nov 2024 19:18:16 +0000 (11:18 -0800)]
Merge tag 'rtc-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"New drivers:
- Amlogic A4 and A5 RTC
- Marvell 88PM886 PMIC RTC
- Renesas RTCA-3 for Renesas RZ/G3S
Driver updates:
- ab-eoz9: fix temperature and alarm support
- cmos: improve locking behaviour
- isl12022: add alarm support
- m48t59: improve epoch handling
- mt6359: add range
- rzn1: fix BCD conversions and simplify driver"
* tag 'rtc-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (38 commits)
rtc: ab-eoz9: don't fail temperature reads on undervoltage notification
rtc: rzn1: reduce register access
rtc: rzn1: drop superfluous wday calculation
m68k: mvme147, mvme16x: Adopt rtc-m48t59 platform driver
rtc: brcmstb-waketimer: don't include 'pm_wakeup.h' directly
rtc: m48t59: Use platform_data struct for year offset value
rtc: ab-eoz9: fix abeoz9_rtc_read_alarm
rtc: rv3028: fix RV3028_TS_COUNT type
rtc: rzn1: update Michel's email
rtc: rzn1: fix BCD to rtc_time conversion errors
rtc: amlogic-a4: fix compile error
rtc: amlogic-a4: drop error messages
MAINTAINERS: Add an entry for Amlogic RTC driver
rtc: support for the Amlogic on-chip RTC
dt-bindings: rtc: Add Amlogic A4 and A5 RTC
rtc: add driver for Marvell 88PM886 PMIC RTC
rtc: check if __rtc_read_time was successful in rtc_timer_do_work()
rtc: pcf8563: Switch to regmap
rtc: pcf8563: Sort headers alphabetically
rtc: abx80x: Fix WDT bit position of the status register
...
Linus Torvalds [Sat, 30 Nov 2024 18:34:54 +0000 (10:34 -0800)]
Merge tag 'uml-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Lots of cleanups, mostly from Benjamin Berg and Tiwei Bie
- Removal of unused code
- Fix for sparse warnings
- Cleanup around stub_exe()
* tag 'uml-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (68 commits)
hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio()
um: move thread info into task
um: Always dump trace for specified task in show_stack
um: vector: Do not use drvdata in release
um: net: Do not use drvdata in release
um: ubd: Do not use drvdata in release
um: ubd: Initialize ubd's disk pointer in ubd_add
um: virtio_uml: query the number of vqs if supported
um: virtio_uml: fix call_fd IRQ allocation
um: virtio_uml: send SET_MEM_TABLE message with the exact size
um: remove broken double fault detection
um: remove duplicate UM_NSEC_PER_SEC definition
um: remove file sync for stub data
um: always include kconfig.h and compiler-version.h
um: set DONTDUMP and DONTFORK flags on KASAN shadow memory
um: fix sparse warnings in signal code
um: fix sparse warnings from regset refactor
um: Remove double zero check
um: fix stub exe build with CONFIG_GCOV
um: Use os_set_pdeathsig helper in winch thread/process
...
Linus Torvalds [Sat, 30 Nov 2024 18:32:47 +0000 (10:32 -0800)]
Merge tag 'ubifs-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
"JFFS2:
- Bug fix for rtime compression
- Various cleanups
UBI:
- Cleanups for fastmap and wear leveling
UBIFS:
- Add support for FS_IOC_GETFSSYSFSPATH
- Remove dead ioctl code
- Fix UAF in ubifs_tnc_end_commit()"
* tag 'ubifs-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (25 commits)
ubifs: Fix uninitialized use of err in ubifs_jnl_write_inode()
jffs2: Prevent rtime decompress memory corruption
jffs2: remove redundant check on outpos > pos
fs: jffs2: Fix inconsistent indentation in jffs2_mark_node_obsolete
jffs2: Correct some typos in comments
jffs2: fix use of uninitialized variable
jffs2: Use str_yes_no() helper function
mtd: ubi: remove redundant check on bytes_left at end of function
mtd: ubi: fix unreleased fwnode_handle in find_volume_fwnode()
ubifs: authentication: Fix use-after-free in ubifs_tnc_end_commit
ubi: fastmap: Fix duplicate slab cache names while attaching
ubifs: xattr: remove unused anonymous enum
ubifs: Reduce kfree() calls in ubifs_purge_xattrs()
ubifs: Call iput(xino) only once in ubifs_purge_xattrs()
ubi: wl: Close down wear-leveling before nand is suspended
mtd: ubi: Rmove unused declaration in header file
ubifs: Correct the total block count by deducting journal reservation
ubifs: Convert to use ERR_CAST()
ubifs: add support for FS_IOC_GETFSSYSFSPATH
ubifs: remove unused ioctl flags GETFLAGS/SETFLAGS
...
* tag '9p-for-6.13-rc1' of https://github.com/martinetd/linux:
net/9p/usbg: allow building as standalone module
9p/xen: fix release of IRQ
9p/xen: fix init sequence
net/9p/usbg: fix handling of the failed kzalloc() memory allocation
fs/9p: replace functions v9fs_cache_{register|unregister} with direct calls
Linus Torvalds [Sat, 30 Nov 2024 18:22:38 +0000 (10:22 -0800)]
Merge tag 'ceph-for-6.13-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A fix for the mount "device" string parser from Patrick and two cred
reference counting fixups from Max, marked for stable.
Also included a number of cleanups and a tweak to MAINTAINERS to avoid
unnecessarily CCing netdev list"
* tag 'ceph-for-6.13-rc1' of https://github.com/ceph/ceph-client:
ceph: fix cred leak in ceph_mds_check_access()
ceph: pass cred pointer to ceph_mds_auth_match()
ceph: improve caps debugging output
ceph: correct ceph_mds_cap_peer field name
ceph: correct ceph_mds_cap_item field name
ceph: miscellaneous spelling fixes
ceph: Use strscpy() instead of strcpy() in __get_snap_name()
ceph: Use str_true_false() helper in status_show()
ceph: requalify some char pointers as const
ceph: extract entity name from device id
MAINTAINERS: exclude net/ceph from networking
ceph: Remove fs/ceph deadcode
libceph: Remove unused ceph_crypto_key_encode
libceph: Remove unused ceph_osdc_watch_check
libceph: Remove unused pagevec functions
libceph: Remove unused ceph_pagelist functions
Linus Torvalds [Sat, 30 Nov 2024 18:17:53 +0000 (10:17 -0800)]
Merge tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- nfs/localio: fix for a memory corruption in nfs_local_read_done
- Revert "nfs: don't reuse partially completed requests in
nfs_lock_and_join_requests"
- nfsv4:
- ignore SB_RDONLY when mounting nfs
- Fix a use-after-free problem in open()
- sunrpc:
- clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
- timeout and cancel TLS handshake with -ETIMEDOUT
- fix one UAF issue caused by sunrpc kernel tcp socket
- Fix a hang in TLS sock_close if sk_write_pending
- pNFS/blocklayout: Fix device registration issues
Features and cleanups:
- localio cleanups from Mike Snitzer
- Clean up refcounting on the nfs version modules
- __counted_by() annotations
- nfs: make processes that are waiting for an I/O lock killable"
* tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
fs/nfs/io: make nfs_start_io_*() killable
nfs/blocklayout: Limit repeat device registration on failure
nfs/blocklayout: Don't attempt unregister for invalid block device
sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
nfs: ignore SB_RDONLY when mounting nfs
Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"
nfs/localio: must clear res.replen in nfs_local_read_done
NFSv4.0: Fix a use-after-free problem in the asynchronous open()
NFSv4.0: Fix the wake up of the next waiter in nfs_release_seqid()
SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
sunrpc: remove newlines from tracepoints
nfs: Annotate struct pnfs_commit_array with __counted_by()
nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
nfs/localio: remove redundant suid/sgid handling
NFS: Implement get_nfs_version()
...
Linus Torvalds [Sat, 30 Nov 2024 18:14:42 +0000 (10:14 -0800)]
Merge tag '6.13-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- directory lease fixes
- password rotation fixes
- reconnect fix
- fix for SMB3.02 mounts
- DFS (global namespace) fixes
- fixes for special file handling (most relating to better handling
various types of symlinks)
- two minor cleanups
* tag '6.13-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (22 commits)
cifs: update internal version number
cifs: unlock on error in smb3_reconfigure()
cifs: during remount, make sure passwords are in sync
cifs: support mounting with alternate password to allow password rotation
smb: Initialize cfid->tcon before performing network ops
smb: During unmount, ensure all cached dir instances drop their dentry
smb: client: fix noisy message when mounting shares
smb: client: don't try following DFS links in cifs_tree_connect()
smb: client: allow reconnect when sending ioctl
smb: client: get rid of @nlsc param in cifs_tree_connect()
smb: client: allow more DFS referrals to be cached
cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE session
cifs: Validate content of WSL reparse point buffers
cifs: Improve guard for excluding $LXDEV xattr
cifs: Add support for parsing WSL-style symlinks
cifs: Validate content of native symlink
cifs: Fix parsing native symlinks relative to the export
smb: client: fix NULL ptr deref in crypto_aead_setkey()
Update misleading comment in cifs_chan_update_iface
smb: client: change return value in open_cached_dir_by_dentry() if !cfids
...
Linus Torvalds [Sat, 30 Nov 2024 18:06:56 +0000 (10:06 -0800)]
Merge tag '6.13-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
- fix use after free due to race in ksmd workqueue handler
- debugging improvements
- fix incorrectly formatted response when client attempts SMB1
- improve memory allocation to reduce chance of OOM
- improve delays between retries when killing sessions
* tag '6.13-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix use-after-free in SMB request handling
ksmbd: add debug print for pending request during server shutdown
ksmbd: add netdev-up/down event debug print
ksmbd: add debug prints to know what smb2 requests were received
ksmbd: add debug print for rdma capable
ksmbd: use msleep instaed of schedule_timeout_interruptible()
ksmbd: use __GFP_RETRY_MAYFAIL
ksmbd: fix malformed unsupported smb1 negotiate response
Brian Norris [Tue, 26 Nov 2024 21:04:34 +0000 (13:04 -0800)]
PCI/pwrctrl: Unregister platform device only if one actually exists
If a PCI device has an associated device_node with power supplies,
pci_bus_add_device() creates platform devices for use by pwrctrl. When the
PCI device is removed, pci_stop_dev() uses of_find_device_by_node() to
locate the related platform device, then unregisters it.
But when we remove a PCI device with no associated device node,
dev_of_node(dev) is NULL, and of_find_device_by_node(NULL) returns the
first device with "dev->of_node == NULL". The result is that we (a)
mistakenly unregister a completely unrelated platform device, leading to
issues like the first trace below, and (b) dereference the NULL pointer
from dev_of_node() when clearing OF_POPULATED, as in the second trace.
Unregister a platform device only if there is one associated with this PCI
device. This resolves issues seen when doing:
# echo 1 > /sys/bus/pci/devices/.../remove
Sample issue from unregistering the wrong platform device:
Linus Torvalds [Sat, 30 Nov 2024 17:03:16 +0000 (09:03 -0800)]
Merge tag 'tty-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is a small set of tty and serial driver updates for 6.13-rc1.
Nothing major at all this time, only some small changes:
- few device tree binding updates
- 8250_exar serial driver updates
- imx serial driver updates
- sprd_serial driver updates
- other tiny serial driver updates, full details in the shortlog
All of these have been in linux-next for a while with one reported
issue, but that commit has now been reverted"
* tag 'tty-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (37 commits)
Revert "serial: sh-sci: Clean sci_ports[0] after at earlycon exit"
serial: amba-pl011: fix build regression
dt-bindings: serial: Add a new compatible string for ums9632
serial: sprd: Add support for sc9632
tty/serial/altera_uart: unwrap error log string
tty/serial/altera_jtaguart: unwrap error log string
serial: amba-pl011: Fix RX stall when DMA is used
tty: ldsic: fix tty_ldisc_autoload sysctl's proc_handler
serial: 8250_fintek: Add support for F81216E
serial: sh-sci: Clean sci_ports[0] after at earlycon exit
tty: atmel_serial: Fix typo retreives to retrieves
tty: atmel_serial: Use devm_platform_ioremap_resource()
serial: 8250: omap: Move pm_runtime_get_sync
tty: serial: samsung: Add Exynos8895 compatible
dt-bindings: serial: samsung: Add samsung,exynos8895-uart compatible
serial: 8250_dw: Add Sophgo SG2044 quirk
dt-bindings: serial: snps-dw-apb-uart: Add Sophgo SG2044 uarts
dt-bindings: serial: snps,dw-apb-uart: merge duplicate compatible entry.
altera_jtaguart: Use dev_err() to report error attaching IRQ
altera_uart: Use dev_err() to report error attaching IRQ handler
...