Kunwu Chan [Sun, 26 Nov 2023 09:57:39 +0000 (17:57 +0800)]
powerpc/powernv: Add a null pointer check in opal_powercap_init()
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Fixes: b9ef7b4b867f ("powerpc: Convert to using %pOFn instead of device_node.name") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231126095739.1501990-1-chentao@kylinos.cn
Kunwu Chan [Fri, 8 Dec 2023 08:59:37 +0000 (16:59 +0800)]
powerpc/powernv: Add a null pointer check to scom_debug_init_one()
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Add a null pointer check, and release 'ent' to avoid memory leaks.
Fixes: bfd2f0d49aef ("powerpc/powernv: Get rid of old scom_controller abstraction") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231208085937.107210-1-chentao@kylinos.cn
Kunwu Chan [Mon, 4 Dec 2023 02:32:23 +0000 (10:32 +0800)]
powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231204023223.2447523-1-chentao@kylinos.cn
Sathvika Vasireddy [Fri, 8 Dec 2023 16:30:43 +0000 (22:00 +0530)]
powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B
Commit d49a0626216b95 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT")
introduced a generic function-alignment infrastructure. Move to using
FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of
the existing _GLOBAL macro.
Nathan Lynch [Tue, 12 Dec 2023 17:02:00 +0000 (11:02 -0600)]
powerpc/selftests: Add test for papr-sysparm
Consistently testing system parameter access is a bit difficult by
nature -- the set of parameters available depends on the model and
system configuration, and updating a parameter should be considered a
destructive operation reserved for the admin.
So we validate some of the error paths and retrieve the SPLPAR
characteristics string, but not much else.
Nathan Lynch [Tue, 12 Dec 2023 17:01:59 +0000 (11:01 -0600)]
powerpc/selftests: Add test for papr-vpd
Add selftests for /dev/papr-vpd, exercising the common expected use
cases:
* Retrieve all VPD by passing an empty location code.
* Retrieve the "system VPD" by passing a location code derived from DT
root node properties, as done by the vpdupdate command.
The tests also verify that certain intended properties of the driver
hold:
* Passing an unterminated location code to PAPR_VPD_CREATE_HANDLE gets
EINVAL.
* Passing a NULL location code pointer to PAPR_VPD_CREATE_HANDLE gets
EFAULT.
* Closing the device node without first issuing a
PAPR_VPD_CREATE_HANDLE command to it succeeds.
* Releasing a handle without first consuming any data from it
succeeds.
* Re-reading the contents of a handle returns the same data as the
first time.
Some minimal validation of the returned data is performed.
The tests are skipped on systems where the papr-vpd driver does not
initialize, making this useful only on PowerVM LPARs at this point.
Nathan Lynch [Tue, 12 Dec 2023 17:01:58 +0000 (11:01 -0600)]
powerpc/pseries/papr-sysparm: Expose character device to user space
Until now the papr_sysparm APIs have been kernel-internal. But user
space needs access to PAPR system parameters too. The only method
available to user space today to get or set system parameters is using
sys_rtas() and /dev/mem to pass RTAS-addressable buffers between user
space and firmware. This is incompatible with lockdown and should be
deprecated.
So provide an alternative ABI to user space in the form of a
/dev/papr-sysparm character device with just two ioctl commands (get
and set). The data payloads involved are small enough to fit in the
ioctl argument buffer, making the code relatively simple.
Exposing the system parameters through sysfs has been considered but
it would be too awkward:
* The kernel currently does not have to contain an exhaustive list of
defined system parameters. This is a convenient property to maintain
because we don't have to update the kernel whenever a new parameter
is added to PAPR. Exporting a named attribute in sysfs for each
parameter would negate this.
* Some system parameters are text-based and some are not.
* Retrieval of at least one system parameter requires input data,
which a simple read-oriented interface can't support.
The ability to get and set system parameters will be exposed to user
space, so let's get a little more strict about malformed
papr_sysparm_buf objects.
* Create accessors for the length field of struct papr_sysparm_buf.
The length is always stored in MSB order and this is better than
spreading the necessary conversions all over.
* Reject attempts to submit invalid buffers to RTAS.
* Warn if RTAS returns a buffer with an invalid length, clamping the
returned length to a safe value that won't overrun the buffer.
These are meant as precautionary measures to mitigate both firmware
and kernel bugs in this area, should they arise, but I am not aware of
any.
When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
the file contains the result of a complete ibm,get-vpd sequence. The
file contents are immutable from the POV of user space. To get a new
view of the VPD, the client must create a new handle.
This design choice insulates user space from most of the complexities
that ibm,get-vpd brings:
* ibm,get-vpd must be called more than once to obtain complete
results.
* Only one ibm,get-vpd call sequence should be in progress at a time;
interleaved sequences will disrupt each other. Callers must have a
protocol for serializing their use of the function.
* A call sequence in progress may receive a "VPD changed, try again"
status, requiring the client to abandon the sequence and start
over.
The memory required for the VPD buffers seems acceptable, around 20KB
for all VPD on one of my systems. And the value of the
/rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
consistently 300KB across various systems I've checked.
I've implemented support for this new ABI in the rtas_get_vpd()
function in librtas, which the vpdupdate command currently uses to
populate its VPD database. I've verified that an unmodified vpdupdate
binary generates an identical database when using a librtas.so that
prefers the new ABI.
Along with the papr-vpd.h header exposed to user space, this
introduces a common papr-miscdev.h uapi header to share a base ioctl
ID with similar drivers to come.
Nathan Lynch [Tue, 12 Dec 2023 17:01:55 +0000 (11:01 -0600)]
powerpc/rtas: Warn if per-function lock isn't held
If the function descriptor has a populated lock member, then callers
are required to hold it across calls. Now that the firmware activation
sequence is appropriately guarded, we can warn when the requirement
isn't satisfied.
__do_enter_rtas_trace() gets reorganized a bit as a result of
performing the function descriptor lookup unconditionally now.
Use rtas_ibm_activate_firmware_lock to prevent interleaving call
sequences of the ibm,activate-firmware RTAS function, which typically
requires multiple calls to complete the update. While the spec does
not specifically prohibit interleaved sequences, there's almost
certainly no advantage to allowing them.
On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.
However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.
Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.
But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.
So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding more function-specific hacks
to sys_rtas(). This is a prerequisite for adding an in-kernel call
sequence of ibm,get-vpd, which is in a change to follow.
Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().
* Add an optional mutex member to struct rtas_function.
* Statically define a mutex for each RTAS function with known call
sequence serialization requirements, and assign its address to the
.lock member of the corresponding function table entry, along with
justifying commentary.
* In sys_rtas(), if the table entry for the RTAS function being
called has a populated lock member, acquire it before taking
rtas_lock and entering RTAS.
* Kernel-based RTAS call sequences are expected to access the
appropriate mutex explicitly by name. For example, a user of the
ibm,activate-firmware RTAS function would do:
int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
int fwrc;
mutex_lock(&rtas_ibm_activate_firmware_lock);
do {
fwrc = rtas_call(token, 0, 1, NULL);
} while (rtas_busy_delay(fwrc));
mutex_unlock(&rtas_ibm_activate_firmware_lock);
There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock.
Nathan Lynch [Tue, 12 Dec 2023 17:01:52 +0000 (11:01 -0600)]
powerpc/rtas: Move token validation from block_rtas_call() to sys_rtas()
The rtas system call handler sys_rtas() delegates certain input
validation steps to a helper function: block_rtas_call(). One of these
steps ensures that the user-supplied token value maps to a known RTAS
function. This is done by performing a "reverse" token-to-function
lookup via rtas_token_to_function_untrusted() to obtain an
rtas_function object.
In changes to come, sys_rtas() itself will need the function
descriptor for the token. To prepare:
* Move the lookup and validation up into sys_rtas() and pass the
resulting rtas_function pointer to block_rtas_call(), which is
otherwise unconcerned with the token value.
* Change block_rtas_call() to report the RTAS function name instead of
the token value on validation failures, since it can now rely on
having a valid function descriptor.
One behavior change is that sys_rtas() now silently errors out when
passed a bad token, before calling block_rtas_call(). So we will no
longer log "RTAS call blocked - exploit attempt?" on invalid
tokens. This is consistent with how sys_rtas() currently handles other
"metadata" (nargs and nret), while block_rtas_call() is primarily
concerned with validating the arguments to be passed to specific RTAS
functions.
Nathan Lynch [Tue, 12 Dec 2023 17:01:51 +0000 (11:01 -0600)]
powerpc/rtas: Add function return status constants
Not all of the generic RTAS function statuses specified in PAPR have
symbolic constants and descriptions in rtas.h. Fix this, providing a
little more background, slightly updating the existing wording, and
improving the formatting.
Nathan Lynch [Tue, 12 Dec 2023 17:01:50 +0000 (11:01 -0600)]
powerpc/rtas: Fall back to linear search on failed token->function lookup
Enabling any of the powerpc:rtas_* tracepoints at boot is likely to
result in an oops on RTAS platforms. For example, booting a QEMU
pseries model with 'trace_event=powerpc:rtas_input' in the command
line leads to:
BUG: Kernel NULL pointer dereference on read at 0x00000008
Oops: Kernel access of bad area, sig: 7 [#1]
NIP [c00000000004231c] do_enter_rtas+0x1bc/0x460
LR [c00000000004231c] do_enter_rtas+0x1bc/0x460
Call Trace:
do_enter_rtas+0x1bc/0x460 (unreliable)
rtas_call+0x22c/0x4a0
rtas_get_boot_time+0x80/0x14c
read_persistent_clock64+0x124/0x150
read_persistent_wall_and_boot_offset+0x28/0x58
timekeeping_init+0x70/0x348
start_kernel+0xa0c/0xc1c
start_here_common+0x1c/0x20
(This is preceded by a warning for the failed lookup in
rtas_token_to_function().)
This happens when __do_enter_rtas_trace() attempts a token to function
descriptor lookup before the xarray containing the mappings has been
set up.
Fall back to linear scan of the table if rtas_token_to_function_xarray
is empty.
Add a convenience macro for iterating over every element of the
internal function table and convert the one site that can use it. An
additional user of the macro is anticipated in changes to follow.
Nathan Lynch [Tue, 12 Dec 2023 17:01:48 +0000 (11:01 -0600)]
powerpc/rtas: Avoid warning on invalid token argument to sys_rtas()
rtas_token_to_function() WARNs when passed an invalid token; it's
meant to catch bugs in kernel-based users of RTAS functions. However,
user space controls the token value passed to rtas_token_to_function()
by block_rtas_call(), so user space with sufficient privilege to use
sys_rtas() can trigger the warnings at will:
unexpected failed lookup for token 2048
WARNING: CPU: 20 PID: 2247 at arch/powerpc/kernel/rtas.c:556
rtas_token_to_function+0xfc/0x110
...
NIP rtas_token_to_function+0xfc/0x110
LR rtas_token_to_function+0xf8/0x110
Call Trace:
rtas_token_to_function+0xf8/0x110 (unreliable)
sys_rtas+0x188/0x880
system_call_exception+0x268/0x530
system_call_common+0x160/0x2c4
It's desirable to continue warning on bogus tokens in
rtas_token_to_function(). Currently it is used to look up RTAS
function descriptors when tracing, where we know there has to have
been a successful descriptor lookup by different means already, and it
would be a serious inconsistency for the reverse lookup to fail.
So instead of weakening rtas_token_to_function()'s contract by
removing the warnings, introduce rtas_token_to_function_untrusted(),
which has no opinion on failed lookups. Convert block_rtas_call() and
rtas_token_to_function() to use it.
Kajol Jain [Thu, 16 Nov 2023 12:20:32 +0000 (17:50 +0530)]
powerpc/hv-gpci: Add return value check in affinity_domain_via_partition_show function
To access hv-gpci kernel interface files data, the
"Enable Performance Information Collection" option has to be set
in hmc. Incase that option is not set and user try to read
the interface files, it should give error message as
operation not permitted.
Result of accessing added interface files with disabled
performance collection option:
[command]# cat processor_bus_topology
cat: processor_bus_topology: Operation not permitted
[command]# cat processor_config
cat: processor_config: Operation not permitted
[command]# cat affinity_domain_via_domain
cat: affinity_domain_via_domain: Operation not permitted
[command]# cat affinity_domain_via_virtual_processor
cat: affinity_domain_via_virtual_processor: Operation not permitted
[command]# cat affinity_domain_via_partition
Based on above result there is no error message when reading
affinity_domain_via_partition file because of missing
check for failed hcall. Fix this issue by adding
a check in the start of affinity_domain_via_partition_show
function, to return error incase hcall fails, with error type
other then H_PARAMETER.
Fixes: a15e0d6a6929 ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via partition information") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231116122033.160964-1-kjain@linux.ibm.com
Michael Ellerman [Tue, 28 Nov 2023 13:27:48 +0000 (00:27 +1100)]
selftests/powerpc: Check all FPRs in fpu_syscall test
There is a selftest that checks if FPRs are corrupted across a fork, aka
clone. It was added as part of the series that optimised the clone path
to save the parent's FP state without "giving up" (turning off FP).
See commit 8792468da5e1 ("powerpc: Add the ability to save FPU without
giving it up").
The test encodes the assumption that FPRs 0-13 are volatile across the
syscall, by only checking the volatile FPRs are not changed by the fork.
There was also a comment in the fpu_preempt test alluding to that:
The check_fpu function in asm only checks the non volatile registers
as it is reused from the syscall test
It is true that the function call ABI treats f0-f13 as volatile,
however the syscall ABI has since been documented as *not* treating those
registers as volatile. See commit 7b8845a2a2ec ("powerpc/64: Document
the syscall ABI").
So change the test to check all FPRs are not corrupted by the syscall.
Note that this currently fails, because save_fpu() etc. do not restore
f0/vsr0.
Michael Ellerman [Tue, 28 Nov 2023 13:27:46 +0000 (00:27 +1100)]
selftests/powerpc: Generate better bit patterns for FPU tests
The fpu_preempt test randomly initialises an array of doubles to try and
detect FPU register corruption.
However the values it generates do not occupy the full range of values
possible in the 64-bit double, meaning some partial register corruption
could go undetected.
Without getting too carried away, add some better initialisation to
generate values that occupy more bits.
Michael Ellerman [Tue, 28 Nov 2023 13:27:45 +0000 (00:27 +1100)]
selftests/powerpc: Check all FPRs in fpu_preempt
There's a selftest that checks FPRs aren't corrupted by preemption, or
just process scheduling. However it only checks the non-volatile FPRs,
meaning corruption of the volatile FPRs could go undetected.
The check_fpu function it calls is used by several other tests, so for
now add a new routine to check all the FPRs. Increase the size of the
array of FPRs to 32, and initialise them all with random values.
Michael Ellerman [Tue, 28 Nov 2023 13:27:44 +0000 (00:27 +1100)]
selftests/powerpc: Fix error handling in FPU/VMX preemption tests
The FPU & VMX preemption tests do not check for errors returned by the
low-level asm routines, preempt_fpu() / preempt_vsx() respectively.
That means any register corruption detected by the asm routines does not
result in a test failure.
Fix it by returning the return value of the asm routines from the
pthread child routines.
Michael Ellerman [Wed, 6 Dec 2023 11:55:48 +0000 (22:55 +1100)]
powerpc/Makefile: Auto detect cross compiler
If no cross compiler is specified, try to auto detect one.
Look for various combinations, matching:
powerpc(64(le)?)?(-unknown)?-linux(-gnu)?-
There are more possibilities, but the above is known to find a compiler
on Fedora and Ubuntu (which use linux-gnu-), and also detects the
kernel.org cross compilers (which use linux-).
Michael Ellerman [Wed, 6 Dec 2023 11:55:47 +0000 (22:55 +1100)]
powerpc/Makefile: Default to ppc64le_defconfig when cross building
If the kernel is being cross compiled, there is no information from
uname on which defconfig is most appropriate, so the Makefile defaults
to ppc64.
However these days almost all distros that support powerpc are little
endian, so it's more likely that defaulting to ppc64le_defconfig will
produce something useful for a user.
Michael Ellerman [Wed, 6 Dec 2023 11:55:46 +0000 (22:55 +1100)]
powerpc/vdso: No need to undef powerpc for 64-bit build
The vdso Makefile adds -U$(ARCH) to CPPFLAGS for the vdso64.lds linker
script. ARCH is always powerpc, so it becomes -Upowerpc, which means
undefine the "powerpc" symbol.
But the 64-bit compiler doesn't define powerpc in the first place,
compare:
powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE
There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
But that got dropped by
commit 6a56ccbcf6c6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite")
With the change in this patch numa fault pte (pte_protnone()) gets mapped as regular user pte
with RWX cleared (no-access) whereas earlier it used to be mapped _PAGE_PRIVILEGED.
Hash fault handling code gets some WARN_ON added in this patch because
those functions are not expected to get called with _PAGE_READ cleared.
commit 18061c17c8ec ("powerpc/mm: Update PROTFAULT handling in the page
fault path") explains the details.
Nathan Lynch [Tue, 14 Nov 2023 17:01:55 +0000 (11:01 -0600)]
powerpc/pseries/memhp: Log more error conditions in add path
When an add operation for multiple LMBs fails, there is currently
little indication from the kernel of what went wrong. Be a little more
verbose about error conditions in the add paths.
Nathan Lynch [Tue, 14 Nov 2023 17:01:53 +0000 (11:01 -0600)]
powerpc/pseries/memhp: Fix access beyond end of drmem array
dlpar_memory_remove_by_index() may access beyond the bounds of the
drmem lmb array when the LMB lookup fails to match an entry with the
given DRC index. When the search fails, the cursor is left pointing to
&drmem_info->lmbs[drmem_info->n_lmbs], which is one element past the
last valid entry in the array. The debug message at the end of the
function then dereferences this pointer:
pr_debug("Failed to hot-remove memory at %llx\n",
lmb->base_addr);
This was found by inspection and confirmed with KASAN:
pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 1234
==================================================================
BUG: KASAN: slab-out-of-bounds in dlpar_memory+0x298/0x1658
Read of size 8 at addr c000000364e97fd0 by task bash/949
The buggy address belongs to the object at c000000364e80000
which belongs to the cache kmalloc-128k of size 131072
The buggy address is located 0 bytes to the right of
allocated 98256-byte region [c000000364e80000, c000000364e97fd0)
==================================================================
pseries-hotplug-mem: Failed to hot-remove memory at 0
Log failed lookups with a separate message and dereference the
cursor only when it points to a valid entry.
Randy Dunlap [Fri, 1 Dec 2023 05:51:59 +0000 (21:51 -0800)]
powerpc/44x: select I2C for CURRITUCK
Fix build errors when CURRITUCK=y and I2C is not builtin (=m or is
not set). Fixes these build errors:
powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `avr_halt_system':
ppc476.c:(.text+0x58): undefined reference to `i2c_smbus_write_byte_data'
powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `ppc47x_device_probe':
ppc476.c:(.init.text+0x18): undefined reference to `i2c_register_driver'
Fixes: 2a2c74b2efcb ("IBM Akebono: Add the Akebono platform") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202312010820.cmdwF5X9-lkp@intel.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231201055159.8371-1-rdunlap@infradead.org
Zhao Ke [Wed, 29 Nov 2023 07:58:45 +0000 (15:58 +0800)]
powerpc: Add PVN support for HeXin C2000 processor
HeXin Tech Co. has applied for a new PVN from the OpenPower Community
for its new processor C2000. The OpenPower has assigned a new PVN
and this newly assigned PVN is 0x0066, add pvr register related
support for this PVN.
Michael Ellerman [Thu, 30 Nov 2023 11:44:33 +0000 (22:44 +1100)]
powerpc: Fix build error due to is_valid_bugaddr()
With CONFIG_GENERIC_BUG=n the build fails with:
arch/powerpc/kernel/traps.c:1442:5: error: no previous prototype for ‘is_valid_bugaddr’ [-Werror=missing-prototypes]
1442 | int is_valid_bugaddr(unsigned long addr)
| ^~~~~~~~~~~~~~~~
The prototype is only defined, and the function is only needed, when
CONFIG_GENERIC_BUG=y, so move the implementation under that.
Michael Ellerman [Thu, 30 Nov 2023 11:44:32 +0000 (22:44 +1100)]
powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
With NUMA=n and FA_DUMP=y or PRESERVE_FA_DUMP=y the build fails with:
arch/powerpc/kernel/fadump.c:1739:22: error: no previous prototype for ‘arch_reserved_kernel_pages’ [-Werror=missing-prototypes]
1739 | unsigned long __init arch_reserved_kernel_pages(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
The prototype for arch_reserved_kernel_pages() is in include/linux/mm.h,
but it's guarded by __HAVE_ARCH_RESERVED_KERNEL_PAGES. The powerpc
headers define __HAVE_ARCH_RESERVED_KERNEL_PAGES in asm/mmzone.h, which
is not included into the generic headers when NUMA=n.
Move the definition of __HAVE_ARCH_RESERVED_KERNEL_PAGES into asm/mmu.h
which is included regardless of NUMA=n.
Additionally the ifdef around __HAVE_ARCH_RESERVED_KERNEL_PAGES needs to
also check for CONFIG_PRESERVE_FA_DUMP.
Michael Ellerman [Wed, 29 Nov 2023 13:19:19 +0000 (00:19 +1100)]
powerpc/64s: Fix CONFIG_NUMA=n build due to create_section_mapping()
With CONFIG_NUMA=n the build fails with:
arch/powerpc/mm/book3s64/pgtable.c:275:15: error: no previous prototype for ‘create_section_mapping’ [-Werror=missing-prototypes]
275 | int __meminit create_section_mapping(unsigned long start, unsigned long end,
| ^~~~~~~~~~~~~~~~~~~~~~
That happens because the prototype for create_section_mapping() is in
asm/mmzone.h, but asm/mmzone.h is only included by linux/mmzone.h
when CONFIG_NUMA=y.
In fact the prototype is only needed by arch/powerpc/mm code, so move
the prototype into arch/powerpc/mm/mmu_decl.h, which also fixes the
build error.
Michael Ellerman [Wed, 29 Nov 2023 13:19:18 +0000 (00:19 +1100)]
powerpc/44x: Make ppc44x_idle_init() static
The 44x/fsp2_defconfig build fails with:
arch/powerpc/platforms/44x/idle.c:30:12: error: no previous prototype for ‘ppc44x_idle_init’ [-Werror=missing-prototypes]
30 | int __init ppc44x_idle_init(void)
| ^~~~~~~~~~~~~~~~
Michael Ellerman [Wed, 29 Nov 2023 13:19:15 +0000 (00:19 +1100)]
powerpc/suspend: Add prototype for do_after_copyback()
With HIBERNATION=y the build breaks with:
arch/powerpc/kernel/swsusp_64.c:14:6: error: no previous prototype for ‘do_after_copyback’ [-Werror=missing-prototypes]
14 | void do_after_copyback(void)
| ^~~~~~~~~~~~~~~~~
do_after_copyback() is only called from asm, so there is no prototype,
nor any header where it makes sense to place one. Just add a prototype
in the C file to fix the build error.
Nathan Lynch [Tue, 28 Nov 2023 00:40:09 +0000 (18:40 -0600)]
powerpc/rtas_pci: rename and properly expose config access APIs
The rtas_read_config() and rtas_write_config() functions in
kernel/rtas_pci.c have external linkage and two users in arch/powerpc:
the rtas_pci code itself and the pseries platform's "enhanced error
handling" (EEH) support code.
The prototypes for these functions in asm/ppc-pci.h have until now
been guarded by CONFIG_EEH since the only external caller is the
pseries EEH code. However, this presumably has always generated
warnings when built with !CONFIG_EEH and -Wmissing-prototypes:
arch/powerpc/kernel/rtas_pci.c:46:5: error: no previous prototype for
function 'rtas_read_config' [-Werror,-Wmissing-prototypes]
46 | int rtas_read_config(struct pci_dn *pdn, int where,
int size, u32 *val)
arch/powerpc/kernel/rtas_pci.c:98:5: error: no previous prototype for
function 'rtas_write_config' [-Werror,-Wmissing-prototypes]
98 | int rtas_write_config(struct pci_dn *pdn, int where,
int size, u32 val)
The introduction of commit c6345dfa6e3e ("Makefile.extrawarn: turn on
missing-prototypes globally") forces the issue.
The efika and chrp platform code have (static) functions with the same
names but different signatures. We may as well eliminate the potential
for conflicts and confusion by renaming the globally visible versions
as their prototypes get moved out of the CONFIG_EEH-guarded region;
their current names are too generic anyway. Since they operate on
objects of the type 'struct pci_dn *', give them the slightly more
verbose prefix "rtas_pci_dn_" and fix up all the call sites.
Fixes: c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYt0LLXtjSz+Hkf3Fhm-kf0ZQanrhUS+zVZGa3O+Wt2+vg@mail.gmail.com/ Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231127-rtas-pci-rw-config-v1-1-385d29ace3df@linux.ibm.com
Stephen Rothwell [Mon, 27 Nov 2023 02:28:09 +0000 (13:28 +1100)]
powerpc: pmd_move_must_withdraw() is only needed for CONFIG_TRANSPARENT_HUGEPAGE
The linux-next build of powerpc64 allnoconfig fails with:
arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 'pmd_move_must_withdraw'
557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
| ^~~~~~~~~~~~~~~~~~~~~~
Caused by commit:
c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
Fix it by moving the function definition under
CONFIG_TRANSPARENT_HUGEPAGE like the prototype. The function is only
called when CONFIG_TRANSPARENT_HUGEPAGE=y.
Naveen N Rao [Thu, 23 Nov 2023 07:17:05 +0000 (12:47 +0530)]
powerpc/lib: Validate size for vector operations
Some of the fp/vmx code in sstep.c assume a certain maximum size for the
instructions being emulated. The size of those operations however is
determined separately in analyse_instr().
Add a check to validate the assumption on the maximum size of the
operations, so as to prevent any unintended kernel stack corruption.
Signed-off-by: Naveen N Rao <naveen@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Build-tested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231123071705.397625-1-naveen@kernel.org
Michael Ellerman [Mon, 20 Nov 2023 23:54:36 +0000 (10:54 +1100)]
powerpc/lib: Avoid array bounds warnings in vec ops
Building with GCC with -Warray-bounds enabled there are several warnings
in sstep.c along the lines of:
In function ‘do_byte_reverse’,
inlined from ‘do_vec_load’ at arch/powerpc/lib/sstep.c:691:3,
inlined from ‘emulate_loadstore’ at arch/powerpc/lib/sstep.c:3439:9:
arch/powerpc/lib/sstep.c:289:23: error: array subscript 2 is outside array bounds of ‘u8[16]’ {aka ‘unsigned char[16]’} [-Werror=array-bounds=]
289 | up[2] = byterev_8(up[1]);
| ~~~~~~^~~~~~~~~~~~~~~~~~
arch/powerpc/lib/sstep.c: In function ‘emulate_loadstore’:
arch/powerpc/lib/sstep.c:681:11: note: at offset 16 into object ‘u’ of size 16
681 | } u = {};
| ^
do_byte_reverse() supports a size up to 32 bytes, but in these cases the
caller is only passing a 16 byte buffer. In practice there is no bug,
do_vec_load() is only called from the LOAD_VMX case in emulate_loadstore().
That in turn is only reached when analyse_instr() recognises VMX ops,
and in all cases the size is no greater than 16:
Although the warning is incorrect, the code would be safer if it clamped
the size from the caller to the known size of the buffer. Do that using
min_t().
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Closes: https://lore.kernel.org/linuxppc-dev/YpbUcPrm61RLIiZF@debian.me/ Reported-by: Jan-Benedict Glaw <jbglaw@lug-owl.de> Closes: https://lore.kernel.org/linuxppc-dev/20221212215117.aa7255t7qd6yefk4@lug-owl.de/ Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Closes: https://lore.kernel.org/linuxppc-dev/6a8bf78c-aedb-4d5a-b0aa-82a51a17b884@embeddedor.com/ Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Build-tested-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231120235436.1569255-1-mpe@ellerman.id.au
Kunwu Chan [Wed, 22 Nov 2023 03:06:51 +0000 (11:06 +0800)]
powerpc/xics: Check return value of kasprintf in icp_native_map_one_cpu
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Masahiro Yamada [Mon, 20 Nov 2023 23:23:32 +0000 (08:23 +0900)]
powerpc: add crtsavres.o to always-y instead of extra-y
crtsavres.o is linked to modules. However, as explained in commit d0e628cd817f ("kbuild: doc: clarify the difference between extra-y
and always-y"), 'make modules' does not build extra-y.
For example, the following command fails:
$ make ARCH=powerpc LLVM=1 KBUILD_MODPOST_WARN=1 mrproper ps3_defconfig modules
[snip]
LD [M] arch/powerpc/platforms/cell/spufs/spufs.ko
ld.lld: error: cannot open arch/powerpc/lib/crtsavres.o: No such file or directory
make[3]: *** [scripts/Makefile.modfinal:56: arch/powerpc/platforms/cell/spufs/spufs.ko] Error 1
make[2]: *** [Makefile:1844: modules] Error 2
make[1]: *** [/home/masahiro/workspace/linux-kbuild/Makefile:350: __build_one_by_one] Error 2
make: *** [Makefile:234: __sub-make] Error 2
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Fixes: baa25b571a16 ("powerpc/64: Do not link crtsavres.o in vmlinux") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231120232332.4100288-1-masahiroy@kernel.org
Michael Ellerman [Wed, 25 Oct 2023 01:24:52 +0000 (12:24 +1100)]
powerpc: Make cpu_spec __ro_after_init
The cpu_spec is a struct holding various information about the CPU the
kernel is executing on. It's populated early in boot and must not change
after that.
In particular the cpu_features and mmu_features hold the set of
discovered CPU/MMU features and are used to set static keys for each
feature, and do binary patching of assembly. So any change to the
cpu_features/mmu_features later in boot will not be reflected in
the state of the static keys or patched code.
There is already logic to check that cpu_features/mmu_features don't
change, see check_features() in feature-fixups.c.
But as another layer of protection the entire cpu_spec should be read
only after init, annotate it as such.
Michael Ellerman [Tue, 24 Oct 2023 11:27:25 +0000 (22:27 +1100)]
powerpc/configs/64s: Enable CONFIG_MEM_SOFT_DIRTY
Enable CONFIG_MEM_SOFT_DIRTY to get some test coverage. Distros enable
it, and it has been broken previously. See commit 66b2ca086210
("powerpc/64s/radix: Fix soft dirty tracking").
Nathan Lynch [Mon, 6 Nov 2023 13:42:59 +0000 (07:42 -0600)]
powerpc/rtas: Remove 'extern' from function declarations in rtas.h
This header occasionally gains new function declarations without the
leading extern in accordance with current style rules. Leaving the
legacy externs in place is making the header more difficult to read
over time because of the inconsistency. Remove them.
Nathan Lynch [Mon, 6 Nov 2023 13:42:57 +0000 (07:42 -0600)]
powerpc/rtas: Move post_mobility_fixup() declaration to pseries
This is a pseries-specific function declaration that doesn't belong in
rtas.h. Move it to the pseries platform code and adjust
pseries/suspend.c accordingly.
Nathan Lynch [Mon, 6 Nov 2023 13:42:55 +0000 (07:42 -0600)]
powerpc/rtas: Drop declaration of undefined call_rtas() function
The call_rtas() function has never been a part of arch/powerpc, and
its implementation was removed from arch/ppc by 0a26b1364f14 ("ppc:
Remove CHRP, POWER3 and POWER4 support from arch/ppc").
Linus Torvalds [Sun, 19 Nov 2023 21:54:28 +0000 (13:54 -0800)]
Merge tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix section mismatch warning messages for riscv and loongarch
- Remove CONFIG_IA64 left-over from linux/export-internal.h
- Fix the location of the quotes for UIMAGE_NAME
- Fix a memory leak bug in Kconfig
* tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix memory leak from range properties
kbuild: Move the single quotes for image name
linux/export: clean up the IA-64 KSYM_FUNC macro
modpost: fix section mismatch message for RELA
Linus Torvalds [Sun, 19 Nov 2023 21:49:32 +0000 (13:49 -0800)]
Merge tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Flush the translation service tables to prevent unpredictable
behavior on non-coherent GIC devices
* tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
Linus Torvalds [Sun, 19 Nov 2023 21:46:17 +0000 (13:46 -0800)]
Merge tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
use
- A documentation fix
- Add Kirill as TDX maintainer
* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Ignore invalid x2APIC entries
x86/shstk: Delay signal entry SSP write until after user accesses
x86/Documentation: Indent 'note::' directive for protocol version number note
MAINTAINERS: Add Intel TDX entry
Linus Torvalds [Sun, 19 Nov 2023 21:35:07 +0000 (13:35 -0800)]
Merge tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Do the push of pending hrtimers away from a CPU which is being
offlined earlier in the offlining process in order to prevent a
deadlock
* tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Push pending hrtimers away from outgoing CPU earlier
Linus Torvalds [Sun, 19 Nov 2023 21:32:00 +0000 (13:32 -0800)]
Merge tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Fix virtual runtime calculation when recomputing a sched entity's
weights
- Fix wrongly rejected unprivileged poll requests to the cgroup psi
pressure files
- Make sure the load balancing is done by only one CPU
* tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the decision for load balance
sched: psi: fix unprivileged polling against cgroups
sched/eevdf: Fix vruntime adjustment on reweight
Linus Torvalds [Sat, 18 Nov 2023 23:20:58 +0000 (15:20 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven small fixes, six in drivers and one in sd.
The sd fix is so large because it changes a struct pointer to a struct
but otherwise is fairly simple"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller
scsi: sd: Fix sshdr use in sd_suspend_common()
scsi: scsi_debug: Delete some bogus error checking
scsi: scsi_debug: Fix some bugs in sdebug_error_write()
scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR
scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1
scsi: qla2xxx: Fix system crash due to bad pointer access
Linus Torvalds [Sat, 18 Nov 2023 23:13:10 +0000 (15:13 -0800)]
Merge tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"On parisc we still sometimes need writeable stacks, e.g. if programs
aren't compiled with gcc-14. To avoid issues with the upcoming
systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now
(for parisc only).
The other two patches are minor: a bugfix for the soft power-off on
qemu with 64-bit kernel and prefer strscpy() over strlcpy():
- Fix power soft-off on qemu
- Disable prctl(PR_SET_MDWE) since parisc sometimes still needs
writeable stacks
- Use strscpy instead of strlcpy in show_cpuinfo()"
* tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
prctl: Disable prctl(PR_SET_MDWE) on parisc
parisc/power: Fix power soft-off when running on qemu
parisc: Replace strlcpy() with strscpy()
Linus Torvalds [Sat, 18 Nov 2023 19:28:28 +0000 (11:28 -0800)]
Merge tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix deadlock arising due to intent items in AIL not being cleared
when log recovery fails
- Fix stale data exposure bug when remapping COW fork extents to data
fork
- Fix deadlock when data device flush fails
- Fix AGFL minimum size calculation
- Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is
selected
- Fix corruption of log inode's extent count field when NREXT64 feature
is enabled
* tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: recovery should not clear di_flushiter unconditionally
xfs: inode recovery does not validate the recovered inode
xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
xfs: fix internal error from AGFL exhaustion
xfs: up(ic_sema) if flushing data device fails
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
XFS: Update MAINTAINERS to catch all XFS documentation
xfs: abort intent items when recovery intents fail
xfs: factor out xfs_defer_pending_abort
Linus Torvalds [Sat, 18 Nov 2023 19:23:32 +0000 (11:23 -0800)]
Merge tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix several long-standing bugs in the duplicate reply cache
- Fix a memory leak
* tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix checksum mismatches in the duplicate reply cache
NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()
NFSD: Update nfsd_cache_append() to use xdr_stream
nfsd: fix file memleak on client_opens_release
Linus Torvalds [Sat, 18 Nov 2023 19:18:46 +0000 (11:18 -0800)]
Merge tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- multichannel fixes (including a lock ordering fix and an important
refcounting fix)
- spnego fix
* tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix lock ordering while disabling multichannel
cifs: fix leak of iface for primary channel
cifs: fix check of rc in function generate_smb3signingkey
cifs: spnego: add ';' in HOST_KEY_LEN
Helge Deller [Sat, 18 Nov 2023 18:33:35 +0000 (19:33 +0100)]
prctl: Disable prctl(PR_SET_MDWE) on parisc
systemd-254 tries to use prctl(PR_SET_MDWE) for it's MemoryDenyWriteExecute
functionality, but fails on parisc which still needs executable stacks in
certain combinations of gcc/glibc/kernel.
Disable prctl(PR_SET_MDWE) by returning -EINVAL for now on parisc, until
userspace has catched up.
Signed-off-by: Helge Deller <deller@gmx.de> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sam James <sam@gentoo.org> Closes: https://github.com/systemd/systemd/issues/29775 Tested-by: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/all/875y2jro9a.fsf@gentoo.org/ Cc: <stable@vger.kernel.org> # v6.3+
Linus Torvalds [Sat, 18 Nov 2023 18:02:16 +0000 (10:02 -0800)]
Merge tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Various fixes for the DM delay target to address regressions
introduced during the 6.7 merge window
- Fixes to both DM bufio and the verity target for no-sleep mode,
to address sleeping while atomic issues
- Update DM crypt target in response to the treewide change that
made MAX_ORDER inclusive
* tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt: start allocating with MAX_ORDER
dm-verity: don't use blocking calls from tasklets
dm-bufio: fix no-sleep mode
dm-delay: avoid duplicate logic
dm-delay: fix bugs introduced by kthread mode
dm-delay: fix a race between delay_presuspend and delay_bio
Kees Cook [Thu, 16 Nov 2023 19:13:40 +0000 (11:13 -0800)]
parisc: Replace strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed
the destination size limit. This is both inefficient and can lead
to linear read overflows if a source string is not NUL-terminated[1].
Additionally, it returns the size of the source string, not the
resulting size of the destination string. In an effort to remove strlcpy()
completely[2], replace strlcpy() here with strscpy().
Linus Torvalds [Sat, 18 Nov 2023 17:44:14 +0000 (09:44 -0800)]
Merge tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Revert a not-working conversion to generic recovery for PXA,
use proper IO accessors for designware, and use proper PM level
for ocores to allow accessing interrupt providers late"
* tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: Move system PM hooks to the NOIRQ phase
i2c: designware: Fix corrupted memory seen in the ISR
Revert "i2c: pxa: move to generic GPIO recovery"
Linus Torvalds [Sat, 18 Nov 2023 17:09:17 +0000 (09:09 -0800)]
Merge tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Turbostat features are now table-driven (Rui Zhang)
- Add support for some new platforms (Sumeet Pawnikar, Rui Zhang)
- Gracefully run in configs when CPUs are limited (Rui Zhang, Srinivas
Pandruvada)
- misc minor fixes
[ This came in during the merge window, but sorting out the signed tag
took a while, so thus the late merge - Linus ]
* tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (86 commits)
tools/power turbostat: version 2023.11.07
tools/power/turbostat: bugfix "--show IPC"
tools/power/turbostat: Add initial support for LunarLake
tools/power/turbostat: Add initial support for ArrowLake
tools/power/turbostat: Add initial support for GrandRidge
tools/power/turbostat: Add initial support for SierraForest
tools/power/turbostat: Add initial support for GraniteRapids
tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features
tools/power/turbostat: Move process to root cgroup
tools/power/turbostat: Handle cgroup v2 cpu limitation
tools/power/turbostat: Abstrct function for parsing cpu string
tools/power/turbostat: Handle offlined CPUs in cpu_subset
tools/power/turbostat: Obey allowed CPUs for system summary
tools/power/turbostat: Obey allowed CPUs for primary thread/core detection
tools/power/turbostat: Abstract several functions
tools/power/turbostat: Obey allowed CPUs during startup
tools/power/turbostat: Obey allowed CPUs when accessing CPU counters
tools/power/turbostat: Introduce cpu_allowed_set
tools/power/turbostat: Remove PC7/PC9 support on ADL/RPL
tools/power/turbostat: Enable MSR_CORE_C1_RES on recent Intel client platforms
...
Linus Torvalds [Fri, 17 Nov 2023 22:36:58 +0000 (14:36 -0800)]
Merge tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of small fixes for minor nits and compiler warnings.
Bigger items:
- The six locks lost wakeup is finally fixed: six_read_trylock() was
checking for the waiting bit before decrementing the number of
readers - validated the fix with a torture test.
- Fix for a memory reclaim issue: when needing to reallocate a key
cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL
dance.
- Multiple deleted inodes btree fixes
- Fix an issue in fsck, where i_nlink would be recalculated
incorrectly for hardlinked files if a snapshot had ever been taken.
- Kill journal pre-reservations: This is a bigger patch than I would
normally send at this point, but it deletes code and it fixes some
of our tests that would sporadically die with the journal getting
stuck, and it's a performance improvement, too"
* tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Fix missing locking for dentry->d_parent access
bcachefs: six locks: Fix lost wakeup
bcachefs: Fix no_data_io mode checksum check
bcachefs: Fix bch2_check_nlinks() for snapshots
bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y
bcachefs: Disable debug log statements
bcachefs: Fix missing transaction commit
bcachefs: Fix error path in bch2_mount()
bcachefs: Fix potential sleeping during mount
bcachefs: Fix iterator leak in may_delete_deleted_inode()
bcachefs: Kill journal pre-reservations
bcachefs: Check for nonce offset inconsistency in data_update path
bcachefs: Make sure to drop/retake btree locks before reclaim
bcachefs: btree_trans->write_locked
bcachefs: Run btree key cache shrinker less aggressively
bcachefs: Split out btree_key_cache_types.h
bcachefs: Guard against insufficient devices to create stripes
bcachefs: Fix null ptr deref in bch2_backpointer_get_node()
bcachefs: Fix multiple -Warray-bounds warnings
bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds warnings
...
Linus Torvalds [Fri, 17 Nov 2023 22:19:46 +0000 (14:19 -0800)]
Merge tag 'mm-hotfixes-stable-2023-11-17-14-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Thirteen hotfixes. Seven are cc:stable and the remainder pertain to
post-6.6 issues or aren't considered suitable for backporting"
* tag 'mm-hotfixes-stable-2023-11-17-14-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: more ptep_get() conversion
parisc: fix mmap_base calculation when stack grows upwards
mm/damon/core.c: avoid unintentional filtering out of schemes
mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors
mm/damon/sysfs-schemes: handle tried region directory allocation failure
mm/damon/sysfs-schemes: handle tried regions sysfs directory allocation failure
mm/damon/sysfs: check error from damon_sysfs_update_target()
mm: fix for negative counter: nr_file_hugepages
selftests/mm: add hugetlb_fault_after_madv to .gitignore
selftests/mm: restore number of hugepages
selftests: mm: fix some build warnings
selftests: mm: skip whole test instead of failure
mm/damon/sysfs: eliminate potential uninitialized variable warning
Linus Torvalds [Fri, 17 Nov 2023 22:08:14 +0000 (14:08 -0800)]
Merge tag 'block-6.7-2023-11-17' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
"Just a single fix from Christoph/Ming, fixing a case where integrity
IO could be called without having an appropriate queue reference"
* tag 'block-6.7-2023-11-17' of git://git.kernel.dk/linux:
blk-mq: make sure active queue usage is held for bio_integrity_prep()
Linus Torvalds [Fri, 17 Nov 2023 22:03:18 +0000 (14:03 -0800)]
Merge tag 'io_uring-6.7-2023-11-17' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single fixup for a change we made in this release, which caused
a regression in sometimes missing fdinfo output if the SQPOLL thread
had the lock held when fdinfo output was retrieved.
This brings us back on par with what we had before, where just the
main uring_lock will prevent that output. We'd love to get rid of that
too, but that is beyond the scope of this release and will have to
wait for 6.8"
* tag 'io_uring-6.7-2023-11-17' of git://git.kernel.dk/linux:
io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
Linus Torvalds [Fri, 17 Nov 2023 21:58:26 +0000 (13:58 -0800)]
Merge tag 'drm-fixes-2023-11-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"This is a 'blast from the bast' fixes pull, because it contains a
bunch of AGP fixes for amdgpu. Otherwise nothing out of the ordinary.
Next week is back to Dave unless he's knocked out by some conference
bug.
- amdgpu: fixes all over, including a set of AGP fixes
- nouvea: GSP + other bugfixes
- ivpu build fix
- lenovo legion go panel orientation quirk"
* tag 'drm-fixes-2023-11-17' of git://anongit.freedesktop.org/drm/drm: (30 commits)
drm/amdgpu/gmc9: disable AGP aperture
drm/amdgpu/gmc10: disable AGP aperture
drm/amdgpu/gmc11: disable AGP aperture
drm/amdgpu: add a module parameter to control the AGP aperture
drm/amdgpu/gmc11: fix logic typo in AGP check
drm/amd/display: Fix encoder disable logic
drm/amd/display: Change the DMCUB mailbox memory location from FB to inbox
drm/amdgpu: add and populate the port num into xgmi topology info
drm/amd/display: Negate IPS allow and commit bits
drm/amd/pm: Don't send unload message for reset
drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.c
drm/amd/display: Clear dpcd_sink_ext_caps if not set
drm/amd/display: Enable fast plane updates on DCN3.2 and above
drm/amd/display: fix NULL dereference
drm/amd/display: fix a NULL pointer dereference in amdgpu_dm_i2c_xfer()
drm/amd/display: Add null checks for 8K60 lightup
drm/amd/pm: Fill pcie error counters for gpu v1_4
drm/amd/pm: Update metric table for smu v13_0_6
drm/amdgpu: correct chunk_ptr to a pointer to chunk.
drm/amd/display: Fix DSC not Enabled on Direct MST Sink
...
Chuck Lever [Fri, 10 Nov 2023 16:28:45 +0000 (11:28 -0500)]
NFSD: Fix checksum mismatches in the duplicate reply cache
nfsd_cache_csum() currently assumes that the server's RPC layer has
been advancing rq_arg.head[0].iov_base as it decodes an incoming
request, because that's the way it used to work. On entry, it
expects that buf->head[0].iov_base points to the start of the NFS
header, and excludes the already-decoded RPC header.
These days however, head[0].iov_base now points to the start of the
RPC header during all processing. It no longer points at the NFS
Call header when execution arrives at nfsd_cache_csum().
In a retransmitted RPC the XID and the NFS header are supposed to
be the same as the original message, but the contents of the
retransmitted RPC header can be different. For example, for krb5,
the GSS sequence number will be different between the two. Thus if
the RPC header is always included in the DRC checksum computation,
the checksum of the retransmitted message might not match the
checksum of the original message, even though the NFS part of these
messages is identical.
The result is that, even if a matching XID is found in the DRC,
the checksum mismatch causes the server to execute the
retransmitted RPC transaction again.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 10 Nov 2023 16:28:33 +0000 (11:28 -0500)]
NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()
The "statp + 1" pointer that is passed to nfsd_cache_update() is
supposed to point to the start of the egress NFS Reply header. In
fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
But both krb5i and krb5p add fields between the RPC header's
accept_stat field and the start of the NFS Reply header. In those
cases, "statp + 1" points at the extra fields instead of the Reply.
The result is that nfsd_cache_update() caches what looks to the
client like garbage.
A connection break can occur for a number of reasons, but the most
common reason when using krb5i/p is a GSS sequence number window
underrun. When an underrun is detected, the server is obliged to
drop the RPC and the connection to force a retransmit with a fresh
GSS sequence number. The client presents the same XID, it hits in
the server's DRC, and the server returns the garbage cache entry.
The "statp + 1" argument has been used since the oldest changeset
in the kernel history repo, so it has been in nfsd_dispatch()
literally since before history began. The problem arose only when
the server-side GSS implementation was added twenty years ago.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 10 Nov 2023 16:28:39 +0000 (11:28 -0500)]
NFSD: Update nfsd_cache_append() to use xdr_stream
When inserting a DRC-cached response into the reply buffer, ensure
that the reply buffer's xdr_stream is updated properly. Otherwise
the server will send a garbage response.
Cc: stable@vger.kernel.org # v6.3+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Mahmoud Adam [Fri, 10 Nov 2023 18:21:04 +0000 (19:21 +0100)]
nfsd: fix file memleak on client_opens_release
seq_release should be called to free the allocated seq_file
Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Mahmoud Adam <mngyadam@amazon.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Mikulas Patocka [Fri, 17 Nov 2023 17:38:33 +0000 (18:38 +0100)]
dm-crypt: start allocating with MAX_ORDER
Commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely")
changed the meaning of MAX_ORDER from exclusive to inclusive. So, we
can allocate compound pages with up to 1 << MAX_ORDER pages.
Reflect this change in dm-crypt and start trying to allocate compound
pages with MAX_ORDER.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Fri, 17 Nov 2023 17:37:25 +0000 (18:37 +0100)]
dm-verity: don't use blocking calls from tasklets
The commit 5721d4e5a9cd enhanced dm-verity, so that it can verify blocks
from tasklets rather than from workqueues. This reportedly improves
performance significantly.
However, dm-verity was using the flag CRYPTO_TFM_REQ_MAY_SLEEP from
tasklets which resulted in warnings about sleeping function being called
from non-sleeping context.
This commit fixes dm-verity so that it doesn't use the flags
CRYPTO_TFM_REQ_MAY_SLEEP and CRYPTO_TFM_REQ_MAY_BACKLOG from tasklets. The
crypto API would do GFP_ATOMIC allocation instead, it could return -ENOMEM
and we catch -ENOMEM in verity_tasklet and requeue the request to the
workqueue.
Mikulas Patocka [Fri, 17 Nov 2023 17:36:34 +0000 (18:36 +0100)]
dm-bufio: fix no-sleep mode
dm-bufio has a no-sleep mode. When activated (with the
DM_BUFIO_CLIENT_NO_SLEEP flag), the bufio client is read-only and we
could call dm_bufio_get from tasklets. This is used by dm-verity.
Unfortunately, commit 450e8dee51aa ("dm bufio: improve concurrent IO
performance") broke this and the kernel would warn that cache_get()
was calling down_read() from no-sleeping context. The bug can be
reproduced by using "veritysetup open" with the "--use-tasklets"
flag.
This commit fixes dm-bufio, so that the tasklet mode works again, by
expanding use of the 'no_sleep_enabled' static_key to conditionally
use either a rw_semaphore or rwlock_t (which are colocated in the
buffer_tree structure using a union).
Mikulas Patocka [Fri, 17 Nov 2023 17:24:04 +0000 (18:24 +0100)]
dm-delay: avoid duplicate logic
This is small refactoring of dm-delay - we avoid duplicate logic in
flush_delayed_bios and flush_delayed_bios_fast and join these two
functions into one.
We also add cond_resched() to flush_delayed_bios because the list may have
unbounded number of entries.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>