Siddharth Vadapalli [Wed, 4 Jan 2023 10:34:32 +0000 (16:04 +0530)]
net: ethernet: ti: am65-cpsw: Add support for SERDES configuration
Use PHY framework APIs to initialize the SERDES PHY connected to CPSW MAC.
Define the functions am65_cpsw_disable_phy(), am65_cpsw_enable_phy(),
am65_cpsw_disable_serdes_phy() and am65_cpsw_enable_serdes_phy().
Add new member "serdes_phy" to struct "am65_cpsw_slave_data" to store the
SERDES PHY for each port, if it exists. Use it later while disabling the
SERDES PHY for each port.
Power on and initialize the SerDes PHY in am65_cpsw_nuss_init_slave_ports()
by invoking am65_cpsw_enable_serdes_phy().
Power off the SerDes PHY in am65_cpsw_nuss_remove() by invoking
am65_cpsw_disable_serdes_phy().
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Siddharth Vadapalli [Wed, 4 Jan 2023 10:34:30 +0000 (16:04 +0530)]
dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J721e CPSW9G support
Update bindings for TI K3 J721e SoC which contains 9 ports (8 external
ports) CPSW9G module and add compatible for it.
Changes made:
- Add new compatible ti,j721e-cpswxg-nuss for CPSW9G.
- Extend pattern properties for new compatible.
- Change maximum number of CPSW ports to 8 for new compatible.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 5 Jan 2023 04:21:25 +0000 (20:21 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2023-01-04
We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).
The main changes are:
1) Fixes, improvements and refactoring of parts of BPF verifier's
state equivalence checks, from Andrii Nakryiko.
2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
around padding handling and enums, also from Andrii Nakryiko.
3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
support decap on GRE tunnel devices not operating in collect metadata,
from Christian Ehrig.
4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
from Dave Marchevsky.
5) Remove the need for trace_printk_lock for bpf_trace_printk
and bpf_trace_vprintk helpers, from Jiri Olsa.
6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
from Maryam Tahhan.
7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.
8) Bigger batch of improvements to BPF tracing code samples,
from Daniel T. Lee.
9) Add LoongArch support to libbpf's bpf_tracing helper header,
from Hengqi Chen.
10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
from Khem Raj.
11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
from Martin KaFai Lau.
12) Use pkg-config to locate libelf for resolve_btfids build,
from Shen Jiamin.
13) Various libbpf improvements around API documentation and errno
handling, from Xin Liu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
libbpf: Return -ENODATA for missing btf section
libbpf: Add LoongArch support to bpf_tracing.h
libbpf: Restore errno after pr_warn.
libbpf: Added the description of some API functions
libbpf: Fix invalid return address register in s390
samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
samples/bpf: Change _kern suffix to .bpf with syscall tracing program
samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
samples/bpf: Use kyscall instead of kprobe in syscall tracing program
bpf: rename list_head -> graph_root in field info types
libbpf: fix errno is overwritten after being closed.
bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
bpf: perform byte-by-byte comparison only when necessary in regsafe()
bpf: reject non-exact register type matches in regsafe()
bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
bpf: reorganize struct bpf_reg_state fields
bpf: teach refsafe() to take into account ID remapping
bpf: Remove unused field initialization in bpf's ctl_table
selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
...
====================
David S. Miller [Wed, 4 Jan 2023 08:57:24 +0000 (08:57 +0000)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-03 (igc)
Muhammad Husaini Zulkifli says:
Improvements to the Time-Sensitive Networking (TSN) Qbv Scheduling
capabilities were included in this patch series for I226 SKU.
An overview of each patch series is given below:
Patch 1: To enable basetime scheduling in the future, remove the existing
restriction for i226 stepping while maintain the restriction for i225.
Patch 2: Remove the restriction which require a controller reset when
setting the basetime register for new i226 steps and enable the second
GCL configuration.
Patch 3: Remove the power reset adapter during disabling the tsn config.
---
Patches remaining from initial PR:
https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/
after sending net patches:
https://lore.kernel.org/netdev/20221215230758.3595578-1-anthony.l.nguyen@intel.com/
Note: patch 3 is an additional patch from the initial PR.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Muhammad Husaini Zulkifli [Wed, 14 Dec 2022 16:29:09 +0000 (00:29 +0800)]
igc: Remove reset adapter task for i226 during disable tsn config
I225 have limitation when programming the BaseTime register which required
a power cycle of the controller. This limitation already lifted in I226.
This patch removes the restriction so that when user configure/remove any
TSN mode, it would not go into power cycle reset adapter.
How to test:
Schedule any gate control list configuration or delete it.
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tan Tee Min [Wed, 14 Dec 2022 16:29:08 +0000 (00:29 +0800)]
igc: enable Qbv configuration for 2nd GCL
Make reset task only executes for i225 and Qbv disabling to allow
i226 configure for 2nd GCL without resetting the adapter.
In i226, Tx won't hang if there is a GCL is already running, so in
this case we don't need to set FutScdDis bit.
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Muhammad Husaini Zulkifli [Wed, 14 Dec 2022 16:29:07 +0000 (00:29 +0800)]
igc: remove I226 Qbv BaseTime restriction
Remove the Qbv BaseTime restriction for I226 so that the BaseTime can be
scheduled to the future time. A new register bit of Tx Qav Control
(Bit-7: FutScdDis) was introduced to allow I226 scheduling future time as
Qbv BaseTime and not having the Tx hang timeout issue.
Besides, according to datasheet section 7.5.2.9.3.3, FutScdDis bit has to
be configured first before the cycle time and base time.
Indeed the FutScdDis bit is only active on re-configuration, thus we have
to set the BASET_L to zero and then only set it to the desired value.
Please also note that the Qbv configuration flow is moved around based on
the Qbv programming guideline that is documented in the latest datasheet.
Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Xin Liu [Sat, 24 Dec 2022 11:20:58 +0000 (19:20 +0800)]
libbpf: Added the description of some API functions
Currently, many API functions are not described in the document.
Add add API description of the following four API functions:
- libbpf_set_print;
- bpf_object__open;
- bpf_object__load;
- bpf_object__close.
Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects. This commit
enhances syscall tracing program with the following instruments.
In this patchset, ksyscall is used instead of kprobe. By using
ksyscall, libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.
Also ksyscall provides more fine method with attaching system call, the
coarse SYSCALL helper at trace_common.h can be removed.
Next, BPF_SYSCALL is used to reduce the inconvenience of parsing
arguments. Since the nature of SYSCALL_WRAPPER function wraps the
argument once, additional process of argument extraction is required
to properly parse the argument. The BPF_SYSCALL macro will reduces the
hassle of parsing arguments from pt_regs.
Lastly, vmlinux.h is applied to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.
Additionally, this patchset changes the suffix of _kern to .bpf to make
use of the new compile rule (CLANG-BPF) which is more simple and neat.
By just changing the _kern suffix to .bpf will inherit the benefit of
the new CLANG-BPF compile target.
Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
This will fix the compiling problem with 'clang -target bpf'.
To fix the build error with the s390x, this patchset also includes the
fix of libbpf invalid return address register mapping in s390.
---
Changes in V2:
- add gnu/stub.h hack to fix compile error with 'clang -target bpf'
Changes in V3:
- fix libbpf invalid return address register mapping in s390
====================
Daniel T. Lee [Sat, 24 Dec 2022 07:15:27 +0000 (16:15 +0900)]
libbpf: Fix invalid return address register in s390
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.
Daniel T. Lee [Sat, 24 Dec 2022 07:15:26 +0000 (16:15 +0900)]
samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
This commit enhances the syscall tracing programs by using the
BPF_SYSCALL macro to reduce the inconvenience of parsing arguments from
pt_regs. By simplifying argument extraction, bpf program will become
clear to understand.
Daniel T. Lee [Sat, 24 Dec 2022 07:15:25 +0000 (16:15 +0900)]
samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
Currently, there is a problem with tracex2, as it doesn't print the
histogram properly and the results are misleading. (all results report
as 0)
The problem is caused by a change in arguments of the function to which
the kprobe connects. This tracex2 bpf program uses kprobe (attached
to __x64_sys_write) to figure out the size of the write system call. In
order to achieve this, the third argument 'count' must be intact.
The following is a prototype of the sys_write variant. (checked with
pfunct)
~/git/linux$ pfunct -P fs/read_write.o | grep sys_write
ssize_t ksys_write(unsigned int fd, const char * buf, size_t count);
long int __x64_sys_write(const struct pt_regs * regs);
... cross compile with s390x ...
long int __s390_sys_write(struct pt_regs * regs);
Since the nature of SYSCALL_WRAPPER function wraps the argument once,
additional process of argument extraction is required to properly parse
the argument.
In order to fix this problem, the BPF_SYSCALL macro has been used. This
reduces the hassle of parsing arguments from pt_regs. Since the macro
uses the CORE version of argument extraction, additional portability
comes too.
Daniel T. Lee [Sat, 24 Dec 2022 07:15:24 +0000 (16:15 +0900)]
samples/bpf: Change _kern suffix to .bpf with syscall tracing program
Currently old compile rule (CLANG-bpf) doesn't contains VMLINUX_H define
flag which is essential for the bpf program that includes "vmlinux.h".
Also old compile rule doesn't directly specify the compile target as bpf,
instead it uses bunch of extra options with clang followed by long chain
of commands. (e.g. clang | opt | llvm-dis | llc)
In Makefile, there is already new compile rule which is more simple and
neat. And it also has -D__VMLINUX_H__ option. By just changing the _kern
suffix to .bpf will inherit the benefit of the new CLANG-BPF compile
target.
Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
As commit 1c2dd16add7e ("selftests/bpf: get rid of -D__x86_64__") noted,
compiling with 'clang -target bpf' will raise an error with stubs.h
unless workaround (-D__x86_64) is used. This commit solves this problem
by adding dummy stub.h to make /usr/include/features.h to follow the
expected path as the same way selftests/bpf dealt with.
Daniel T. Lee [Sat, 24 Dec 2022 07:15:23 +0000 (16:15 +0900)]
samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
This commit applies vmlinux.h to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.
Daniel T. Lee [Sat, 24 Dec 2022 07:15:22 +0000 (16:15 +0900)]
samples/bpf: Use kyscall instead of kprobe in syscall tracing program
Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects.
In this commit, ksyscall is used instead of kprobe. By using ksyscall,
libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.
In addition, ksyscall provides more fine method with attaching system
call, the coarse SYSCALL helper at trace_common.h can be removed.
Dave Marchevsky [Sat, 17 Dec 2022 08:24:57 +0000 (00:24 -0800)]
bpf: rename list_head -> graph_root in field info types
Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:
include/linux/bpf.h:
struct btf_field_list_head -> struct btf_field_graph_root
list_head -> graph_root in struct btf_field union
kernel/bpf/btf.c:
list_head -> graph_root in struct btf_field_info
This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.
Xin Liu [Fri, 23 Dec 2022 13:36:18 +0000 (21:36 +0800)]
libbpf: fix errno is overwritten after being closed.
In the ensure_good_fd function, if the fcntl function succeeds but
the close function fails, ensure_good_fd returns a normal fd and
sets errno, which may cause users to misunderstand. The close
failure is not a serious problem, and the correct FD has been
handed over to the upper-layer application. Let's restore errno here.
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:20 +0000 (21:49 -0800)]
bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
Comparing IDs exactly between two separate states is not just
suboptimal, but also incorrect in some cases. So update regs_exact()
check to do byte-by-byte memcmp() only up to id/ref_obj_id. For id and
ref_obj_id perform proper check_ids() checks, taking into account idmap.
This change makes more states equivalent improving insns and states
stats across a bunch of selftest BPF programs:
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:19 +0000 (21:49 -0800)]
bpf: perform byte-by-byte comparison only when necessary in regsafe()
Extract byte-by-byte comparison of bpf_reg_state in regsafe() into
a helper function, which makes it more convenient to use it "on demand"
only for registers that benefit from such checks, instead of doing it
all the time, even if result of such comparison is ignored.
Also, remove WARN_ON_ONCE(1)+return false dead code. There is no risk of
missing some case as compiler will warn about non-void function not
returning value in some branches (and that under assumption that default
case is removed in the future).
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:18 +0000 (21:49 -0800)]
bpf: reject non-exact register type matches in regsafe()
Generalize the (somewhat implicit) rule of regsafe(), which states that
if register types in old and current states do not match *exactly*, they
can't be safely considered equivalent.
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:17 +0000 (21:49 -0800)]
bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
Make generic check to prevent XXX_OR_NULL and XXX register types to be
intermixed. While technically in some situations it could be safe, it's
impossible to enforce due to the loss of an ID when converting
XXX_OR_NULL to its non-NULL variant. So prevent this in general, not
just for PTR_TO_MAP_KEY and PTR_TO_MAP_VALUE.
PTR_TO_MAP_KEY_OR_NULL and PTR_TO_MAP_VALUE_OR_NULL checks, which were
previously special-cased, are simplified to generic check that takes
into account range_within() and tnum_in(). This is correct as BPF
verifier doesn't allow arithmetic on XXX_OR_NULL register types, so
var_off and ranges should stay zero. But even if in the future this
restriction is lifted, it's even more important to enforce that var_off
and ranges are compatible, otherwise it's possible to construct case
where this can be exploited to bypass verifier's memory range safety
checks.
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:16 +0000 (21:49 -0800)]
bpf: reorganize struct bpf_reg_state fields
Move id and ref_obj_id fields after scalar data section (var_off and
ranges). This is necessary to simplify next patch which will change
regsafe()'s logic to be safer, as it makes the contents that has to be
an exact match (type-specific parts, off, type, and var_off+ranges)
a single sequential block of memory, while id and ref_obj_id should
always be remapped and thus can't be memcp()'ed.
There are few places that assume that var_off is after id/ref_obj_id to
clear out id/ref_obj_id with the single memset(0). These are changed to
explicitly zero-out id/ref_obj_id fields. Other places are adjusted to
preserve exact byte-by-byte comparison behavior.
Andrii Nakryiko [Fri, 23 Dec 2022 05:49:15 +0000 (21:49 -0800)]
bpf: teach refsafe() to take into account ID remapping
states_equal() check performs ID mapping between old and new states to
establish a 1-to-1 correspondence between IDs, even if their absolute
numberic values across two equivalent states differ. This is important
both for correctness and to avoid unnecessary work when two states are
equivalent.
With recent changes we partially fixed this logic by maintaining ID map
across all function frames. This patch also makes refsafe() check take
into account (and maintain) ID map, making states_equal() behavior more
optimal and correct.
Ricardo Ribalda [Wed, 21 Dec 2022 19:55:29 +0000 (20:55 +0100)]
bpf: Remove unused field initialization in bpf's ctl_table
Maxlen is used by standard proc_handlers such as proc_dointvec(), but in this
case we have our own proc_handler via bpf_stats_handler(). Therefore, remove
the initialization.
Dave Marchevsky [Fri, 16 Dec 2022 21:43:19 +0000 (13:43 -0800)]
selftests/bpf: Add verifier test exercising jit PROBE_MEM logic
This patch adds a test exercising logic that was fixed / improved in
the previous patch in the series, as well as general sanity checking for
jit's PROBE_MEM logic which should've been unaffected by the previous
patch.
The added verifier test does the following:
* Acquire a referenced kptr to struct prog_test_ref_kfunc using
existing net/bpf/test_run.c kfunc
* Helper returns ptr to a specific prog_test_ref_kfunc whose first
two fields - both ints - have been prepopulated w/ vals 42 and
108, respectively
* kptr_xchg the acquired ptr into an arraymap
* Do a direct map_value load of the just-added ptr
* Goal of all this setup is to get an unreferenced kptr pointing to
struct with ints of known value, which is the result of this step
* Using unreferenced kptr obtained in previous step, do loads of
prog_test_ref_kfunc.a (offset 0) and .b (offset 4)
* Then incr the kptr by 8 and load prog_test_ref_kfunc.a again (this
time at offset -8)
* Add all the loaded ints together and return
Before the PROBE_MEM fixes in previous patch, the loads at offset 0 and
4 would succeed, while the load at offset -8 would incorrectly fail
runtime check emitted by the JIT and 0 out dst reg as a result. This
confirmed by retval of 150 for this test before previous patch - since
second .a read is 0'd out - and a retval of 192 with the fixed logic.
The test exercises the two optimizations to fixed logic added in last
patch as well:
* First load, with insn "r8 = *(u32 *)(r9 + 0)" exercises "insn->off
is 0, no need to add / sub from src_reg" optimization
* Third load, with insn "r9 = *(u32 *)(r9 - 8)" exercises "src_reg ==
dst_reg, no need to restore src_reg after load" optimization
Dave Marchevsky [Fri, 16 Dec 2022 21:43:18 +0000 (13:43 -0800)]
bpf, x86: Improve PROBE_MEM runtime load check
This patch rewrites the runtime PROBE_MEM check insns emitted by the BPF
JIT in order to ensure load safety. The changes in the patch fix two
issues with the previous logic and more generally improve size of
emitted code. Paragraphs between this one and "FIX 1" below explain the
purpose of the runtime check and examine the current implementation.
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
address being loaded from is not necessarily valid. The BPF jit sets up
exception handlers for each such load which catch page faults and 0 out
the destination register.
Arbitrary register-relative loads can escape this exception handling
mechanism. Specifically, a load like dst_reg = *(src_reg + off) will not
trigger BPF exception handling if (src_reg + off) is outside of kernel
address space, resulting in an uncaught page fault. A concrete example
of such behavior is a program like:
struct result {
char space[40];
long a;
};
/* if err, returns ERR_PTR(-EINVAL) */
struct result *ptr = get_ptr_maybe_err();
long x = ptr->a;
If get_ptr_maybe_err returns ERR_PTR(-EINVAL) and the result isn't
checked for err, 'result' will be (u64)-EINVAL, a number close to
U64_MAX. The ptr->a load will be > U64_MAX and will wrap over to a small
positive u64, which will be in userspace and thus not covered by BPF
exception handling mechanism.
In order to prevent such loads from occurring, the BPF jit emits some
instructions which do runtime checking of (src_reg + off) and skip the
actual load if it's out of range. As an example, here are instructions
emitted for a %rdi = *(%rdi + 0x10) PROBE_MEM load:
The JIT considers kernel address space to start at MAX_TASK_SIZE +
PAGE_SIZE. Determining whether a load will be outside of kernel address
space should be a simple check:
(src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
But because there is only one spare register when the checking logic is
emitted, this logic is split into two checks:
Check 1: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE - off)
Check 2: src_reg + off doesn't wrap over U64_MAX and result in small pos u64
Emitted insns implementing Checks 1 and 2 are annotated in the above
example. Check 1 can be done with a single spare register since the
source reg by definition is the left-hand-side of the inequality.
Since adding 'off' to both sides of Check 1's inequality results in the
original inequality we want, it's equivalent to testing that inequality.
Except in the case where src_reg + off wraps past U64_MAX, which is why
Check 2 needs to actually add src_reg + off if Check 1 passes - again
using the single spare reg.
FIX 1: The Check 1 inequality listed above is not what current code is
doing. Current code is a bit more pessimistic, instead checking:
src_reg >= (MAX_TASK_SIZE + PAGE_SIZE + abs(off))
The 0x800000000010 in above example is from this current check. If Check
1 was corrected to use the correct right-hand-side, the value would be
0x7ffffffffff0. This patch changes the checking logic more broadly (FIX
2 below will elaborate), fixing this issue as a side-effect of the
rewrite. Regardless, it's important to understand why Check 1 should've
been doing MAX_TASK_SIZE + PAGE_SIZE - off before proceeding.
FIX 2: Current code relies on a 'jnc' to determine whether src_reg + off
addition wrapped over. For negative offsets this logic is incorrect.
Consider Check 2 insns emitted when off = -0x10:
2's complement representation of -0x10 is a large positive u64. Any
value of src_reg that passes Check 1 will result in carry flag being set
after (src_reg + off) addition. So a load with any negative offset will
always fail Check 2 at runtime and never do the actual load. This patch
fixes the negative offset issue by rewriting both checks in order to not
rely on carry flag.
The rewrite takes advantage of the fact that, while we only have one
scratch reg to hold arbitrary values, we know the offset at JIT time.
This we can use src_reg as a temporary scratch reg to hold src_reg +
offset since we can return it to its original value by later subtracting
offset. As a result we can directly check the original inequality we
care about:
(src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
For a load like %rdi = *(%rsi + -0x10), this results in emitted code:
43: movabs $0x800000000000,%r11
4d: add $0xfffffffffffffff0,%rsi --- src_reg += off
54: cmp %r11,%rsi --- Check original inequality
57: jae 0x000000000000005d
59: xor %edi,%edi
5b: jmp 0x0000000000000061
5d: mov 0x0(%rdi),%rsi --- Actual Load
61: sub $0xfffffffffffffff0,%rsi --- src_reg -= off
Note that the actual load is always done with offset 0, since previous
insns have already done src_reg += off. Regardless of whether the new
check succeeds or fails, insn 61 is always executed, returning src_reg
to its original value.
Because the goal of these checks is to ensure that loaded-from address
will be protected by BPF exception handler, the new check can safely
ignore any wrapover from insn 4d. If such wrapped-over address passes
insn 54 + 57's cmp-and-jmp it will have such protection so the load can
proceed.
IMPROVEMENTS: The above improved logic is 8 insns vs original logic's 9,
and has 1 fewer jmp. The number of checking insns can be further
improved in common scenarios:
If src_reg == dst_reg, the actual load insn will clobber src_reg, so
there's no original src_reg state for the sub insn immediately following
the load to restore, so it can be omitted. In fact, it must be omitted
since it would incorrectly subtract from the result of the load if it
wasn't. So for src_reg == dst_reg, JIT emits these insns:
The only difference from larger example being the omitted sub, which
would've been insn 5a in this example.
If offset == 0, we can similarly omit the sub as in previous case, since
there's nothing added to subtract. For the same reason we can omit the
addition as well, resulting in JIT emitting these insns:
(Above counts don't include the 1 load insn, just checking around it)
Based on BPF bytecode + JITted x86 insn I saw while experimenting with
these improvements, I expect the src_reg == dst_reg case to occur most
often, followed by offset == 0, then the general case.
Linus Torvalds [Wed, 21 Dec 2022 16:13:01 +0000 (08:13 -0800)]
Merge tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfsuid cleanup from Christian Brauner:
"This moves the ima specific vfs{g,u}id_t comparison helpers out of the
header and into the one file in ima where they are used.
We shouldn't incentivize people to use them by placing them into the
header. As discussed and suggested by Linus in [1] let's just define
them locally in the one file in ima where they are used"
Linus Torvalds [Wed, 21 Dec 2022 16:02:30 +0000 (08:02 -0800)]
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"Two remaining changes that are now possible after you merged a few
other trees:
- #include <asm/archrandom.h> can be removed from random.h now,
making the direct use of the arch_random_* API more of a private
implementation detail between the archs and random.c, rather than
something for general consumers.
- Two additional uses of prandom_u32_max() snuck in during the
initial phase of pulls, so these have been converted to
get_random_u32_below(), and now the deprecated prandom_u32_max()
alias -- which was just a wrapper around get_random_u32_below() --
can be removed.
In addition, there is one fix:
- Check efi_rt_services_supported() before attempting to use an EFI
runtime function.
This affected EFI systems that disable runtime services yet still
boot via EFI (e.g. the reporter's Lenovo Thinkpad X13s laptop), as
well systems where EFI runtime services have been forcibly
disabled, such as on PREEMPT_RT.
On those machines, a very early and hard to diagnose crash would
happen, preventing boot"
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove prandom_u32_max()
efi: random: fix NULL-deref when refreshing seed
random: do not include <asm/archrandom.h> from random.h
Linus Torvalds [Wed, 21 Dec 2022 15:59:57 +0000 (07:59 -0800)]
Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This fixes a lockdep false positive in synchronize_rcu() that can
otherwise occur during early boot.
The fix simply avoids invoking lockdep if the scheduler has not yet
been initialized, that is, during that portion of boot when interrupts
are disabled"
* tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Don't assert interrupts enabled too early in boot
Martin KaFai Lau [Wed, 21 Dec 2022 01:30:36 +0000 (17:30 -0800)]
bpf: Reduce smap->elem_size
'struct bpf_local_storage_elem' has an unused 56 byte padding at the
end due to struct's cache-line alignment requirement. This padding
space is overlapped by storage value contents, so if we use sizeof()
to calculate the total size, we overinflate it by 56 bytes. Use
offsetof() instead to calculate more exact memory use.
Changbin Du [Sat, 17 Dec 2022 22:35:08 +0000 (06:35 +0800)]
libbpf: Show error info about missing ".BTF" section
Show the real problem instead of just saying "No such file or directory".
Now will print below info:
libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
Khem Raj [Mon, 19 Dec 2022 19:15:26 +0000 (11:15 -0800)]
libbpf: Fix build warning on ref_ctr_off for 32-bit architectures
Clang warns on 32-bit ARM on this comparision:
libbpf.c:10497:18: error: result of comparison of constant 4294967296 with expression of type 'size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Typecast ref_ctr_off to __u64 in the check conditional, it is false on
32bit anyways.
Wei Fang [Mon, 19 Dec 2022 02:27:55 +0000 (10:27 +0800)]
net: fec: check the return value of build_skb()
The build_skb might return a null pointer but there is no check on the
return value in the fec_enet_rx_queue(). So a null pointer dereference
might occur. To avoid this, we check the return value of build_skb. If
the return value is a null pointer, the driver will recycle the page and
update the statistic of ndev. Then jump to rx_processing_done to clear
the status flags of the BD so that the hardware can recycle the BD.
Fixes: 95698ff6177b ("net: fec: using page pool to manage RX buffers") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 20 Dec 2022 14:53:16 +0000 (08:53 -0600)]
Merge tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX/License additions from Greg KH:
"Here are two small updates for LICENSES and some kernel files that add
the Copyleft-next license and use it in a SPDX tag as a dual-license
for some kernel files.
These have been discussed thoroughly in public on the linux-spdx
mailing list, and have the needed acks on them, as well as having been
in linux-next with no reported issues for quite some time"
* tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
testing: use the copyleft-next-0.3.1 SPDX tag
LICENSES: Add the copyleft-next-0.3.1 license
Linus Torvalds [Tue, 20 Dec 2022 14:48:24 +0000 (08:48 -0600)]
Merge tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
"This is mostly a treewide clean-up from Krzysztof. There's also a
couple of fixes and things that fell thru the cracks.
I must say this has been a nice merge window without bindings dumped
in at the last minute introducing warnings.
Summary:
- Treewide dropping of redundant 'binding' or 'schema' from schema
titles. This will be followed up with a automated check to catch
these.
- Re-sort vendor-prefies
- Convert GPIO based watchdog to schema
- Handle all the variations for clocks, resets, power domains in i.MX
PCIe binding
- Document missing 'power-domains' property in mxsfb
- Fix error with path references in Tegra XUSB example
- Honor CONFIG_CMDLINE* even without /chosen node"
* tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: drop redundant part of title (manual)
dt-bindings: clock: drop redundant part of title
dt-bindings: drop redundant part of title (beginning)
dt-bindings: drop redundant part of title (end, part three)
dt-bindings: drop redundant part of title (end, part two)
dt-bindings: drop redundant part of title (end)
dt-bindings: clock: st,stm32mp1-rcc: add proper title
dt-bindings: memory-controllers: ti,gpmc-child: drop redundant part of title
dt-bindings: drop redundant part of title of shared bindings
dt-bindings: watchdog: gpio: Convert bindings to YAML
dt-bindings: imx6q-pcie: Handle more resets on legacy platforms
dt-bindings: imx6q-pcie: Handle various PD configurations
dt-bindings: imx6q-pcie: Handle various clock configurations
dt-bindings: hwmon: ntc-thermistor: drop Naveen Krishna Chatradhi from maintainers
dt-bindings: mxsfb: Document i.MX8M/i.MX6SX/i.MX6SL power-domains property
dt-bindings: vendor-prefixes: sort entries alphabetically
dt-bindings: usb: tegra-xusb: Remove path references
of: fdt: Honor CONFIG_CMDLINE* even without /chosen node
Linus Torvalds [Tue, 20 Dec 2022 14:43:53 +0000 (08:43 -0600)]
Merge tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"There is one noteable patch, which allows the parisc kernel to use the
same MADV_xxx constants as the other architectures going forward. With
that change only alpha has one entry left (MADV_DONTNEED is 6 vs 4 on
others) which is different. To prevent an ABI breakage, a wrapper is
included which translates old MADV values to the new ones, so existing
userspace isn't affected. Reason for that patch is, that some
applications wrongly used the standard MADV_xxx values even on some
non-x86 platforms and as such those programs failed to run correctly
on parisc (examples are qemu-user, tor browser and boringssl).
Then the kgdb console and the LED code received some fixes, and some
0-day warnings are now gone. Finally, the very last compile warning
which was visible during a kernel build is now fixed too (in the vDSO
code).
The majority of the patches are tagged for stable series and in
summary this patchset is quite small and drops more code than it adds:
Fixes:
- Fix potential null-ptr-deref in start_task()
- Fix kgdb console on serial port
- Add missing FORCE prerequisites in Makefile
- Drop PMD_SHIFT from calculation in pgtable.h
Enhancements:
- Implement a wrapper to align madvise() MADV_* constants with other
architectures
- If machine supports running MPE/XL, show the MPE model string
Cleanups:
- Drop duplicate kgdb console code
- Indenting fixes in setup_cmdline()"
* tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Show MPE/iX model string at bootup
parisc: Add missing FORCE prerequisites in Makefile
parisc: Move pdc_result struct to firmware.c
parisc: Drop locking in pdc console code
parisc: Drop duplicate kgdb_pdc console
parisc: Fix locking in pdc_iodc_print() firmware call
parisc: Drop PMD_SHIFT from calculation in pgtable.h
parisc: Align parisc MADV_XXX constants with all other architectures
parisc: led: Fix potential null-ptr-deref in start_task()
parisc: Fix inconsistent indenting in setup_cmdline()
Linus Torvalds [Tue, 20 Dec 2022 14:32:11 +0000 (08:32 -0600)]
Merge tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are only three fairly simple patches.
The #include change to linux/swab.h addresses a userspace build issue,
and the change to the mmio tracing logic helps provide more useful
traces"
* tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
uapi: Add missing _UAPI prefix to <asm-generic/types.h> include guard
asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info
include/uapi/linux/swab: Fix potentially missing __always_inline
Johan Hovold [Fri, 16 Dec 2022 09:15:14 +0000 (10:15 +0100)]
efi: random: fix NULL-deref when refreshing seed
Do not try to refresh the RNG seed in case the firmware does not support
setting variables.
This is specifically needed to prevent a NULL-pointer dereference on the
Lenovo X13s with some firmware revisions, or more generally, whenever
the runtime services have been disabled (e.g. efi=noruntime or with
PREEMPT_RT).
Fixes: e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized") Reported-by: Steev Klimaszewski <steev@kali.org> Reported-by: Bjorn Andersson <andersson@kernel.org> Tested-by: Steev Klimaszewski <steev@kali.org> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Fri, 28 Oct 2022 23:42:02 +0000 (01:42 +0200)]
random: do not include <asm/archrandom.h> from random.h
The <asm/archrandom.h> header is a random.c private detail, not
something to be called by other code. As such, don't make it
automatically available by way of random.h.
Cc: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The networking code uses flags in sk_allocation to determine if it can use
current->task_frag, however in-kernel users of sockets may stop setting
sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
as SUNRPC has done in commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save()
on all rpciod/xprtiod jobs").
This will cause corruption in current->task_frag when recursing into the
network layer for those subsystems during page fault or reclaim. The
corruption is difficult to diagnose because stack traces may not contain the
offending subsystem at all. The corruption is unlikely to show up in
testing because it requires memory pressure, and so subsystems that
convert to memalloc_nofs_save/restore are likely to continue to run into
this issue.
Guilluame Nault has done all of the hard work tracking this problem down and
finding the best fix for this issue. I'm just taking a turn posting another
fix.
====================
Benjamin Coddington [Fri, 16 Dec 2022 12:45:28 +0000 (07:45 -0500)]
net: simplify sk_page_frag
Now that in-kernel socket users that may recurse during reclaim have benn
converted to sk_use_task_frag = false, we can have sk_page_frag() simply
check that value.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Benjamin Coddington [Fri, 16 Dec 2022 12:45:27 +0000 (07:45 -0500)]
Treewide: Stop corrupting socket's task_frag
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag. The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.
The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code. I believe this problem to
be much more pervasive than reports to the community may indicate.
Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false. Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.
CC: Philipp Reisner <philipp.reisner@linbit.com> CC: Lars Ellenberg <lars.ellenberg@linbit.com> CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com> CC: Jens Axboe <axboe@kernel.dk> CC: Josef Bacik <josef@toxicpanda.com> CC: Keith Busch <kbusch@kernel.org> CC: Christoph Hellwig <hch@lst.de> CC: Sagi Grimberg <sagi@grimberg.me> CC: Lee Duncan <lduncan@suse.com> CC: Chris Leech <cleech@redhat.com> CC: Mike Christie <michael.christie@oracle.com> CC: "James E.J. Bottomley" <jejb@linux.ibm.com> CC: "Martin K. Petersen" <martin.petersen@oracle.com> CC: Valentina Manea <valentina.manea.m@gmail.com> CC: Shuah Khan <shuah@kernel.org> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: David Howells <dhowells@redhat.com> CC: Marc Dionne <marc.dionne@auristor.com> CC: Steve French <sfrench@samba.org> CC: Christine Caulfield <ccaulfie@redhat.com> CC: David Teigland <teigland@redhat.com> CC: Mark Fasheh <mark@fasheh.com> CC: Joel Becker <jlbec@evilplan.org> CC: Joseph Qi <joseph.qi@linux.alibaba.com> CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Latchesar Ionkov <lucho@ionkov.net> CC: Dominique Martinet <asmadeus@codewreck.org> CC: Ilya Dryomov <idryomov@gmail.com> CC: Xiubo Li <xiubli@redhat.com> CC: Chuck Lever <chuck.lever@oracle.com> CC: Jeff Layton <jlayton@kernel.org> CC: Trond Myklebust <trond.myklebust@hammerspace.com> CC: Anna Schumaker <anna@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault [Fri, 16 Dec 2022 12:45:26 +0000 (07:45 -0500)]
net: Introduce sk_use_task_frag in struct sock.
Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current->task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.
To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
that mustn't use current->task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
->sk_allocation.
This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
->sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current->flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.
This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current->task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (->sk_frag). The first user is
sunrpc, which needs to avoid using current->task_frag but can keep
->sk_allocation set to GFP_KERNEL otherwise.
Eventually, it might be possible to simplify sk_page_frag() by only
testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
heuristic entirely (assuming other sockets will set ->sk_use_task_frag
according to their constraints in the future).
The new ->sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with ->sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).
Matt Johnston [Thu, 15 Dec 2022 05:49:33 +0000 (13:49 +0800)]
mctp: Remove device type check at unregister
The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.
This was encountered by https://github.com/openthread/openthread/issues/8523
Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.
Arun Ramadoss [Tue, 13 Dec 2022 10:14:40 +0000 (15:44 +0530)]
net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
KSZ swithes used interrupts for detecting the phy link up and down.
During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
flag. But this flag has to be retrieved from device tree instead of hard
coding in the driver, so removing the flag.
Fixes: ff319a644829 ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common") Reported-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Ehrig [Sun, 18 Dec 2022 05:17:32 +0000 (06:17 +0100)]
selftests/bpf: Add BPF_F_NO_TUNNEL_KEY test
This patch adds a selftest simulating a GRE sender and receiver using
tunnel headers without tunnel keys. It validates if packets encapsulated
using BPF_F_NO_TUNNEL_KEY are decapsulated by a GRE receiver not
configured with tunnel keys.
Christian Ehrig [Sun, 18 Dec 2022 05:17:31 +0000 (06:17 +0100)]
bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.
At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.
RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.
Currently, compiling samples/bpf with LLVM emits several warning. They
are only small details, but they do not appear when compiled with GCC.
Detailed compilation command and warning logs can be found from bpf CI.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Daniel T. Lee [Sun, 18 Dec 2022 06:14:53 +0000 (15:14 +0900)]
samples/bpf: fix uninitialized warning with test_current_task_under_cgroup
Currently, compiling samples/bpf with LLVM warns about the uninitialized
use of variable with test_current_task_under_cgroup.
./samples/bpf/test_current_task_under_cgroup_user.c:57:6:
warning: variable 'cg2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (setup_cgroup_environment())
^~~~~~~~~~~~~~~~~~~~~~~~~~
./samples/bpf/test_current_task_under_cgroup_user.c:106:8:
note: uninitialized use occurs here
close(cg2);
^~~
./samples/bpf/test_current_task_under_cgroup_user.c:57:2:
note: remove the 'if' if its condition is always false
if (setup_cgroup_environment())
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./samples/bpf/test_current_task_under_cgroup_user.c:19:9:
note: initialize the variable 'cg2' to silence this warning
int cg2, idx = 0, rc = 1;
^
= 0
1 warning generated.
This commit resolve this compiler warning by pre-initialize the variable
with error for safeguard.
Linus Torvalds [Mon, 19 Dec 2022 22:07:59 +0000 (16:07 -0600)]
Merge tag 'soc-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are a couple of build fixes from randconfig testing, plus a set
of Mediatek SoC specific fixes, all trivial"
* tag 'soc-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: tegra: fix CPU_BIG_ENDIAN dependencies
ARM: disallow pre-ARMv5 builds with ld.lld
ARM: pxa: fix building with clang
MAINTAINERS: add related dts to IXP4xx
ARM: dts: spear: drop 0x from unit address
arm64: dts: mt8183: Fix Mali GPU clock
arm64: dts: mediatek: mt8195-demo: fix the memory size of node secmon
soc: mediatek: pm-domains: Fix the power glitch issue
Jiri Olsa [Thu, 15 Dec 2022 21:44:30 +0000 (22:44 +0100)]
bpf: Remove trace_printk_lock
Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded
with trace_printk_lock spin lock.
The spin lock contention causes issues with bpf programs attached to
contention_begin tracepoint [1][2].
Andrii suggested we could get rid of the contention by using trylock, but we
could actually get rid of the spinlock completely by using percpu buffers the
same way as for bin_args in bpf_bprintf_prepare function.
Adding new return 'buf' argument to struct bpf_bprintf_data and making
bpf_bprintf_prepare to return also the buffer for printk helpers.
Jiri Olsa [Thu, 15 Dec 2022 21:44:29 +0000 (22:44 +0100)]
bpf: Do cleanup in bpf_bprintf_cleanup only when needed
Currently we always cleanup/decrement bpf_bprintf_nest_level variable
in bpf_bprintf_cleanup if it's > 0.
There's possible scenario where this could cause a problem, when
bpf_bprintf_prepare does not get bin_args buffer (because num_args is 0)
and following bpf_bprintf_cleanup call decrements bpf_bprintf_nest_level
variable, like:
in task context:
bpf_bprintf_prepare(num_args != 0) increments 'bpf_bprintf_nest_level = 1'
-> first irq :
bpf_bprintf_prepare(num_args == 0)
bpf_bprintf_cleanup decrements 'bpf_bprintf_nest_level = 0'
-> second irq:
bpf_bprintf_prepare(num_args != 0) bpf_bprintf_nest_level = 1
gets same buffer as task context above
Adding check to bpf_bprintf_cleanup and doing the real cleanup only if we
got bin_args data in the first place.
Linus Torvalds [Mon, 19 Dec 2022 18:33:32 +0000 (12:33 -0600)]
Merge tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support zstd-compressed debug info
- Allow W=1 builds to detect objects shared among multiple modules
- Add srcrpm-pkg target to generate a source RPM package
- Make the -s option detection work for future GNU Make versions
- Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
- Allow W=1 builds to detect -Wundef warnings in any preprocessed files
- Raise the minimum supported version of binutils to 2.25
- Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
- Use $(file ...) to read a file if GNU Make >= 4.2 is used
- Print error if GNU Make older than 3.82 is used
- Allow modpost to detect section mismatches with Clang LTO
- Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
* tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
buildtar: fix tarballs with EFI_ZBOOT enabled
modpost: Include '.text.*' in TEXT_SECTIONS
padata: Mark padata_work_init() as __ref
kbuild: ensure Make >= 3.82 is used
kbuild: refactor the prerequisites of the modpost rule
kbuild: change module.order to list *.o instead of *.ko
kbuild: use .NOTINTERMEDIATE for future GNU Make versions
kconfig: refactor Makefile to reduce process forks
kbuild: add read-file macro
kbuild: do not sort after reading modules.order
kbuild: add test-{ge,gt,le,lt} macros
Documentation: raise minimum supported version of binutils to 2.25
kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds
kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS
kbuild: Port silent mode detection to future gnu make.
init/version.c: remove #include <generated/utsrelease.h>
firmware_loader: remove #include <generated/utsrelease.h>
modpost: Mark uuid_le type to be suitable only for MEI
kbuild: add ability to make source rpm buildable using koji
kbuild: warn objects shared among multiple modules
...
Linus Torvalds [Mon, 19 Dec 2022 18:26:03 +0000 (12:26 -0600)]
Merge tag 'zstd-linus-v6.2' of https://github.com/terrelln/linux
Pull zstd updates from Nick Terrell:
"Update the kernel to upstream zstd v1.5.2 [0]. Specifically to the tag
v1.5.2-kernel [1] which includes several cherrypicked fixes for the
kernel on top of v1.5.2.
Excepting the MAINTAINERS change, all the changes in this can be
generated by:
git clone https://github.com/facebook/zstd
cd zstd/contrib/linux-kernel
git checkout v1.5.2-kernel
LINUX=/path/to/linux/repo make import
Additionally, this includes several minor typo fixes, which have all
been fixed upstream so they are maintained on the next import"
Arnd Bergmann [Mon, 19 Dec 2022 15:47:18 +0000 (16:47 +0100)]
Merge tag 'v6.1-dts64-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/fixes
MT8183: fix phandle for GPU clock
MT8195 demo: fix size of secmon reserved memory area
* tag 'v6.1-dts64-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux:
arm64: dts: mt8183: Fix Mali GPU clock
arm64: dts: mediatek: mt8195-demo: fix the memory size of node secmon
Linus Torvalds [Mon, 19 Dec 2022 15:10:33 +0000 (09:10 -0600)]
Merge tag 'nfsd-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd updates from Chuck Lever:
"This contains a number of crasher fixes that were not ready for the
initial pull request last week.
In particular, Jeff's patch attempts to address reference count
underflows in NFSD's filecache, which have been very difficult to
track down because there is no reliable reproducer.
Common failure modes:
https://bugzilla.kernel.org/show_bug.cgi?id=216691#c11
https://bugzilla.kernel.org/show_bug.cgi?id=216674#c6
https://bugzilla.redhat.com/show_bug.cgi?id=2138605
The race windows were found by inspection and the clean-ups appear
sensible and pass regression testing, so we include them here in the
hope that they address the problem. However we remain vigilant because
we don't have 100% certainty yet that the problem is fully addressed.
Summary:
- Address numerous reports of refcount underflows in NFSD's filecache
- Address a UAF in callback setup error handling
- Address a UAF during server-to-server copy"
* tag 'nfsd-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: fix use-after-free in __nfs42_ssc_open()
nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure
nfsd: rework refcounting in filecache
Helge Deller [Fri, 21 Oct 2022 08:03:52 +0000 (10:03 +0200)]
parisc: Show MPE/iX model string at bootup
Some (mostly 64-bit machines) machines allow to run MPE/iX and report the MPE
model string via firmware call. Enhance the pdc_model_sysmodel() function to
report that model string.
Note that some 32-bit machines like the B160L wrongly report success for the
firmware call, so include a check to prevent showing wrong info.
| drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c:502:65: error:
| array subscript ‘struct kvaser_cmd_ext[0]’ is partly outside array
| bounds of ‘unsigned char[32]’ [-Werror=array-bounds=]
| 502 | ret = le16_to_cpu(((struct kvaser_cmd_ext *)cmd)->len);
kvaser_usb_hydra_cmd_size() returns the size of given command. It
depends on the command number (cmd->header.cmd_no). For extended
commands (cmd->header.cmd_no == CMD_EXTENDED) the above shown code is
executed.
Help gcc to recognize that this code path is not taken in all cases,
by calling kvaser_usb_hydra_cmd_size() directly after assigning the
command number.
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Cc: Jimmy Assarsson <extja@kvaser.com> Cc: Anssi Hannula <anssi.hannula@bitwise.fi> Link: https://lore.kernel.org/all/20221219110104.1073881-1-mkl@pengutronix.de Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Tue, 13 Dec 2022 05:11:36 +0000 (14:11 +0900)]
Documentation: devlink: add missing toc entry for etas_es58x devlink doc
toc entry is missing for etas_es58x devlink doc and triggers this warning:
Documentation/networking/devlink/etas_es58x.rst: WARNING: document isn't included in any toctree
Add the missing toc entry.
Fixes: 9f63f96aac92 ("Documentation: devlink: add devlink documentation for the etas_es58x driver") Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20221213051136.721887-1-mailhol.vincent@wanadoo.fr Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Linus Torvalds [Mon, 19 Dec 2022 15:00:00 +0000 (09:00 -0600)]
Merge tag 'rtc-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Most of the changes are a rework of the cmos driver by Rafael and
fixes for issues found using static checkers. The removal of a driver
leads to a reduction of the number of LOC of the subsystem.
Removed driver:
- davinci
Updates:
- convert i2c drivers to .probe_new
- fix spelling mistakes and duplicated words in comments
- cmos: rework wake setup and ACPI event handling
- cros-ec: Limit RTC alarm range to fix alarmtimer
- ds1347: fix century register handling
- efi: wakeup support
- isl12022: temperature sensor support
- pcf85063: fix read_alarm and clkout
- pcf8523: use stop bit to detect invalid time
- pcf8563: use RTC_FEATURE_ALARM
- snvs: be more flexible on LPSRT reads
- many static checker fixes"
* tag 'rtc-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (48 commits)
rtc: ds1742: use devm_platform_get_and_ioremap_resource()
rtc: mxc_v2: Add missing clk_disable_unprepare()
rtc: rs5c313: correct some spelling mistakes
rtc: at91rm9200: Fix syntax errors in comments
rtc: remove duplicated words in comments
rtc: rv3028: Use IRQ flags obtained from device tree if available
rtc: ds1307: use sysfs_emit() to instead of scnprintf()
rtc: isl12026: drop obsolete dependency on COMPILE_TEST
dt-bindings: rtc: m41t80: Convert text schema to YAML one
rtc: pcf85063: fix pcf85063_clkout_control
rtc: rx6110: fix warning with !OF
rtc: rk808: reduce 'struct rk808' usage
rtc: msc313: Fix function prototype mismatch in msc313_rtc_probe()
dt-bindings: rtc: convert rtc-meson.txt to dt-schema
rtc: pic32: Move devm_rtc_allocate_device earlier in pic32_rtc_probe()
rtc: st-lpc: Add missing clk_disable_unprepare in st_rtc_probe()
rtc: pcf85063: Fix reading alarm
rtc: pcf8523: fix for stop bit
rtc: efi: Add wakeup support
rtc: pcf8563: clear RTC_FEATURE_ALARM if no irq
...
Linus Torvalds [Mon, 19 Dec 2022 14:54:17 +0000 (08:54 -0600)]
Merge tag 'dmaengine-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New support:
- Qualcomm SDM670, SM6115 and SM6375 GPI controller support
- Ingenic JZ4755 dmaengine support
- Removal of iop-adma driver
Updates:
- Tegra support for dma-channel-mask
- at_hdmac cleanup and virt-chan support for this driver"
* tag 'dmaengine-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (46 commits)
dmaengine: Revert "dmaengine: remove s3c24xx driver"
dmaengine: tegra: Add support for dma-channel-mask
dt-bindings: dmaengine: Add dma-channel-mask to Tegra GPCDMA
dmaengine: idxd: Remove linux/msi.h include
dt-bindings: dmaengine: qcom: gpi: add compatible for SM6375
dmaengine: idxd: Fix crc_val field for completion record
dmaengine: at_hdmac: Convert driver to use virt-dma
dmaengine: at_hdmac: Remove unused member of at_dma_chan
dmaengine: at_hdmac: Rename "chan_common" to "dma_chan"
dmaengine: at_hdmac: Rename "dma_common" to "dma_device"
dmaengine: at_hdmac: Use bitfield access macros
dmaengine: at_hdmac: Keep register definitions and structures private to at_hdmac.c
dmaengine: at_hdmac: Set include entries in alphabetic order
dmaengine: at_hdmac: Use pm_ptr()
dmaengine: at_hdmac: Use devm_clk_get()
dmaengine: at_hdmac: Use devm_platform_ioremap_resource
dmaengine: at_hdmac: Use devm_kzalloc() and struct_size()
dmaengine: at_hdmac: Introduce atc_get_llis_residue()
dmaengine: at_hdmac: s/atc_get_bytes_left/atc_get_residue
dmaengine: at_hdmac: Pass residue by address to avoid unnecessary implicit casts
...
Linus Torvalds [Mon, 19 Dec 2022 14:47:33 +0000 (08:47 -0600)]
Merge tag 'soundwire-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"This include bunch of Intel driver code reorganization and support for
qcom v1.7.0 controller:
- intel: reorganization of hw_ops callbacks, splitting files etc
- qcom: support for v1.7.0 qcom controllers"
* tag 'soundwire-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel: split auxdevice to different file
soundwire: intel: add in-band wake callbacks in hw_ops
soundwire: intel: add link power management callbacks in hw_ops
soundwire: intel: add bus management callbacks in hw_ops
soundwire: intel: add register_dai callback in hw_ops
soundwire: intel: add debugfs callbacks in hw_ops
soundwire: intel: start using hw_ops
dt-bindings: soundwire: Convert text bindings to DT Schema
soundwire: cadence: use dai_runtime_array instead of dma_data
soundwire: cadence: rename sdw_cdns_dai_dma_data as sdw_cdns_dai_runtime
soundwire: qcom: add support for v1.7 Soundwire Controller
dt-bindings: soundwire: qcom: add v1.7.0 support
soundwire: qcom: make reset optional for v1.6 controller
soundwire: qcom: remove unused SWRM_SPECIAL_CMD_ID
soundwire: dmi-quirks: add quirk variant for LAPBC710 NUC15
Linus Torvalds [Mon, 19 Dec 2022 14:34:39 +0000 (08:34 -0600)]
Merge tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core code:
- map/unmap_pages() cleanup
- SVA and IOPF refactoring
- Clean up and document return codes from device/domain attachment
AMD driver:
- Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and
ivrs_acpihid command line options
- Some smaller cleanups
Intel driver:
- Blocking domain support
- Cleanups
S390 driver:
- Fixes and improvements for attach and aperture handling
PAMU driver:
- Resource leak fix and cleanup
Rockchip driver:
- Page table permission bit fix
Mediatek driver:
- Improve safety from invalid dts input
- Smaller fixes and improvements
Sun50i driver:
- Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever
- Various other fixes"
* tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits)
iommu/mediatek: Fix forever loop in error handling
iommu/mediatek: Fix crash on isr after kexec()
iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY
iommu/amd: Fix typo in macro parameter name
iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data
iommu/mediatek: Improve safety for mediatek,smi property in larb nodes
iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"
iommu/mediatek: Add error path for loop of mm_dts_parse
iommu/mediatek: Use component_match_add
iommu/mediatek: Add platform_device_put for recovering the device refcnt
iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe()
iommu/vt-d: Use real field for indication of first level
iommu/vt-d: Remove unnecessary domain_context_mapped()
iommu/vt-d: Rename domain_add_dev_info()
iommu/vt-d: Rename iommu_disable_dev_iotlb()
iommu/vt-d: Add blocking domain support
iommu/vt-d: Add device_block_translation() helper
iommu/vt-d: Allocate pasid table in device probe path
iommu/amd: Check return value of mmu_notifier_register()
iommu/amd: Fix pci device refcount leak in ppr_notifier()
...
Linus Torvalds [Mon, 19 Dec 2022 14:23:27 +0000 (08:23 -0600)]
Merge tag 'loongarch-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Switch to relative exception tables
- Add unaligned access support
- Add alternative runtime patching mechanism
- Add FDT booting support from efi system table
- Add suspend/hibernation (ACPI S3/S4) support
- Add basic STACKPROTECTOR support
- Add ftrace (function tracer) support
- Update the default config file
* tag 'loongarch-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (24 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: modules/ftrace: Initialize PLT at load time
LoongArch/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support
LoongArch/ftrace: Add HAVE_DYNAMIC_FTRACE_WITH_ARGS support
LoongArch/ftrace: Add HAVE_DYNAMIC_FTRACE_WITH_REGS support
LoongArch/ftrace: Add dynamic function graph tracer support
LoongArch/ftrace: Add dynamic function tracer support
LoongArch/ftrace: Add recordmcount support
LoongArch/ftrace: Add basic support
LoongArch: module: Use got/plt section indices for relocations
LoongArch: Add basic STACKPROTECTOR support
LoongArch: Add hibernation (ACPI S4) support
LoongArch: Add suspend (ACPI S3) support
LoongArch: Add processing ISA Node in DeviceTree
LoongArch: Add FDT booting support from efi system table
LoongArch: Use alternative to optimize libraries
LoongArch: Add alternative runtime patching mechanism
LoongArch: Add unaligned access support
LoongArch: BPF: Add BPF exception tables
LoongArch: Remove the .fixup section usage
...
Linus Torvalds [Mon, 19 Dec 2022 13:13:33 +0000 (07:13 -0600)]
Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add powerpc qspinlock implementation optimised for large system
scalability and paravirt. See the merge message for more details
- Enable objtool to be built on powerpc to generate mcount locations
- Use a temporary mm for code patching with the Radix MMU, so the
writable mapping is restricted to the patching CPU
- Add an option to build the 64-bit big-endian kernel with the ELFv2
ABI
- Sanitise user registers on interrupt entry on 64-bit Book3S
- Many other small features and fixes
Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.
* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
powerpc/code-patching: Fix oops with DEBUG_VM enabled
powerpc/qspinlock: Fix 32-bit build
powerpc/prom: Fix 32-bit build
powerpc/rtas: mandate RTAS syscall filtering
powerpc/rtas: define pr_fmt and convert printk call sites
powerpc/rtas: clean up includes
powerpc/rtas: clean up rtas_error_log_max initialization
powerpc/pseries/eeh: use correct API for error log size
powerpc/rtas: avoid scheduling in rtas_os_term()
powerpc/rtas: avoid device tree lookups in rtas_os_term()
powerpc/rtasd: use correct OF API for event scan rate
powerpc/rtas: document rtas_call()
powerpc/pseries: unregister VPA when hot unplugging a CPU
powerpc/pseries: reset the RCU watchdogs after a LPM
powerpc: Take in account addition CPU node when building kexec FDT
powerpc: export the CPU node count
powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
powerpc/dts/fsl: Fix pca954x i2c-mux node names
cxl: Remove unnecessary cxl_pci_window_alignment()
selftests/powerpc: Fix resource leaks
...
Linus Torvalds [Mon, 19 Dec 2022 13:03:44 +0000 (07:03 -0600)]
Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull fault-injection updates from Andrew Morton:
"Some fault-injection improvements from Wei Yongjun which enable
stacktrace filtering on x86_64"
* tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
fault-injection: make stacktrace filter works as expected
fault-injection: make some stack filter attrs more readable
fault-injection: skip stacktrace filtering by default
fault-injection: allow stacktrace filter for x86-64
Linus Torvalds [Mon, 19 Dec 2022 12:58:57 +0000 (06:58 -0600)]
Merge tag 'mm-stable-2022-12-17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
- A few late-breaking minor fixups
- Two minor feature patches which were awkwardly dependent on mm-nonmm.
I need to set up a new branch to handle such things.
* tag 'mm-stable-2022-12-17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: zram: zsmalloc: Add an additional co-maintainer
mm/kmemleak: use %pK to display kernel pointers in backtrace
mm: use stack_depot for recording kmemleak's backtrace
maple_tree: update copyright dates for test code
maple_tree: fix mas_find_rev() comment
mm/gup_test: free memory allocated via kvcalloc() using kvfree()
Jeremy Kerr [Fri, 16 Dec 2022 03:44:09 +0000 (11:44 +0800)]
mctp: serial: Fix starting value for frame check sequence
RFC1662 defines the start state for the crc16 FCS to be 0xffff, but
we're currently starting at zero.
This change uses the correct start state. We're only early in the
adoption for the serial binding, so there aren't yet any other users to
interface to.
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding") Reported-by: Harsh Tyagi <harshtya@google.com> Tested-by: Harsh Tyagi <harshtya@google.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huanhuan Wang [Fri, 16 Dec 2022 14:31:01 +0000 (15:31 +0100)]
nfp: fix unaligned io read of capabilities word
The address of 32-bit extend capability is not qword aligned,
and may cause exception in some arch.
Fixes: 484963ce9f1e ("nfp: extend capability and control words") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 16 Dec 2022 16:29:17 +0000 (16:29 +0000)]
net: stream: purge sk_error_queue in sk_stream_kill_queues()
Changheon Lee reported TCP socket leaks, with a nice repro.
It seems we leak TCP sockets with the following sequence:
1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.
Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
__skb_tstamp_tx() is using skb_clone(), unless
SOF_TIMESTAMPING_OPT_TSONLY was also requested.
2) If the application is also using MSG_ZEROCOPY, then we put in the
error queue cloned skbs that had a struct ubuf_info attached to them.
Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
does a sock_hold().
As long as the cloned skbs are still in sk_error_queue,
socket refcount is kept elevated.
3) Application closes the socket, while error queue is not empty.
Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.
This bug can be (ab)used to consume all kernel memory
and freeze the host.
We need to purge the error queue, with proper synchronization
against concurrent writers.
Fixes: 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()") Reported-by: Changheon Lee <darklight2357@icloud.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 18 Dec 2022 18:08:40 +0000 (19:08 +0100)]
myri10ge: Fix an error handling path in myri10ge_probe()
Some memory allocated in myri10ge_probe_slices() is not released in the
error handling path of myri10ge_probe().
Add the corresponding kfree(), as already done in the remove function.
Fixes: 0dcffac1a329 ("myri10ge: add multislices support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Mon, 19 Dec 2022 08:22:15 +0000 (09:22 +0100)]
net: microchip: vcap: Fix initialization of value and mask
Fix the following smatch warning:
smatch warnings:
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:103 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'value'.
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:106 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'mask'.
In case the vcap field was VCAP_FIELD_U128 and the key was different
than IP6_S/DIP then the value and mask were not initialized, therefore
initialize them.
Fixes: 610c32b2ce66 ("net: microchip: vcap: Add vcap_get_rule") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Dec 2022 09:51:31 +0000 (09:51 +0000)]
Merge branch 'rxrpc-fixes'
David Howells says:
====================
rxrpc: Fixes for I/O thread conversion/SACK table expansion
Here are some fixes for AF_RXRPC:
(1) Fix missing unlock in rxrpc's sendmsg.
(2) Fix (lack of) propagation of security settings to rxrpc_call.
(3) Fix NULL ptr deref in rxrpc_unuse_local().
(4) Fix problem with kthread_run() not invoking the I/O thread function if
the kthread gets stopped first. Possibly this should actually be
fixed in the kthread code.
(5) Fix locking problem as putting a peer (which may be done from RCU) may
now invoke kthread_stop().
(6) Fix switched parameters in a couple of trace calls.
(7) Fix I/O thread's checking for kthread stop to make sure it completes
all outstanding work before returning so that calls are cleaned up.
(8) Fix an uninitialised var in the new rxperf test server.
(9) Fix the return value of rxrpc_new_incoming_call() so that the checks
on it work correctly.
The patches fix at least one syzbot bug[1] and probably some others that
don't have reproducers[2][3][4]. I think it also fixes another[5], but
that showed another failure during testing that was different to the
original.
There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a
combination of several patches in my rxrpc-next branch, but I haven't
included that here.
====================
Tested-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: kafs-testing+fedora36_64checkkafs-build-164@auristor.com Signed-off-by: David S. Miller <davem@davemloft.net>