]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
3 years agobpf, x86: Set header->size properly before freeing it
Song Liu [Wed, 2 Mar 2022 17:51:26 +0000 (09:51 -0800)]
bpf, x86: Set header->size properly before freeing it

On do_jit failure path, the header is freed by bpf_jit_binary_pack_free.
While bpf_jit_binary_pack_free doesn't require proper ro_header->size,
bpf_prog_pack_free still uses it. Set header->size in bpf_int_jit_compile
before calling bpf_jit_binary_pack_free.

Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220302175126.247459-3-song@kernel.org
3 years agox86: Disable HAVE_ARCH_HUGE_VMALLOC on 32-bit x86
Song Liu [Wed, 2 Mar 2022 17:51:25 +0000 (09:51 -0800)]
x86: Disable HAVE_ARCH_HUGE_VMALLOC on 32-bit x86

kernel test robot reported kernel BUG like:

[ 44.587744][ T1] kernel BUG at arch/x86/mm/physaddr.c:76!
[ 44.590151][ T1] __vmalloc_area_node (mm/vmalloc.c:622 mm/vmalloc.c:2995)
[ 44.590151][ T1] __vmalloc_node_range (mm/vmalloc.c:3108)
[ 44.590151][ T1] __vmalloc_node (mm/vmalloc.c:3157)

which is triggered with HAVE_ARCH_HUGE_VMALLOC on 32-bit x86. Since BPF
only uses HAVE_ARCH_HUGE_VMALLOC for x86_64, turn it off for 32-bit x86.

Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220302175126.247459-2-song@kernel.org
3 years agobpf, test_run: Fix overflow in XDP frags bpf_test_finish
Stanislav Fomichev [Mon, 28 Feb 2022 23:23:32 +0000 (15:23 -0800)]
bpf, test_run: Fix overflow in XDP frags bpf_test_finish

Syzkaller reports another issue:

WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
check_copy_size include/linux/thread_info.h:230 [inline]
WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
copy_to_user include/linux/uaccess.h:199 [inline]
WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
bpf_test_finish.isra.0+0x4b2/0x680 net/bpf/test_run.c:171

This can happen when the userspace buffer is smaller than head + frags.
Return ENOSPC in this case.

Fixes: 7855e0db150a ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
Reported-by: syzbot+5f81df6205ecbbc56ab5@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/bpf/20220228232332.458871-1-sdf@google.com
3 years agoselftests/bpf: Update btf_dump case for conflicting names
Xu Kuohai [Tue, 1 Mar 2022 05:32:50 +0000 (00:32 -0500)]
selftests/bpf: Update btf_dump case for conflicting names

Update btf_dump case for conflicting names caused by forward declaration.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220301053250.1464204-3-xukuohai@huawei.com
3 years agolibbpf: Skip forward declaration when counting duplicated type names
Xu Kuohai [Tue, 1 Mar 2022 05:32:49 +0000 (00:32 -0500)]
libbpf: Skip forward declaration when counting duplicated type names

Currently if a declaration appears in the BTF before the definition, the
definition is dumped as a conflicting name, e.g.:

    $ bpftool btf dump file vmlinux format raw | grep "'unix_sock'"
    [81287] FWD 'unix_sock' fwd_kind=struct
    [89336] STRUCT 'unix_sock' size=1024 vlen=14

    $ bpftool btf dump file vmlinux format c | grep "struct unix_sock"
    struct unix_sock;
    struct unix_sock___2 { <--- conflict, the "___2" is unexpected
    struct unix_sock___2 *unix_sk;

This causes a compilation error if the dump output is used as a header file.

Fix it by skipping declaration when counting duplicated type names.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220301053250.1464204-2-xukuohai@huawei.com
3 years agobpf: Add some description about BPF_JIT_ALWAYS_ON in Kconfig
Tiezhu Yang [Tue, 22 Feb 2022 09:57:05 +0000 (17:57 +0800)]
bpf: Add some description about BPF_JIT_ALWAYS_ON in Kconfig

When CONFIG_BPF_JIT_ALWAYS_ON is enabled, /proc/sys/net/core/bpf_jit_enable
is permanently set to 1 and setting any other value than that will return
failure.

Add the above description in the help text of config BPF_JIT_ALWAYS_ON, and
then we can distinguish between BPF_JIT_ALWAYS_ON and BPF_JIT_DEFAULT_ON.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/1645523826-18149-2-git-send-email-yangtiezhu@loongson.cn
3 years agobpf, docs: Add a missing colon in verifier.rst
Wan Jiabing [Mon, 28 Feb 2022 08:04:16 +0000 (16:04 +0800)]
bpf, docs: Add a missing colon in verifier.rst

Add a missing colon to fix the document style.

Fixes: 88691e9e1ef5 ("bpf, docs: Split general purpose eBPF documentation out of filter.rst")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220228080416.1689327-1-wanjiabing@vivo.com
3 years agobpf: Cache the last valid build_id
Hao Luo [Thu, 24 Feb 2022 00:05:31 +0000 (16:05 -0800)]
bpf: Cache the last valid build_id

For binaries that are statically linked, consecutive stack frames are
likely to be in the same VMA and therefore have the same build id.

On a real-world workload, we observed that 66% of CPU cycles in
__bpf_get_stackid() were spent on build_id_parse() and find_vma().

As an optimization for this case, we can cache the previous frame's
VMA, if the new frame has the same VMA as the previous one, reuse the
previous one's build id.

We are holding the MM locks as reader across the entire loop, so we
don't need to worry about VMA going away.

Tested through "stacktrace_build_id" and "stacktrace_build_id_nmi" in
test_progs.

Suggested-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/bpf/20220224000531.1265030-1-haoluo@google.com
3 years agolibbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning
Stijn Tintel [Fri, 25 Feb 2022 15:23:55 +0000 (17:23 +0200)]
libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning

When a BPF map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY doesn't have the
max_entries parameter set, the map will be created with max_entries set
to the number of available CPUs. When we try to reuse such a pinned map,
map_is_reuse_compat will return false, as max_entries in the map
definition differs from max_entries of the existing map, causing the
following error:

  libbpf: couldn't reuse pinned map at '/sys/fs/bpf/m_logging': parameter mismatch

Fix this by overwriting max_entries in the map definition. For this to
work, we need to do this in bpf_object__create_maps, before calling
bpf_object__reuse_map.

Fixes: 57a00f41644f ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220225152355.315204-1-stijn@linux-ipv6.be
3 years agobpf, selftests: Use raw_tp program for atomic test
Hou Tao [Thu, 17 Feb 2022 07:22:32 +0000 (15:22 +0800)]
bpf, selftests: Use raw_tp program for atomic test

Now atomic tests will attach fentry program and run it through
bpf_prog_test_run_opts(), but attaching fentry program depends on BPF
trampoline which is only available under x86-64. Considering many archs
have atomic support, using raw_tp program instead.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217072232.1186625-5-houtao1@huawei.com
3 years agobpf, arm64: Support more atomic operations
Hou Tao [Thu, 17 Feb 2022 07:22:31 +0000 (15:22 +0800)]
bpf, arm64: Support more atomic operations

Atomics for eBPF patch series adds support for atomic[64]_fetch_add,
atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg|cmpxchg}, but it
only adds support for x86-64, so support these atomic operations for
arm64 as well.

Basically the implementation procedure is almost mechanical translation
of code snippets in atomic_ll_sc.h & atomic_lse.h & cmpxchg.h located
under arch/arm64/include/asm.

When LSE atomic is unavailable, an extra temporary register is needed for
(BPF_ADD | BPF_FETCH) to save the value of src register, instead of adding
TMP_REG_4 just use BPF_REG_AX instead. Also make emit_lse_atomic() as an
empty inline function when CONFIG_ARM64_LSE_ATOMICS is disabled.

For cpus_have_cap(ARM64_HAS_LSE_ATOMICS) case and no-LSE-ATOMICS case, the
following three tests: "./test_verifier", "./test_progs -t atomic" and
"insmod ./test_bpf.ko" are exercised and passed.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220217072232.1186625-4-houtao1@huawei.com
3 years agoMerge branch 'for-next/insn' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git...
Daniel Borkmann [Mon, 28 Feb 2022 15:21:39 +0000 (16:21 +0100)]
Merge branch 'for-next/insn' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Will Deacon says:

====================
On Tue, Feb 22, 2022 at 10:38:02PM +0000, Will Deacon wrote:
> On Thu, 17 Feb 2022 15:22:28 +0800, Hou Tao wrote:
> > Atomics support in bpf has already been done by "Atomics for eBPF"
> > patch series [1], but it only adds support for x86, and this patchset
> > adds support for arm64.
> >
> > Patch #1 & patch #2 are arm64 related. Patch #1 moves the common used
> > macro AARCH64_BREAK_FAULT into insn-def.h for insn.h. Patch #2 adds
> > necessary encoder helpers for atomic operations.
> >
> > [...]
>
> Applied to arm64 (for-next/insn), thanks!
>
> [1/4] arm64: move AARCH64_BREAK_FAULT into insn-def.h
>       https://git.kernel.org/arm64/c/97e58e395e9c
> [2/4] arm64: insn: add encoders for atomic operations
>       https://git.kernel.org/arm64/c/fa1114d9eba5

Daniel -- let's give this a day or so in -next, then if nothing catches
fire you're more than welcome to pull this branch as a base for the rest
of the series.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220222224211.GB16976@willie-the-truck
3 years agobpftool: Remove redundant slashes
Yuntao Wang [Sat, 26 Feb 2022 16:38:15 +0000 (00:38 +0800)]
bpftool: Remove redundant slashes

Because the OUTPUT variable ends with a slash but CURDIR doesn't, to keep
the _OUTPUT value consistent, we add a trailing slash to CURDIR when
defining _OUTPUT variable.

Since the _OUTPUT variable holds a value ending with a trailing slash,
there is no need to add another one when defining BOOTSTRAP_OUTPUT and
LIBBPF_OUTPUT variables. Likewise, when defining LIBBPF_INCLUDE and
LIBBPF_BOOTSTRAP_INCLUDE, we shouldn't add an extra slash either for the
same reason.

When building libbpf, the value of the DESTDIR argument should also not
end with a trailing slash.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220226163815.520133-1-ytcoode@gmail.com
3 years agobpf: Add config to allow loading modules with BTF mismatches
Connor O'Brien [Wed, 23 Feb 2022 01:28:14 +0000 (01:28 +0000)]
bpf: Add config to allow loading modules with BTF mismatches

BTF mismatch can occur for a separately-built module even when the ABI is
otherwise compatible and nothing else would prevent successfully loading.

Add a new Kconfig to control how mismatches are handled. By default, preserve
the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH
is enabled, load the module but ignore its BTF information.

Suggested-by: Yonghong Song <yhs@fb.com>
Suggested-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
3 years agobpf, arm64: Feed byte-offset into bpf line info
Hou Tao [Sat, 26 Feb 2022 12:19:06 +0000 (20:19 +0800)]
bpf, arm64: Feed byte-offset into bpf line info

insn_to_jit_off passed to bpf_prog_fill_jited_linfo() is calculated in
instruction granularity instead of bytes granularity, but BPF line info
requires byte offset.

bpf_prog_fill_jited_linfo() will be the last user of ctx.offset before
it is freed, so convert the offset into byte-offset before calling into
bpf_prog_fill_jited_linfo() in order to fix the line info dump on arm64.

Fixes: 37ab566c178d ("bpf: arm64: Enable arm64 jit to provide bpf_line_info")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220226121906.5709-3-houtao1@huawei.com
3 years agobpf, arm64: Call build_prologue() first in first JIT pass
Hou Tao [Sat, 26 Feb 2022 12:19:05 +0000 (20:19 +0800)]
bpf, arm64: Call build_prologue() first in first JIT pass

BPF line info needs ctx->offset to be the instruction offset in the whole JITed
image instead of the body itself, so also call build_prologue() first in first
JIT pass.

Fixes: 37ab566c178d ("bpf: arm64: Enable arm64 jit to provide bpf_line_info")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220226121906.5709-2-houtao1@huawei.com
3 years agobpf: Fix issue with bpf preload module taking over stdout/stdin of kernel.
Yucong Sun [Fri, 25 Feb 2022 18:59:24 +0000 (10:59 -0800)]
bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel.

In cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.")
BPF preload was switched from user mode process to use in-kernel light
skeleton instead. However, in the kernel context, early in the boot
sequence, the first available FD can start from 0, instead of normally
3 for user mode process. So FDs 0 and 1 are then used for loaded BPF
programs and prevent init process from setting up stdin/stdout/stderr on
FD 0, 1, and 2 as expected.

Before the fix:

ls -lah /proc/1/fd/*

lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console

After the fix:

ls -lah /proc/1/fd/*

lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console
lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console
lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console

Fix by closing prog FDs after initialization. struct bpf_prog's
themselves are kept alive through direct kernel references taken with
bpf_link_get_from_fd().

Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.")
Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220225185923.2535519-1-fallentree@fb.com
3 years agobpftool: Bpf skeletons assert type sizes
Delyan Kratunov [Wed, 23 Feb 2022 22:01:58 +0000 (22:01 +0000)]
bpftool: Bpf skeletons assert type sizes

When emitting type declarations in skeletons, bpftool will now also emit
static assertions on the size of the data/bss/rodata/etc fields. This
ensures that in situations where userspace and kernel types have the same
name but differ in size we do not silently produce incorrect results but
instead break the build.

This was reported in [1] and as expected the repro in [2] fails to build
on the new size assert after this change.

  [1]: Closes: https://github.com/libbpf/libbpf/issues/433
  [2]: https://github.com/fuweid/iovisor-bcc-pr-3777

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/f562455d7b3cf338e59a7976f4690ec5a0057f7f.camel@fb.com
3 years agobpf: Cleanup comments
Tom Rix [Sun, 20 Feb 2022 18:40:55 +0000 (10:40 -0800)]
bpf: Cleanup comments

Add leading space to spdx tag
Use // for spdx c file comment

Replacements
resereved to reserved
inbetween to in between
everytime to every time
intutivie to intuitive
currenct to current
encontered to encountered
referenceing to referencing
upto to up to
exectuted to executed

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
3 years agolibbpf: Simplify the find_elf_sec_sz() function
Yuntao Wang [Wed, 23 Feb 2022 08:52:44 +0000 (16:52 +0800)]
libbpf: Simplify the find_elf_sec_sz() function

The check in the last return statement is unnecessary, we can just return
the ret variable.

But we can simplify the function further by returning 0 immediately if we
find the section size and -ENOENT otherwise.

Thus we can also remove the ret variable.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220223085244.3058118-1-ytcoode@gmail.com
3 years agobpftool: Remove usage of reallocarray()
Mauricio Vásquez [Mon, 21 Feb 2022 12:56:17 +0000 (07:56 -0500)]
bpftool: Remove usage of reallocarray()

This commit fixes a compilation error on systems with glibc < 2.26 [0]:

```
In file included from main.h:14:0,
                 from gen.c:24:
linux/tools/include/tools/libc_compat.h:11:21: error: attempt to use poisoned "reallocarray"
 static inline void *reallocarray(void *ptr, size_t nmemb, size_t size)
```

This happens because gen.c pulls <bpf/libbpf_internal.h>, and then
<tools/libc_compat.h> (through main.h). When
COMPAT_NEED_REALLOCARRAY is set, libc_compat.h defines reallocarray()
which libbpf_internal.h poisons with a GCC pragma.

This commit reuses libbpf_reallocarray() implemented in commit
029258d7b228 ("libbpf: Remove any use of reallocarray() in libbpf").

v1 -> v2:
- reuse libbpf_reallocarray() instead of reimplementing it

Fixes: a9caaba399f9 ("bpftool: Implement "gen min_core_btf" logic")
Reported-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220221125617.39610-1-mauricio@kinvolk.io
[0]: https://lore.kernel.org/bpf/3bf2bd49-9f2d-a2df-5536-bc0dde70a83b@isovalent.com/

3 years agoscripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading.
Kui-Feng Lee [Thu, 17 Feb 2022 17:54:27 +0000 (09:54 -0800)]
scripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading.

Pass a "-j" argument to pahole if possible to reduce the time of
generating BTF info.

Since v1.22, pahole can parse DWARF and generate BTF with
multithreading to speed up the conversion.  It will reduce the overall
build time of the kernel for seconds.

v3 fixes whitespaces and improves the commit description.
v2 checks the version of pahole to enable multithreading only if possible.

[v2] https://lore.kernel.org/bpf/20220216193431.2691015-1-kuifeng@fb.com/
[v1] https://lore.kernel.org/bpf/20220216004616.2079689-1-kuifeng@fb.com/

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220217175427.649713-1-kuifeng@fb.com
3 years agoarm64: insn: add encoders for atomic operations
Hou Tao [Thu, 17 Feb 2022 07:22:30 +0000 (15:22 +0800)]
arm64: insn: add encoders for atomic operations

It is a preparation patch for eBPF atomic supports under arm64. eBPF
needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and
atomic[64]_{xchg|cmpxchg}. The ordering semantics of eBPF atomics are
the same with the implementations in linux kernel.

Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB
instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for
LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra
helper is added. atomic_fetch_add() and other atomic ops needs support for
STLXR instruction, so extend enum aarch64_insn_ldst_type to do that.

LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE
atomics is enabled, so just return AARCH64_BREAK_FAULT directly in
these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: move AARCH64_BREAK_FAULT into insn-def.h
Hou Tao [Thu, 17 Feb 2022 07:22:29 +0000 (15:22 +0800)]
arm64: move AARCH64_BREAK_FAULT into insn-def.h

If CONFIG_ARM64_LSE_ATOMICS is off, encoders for LSE-related instructions
can return AARCH64_BREAK_FAULT directly in insn.h. In order to access
AARCH64_BREAK_FAULT in insn.h, we can not include debug-monitors.h in
insn.h, because debug-monitors.h has already depends on insn.h, so just
move AARCH64_BREAK_FAULT into insn-def.h.

It will be used by the following patch to eliminate unnecessary LSE-related
encoders when CONFIG_ARM64_LSE_ATOMICS is off.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-2-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agolibbpf: Remove redundant check in btf_fixup_datasec()
Yuntao Wang [Sun, 20 Feb 2022 07:27:50 +0000 (15:27 +0800)]
libbpf: Remove redundant check in btf_fixup_datasec()

The check 't->size && t->size != size' is redundant because if t->size
compares unequal to 0, we will just skip straight to sorting variables.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220220072750.209215-1-ytcoode@gmail.com
3 years agoselftests/bpf: Add test for reg2btf_ids out of bounds access
Kumar Kartikeya Dwivedi [Sun, 20 Feb 2022 02:31:38 +0000 (08:01 +0530)]
selftests/bpf: Add test for reg2btf_ids out of bounds access

This test tries to pass a PTR_TO_BTF_ID_OR_NULL to the release function,
which would trigger a out of bounds access without the fix in commit
45ce4b4f9009 ("bpf: Fix crash due to out of bounds access into reg2btf_ids.")
but after the fix, it should only index using base_type(reg->type),
which should be less than __BPF_REG_TYPE_MAX, and also not permit any
type flags to be set for the reg->type.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220220023138.2224652-1-memxor@gmail.com
3 years agoselftests/bpf: Fix btfgen tests
Andrii Nakryiko [Sun, 20 Feb 2022 04:27:20 +0000 (20:27 -0800)]
selftests/bpf: Fix btfgen tests

There turned out to be a few problems with btfgen selftests.

First, core_btfgen tests are failing in BPF CI due to the use of
full-featured bpftool, which has extra dependencies on libbfd, libcap,
etc, which are present in BPF CI's build environment, but those shared
libraries are missing in QEMU image in which test_progs is running.

To fix this problem, use minimal bootstrap version of bpftool instead.
It only depend on libelf and libz, same as libbpf, so doesn't add any
new requirements (and bootstrap bpftool still implementes entire
`bpftool gen` functionality, which is quite convenient).

Second problem is even more interesting. Both core_btfgen and core_reloc
reuse the same set of struct core_reloc_test_case array of test case
definitions. That in itself is not a problem, but btfgen test replaces
test_case->btf_src_file property with the path to temporary file into
which minimized BTF is output by bpftool. This interferes with original
core_reloc tests, depending on order of tests execution (core_btfgen is
run first in sequential mode and skrews up subsequent core_reloc run by
pointing to already deleted temporary file, instead of the original BTF
files) and whether those two runs share the same process (in parallel
mode the chances are high for them to run in two separate processes and
so not interfere with each other).

To prevent this interference, create and use local copy of a test
definition. Mark original array as constant to catch accidental
modifcations. Note that setup_type_id_case_success() and
setup_type_id_case_success() still modify common test_case->output
memory area, but it is ok as each setup function has to re-initialize it
completely anyways. In sequential mode it leads to deterministic and
correct initialization. In parallel mode they will either each have
their own process, or if core_reloc and core_btfgen happen to be run by
the same worker process, they will still do that sequentially within the
worker process. If they are sharded across multiple processes, they
don't really share anything anyways.

Also, rename core_btfgen into core_reloc_btfgen, as it is indeed just
a "flavor" of core_reloc test, not an independent set of tests. So make
it more obvious.

Last problem that needed solving was that location of bpftool differs
between test_progs and test_progs' flavors (e.g., test_progs-no_alu32).
To keep it simple, create a symlink to bpftool both inside
selftests/bpf/ directory and selftests/bpf/<flavor> subdirectory. That
way, from inside core_reloc test, location to bpftool is just "./bpftool".

v2->v3:
  - fix bpftool location relative the test_progs-no_alu32;
v1->v2:
  - fix corruption of core_reloc_test_case.

Fixes: 704c91e59fe0 ("selftests/bpf: Test "bpftool gen min_core_btf")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yucong Sun <sunyucong@gmail.com>
Link: https://lore.kernel.org/bpf/20220220042720.3336684-1-andrii@kernel.org
3 years agobpf: Initialize ret to 0 inside btf_populate_kfunc_set()
Souptick Joarder (HPE) [Sat, 19 Feb 2022 16:39:15 +0000 (22:09 +0530)]
bpf: Initialize ret to 0 inside btf_populate_kfunc_set()

Kernel test robot reported below error ->

kernel/bpf/btf.c:6718 btf_populate_kfunc_set()
error: uninitialized symbol 'ret'.

Initialize ret to 0.

Fixes: dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20220219163915.125770-1-jrdr.linux@gmail.com
3 years agoselftests/bpf: Fix a clang deprecated-declarations compilation error
Yonghong Song [Thu, 17 Feb 2022 19:40:05 +0000 (11:40 -0800)]
selftests/bpf: Fix a clang deprecated-declarations compilation error

Build the kernel and selftest with clang compiler with LLVM=1,
  make -j LLVM=1
  make -C tools/testing/selftests/bpf -j LLVM=1

I hit the following selftests/bpf compilation error:
  In file included from test_cpp.cpp:3:
  /.../tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:73:8:
    error: 'relaxed_core_relocs' is deprecated: libbpf v0.6+: field has no effect [-Werror,-Wdeprecated-declarations]
  struct bpf_object_open_opts {
         ^
  test_cpp.cpp:56:2: note: in implicit move constructor for 'bpf_object_open_opts' first required here
          LIBBPF_OPTS(bpf_object_open_opts, opts);
          ^
  /.../tools/testing/selftests/bpf/tools/include/bpf/libbpf_common.h:77:3: note: expanded from macro 'LIBBPF_OPTS'
                  (struct TYPE) {                                             \
                  ^
  /.../tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:90:2: note: 'relaxed_core_relocs' has been explicitly marked deprecated here
          LIBBPF_DEPRECATED_SINCE(0, 6, "field has no effect")
          ^
  /.../tools/testing/selftests/bpf/tools/include/bpf/libbpf_common.h:24:4: note: expanded from macro 'LIBBPF_DEPRECATED_SINCE'
                  (LIBBPF_DEPRECATED("libbpf v" # major "." # minor "+: " msg))
                   ^
  /.../tools/testing/selftests/bpf/tools/include/bpf/libbpf_common.h:19:47: note: expanded from macro 'LIBBPF_DEPRECATED'
  #define LIBBPF_DEPRECATED(msg) __attribute__((deprecated(msg)))

There are two ways to fix the issue, one is to use GCC diagnostic ignore pragma, and the
other is to open code bpf_object_open_opts instead of using LIBBPF_OPTS.
Since in general LIBBPF_OPTS is preferred, the patch fixed the issue by
adding proper GCC diagnostic ignore pragmas.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217194005.2765348-1-yhs@fb.com
3 years agobpf: Call maybe_wait_bpf_programs() only once from generic_map_delete_batch()
Eric Dumazet [Fri, 18 Feb 2022 18:18:01 +0000 (10:18 -0800)]
bpf: Call maybe_wait_bpf_programs() only once from generic_map_delete_batch()

As stated in the comment found in maybe_wait_bpf_programs(),
the synchronize_rcu() barrier is only needed before returning
to userspace, not after each deletion in the batch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220218181801.2971275-1-eric.dumazet@gmail.com
3 years agoipv6: annotate some data-races around sk->sk_prot
Eric Dumazet [Thu, 17 Feb 2022 23:48:41 +0000 (15:48 -0800)]
ipv6: annotate some data-races around sk->sk_prot

IPv6 has this hack changing sk->sk_prot when an IPv6 socket
is 'converted' to an IPv4 one with IPV6_ADDRFORM option.

This operation is only performed for TCP and UDP, knowing
their 'struct proto' for the two network families are populated
in the same way, and can not disappear while a reader
might use and dereference sk->sk_prot.

If we think about it all reads of sk->sk_prot while
either socket lock or RTNL is not acquired should be using READ_ONCE().

Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also
write over sk->sk_prot.

BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt

write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0:
 do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline]
 ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019
 udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649
 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489
 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
 __do_sys_setsockopt net/socket.c:2191 [inline]
 __se_sys_setsockopt net/socket.c:2188 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1:
 inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659
 ____sys_recvmsg+0x16c/0x320
 ___sys_recvmsg net/socket.c:2674 [inline]
 do_recvmmsg+0x3f5/0xae0 net/socket.c:2768
 __sys_recvmmsg net/socket.c:2847 [inline]
 __do_sys_recvmmsg net/socket.c:2870 [inline]
 __se_sys_recvmmsg net/socket.c:2863 [inline]
 __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ibmvnic: Cleanup workaround doing an EOI after partition migration
Cédric Le Goater [Fri, 18 Feb 2022 08:07:08 +0000 (09:07 +0100)]
net/ibmvnic: Cleanup workaround doing an EOI after partition migration

There were a fair amount of changes to workaround a firmware bug leaving
a pending interrupt after migration of the ibmvnic device :

commit 2df5c60e198c ("net/ibmvnic: Ignore H_FUNCTION return from H_EOI
            to tolerate XIVE mode")
commit 284f87d2f387 ("Revert "net/ibmvnic: Fix EOI when running in
            XIVE mode"")
commit 11d49ce9f794 ("net/ibmvnic: Fix EOI when running in XIVE mode.")
commit f23e0643cd0b ("ibmvnic: Clear pending interrupt after device reset")

Here is the final one taking into account the XIVE interrupt mode.

Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoteaming: deliver link-local packets with the link they arrive on
jeffreyji [Thu, 17 Feb 2022 21:23:12 +0000 (21:23 +0000)]
teaming: deliver link-local packets with the link they arrive on

skb is ignored if team port is disabled. We want the skb to be delivered
if it's an link layer packet.

Issue is already fixed for bonding in
commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")

changelog:

v2: change LLDP -> link layer in comments/commit descrip, comment format

Signed-off-by: jeffreyji <jeffreyji@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'qca8k-phylink'
David S. Miller [Fri, 18 Feb 2022 11:28:33 +0000 (11:28 +0000)]
Merge branch 'qca8k-phylink'

Russell King says:

====================
net: dsa: qca8k: convert to phylink_pcs and mark as non-legacy

This series adds support into DSA for the mac_select_pcs method, and
converts qca8k to make use of this, eventually marking qca8k as non-
legacy.

Patch 1 adds DSA support for mac_select_pcs.
Patch 2 and patch 3 moves code around in qca8k to make patch 4 more
readable.
Patch 4 does a simple conversion to phylink_pcs.
Patch 5 moves the serdes configuration to phylink_pcs.
Patch 6 marks qca8k as non-legacy.

v2: fix dsa_phylink_mac_select_pcs() formatting and double-blank line
in patch 5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: mark as non-legacy
Russell King (Oracle) [Thu, 17 Feb 2022 18:31:01 +0000 (18:31 +0000)]
net: dsa: qca8k: mark as non-legacy

The qca8k driver does not make use of the speed, duplex, pause or
advertisement in its phylink_mac_config() implementation, so it can be
marked as a non-legacy driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: move pcs configuration
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:56 +0000 (18:30 +0000)]
net: dsa: qca8k: move pcs configuration

Move the PCS configuration to qca8k_pcs_config().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: convert to use phylink_pcs
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:51 +0000 (18:30 +0000)]
net: dsa: qca8k: convert to use phylink_pcs

Convert the qca8k driver to use the phylink_pcs support to talk to the
SGMII PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: move qca8k_phylink_mac_link_state()
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:45 +0000 (18:30 +0000)]
net: dsa: qca8k: move qca8k_phylink_mac_link_state()

Move qca8k_phylink_mac_link_state() to separate the code movement from
code changes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: move qca8k_setup()
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:40 +0000 (18:30 +0000)]
net: dsa: qca8k: move qca8k_setup()

Move qca8k_setup() to be later in the file to avoid needing prototypes
for called functions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: add support for phylink mac_select_pcs()
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:35 +0000 (18:30 +0000)]
net: dsa: add support for phylink mac_select_pcs()

Add DSA support for the phylink mac_select_pcs() method so DSA drivers
can return provide phylink with the appropriate PCS for the PHY
interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: xilinx: cleanup comments
Tom Rix [Thu, 17 Feb 2022 16:05:18 +0000 (08:05 -0800)]
net: ethernet: xilinx: cleanup comments

Remove the second 'the'.
Replacements:
endiannes to endianness
areconnected to are connected
Mamagement to Management
undoccumented to undocumented
Xilink to Xilinx
strucutre to structure

Change kernel-doc comment style to c style for
/* Management ...

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: gro: Fix a 'directive in macro's argument list' sparse warning
Gal Pressman [Thu, 17 Feb 2022 08:07:55 +0000 (10:07 +0200)]
net: gro: Fix a 'directive in macro's argument list' sparse warning

Following the cited commit, sparse started complaining about:
../include/net/gro.h:58:1: warning: directive in macro's argument list
../include/net/gro.h:59:1: warning: directive in macro's argument list

Fix that by moving the defines out of the struct_group() macro.

Fixes: de5a1f3ce4c8 ("net: gro: minor optimization for dev_gro_receive()")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agos390/qeth: Remove redundant 'flush_workqueue()' calls
Xu Wang [Wed, 16 Feb 2022 07:51:55 +0000 (07:51 +0000)]
s390/qeth: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220216075155.940-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: delete unused exported symbols for ethtool PHY stats
Vladimir Oltean [Wed, 16 Feb 2022 19:37:26 +0000 (21:37 +0200)]
net: dsa: delete unused exported symbols for ethtool PHY stats

Introduced in commit cf963573039a ("net: dsa: Allow providing PHY
statistics from CPU port"), it appears these were never used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220216193726.2926320-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: add sanity check in proto_register()
Eric Dumazet [Wed, 16 Feb 2022 17:18:01 +0000 (09:18 -0800)]
net: add sanity check in proto_register()

prot->memory_allocated should only be set if prot->sysctl_mem
is also set.

This is a followup of commit 25206111512d ("crypto: af_alg - get
rid of alg_memory_allocated").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220216171801.3604366-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ll_temac: Use GFP_KERNEL instead of GFP_ATOMIC when possible
Christophe JAILLET [Wed, 16 Feb 2022 20:16:16 +0000 (21:16 +0100)]
net: ll_temac: Use GFP_KERNEL instead of GFP_ATOMIC when possible

XTE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and the default value for
'rx_bd_num' is RX_BD_NUM_DEFAULT (i.e. 1024)

So this loop allocates more than 9 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/694abd65418b2b3974106a82d758e3474c65ae8f.1645042560.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible
Christophe JAILLET [Wed, 16 Feb 2022 20:38:11 +0000 (21:38 +0100)]
net: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible

NIXGE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and RX_BD_NUM 128.

So this loop allocates more than 1 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/28d2c8e05951ad02a57eb48333672947c8bb4f81.1645043881.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mptcp-selftest-fine-tuning-and-cleanup'
Jakub Kicinski [Fri, 18 Feb 2022 04:00:02 +0000 (20:00 -0800)]
Merge branch 'mptcp-selftest-fine-tuning-and-cleanup'

Mat Martineau says:

====================
mptcp: Selftest fine-tuning and cleanup

Patch 1 adjusts the mptcp selftest timeout to account for slow machines
running debug builds.

Patch 2 simplifies one test function.

Patches 3-6 do some cleanup, like deleting unused variables and avoiding
extra work when only printing usage information.

Patch 7 improves the checksum tests by utilizing existing checksum MIBs.
====================

Link: https://lore.kernel.org/r/20220218030311.367536-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: add csum mib check for mptcp_connect
Geliang Tang [Fri, 18 Feb 2022 03:03:11 +0000 (19:03 -0800)]
selftests: mptcp: add csum mib check for mptcp_connect

This patch added the data checksum error mib counters check for the
script mptcp_connect.sh when the data checksum is enabled.

In do_transfer(), got the mib counters twice, before and after running
the mptcp_connect commands. The latter minus the former is the actual
number of the data checksum mib counter.

The output looks like this:

ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration    86ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration    66ms) [ FAIL ]
server got 1 data checksum error[s]

Fixes: 94d66ba1d8e48 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/255
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: join: check for tools only if needed
Matthieu Baerts [Fri, 18 Feb 2022 03:03:10 +0000 (19:03 -0800)]
selftests: mptcp: join: check for tools only if needed

To allow showing the 'help' menu even if these tools are not available.

While at it, also avoid launching the command then checking $?. Instead,
the check is directly done in the 'if'.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: join: create tmp files only if needed
Matthieu Baerts [Fri, 18 Feb 2022 03:03:09 +0000 (19:03 -0800)]
selftests: mptcp: join: create tmp files only if needed

These tmp files will only be created when a test will be launched.

This avoid 'dd' output when '-h' is used for example.

While at it, also avoid creating netns that will be removed when
starting the first test.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: join: remove unused vars
Matthieu Baerts [Fri, 18 Feb 2022 03:03:08 +0000 (19:03 -0800)]
selftests: mptcp: join: remove unused vars

Shellcheck found that these variables were set but never used.

Note that rndh is no longer prefixed with '0-' but it doesn't change
anything.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: join: exit after usage()
Matthieu Baerts [Fri, 18 Feb 2022 03:03:07 +0000 (19:03 -0800)]
selftests: mptcp: join: exit after usage()

With an error if it is an unknown option.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: simplify pm_nl_change_endpoint
Geliang Tang [Fri, 18 Feb 2022 03:03:06 +0000 (19:03 -0800)]
selftests: mptcp: simplify pm_nl_change_endpoint

This patch simplified pm_nl_change_endpoint(), using id-based address
lookups only. And dropped the fragile way of parsing 'addr' and 'id'
from the output of pm_nl_show_endpoints().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: increase timeout to 20 minutes
Matthieu Baerts [Fri, 18 Feb 2022 03:03:05 +0000 (19:03 -0800)]
selftests: mptcp: increase timeout to 20 minutes

With the increase number of tests, one CI instance, using a debug kernel
config and not recent hardware, takes around 10 minutes to execute the
slowest MPTCP test: mptcp_join.sh.

Even if most CIs don't take that long to execute these tests --
typically max 10 minutes to run all selftests -- it will help some of
them if the timeout is increased.

The timeout could be disabled but it is always good to have an extra
safeguard, just in case.

Please note that on slow public CIs with kernel debug settings, it has
been observed it can easily take up to 45 minutes to execute all tests
in this very slow environment with other jobs running in parallel.
The slowest test, mptcp_join.sh takes ~30 minutes in this case.

In such environments, the selftests timeout set in the 'settings' file
is disabled because this environment is known as being exceptionnally
slow. It has been decided not to take such exceptional environments into
account and set the timeout to 20min.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Fri, 18 Feb 2022 01:23:51 +0000 (17:23 -0800)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2022-02-17

We've added 29 non-merge commits during the last 8 day(s) which contain
a total of 34 files changed, 1502 insertions(+), 524 deletions(-).

The main changes are:

1) Add BTFGen support to bpftool which allows to use CO-RE in kernels without
   BTF info, from Mauricio Vásquez, Rafael David Tinoco, Lorenzo Fontana and
   Leonardo Di Donato. (Details: https://lpc.events/event/11/contributions/948/)

2) Prepare light skeleton to be used in both kernel module and user space
   and convert bpf_preload.ko to use light skeleton, from Alexei Starovoitov.

3) Rework bpftool's versioning scheme and align with libbpf's version number;
   also add linked libbpf version info to "bpftool version", from Quentin Monnet.

4) Add minimal C++ specific additions to bpftool's skeleton codegen to
   facilitate use of C skeletons in C++ applications, from Andrii Nakryiko.

5) Add BPF verifier sanity check whether relative offset on kfunc calls overflows
   desc->imm and reject the BPF program if the case, from Hou Tao.

6) Fix libbpf to use a dynamically allocated buffer for netlink messages to
   avoid receiving truncated messages on some archs, from Toke Høiland-Jørgensen.

7) Various follow-up fixes to the JIT bpf_prog_pack allocator, from Song Liu.

8) Various BPF selftest and vmtest.sh fixes, from Yucong Sun.

9) Fix bpftool pretty print handling on dumping map keys/values when no BTF
   is available, from Jiri Olsa and Yinjun Zhang.

10) Extend XDP frags selftest to check for invalid length, from Lorenzo Bianconi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (29 commits)
  bpf: bpf_prog_pack: Set proper size before freeing ro_header
  selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails
  selftests/bpf: Fix vmtest.sh to launch smp vm.
  libbpf: Fix memleak in libbpf_netlink_recv()
  bpftool: Fix C++ additions to skeleton
  bpftool: Fix pretty print dump for maps without BTF loaded
  selftests/bpf: Test "bpftool gen min_core_btf"
  bpftool: Gen min_core_btf explanation and examples
  bpftool: Implement btfgen_get_btf()
  bpftool: Implement "gen min_core_btf" logic
  bpftool: Add gen min_core_btf command
  libbpf: Expose bpf_core_{add,free}_cands() to bpftool
  libbpf: Split bpf_core_apply_relo()
  bpf: Reject kfunc calls that overflow insn->imm
  selftests/bpf: Add Skeleton templated wrapper as an example
  bpftool: Add C++-specific open/load/etc skeleton wrappers
  selftests/bpf: Fix GCC11 compiler warnings in -O2 mode
  bpftool: Fix the error when lookup in no-btf maps
  libbpf: Use dynamically allocated buffer when receiving netlink messages
  libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0
  ...
====================

Link: https://lore.kernel.org/r/20220217232027.29831-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobpf: bpf_prog_pack: Set proper size before freeing ro_header
Song Liu [Thu, 17 Feb 2022 18:30:01 +0000 (10:30 -0800)]
bpf: bpf_prog_pack: Set proper size before freeing ro_header

bpf_prog_pack_free() uses header->size to decide whether the header
should be freed with module_memfree() or the bpf_prog_pack logic.
However, in kvmalloc() failure path of bpf_jit_binary_pack_alloc(),
header->size is not set yet. As a result, bpf_prog_pack_free() may treat
a slice of a pack as a standalone kvmalloc'd header and call
module_memfree() on the whole pack. This in turn causes use-after-free by
other users of the pack.

Fix this by setting ro_header->size before freeing ro_header.

Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
Reported-by: syzbot+ecb1e7e51c52f68f7481@syzkaller.appspotmail.com
Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220217183001.1876034-1-song@kernel.org
3 years agoMerge branch 'prestera-route-offloading'
David S. Miller [Thu, 17 Feb 2022 20:45:31 +0000 (20:45 +0000)]
Merge branch 'prestera-route-offloading'

Yevhen Orlov says:

====================
net: marvell: prestera: add basic routes offloading

Add support for blackhole and local routes for Marvell Prestera driver.
Subscribe on fib notifications and handle add/del.

Add features:
 - Support route adding.
   e.g.: "ip route add blackhole 7.7.1.1/24"
   e.g.: "ip route add local 9.9.9.9 dev sw1p30"
 - Support "rt_trap", "rt_offload", "rt_offload_failed" flags
 - Handle case, when route in "local" table overlaps route in "main" table
   example:
        ip ro add blackhole 7.7.7.7
        ip ro add local 7.7.7.7 dev sw1p30
        # blackhole route will be deoffloaded. rt_offload flag disappeared

Limitations:
 - Only "blackhole" and "local" routes supported. "nexthop" routes is TRAP
   for now and will be implemented soon.
 - Only "local" and "main" tables supported
====================

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: marvell: prestera: handle fib notifications
Yevhen Orlov [Wed, 16 Feb 2022 01:05:57 +0000 (03:05 +0200)]
net: marvell: prestera: handle fib notifications

For now we support only TRAP or DROP, so we can offload only "local" or
"blackhole" routes.
Nexthop routes is TRAP for now. Will be implemented soon.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: marvell: prestera: add hardware router objects accounting for lpm
Yevhen Orlov [Wed, 16 Feb 2022 01:05:56 +0000 (03:05 +0200)]
net: marvell: prestera: add hardware router objects accounting for lpm

Add new router_hw object "fib_node". For now it support only DROP and
TRAP mode.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: marvell: prestera: Add router LPM ABI
Yevhen Orlov [Wed, 16 Feb 2022 01:05:55 +0000 (03:05 +0200)]
net: marvell: prestera: Add router LPM ABI

Add functions to create/delete lpm entry in hw.
prestera_hw_lpm_add() take index of allocated virtual router.
Also it takes grp_id, which is index of allocated nexthop group.
ABI to create nexthop group will be added soon.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Feb 2022 20:22:28 +0000 (12:22 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Fast path bpf marge for some -next work.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 17 Feb 2022 20:01:54 +0000 (12:01 -0800)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2022-02-17

We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).

The main changes are:

1) Add schedule points in map batch ops, from Eric.

2) Fix bpf_msg_push_data with len 0, from Felix.

3) Fix crash due to incorrect copy_map_value, from Kumar.

4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.

5) Fix a bpf_timer initialization issue with clang, from Yonghong.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add schedule points in batch ops
  bpf: Fix crash due to out of bounds access into reg2btf_ids.
  selftests: bpf: Check bpf_msg_push_data return value
  bpf: Fix a bpf_timer initialization issue
  bpf: Emit bpf_timer in vmlinux BTF
  selftests/bpf: Add test for bpf_timer overwriting crash
  bpf: Fix crash due to incorrect copy_map_value
  bpf: Do not try bpf_msg_push_data with len 0
====================

Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Feb 2022 19:44:20 +0000 (11:44 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - dsa: lantiq_gswip: fix use after free in gswip_remove()

   - smc: avoid overwriting the copies of clcsock callback functions

  Current release - new code bugs:

   - iwlwifi:
      - fix use-after-free when no FW is present
      - mei: fix the pskb_may_pull check in ipv4
      - mei: retry mapping the shared area
      - mvm: don't feed the hardware RFKILL into iwlmei

  Previous releases - regressions:

   - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()

   - tipc: fix wrong publisher node address in link publications

   - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
     assertion

   - bgmac: make idm and nicpm resource optional again

   - atl1c: fix tx timeout after link flap

  Previous releases - always broken:

   - vsock: remove vsock from connected table when connect is
     interrupted by a signal

   - ping: change destination interface checks to match raw sockets

   - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
     semantics (and null-deref) after SO_RESERVE_MEM was added

   - ipv6: make exclusive flowlabel checks per-netns

   - bonding: force carrier update when releasing slave

   - sched: limit TC_ACT_REPEAT loops

   - bridge: multicast: notify switchdev driver whenever MC processing
     gets disabled because of max entries reached

   - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found

   - iwlwifi: fix locking when "HW not ready"

   - phy: mediatek: remove PHY mode check on MT7531

   - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN

   - dsa: lan9303:
      - fix polarity of reset during probe
      - fix accelerated VLAN handling"

* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  bonding: force carrier update when releasing slave
  nfp: flower: netdev offload check for ip6gretap
  ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
  ipv4: fix data races in fib_alias_hw_flags_set
  net: dsa: lan9303: add VLAN IDs to master device
  net: dsa: lan9303: handle hwaccel VLAN tags
  vsock: remove vsock from connected table when connect is interrupted by a signal
  Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
  ping: fix the dif and sdif check in ping_lookup
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
  net: sched: limit TC_ACT_REPEAT loops
  tipc: fix wrong notification node addresses
  net: dsa: lantiq_gswip: fix use after free in gswip_remove()
  ipv6: per-netns exclusive flowlabel checks
  net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
  CDC-NCM: avoid overflow in sanity checking
  mctp: fix use after free
  net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
  bonding: fix data-races around agg_select_timer
  dpaa2-eth: Initialize mutex used in one step timestamping path
  ...

3 years agoselftests/bpf: Fix crash in core_reloc when bpftool btfgen fails
Yucong Sun [Thu, 17 Feb 2022 18:02:10 +0000 (10:02 -0800)]
selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails

Avoid unnecessary goto cleanup, as there is nothing to clean up.

Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217180210.2981502-1-fallentree@fb.com
3 years agoselftests/bpf: Fix vmtest.sh to launch smp vm.
Yucong Sun [Thu, 17 Feb 2022 15:52:12 +0000 (07:52 -0800)]
selftests/bpf: Fix vmtest.sh to launch smp vm.

Fix typo in vmtest.sh to make sure it launch proper vm with 8 cpus.

Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217155212.2309672-1-fallentree@fb.com
3 years agobonding: force carrier update when releasing slave
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
bonding: force carrier update when releasing slave

In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.

Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.

Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0

Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobpf: Add schedule points in batch ops
Eric Dumazet [Thu, 17 Feb 2022 18:19:02 +0000 (10:19 -0800)]
bpf: Add schedule points in batch ops

syzbot reported various soft lockups caused by bpf batch operations.

 INFO: task kworker/1:1:27 blocked for more than 140 seconds.
 INFO: task hung in rcu_barrier

Nothing prevents batch ops to process huge amount of data,
we need to add schedule points in them.

Note that maybe_wait_bpf_programs(map) calls from
generic_map_delete_batch() can be factorized by moving
the call after the loop.

This will be done later in -next tree once we get this fix merged,
unless there is strong opinion doing this optimization sooner.

Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops")
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
3 years agofs/file_table: fix adding missing kmemleak_not_leak()
Luis Chamberlain [Tue, 15 Feb 2022 02:08:28 +0000 (18:08 -0800)]
fs/file_table: fix adding missing kmemleak_not_leak()

Commit b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().

Fixes: b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
Reported-by: Tong Zhang <ztong0001@gmail.com>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 17 Feb 2022 18:06:09 +0000 (10:06 -0800)]
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix corrupt inject files when only last branch option is enabled with
   ARM CoreSight ETM

 - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12

 - Defer freeing string after possible strlen() on it in the BPF loader,
   found by gcc 12

 - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
   processes

 - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
   initialization

 - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf

 - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
   the CPU iterator

 - Sync linux/perf_event.h UAPI with the kernel sources

 - Update Jiri Olsa's email address in MAINTAINERS

* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf: Defer freeing string after possible strlen() on it
  perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
  libsubcmd: Fix use-after-free for realloc(..., 0)
  libperf: Fix perf_cpu_map__for_each_cpu macro
  perf cs-etm: Fix corrupt inject files when only last branch option is enabled
  perf cs-etm: No-op refactor of synth opt usage
  libperf: Fix 32-bit build for tests uint64_t printf
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  perf trace: Avoid early exit due SIGCHLD from non-workload processes
  MAINTAINERS: Update Jiri's email address

3 years agoMerge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof...
Linus Torvalds [Thu, 17 Feb 2022 17:54:00 +0000 (09:54 -0800)]
Merge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module fix from Luis Chamberlain:
 "Fixes module decompression when CONFIG_SYSFS=n

  The only fix trickled down for v5.17-rc cycle so far is the fix for
  module decompression when CONFIG_SYSFS=n. This was reported through
  0-day"

* tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: fix building with sysfs disabled

3 years agonfp: flower: netdev offload check for ip6gretap
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
nfp: flower: netdev offload check for ip6gretap

IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.

Fixes: f7536ffb0986 ("nfp: flower: Allow ipv6gretap interface for offloading")
Signed-off-by: Danie du Toit <danie.dutoit@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv4: fix data races in fib_alias_hw_flags_set
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
ipv4: fix data races in fib_alias_hw_flags_set

fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.

We need to annotate accesses to following fields of struct fib_alias:

    offload, trap, offload_failed

Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.

BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set

read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
 fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
 fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

value changed: 0x00 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 90b93f1b31f8 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: lan9303: add VLAN IDs to master device
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
net: dsa: lan9303: add VLAN IDs to master device

If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received.  Do this in the
port_enable() function, and remove them in port_disable().

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: lan9303: handle hwaccel VLAN tags
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
net: dsa: lan9303: handle hwaccel VLAN tags

Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
where the VLAN tag has already been consumed by the master.

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomm: don't try to NUMA-migrate COW pages that have other uses
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
mm: don't try to NUMA-migrate COW pages that have other uses

Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:

 "All the details are in the bug, but the bottom line is that somehow,
  this patch causes corruption when the numa balancing feature is
  enabled AND we don't use process affinity AND we use GUP to pin pages
  so our accelerator can DMA to/from system memory.

  Either disabling numa balancing, using process affinity to bind to
  specific numa-node or reverting this patch causes the bug to
  disappear"

and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()
simplification").

Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW.  But it appears it
does.  Suspicious.

However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.

The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.

Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.

Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information).  Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.

The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/
Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agovsock: remove vsock from connected table when connect is interrupted by a signal
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
vsock: remove vsock from connected table when connect is interrupted by a signal

vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.

Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.

Note for backporting: this patch requires d5afa82c977e ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRevert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
Jonas Gorski [Wed, 16 Feb 2022 18:46:34 +0000 (10:46 -0800)]
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"

This reverts commit 3710e80952cf2dc48257ac9f145b117b5f74e0a5.

Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).

The same change was already reverted once with 755f5738ff98 ("net:
broadcom: fix a mistake about ioremap resource").

So let's do it again.

Fixes: 3710e80952cf ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv6/addrconf: ensure addrconf_verify_rtnl() has completed
Eric Dumazet [Wed, 16 Feb 2022 18:20:37 +0000 (10:20 -0800)]
ipv6/addrconf: ensure addrconf_verify_rtnl() has completed

Before freeing the hash table in addrconf_exit_net(),
we need to make sure the work queue has completed,
or risk NULL dereference or UAF.

Thus, use cancel_delayed_work_sync() to enforce this.
We do not hold RTNL in addrconf_exit_net(), making this safe.

Fixes: 8805d13ff1b2 ("ipv6/addrconf: use one delayed work per netns")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: allow out-of-order netdev unregistration
Jakub Kicinski [Tue, 15 Feb 2022 22:53:10 +0000 (14:53 -0800)]
net: allow out-of-order netdev unregistration

Sprinkle for each loops to allow netdevices to be unregistered
out of order, as their refs are released.

This prevents problems caused by dependencies between netdevs
which want to release references in their ->priv_destructor.
See commit d6ff94afd90b ("vlan: move dev_put into vlan_dev_uninit")
for example.

Eric has removed the only known ordering requirement in
commit c002496babfd ("Merge branch 'ipv6-loopback'")
so let's try this and see if anything explodes...

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: transition netdev reg state earlier in run_todo
Jakub Kicinski [Tue, 15 Feb 2022 22:53:09 +0000 (14:53 -0800)]
net: transition netdev reg state earlier in run_todo

In prep for unregistering netdevs out of order move the netdev
state validation and change outside of the loop.

While at it modernize this code and use WARN() instead of
pr_err() + dump_stack().

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agolibbpf: Fix memleak in libbpf_netlink_recv()
Andrii Nakryiko [Thu, 17 Feb 2022 07:39:58 +0000 (23:39 -0800)]
libbpf: Fix memleak in libbpf_netlink_recv()

Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in
all code paths.

Fixes: 9c3de619e13e ("libbpf: Use dynamically allocated buffer when receiving netlink messages")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org
3 years agoping: fix the dif and sdif check in ping_lookup
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
ping: fix the dif and sdif check in ping_lookup

When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 192.168.111.1/24 dev dummy0
  # ping -I dummy0 192.168.111.1 -c1

The issue was also reported on:

  https://github.com/iputils/iputils/issues/104

But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.

This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Daniele Palmas [Tue, 15 Feb 2022 11:13:35 +0000 (12:13 +0100)]
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990

Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
0x1071 composition in order to avoid bind error.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ping6-SOL_IPV6'
David S. Miller [Thu, 17 Feb 2022 14:22:10 +0000 (14:22 +0000)]
Merge branch 'ping6-SOL_IPV6'

Jakub Kicinski says:

====================
net: ping6: support setting basic SOL_IPV6 options via cmsg

Support for IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG on ICMPv6
sockets and associated tests. I have no immediate plans to
implement IPV6_FLOWINFO and all the extension header stuff.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: basic test for IPV6_2292*
Jakub Kicinski [Thu, 17 Feb 2022 01:21:20 +0000 (17:21 -0800)]
selftests: net: basic test for IPV6_2292*

Add a basic test to make sure ping sockets don't crash
with IPV6_2292* options.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: test IPV6_HOPLIMIT
Jakub Kicinski [Thu, 17 Feb 2022 01:21:19 +0000 (17:21 -0800)]
selftests: net: test IPV6_HOPLIMIT

Test setting IPV6_HOPLIMIT via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

  Case HOPLIMIT ICMP cmsg - packet data returned 1, expected 0
  Case HOPLIMIT ICMP diff - packet data returned 1, expected 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: test IPV6_TCLASS
Jakub Kicinski [Thu, 17 Feb 2022 01:21:18 +0000 (17:21 -0800)]
selftests: net: test IPV6_TCLASS

Test setting IPV6_TCLASS via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

  Case TCLASS ICMP cmsg - packet data returned 1, expected 0
  Case TCLASS ICMP cmsg - rejection returned 0, expected 1
  Case TCLASS ICMP diff - pass returned 1, expected 0
  Case TCLASS ICMP diff - packet data returned 1, expected 0
  Case TCLASS ICMP diff - rejection returned 0, expected 1

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: test IPV6_DONTFRAG
Jakub Kicinski [Thu, 17 Feb 2022 01:21:17 +0000 (17:21 -0800)]
selftests: net: test IPV6_DONTFRAG

Test setting IPV6_DONTFRAG via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

    Case DONTFRAG ICMP setsock returned 0, expected 1
    Case DONTFRAG ICMP cmsg returned 0, expected 1
    Case DONTFRAG ICMP both returned 0, expected 1
    Case DONTFRAG ICMP diff returned 0, expected 1
  FAIL - 4/24 cases failed

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ping6: support setting basic SOL_IPV6 options via cmsg
Jakub Kicinski [Thu, 17 Feb 2022 01:21:16 +0000 (17:21 -0800)]
net: ping6: support setting basic SOL_IPV6 options via cmsg

Support setting IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG
during sendmsg via SOL_IPV6 cmsgs.

tclass and dontfrag are init'ed from struct ipv6_pinfo in
ipcm6_init_sk(), while hlimit is inited to -1, so we need
to handle it being populated via cmsg explicitly.

Leave extension headers and flowlabel unimplemented.
Those are slightly more laborious to test and users
seem to primarily care about IPV6_TCLASS.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'switchdev-BRENTRY'
David S. Miller [Thu, 17 Feb 2022 14:17:10 +0000 (14:17 +0000)]
Merge branch 'switchdev-BRENTRY'

Vladimir Oltean says:

====================
kRemove BRENTRY checks from switchdev drivers

As discussed here:
https://patchwork.kernel.org/project/netdevbpf/patch/20220214233111.1586715-2-vladimir.oltean@nxp.com/#24738869

no switchdev driver makes use of VLAN port objects that lack the
BRIDGE_VLAN_INFO_BRENTRY flag. Notifying them in the first place rather
seems like an omission of commit 9c86ce2c1ae3 ("net: bridge: Notify
about bridge VLANs").

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag") that was just merged, the bridge no
longer notifies switchdev upon creation of these VLANs, so we can remove
the checks from drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:52 +0000 (18:47 +0200)]
net: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:51 +0000 (18:47 +0200)]
net: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:50 +0000 (18:47 +0200)]
net: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:49 +0000 (18:47 +0200)]
net: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:48 +0000 (18:47 +0200)]
mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ptp-over-udp-dsa'
David S. Miller [Thu, 17 Feb 2022 14:06:51 +0000 (14:06 +0000)]
Merge branch 'ptp-over-udp-dsa'

Vladimir Oltean says:

====================
Support PTP over UDP with the ocelot-8021q DSA tagging protocol

The alternative tag_8021q-based tagger for Ocelot switches, added here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/

gained support for PTP over L2 here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210213223801.1334216-1-olteanv@gmail.com/

mostly as a minimum viable requirement. That PTP support was mostly
self-contained code that installed some rules to replicate PTP packets
on the CPU queue, in felix_setup_mmio_filtering().

However ocelot-8021q starts to look more interesting for general purpose
usage, so it is now time to reduce the technical debt by integrating the
PTP traps used by Felix for tag_8021q with the rest of the Ocelot driver.

There is further consolidation of traps to be done. The cookies used by
MRP traps overlap with the cookies used for tag_8021q PTP traps, so
those features could not be used at the same time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets
Vladimir Oltean [Wed, 16 Feb 2022 14:30:14 +0000 (16:30 +0200)]
net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets

DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the
expectation is that TX checksumming is offloaded and not done in
software.

Normally the DSA master takes care of this, but packets handled by
ocelot_defer_xmit() are a very special exception, because they are
actually injected into the switch through register-based MMIO. So the
DSA master is not involved at all for these packets => no one calculates
the checksum.

This allows PTP over UDP to work using the ocelot-8021q tagging
protocol.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>