samples: bpf: Add BPF support for devmap_xmit tracepoint
This adds support for the devmap_xmit tracepoint, and its multi device
variant that can be used to obtain streams for each individual
net_device to net_device redirection. This is useful for decomposing
total xmit stats in xdp_monitor.
samples: bpf: Add cpumap tracepoint statistics support
This consolidates retrieval and printing into the XDP sample helper. For
the kthread stats, it expands xdp_stats separately with its own per-CPU
stats. For cpumap enqueue, we display FROM->TO stats also with its
per-CPU stats.
The help out explains in detail the various aspects of the output.
samples: bpf: Add BPF support for cpumap tracepoints
These are invoked in two places, when the XDP frame or SKB (for generic
XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes
the frame after invoking the CPUMAP program for it (returning stats for
the batch).
We use cpumap_map_id to filter on the map_id as a way to avoid printing
incorrect stats for parallel sessions of xdp_redirect_cpu.
samples: bpf: Add BPF support for xdp_exception tracepoint
This would allow us to store stats for each XDP action, including their
per-CPU counts. Consolidating this here allows all redirect samples to
detect xdp_exception events.
samples: bpf: Add redirect tracepoint statistics support
This implements per-errno reporting (for the ones we explicitly
recognize), adds some help output, and implements the stats retrieval
and printing functions.
samples: bpf: Add BPF support for redirect tracepoint
This adds the shared BPF file that will be used going forward for
sharing tracepoint programs among XDP redirect samples.
Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE
and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other
helpers that will be required.
samples: bpf: Add basic infrastructure for XDP samples
This file implements some common helpers to consolidate differences in
features and functionality between the various XDP samples and give them
a consistent look, feel, and reporting capabilities.
This commit only adds support for receive statistics, which does not
rely on any tracepoint, but on the XDP program installed on the device
by each XDP redirect sample.
Some of the key features are:
* A concise output format accompanied by helpful text explaining its
fields.
* An elaborate output format building upon the concise one, and folding
out details in case of errors and staying out of view otherwise.
* Printing driver names for devices redirecting packets.
* Getting mac address for interface.
* Printing summarized total statistics for the entire session.
* Ability to dynamically switch between concise and verbose mode, using
SIGQUIT (Ctrl + \).
In later patches, the support will be extended for each tracepoint with
its own custom output in concise and verbose mode.
tools: include: Add ethtool_drvinfo definition to UAPI header
Instead of copying the whole header in, just add the struct definitions
we need for now. In the future it can be synced as a copy of in-tree
header if required.
cookie_uid_helper_example.c: In function ‘main’:
cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive
writing 10 bytes into a region of size between 8 and 58
[-Wformat-overflow=]
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~
/home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note:
‘sprintf’ output between 53 and 103 bytes into a destination of size 100
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179 | file);
| ~~~~~
Fix by using snprintf and a sufficiently sized buffer.
tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of
size 11 [-Wstringop-overread]
35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Andrey Ignatov [Fri, 20 Aug 2021 16:39:35 +0000 (09:39 -0700)]
bpf: Fix possible out of bound write in narrow load handling
Fix a verifier bug found by smatch static checker in [0].
This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.
When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that
cnt >= ARRAY_SIZE(insn_buf)
And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.
Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.
The full report [0] is below:
kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
Alexei Starovoitov [Tue, 24 Aug 2021 21:01:11 +0000 (14:01 -0700)]
Merge branch 'selftests/bpf: minor fixups'
Li Zhijian says:
====================
Fix a few issues reported by 0Day/LKP during runing selftests/bpf.
Changelog:
V2:
- folded previous similar standalone patch to [1/5], and add acked tag
from Song Liu
- add acked tag to [2/5], [3/5] from Song Liu
- [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED
are generated by make. Otherwise, it will break out-of-tree install:
'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install'
- [5/5]: new patch
====================
Li Zhijian [Fri, 20 Aug 2021 02:55:49 +0000 (10:55 +0800)]
selftests/bpf: Exit with KSFT_SKIP if no Makefile found
This would happend when we run the tests after install kselftests
root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh
TAP version 13
1..1
# selftests: bpf: test_doc_build.sh
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_ADDRESS = "en_US.UTF-8",
LC_NAME = "en_US.UTF-8",
LC_MONETARY = "en_US.UTF-8",
LC_PAPER = "en_US.UTF-8",
LC_IDENTIFICATION = "en_US.UTF-8",
LC_TELEPHONE = "en_US.UTF-8",
LC_MEASUREMENT = "en_US.UTF-8",
LC_TIME = "en_US.UTF-8",
LC_NUMERIC = "en_US.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
# skip: bpftool files not found!
#
ok 1 selftests: bpf: test_doc_build.sh # SKIP
Li Zhijian [Fri, 20 Aug 2021 01:55:54 +0000 (09:55 +0800)]
selftests/bpf: Make test_doc_build.sh work from script directory
Previously, it fails as below:
-------------
root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh
++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh
+ SCRIPT_REL_PATH=test_doc_build.sh
++ dirname test_doc_build.sh
+ SCRIPT_REL_DIR=.
++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../
+ KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4
+ cd /opt/rootfs/v5.14-rc4
+ for tgt in docs docs-clean
+ make -s -C /opt/rootfs/v5.14-rc4/. docs
make: *** No rule to make target 'docs'. Stop.
+ for tgt in docs docs-clean
+ make -s -C /opt/rootfs/v5.14-rc4/. docs-clean
make: *** No rule to make target 'docs-clean'. Stop.
-----------
Li Zhijian [Fri, 20 Aug 2021 01:55:53 +0000 (09:55 +0800)]
selftests/bpf: Enlarge select() timeout for test_maps
0Day robot observed that it's easily timeout on a heavy load host.
-------------------
# selftests: bpf: test_maps
# Fork 1024 tasks to 'test_update_delete'
# Fork 1024 tasks to 'test_update_delete'
# Fork 100 tasks to 'test_hashmap'
# Fork 100 tasks to 'test_hashmap_percpu'
# Fork 100 tasks to 'test_hashmap_sizes'
# Fork 100 tasks to 'test_hashmap_walk'
# Fork 100 tasks to 'test_arraymap'
# Fork 100 tasks to 'test_arraymap_percpu'
# Failed sockmap unexpected timeout
not ok 3 selftests: bpf: test_maps # exit=1
# selftests: bpf: test_lru_map
# nr_cpus:8
-------------------
Since this test will be scheduled by 0Day to a random host that could have
only a few cpus(2-8), enlarge the timeout to avoid a false NG report.
In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps',
and knew 10S is likely enough, but i still perfer to a larger value 30.
Yucong Sun [Mon, 23 Aug 2021 21:36:29 +0000 (14:36 -0700)]
selftests/bpf: Reduce flakyness in timer_mim
This patch extends wait time in timer_mim. As observed in slow CI environment,
it is possible to have interrupt/preemption long enough to cause the test to
fail, almost 1 failure in 5 runs.
Alexei Starovoitov [Tue, 24 Aug 2021 00:50:24 +0000 (17:50 -0700)]
Merge branch 'Refactor cgroup_bpf internals to use more specific attach_type'
Dave Marchevsky says:
====================
The cgroup_bpf struct has a few arrays (effective, progs, and flags) of
size MAX_BPF_ATTACH_TYPE. These are meant to separate progs by their
attach type, currently represented by the bpf_attach_type enum.
There are some bpf_attach_type values which are not valid attach types
for cgroup bpf programs. Programs with these attach types will never be
handled by cgroup_bpf_{attach,detach} and thus will never be held in
cgroup_bpf structs. Even if such programs did make it into their
reserved slot in those arrays, they would never be executed.
Accordingly we can migrate to a new internal cgroup_bpf-specific enum
for these arrays, saving some bytes per cgroup and making it more
obvious which BPF programs belong there. netns_bpf_attach_type is an
existing example of this pattern, let's do similar for cgroup_bpf.
v1->v2: Address Daniel's comments
* Reverse xmas tree ordering for def changes
* Helper macro to reduce to_cgroup_bpf_attach_type boilerplate
* checkpatch.pl complains: "ERROR: Macros with complex values should
be enclosed in parentheses". Found some existing macros (do 'git grep
"define case"') which get same complaint. Think it's fine to keep
as-is since it's immediately undef'd.
* Remove CG_BPF_ prefix from cgroup_bpf_attach_type
* Although I agree that the prefix is redundant, the de-prefixed
names feel a bit too 'general' given the internal use of the enum.
e.g. when someone sees CGROUP_INET6_BIND it's not obvious that it
should only be used in certain ways internally.
* Don't feel strongly about this, just my thoughts as a noob to the
internals.
* Rebase onto latest bpf-next/master
* No significant conflicts, some small boilerplate adjustments
needed to catch up to Andrii's "bpf: Refactor BPF_PROG_RUN_ARRAY
family of macros into functions" change
====================
Dave Marchevsky [Thu, 19 Aug 2021 09:24:20 +0000 (02:24 -0700)]
bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum
Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
attach types and a function to map bpf_attach_type values to the new
enum. Inspired by netns_bpf_attach_type.
Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
possible. Functionality is unchanged as attach_type_to_prog_type
switches in bpf/syscall.c were preventing non-cgroup programs from
making use of the invalid cgroup_bpf array slots.
As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
arrays were sized using MAX_BPF_ATTACH_TYPE.
bpf_cgroup_storage is notably not migrated as struct
bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
member which is not meant to be opaque. Similarly, bpf_cgroup_link
continues to report its bpf_attach_type member to userspace via fdinfo
and bpf_link_info.
To ease disambiguation, bpf_attach_type variables are renamed from
'type' to 'atype' when changed to cgroup_bpf_attach_type.
Jiang Wang [Sat, 21 Aug 2021 18:07:36 +0000 (18:07 +0000)]
af_unix: Fix NULL pointer bug in unix_shutdown
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the
unhash function will call prot->unhash(), which is NULL for SEQPACKET.
And kernel will panic. On ARM32, it will show following messages: (it
likely affects x86 too).
Fix the bug by checking the prot->unhash is NULL or not first.
Prankur Gupta [Tue, 17 Aug 2021 22:42:21 +0000 (15:42 -0700)]
selftests/bpf: Add tests for {set|get} socket option from setsockopt BPF
Adding selftests for the newly added functionality to call bpf_setsockopt()
and bpf_getsockopt() from setsockopt BPF programs.
Test Details:
1. BPF Program
Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt
If the cca for the socket is not cubic do nothing
If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case)
then change the cc from cubic to reno
2. User Space Program
Creates an AF_INET6 socket and set the cca for that to be "cubic"
Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt
Verify the cca for the socket changed to reno
Prankur Gupta [Tue, 17 Aug 2021 22:42:20 +0000 (15:42 -0700)]
bpf: Add support for {set|get} socket options from setsockopt BPF
Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF
programs. An example use case is when the user sets the IPV6_TCLASS socket
option, we would also like to change the tcp-cc for that socket.
We don't have any use case for calling bpf_setsockopt() from supposedly read-
only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only
at this point.
Stanislav Fomichev [Wed, 18 Aug 2021 23:52:15 +0000 (16:52 -0700)]
bpf: Use kvmalloc for map values in syscall
Use kvmalloc/kvfree for temporary value when manipulating a map via
syscall. kmalloc might not be sufficient for percpu maps where the value
is big (and further multiplied by hundreds of CPUs).
Can be reproduced with netcnt test on qemu with "-smp 255".
The oops can also be reproduced with the following steps:
./vmtest.sh -s
# at qemu shell
cd /root/bpf && while true; do ./test_progs -t send_signal
Further analysis showed that the failure is introduced with
commit b89fbfbb854c ("bpf: Implement minimal BPF perf link").
With the above commit, the following scenario becomes possible:
cpu1 cpu2
hrtimer_interrupt -> bpf_overflow_handler
(due to closing link_fd)
bpf_perf_link_release ->
perf_event_free_bpf_prog ->
perf_event_free_bpf_handler ->
WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler)
event->prog = NULL
bpf_prog_run(event->prog, &ctx)
In the above case, the event->prog is NULL for bpf_prog_run, hence
causing oops.
To fix the issue, check whether event->prog is NULL or not. If it
is, do not call bpf_prog_run. This seems working as the above
reproducible step runs more than one hour and I didn't see any
failures.
Daniel Borkmann [Thu, 19 Aug 2021 13:59:33 +0000 (15:59 +0200)]
bpf: Undo off-by-one in interpreter tail call count limit
The BPF interpreter as well as x86-64 BPF JIT were both in line by allowing
up to 33 tail calls (however odd that number may be!). Recently, this was
changed for the interpreter to reduce it down to 32 with the assumption that
this should have been the actual limit "which is in line with the behavior of
the x86 JITs" according to b61a28cf11d61 ("bpf: Fix off-by-one in tail call
count limiting").
Paul recently reported:
I'm a bit surprised by this because I had previously tested the tail call
limit of several JIT compilers and found it to be 33 (i.e., allowing chains
of up to 34 programs). I've just extended a test program I had to validate
this again on the x86-64 JIT, and found a limit of 33 tail calls again [1].
Also note we had previously changed the RISC-V and MIPS JITs to allow up to
33 tail calls [2, 3], for consistency with other JITs and with the interpreter.
We had decided to increase these two to 33 rather than decrease the other
JITs to 32 for backward compatibility, though that probably doesn't matter
much as I'd expect few people to actually use 33 tail calls.
Therefore, revert b61a28cf11d61 to re-align interpreter to limit a maximum of
33 tail calls. While it is unlikely to hit the limit for the vast majority,
programs in the wild could one way or another depend on this, so lets rather
be a bit more conservative, and lets align the small remainder of JITs to 33.
If needed in future, this limit could be slightly increased, but not decreased.
Fixes: b61a28cf11d61 ("bpf: Fix off-by-one in tail call count limiting") Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/CAO5pjwTWrC0_dzTbTHFPSqDwA56aVH+4KFGVqdq8=ASs0MqZGQ@mail.gmail.com
Grant Seltzer [Wed, 18 Aug 2021 15:13:13 +0000 (11:13 -0400)]
libbpf: Rename libbpf documentation index file
This patch renames a documentation libbpf.rst to index.rst. In order
for readthedocs.org to pick this file up and properly build the
documentation site.
It also changes the title type of the ABI subsection in the
naming convention doc. This is so that readthedocs.org doesn't treat this
section as a separate document.
The bpf selftest send_signal() is flaky for its subtests trying to
send signals in softirq/nmi context. To reduce flakiness, the
signal-targetted process priority is boosted, which should minimize
preemption of that process and improve the possibility that
the underlying task in softirq/nmi context is the bpf_send_signal()
wanted task.
Patch #1 did a refactoring to use ASSERT_* instead of old CHECK macros.
Patch #2 did actual change of boosting priority.
Changelog:
v1 -> v2:
remove skip logic where the underlying task in interrupt context
is not the intended one.
====================
Yonghong Song [Tue, 17 Aug 2021 19:09:23 +0000 (12:09 -0700)]
selftests/bpf: Fix flaky send_signal test
libbpf CI has reported send_signal test is flaky although
I am not able to reproduce it in my local environment.
But I am able to reproduce with on-demand libbpf CI ([1]).
Through code analysis, the following is possible reason.
The failed subtest runs bpf program in softirq environment.
Since bpf_send_signal() only sends to a fork of "test_progs"
process. If the underlying current task is
not "test_progs", bpf_send_signal() will not be triggered
and the subtest will fail.
To reduce the chances where the underlying process is not
the intended one, this patch boosted scheduling priority to
-20 (highest allowed by setpriority() call). And I did
10 runs with on-demand libbpf CI with this patch and I
didn't observe any failures.
Andrii Nakryiko [Tue, 17 Aug 2021 18:16:07 +0000 (11:16 -0700)]
Merge branch 'selftests/bpf: Improve the usability of test_progs'
Yucong Sun says:
====================
This short series adds two new switches to test_progs, "-a" and "-d",
adding support for both exact string matching, as well as '*' wildcards.
It also cleans up the output to make it possible to generate
allowlist/denylist using common cli tools.
====================
Yucong Sun [Tue, 17 Aug 2021 04:47:32 +0000 (21:47 -0700)]
selftests/bpf: Support glob matching for test selector.
This patch adds '-a' and '-d' arguments supporting both exact string match as
well as using '*' wildcard in test/subtests selection. '-a' and '-t' can
co-exists, same as '-d' and '-b', in which case they just add to the list of
allowed or denied test selectors.
Caveat: Same as the current substring matching mechanism, test and subtest
selector applies independently, 'a*/b*' will execute all tests matching "a*",
and with subtest name matching "b*", but tests matching "a*" that has no
subtests will also be executed.
Yucong Sun [Tue, 17 Aug 2021 04:47:31 +0000 (21:47 -0700)]
selftests/bpf: Also print test name in subtest status message
This patch add test name in subtest status message line, making it possible to
grep ':OK' in the output to generate a list of passed test+subtest names, which
can be processed to generate argument list to be used with "-a", "-d" exact
string matching.
Yucong Sun [Tue, 17 Aug 2021 04:47:30 +0000 (21:47 -0700)]
selftests/bpf: Correctly display subtest skip status
In skip_account(), test->skip_cnt is set to 0 at the end, this makes next print
statement never display SKIP status for the subtest. This patch moves the
accounting logic after the print statement, fixing the issue.
This patch also added SKIP status display for normal tests.
Yucong Sun [Tue, 17 Aug 2021 04:47:29 +0000 (21:47 -0700)]
selftests/bpf: Skip loading bpf_testmod when using -l to list tests.
When using "-l", test_progs often is executed as non-root user,
load_bpf_testmod() will fail and output errors. This patch skips loading bpf
testmod when "-l" is specified, making output cleaner.
Yucong Sun [Tue, 17 Aug 2021 04:57:13 +0000 (21:57 -0700)]
selftests/bpf: Add exponential backoff to map_delete_retriable in test_maps
Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
e.g. Github Actions CI system. This patch adds exponential backoff with a cap
of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
in the range [0ms, 5ms).
Yucong Sun [Mon, 16 Aug 2021 17:52:50 +0000 (10:52 -0700)]
selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
e.g. Github Actions CI system. This patch adds exponential backoff with a cap
of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
in the range [0ms, 5ms).
Andrii Nakryiko [Tue, 17 Aug 2021 01:42:18 +0000 (18:42 -0700)]
Merge branch 'sockmap: add sockmap support for unix stream socket'
Jiang Wang says:
====================
This patch series add support for unix stream type
for sockmap. Sockmap already supports TCP, UDP,
unix dgram types. The unix stream support is similar
to unix dgram.
Also add selftests for unix stream type in sockmap tests.
====================
Jiang Wang [Mon, 16 Aug 2021 19:03:21 +0000 (19:03 +0000)]
af_unix: Add unix_stream_proto for sockmap
Previously, sockmap for AF_UNIX protocol only supports
dgram type. This patch add unix stream type support, which
is similar to unix_dgram_proto. To support sockmap, dgram
and stream cannot share the same unix_proto anymore, because
they have different implementations, such as unhash for stream
type (which will remove closed or disconnected sockets from the map),
so rename unix_proto to unix_dgram_proto and add a new
unix_stream_proto.
Also implement stream related sockmap functions.
And add dgram key words to those dgram specific functions.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
Hengqi Chen [Sun, 15 Aug 2021 08:10:35 +0000 (16:10 +0800)]
selftests/bpf: Test btf__load_vmlinux_btf/btf__load_module_btf APIs
Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test
loads bpf_testmod module BTF and check existence of a symbol which is
known to exist.
grantseltzer [Tue, 10 Aug 2021 02:05:08 +0000 (22:05 -0400)]
bpf: Reconfigure libbpf docs to remove unversioned API
This removes the libbpf_api.rst file from the kernel documentation.
The intention for this file was to pull documentation from comments
above API functions in libbpf. However, due to limitations of the
kernel documentation system, this API documentation could not be
versioned, which is counterintuative to how users expect to use it.
There is also currently no doc comments, making this a blank page.
Once the kernel comment documentation is actually contributed, it
will still exist in the kernel repository, just in the code itself.
A seperate site is being spun up to generate documentaiton from those
comments in a way in which it can be versioned properly.
This also reconfigures the bpf documentation index page to make it
easier to sync to the previously mentioned documentaiton site.
Daniel Borkmann [Mon, 16 Aug 2021 22:45:08 +0000 (00:45 +0200)]
Merge branch 'bpf-perf-link'
Andrii Nakryiko says:
====================
This patch set implements an ability for users to specify custom black box u64
value for each BPF program attachment, bpf_cookie, which is available to BPF
program at runtime. This is a feature that's critically missing for cases when
some sort of generic processing needs to be done by the common BPF program
logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
many uniformly handled kprobes) and it's important to be able to distinguish
between each BPF hook at runtime (e.g., for additional configuration lookup).
The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
design decision. Making this configurable by users adds unnecessary complexity
(extra memory allocations, extra complications on the verifier side to validate
accesses to variable-sized data area) while not really opening up new
possibilities. If user's use case requires storing more data per attachment,
it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
bpf_cookie would be used as an index into respective storage, populated by
user-space code before creating BPF link. This gives user all the flexibility
and control while keeping BPF verifier and BPF helper API simple.
Currently, similar functionality can only be achieved through:
- code-generation and BPF program cloning, which is very complicated and
unmaintainable;
- on-the-fly C code generation and further runtime compilation, which is
what BCC uses and allows to do pretty simply. The big downside is a very
heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
many BPF program clones and the compilation process itself);
- in some cases (kprobes and sometimes uprobes) it's possible to do function
IP lookup to get function-specific configuration. This doesn't work for
all the cases (e.g., when attaching uprobes to shared libraries) and has
higher runtime overhead and additional programming complexity due to
BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
BPF helper was added, it was also very complicated and unstable (API-wise)
to get traced function's IP from fentry/fexit and kretprobe.
With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
to build generic tracing tooling simply and efficiently, ability to provide
additional bpf_cookie value for each *attachment* (as opposed to each BPF
program) is extremely important. Two immediate users of this functionality are
going to be libbpf-based USDT library (currently in development) and retsnoop
([0]), but I'm sure more applications will come once users get this feature in
their kernels.
To achieve above described, all perf_event-based BPF hooks are made available
through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
LINK_CREATE command for program attachments and generally brings
perf_event-based attachments into a common BPF link infrastructure.
With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
helper is added to allow fetching this value at runtime from BPF program side.
BPF cookie is stored either on struct perf_event itself and fetched from the
BPF program context, or is passed through ambient BPF run context, added in c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current").
On the libbpf side of things, BPF perf link is utilized whenever is supported
by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
through corresponding opts structs.
Last part of the patch set adds few self-tests utilizing new APIs.
There are also a few refactorings along the way to make things cleaner and
easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
throughout libbpf and selftests.
Follow-up patches will extend bpf_cookie to fentry/fexit programs.
While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
USDT semaphore (reference counter) offset. Update attach_probe selftests to
validate its functionality. This is another feature (along with bpf_cookie)
required for implementing libbpf-based USDT solution.
[0] https://github.com/anakryiko/retsnoop
v4->v5:
- rebase on latest bpf-next to resolve merge conflict;
- add ref_ctr_offset to uprobe_opts and corresponding selftest;
v3->v4:
- get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
- move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
v2->v3:
- user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
- fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
- use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
v1->v2:
- fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:09 +0000 (00:06 -0700)]
selftests/bpf: Add ref_ctr_offset selftests
Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe
and validate that its value is incremented from zero.
Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the
same location/function *has* to use ref_ctr_offset as well, otherwise
perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for
both uprobe and uretprobe, even though for the purpose of test uprobe would be
enough.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:08 +0000 (00:06 -0700)]
libbpf: Add uprobe ref counter offset support for USDT semaphores
When attaching to uprobes through perf subsystem, it's possible to specify
offset of a so-called USDT semaphore, which is just a reference counted u16,
used by kernel to keep track of how many tracers are attached to a given
location. Support for this feature was added in [0], so just wire this through
uprobe_opts. This is important to enable implementing USDT attachment and
tracing through libbpf's bpf_program__attach_uprobe_opts() API.
[0] a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:07 +0000 (00:06 -0700)]
selftests/bpf: Add bpf_cookie selftests for high-level APIs
Add selftest with few subtests testing proper bpf_cookie usage.
Kprobe and uprobe subtests are pretty straightforward and just validate that
the same BPF program attached with different bpf_cookie will be triggered with
those different bpf_cookie values.
Tracepoint subtest is a bit more interesting, as it is the only
perf_event-based BPF hook that shares bpf_prog_array between multiple
perf_events internally. This means that the same BPF program can't be attached
to the same tracepoint multiple times. So we have 3 identical copies. This
arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
list manipulation logic when programs are attached and detached. The test
validates that bpf_cookie isn't mixed up and isn't lost during such list
manipulations.
Perf_event subtest validates that two BPF links can be created against the
same perf_event (but not at the same time, only one BPF program can be
attached to perf_event itself), and that for each we can specify different
bpf_cookie value.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:06 +0000 (00:06 -0700)]
selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h}
Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
be re-used between multiple uprobe-using selftests. Also rename get_offset()
into more appropriate get_uprobe_offset().
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:04 +0000 (00:06 -0700)]
libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs
Wire through bpf_cookie for all attach APIs that use perf_event_open under the
hood:
- for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
- for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
pass bpf_cookie through opts.
For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
bpf_cookie is not supported either, return error and log warning for user.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:03 +0000 (00:06 -0700)]
libbpf: Add bpf_cookie support to bpf_link_create() API
Add ability to specify bpf_cookie value when creating BPF perf link with
bpf_link_create() low-level API.
Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:02 +0000 (00:06 -0700)]
libbpf: Use BPF perf link when supported by kernel
Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.
This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in bpf_cookie.c
selftests, added in this patch set.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:01 +0000 (00:06 -0700)]
libbpf: Remove unused bpf_link's destroy operation, but add dealloc
bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.
Andrii Nakryiko [Sun, 15 Aug 2021 07:06:00 +0000 (00:06 -0700)]
libbpf: Re-build libbpf.so when libbpf.map changes
Ensure libbpf.so is re-built whenever libbpf.map is modified. Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:59 +0000 (00:05 -0700)]
bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value
Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
to get access to a user-provided bpf_cookie value, specified during BPF
program attachment (BPF link creation) time.
Naming is hard, though. With the concept being named "BPF cookie", I've
considered calling the helper:
- bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
cookie;
- bpf_get_bpf_cookie() -- too much tautology;
- bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
attach BPF program to BPF hook, it's still an "attachment" and the
bpf_cookie is associated with BPF program attachment to a hook, not a BPF
link itself. Technically, we could support bpf_cookie with old-style
cgroup programs.So I ultimately rejected it in favor of
bpf_get_attach_cookie().
Currently all perf_event-backed BPF program types support
bpf_get_attach_cookie() helper. Follow-up patches will add support for
fentry/fexit programs as well.
While at it, mark bpf_tracing_func_proto() as static to make it obvious that
it's only used from within the kernel/trace/bpf_trace.c.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:58 +0000 (00:05 -0700)]
bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:57 +0000 (00:05 -0700)]
bpf: Implement minimal BPF perf link
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.
BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.
Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.
One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.
The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:56 +0000 (00:05 -0700)]
bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input
Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:55 +0000 (00:05 -0700)]
bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions
Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
with all the same readability and maintainability benefits. Making them into
functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
functions. Also, explicitly specifying the type of the BPF prog run callback
required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
internally to const struct sk_buff.
Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
the differences in bpf_run_ctx used for those two different use cases.
I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
everyone.
The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
pass-through user-provided context value in the next patch.
Andrii Nakryiko [Sun, 15 Aug 2021 07:05:54 +0000 (00:05 -0700)]
bpf: Refactor BPF_PROG_RUN into a function
Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.
This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
Andrii Nakryiko [Sun, 15 Aug 2021 07:13:33 +0000 (00:13 -0700)]
Merge branch 'BPF iterator for UNIX domain socket.'
Kuniyuki Iwashima says:
====================
This patch set adds BPF iterator support for UNIX domain socket. The first
patch implements it, and the second adds "%c" support for BPF_SEQ_PRINTF().
Thanks to Yonghong Song for the fix [0] for the LLVM code gen. The fix
prevents the LLVM compiler from transforming the loop exit condition '<' to
'!=', where the upper bound is not a constant. The transformation leads
the verifier to interpret it as an infinite loop.
And thanks to Andrii Nakryiko for its workaround [1].
Changelog:
v6:
- Align the header "Inde" column
- Change int vars to __u64 not to break test_progs-no_alu32
- Move the if statement into the for loop not to depend on the fix [0]
- Drop the README change
- Modify "%c" positive test patterns
v5:
https://lore.kernel.org/netdev/20210812164557.79046-1-kuniyu@amazon.co.jp/
- Align header line of bpf_iter_unix.c
- Add test for "%c"
v4:
https://lore.kernel.org/netdev/20210810092807.13190-1-kuniyu@amazon.co.jp/
- Check IS_BUILTIN(CONFIG_UNIX)
- Support "%c" in BPF_SEQ_PRINTF()
- Uncomment the code to print the name of the abstract socket
- Mention the LLVM fix in README.rst
- Remove the 'aligned' attribute in bpf_iter.h
- Keep the format string on a single line
v3:
https://lore.kernel.org/netdev/20210804070851.97834-1-kuniyu@amazon.co.jp/
- Export some functions for CONFIG_UNIX=m
v2:
https://lore.kernel.org/netdev/20210803011110.21205-1-kuniyu@amazon.co.jp/
- Implement bpf_iter specific seq_ops->stop()
- Add bpf_iter__unix in bpf_iter.h
- Move common definitions in selftest to bpf_tracing_net.h
- Include the code for abstract UNIX domain socket as comment in selftest
- Use ASSERT_OK_PTR() instead of CHECK()
- Make ternary operators on single line
The iterator can output almost the same result compared to /proc/net/unix.
The header line is aligned, and the Inode column uses "%8lu" because "%5lu"
can be easily overflown.
Kuniyuki Iwashima [Sat, 14 Aug 2021 01:57:16 +0000 (10:57 +0900)]
bpf: Support "%c" in bpf_bprintf_prepare().
/proc/net/unix uses "%c" to print a single-byte character to escape '\0' in
the name of the abstract UNIX domain socket. The following selftest uses
it, so this patch adds support for "%c". Note that it does not support
wide character ("%lc" and "%llc") for simplicity.
Kuniyuki Iwashima [Sat, 14 Aug 2021 01:57:15 +0000 (10:57 +0900)]
bpf: af_unix: Implement BPF iterator for UNIX domain socket.
This patch implements the BPF iterator for the UNIX domain socket.
Currently, the batch optimisation introduced for the TCP iterator in the
commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not
used for the UNIX domain socket. It will require replacing the big lock
for the hash table with small locks for each hash list not to block other
processes.
Andrii Nakryiko [Sat, 14 Aug 2021 00:49:24 +0000 (17:49 -0700)]
Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_CGROUP_SOCKOPT'
Stanislav Fomichev says:
====================
We'd like to be able to identify netns from setsockopt hooks
to be able to do the enforcement of some options only in the
"initial" netns (to give users the ability to create clear/isolated
sandboxes if needed without any enforcement by doing unshare(net)).
v3:
- remove extra 'ctx->skb == NULL' check (Martin KaFai Lau)
- rework test to make sure the helper is really called, not just
verified
Ilya Leoshkevich [Thu, 12 Aug 2021 22:48:14 +0000 (00:48 +0200)]
selftests/bpf: Fix test_core_autosize on big-endian machines
The "probed" part of test_core_autosize copies an integer using
bpf_core_read() into an integer of a potentially different size.
On big-endian machines a destination offset is required for this to
produce a sensible result.
Hao Luo [Thu, 12 Aug 2021 00:38:19 +0000 (17:38 -0700)]
libbpf: Support weak typed ksyms.
Currently weak typeless ksyms have default value zero, when they don't
exist in the kernel. However, weak typed ksyms are rejected by libbpf
if they can not be resolved. This means that if a bpf object contains
the declaration of a nonexistent weak typed ksym, it will be rejected
even if there is no program that references the symbol.
Nonexistent weak typed ksyms can also default to zero just like
typeless ones. This allows programs that access weak typed ksyms to be
accepted by verifier, if the accesses are guarded. For example,
extern const int bpf_link_fops3 __ksym __weak;
/* then in BPF program */
if (&bpf_link_fops3) {
/* use bpf_link_fops3 */
}
If actual use of nonexistent typed ksym is not guarded properly,
verifier would see that register is not PTR_TO_BTF_ID and wouldn't
allow to use it for direct memory reads or passing it to BPF helpers.
Jussi Maki [Wed, 11 Aug 2021 12:36:27 +0000 (12:36 +0000)]
selftests/bpf: Fix running of XDP bonding tests
An "innocent" cleanup in the last version of the XDP bonding patchset moved
the "test__start_subtest" calls to the test main function, but I forgot to
reverse the condition, which lead to all tests being skipped. Fix it.
Jussi Maki [Thu, 12 Aug 2021 14:52:41 +0000 (14:52 +0000)]
net, bonding: Disallow vlan+srcmac with XDP
The new vlan+srcmac xmit policy is not implementable with XDP since
in many cases the 802.1Q payload is not present in the packet. This
can be for example due to hardware offload or in the case of veth
due to use of skbuffs internally.
This also fixes the NULL deref with the vlan+srcmac xmit policy
reported by Jonathan Toppins by additionally checking the skb
pointer.
Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events")
kernel/bpf/helpers.c
include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")
MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")
Linus Torvalds [Fri, 13 Aug 2021 02:24:03 +0000 (16:24 -1000)]
Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, bpf, can and
ieee802154.
The size of this is pretty normal, but we got more fixes for 5.14
changes this week than last week. Nothing major but the trend is the
opposite of what we like. We'll see how the next week goes..
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid dma
mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across
suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via
netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans on idle systems to
prevent frequent wake ups"
* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
vsock/virtio: avoid potential deadlock when vsock device remove
wwan: core: Avoid returning NULL from wwan_create_dev()
net: dsa: sja1105: unregister the MDIO buses during teardown
Revert "tipc: Return the correct errno code"
net: mscc: Fix non-GPL export of regmap APIs
net: igmp: increase size of mr_ifc_count
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
net: pcs: xpcs: fix error handling on failed to allocate memory
net: linkwatch: fix failure to restore device state across suspend/resume
net: bridge: fix memleak in br_add_if()
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
net: bridge: fix flags interpretation for extern learn fdb entries
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
bpf, core: Fix kernel-doc notation
net: igmp: fix data-race in igmp_ifc_timer_expire()
net: Fix memory leak in ieee802154_raw_deliver
...
Linus Torvalds [Fri, 13 Aug 2021 02:16:01 +0000 (16:16 -1000)]
Merge tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
and a reference handling fix from Jeff that should address some memory
corruption reports in the snaprealm area.
Both marked for stable"
* tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
ceph: take snap_empty_lock atomically with snaprealm refcount change
ceph: reduce contention in ceph_check_delayed_caps()
i915:
- GVT fix for Windows VM hang.
- Display fix of 12 BPC bits for display 12 and newer.
- Don't try to access some media register for fused off domains.
- Fix kerneldoc build warnings.
* tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm:
drm/doc/rfc: drop lmem uapi section
drm/i915: Only access SFC_DONE when media domain is not fused off
drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg
drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work
drm/amd/display: Remove invalid assert for ODM + MPC case
drm/amd/pm: bug fix for the runtime pm BACO
drm/amdgpu: handle VCN instances when harvesting (v2)
drm/meson: fix colour distortion from HDR set during vendor u-boot
drm/i915/gvt: Fix cached atomics setting for Windows VM
drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled.
drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()'
drm/amdgpu: don't enable baco on boco platforms in runpm
drm/amdgpu: set RAS EEPROM address from VBIOS
drm/amd/pm: update smu v13.0.1 firmware header
drm/mediatek: Fix cursor plane no update
drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge
drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check
Alex Elder [Wed, 11 Aug 2021 14:18:02 +0000 (09:18 -0500)]
dt-bindings: net: qcom,ipa: make imem interconnect optional
On some newer SoCs, the interconnect between IPA and SoC internal
memory (imem) is not used. Update the binding to indicate that
having just the memory and config interconnects is another allowed
configuration.
It isn't required, but all callers of ipa_aggr_granularity_val()
pass a constant value (IPA_AGGR_GRANULARITY) as the usec argument.
Two of those callers are in ipa_validate_build(), with the result
being passed to BUILD_BUG_ON().
Evidently the "sparc64-linux-gcc" compiler (at least) doesn't always
inline ipa_aggr_granularity_val(), so the result of the function is
not constant at compile time, and that leads to build errors.
Define the function with the __always_inline attribute to avoid the
errors. We can see by inspection that the value passed is never
zero, so we can just remove its WARN_ON() call.
Fixes: 5bc5588466a1f ("net: ipa: use WARN_ON() rather than assertions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20210811135948.2634264-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Thu, 12 Aug 2021 20:29:12 +0000 (06:29 +1000)]
Merge tag 'drm-intel-fixes-2021-08-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- GVT fix for Windows VM hang.
- Display fix of 12 BPC bits for display 12 and newer.
- Don't try to access some media register for fused off domains.
- Fix kerneldoc build warnings.
Jakub Kicinski [Thu, 12 Aug 2021 18:50:16 +0000 (11:50 -0700)]
Merge tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
ieee802154 for net 2021-08-12
Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller
reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw.
* tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
net: Fix memory leak in ieee802154_raw_deliver
ieee802154: hwsim: fix GPF in hwsim_new_edge_nl
ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi
====================
lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".
Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
Call trace:
queued_spin_lock_slowpath
vsock_remove_bound
vsock_remove_sock
virtio_transport_release
__vsock_release
vsock_release
__sock_release
sock_close
__fput
____fput
So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.
Linus Torvalds [Thu, 12 Aug 2021 17:20:16 +0000 (07:20 -1000)]
Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
"This fixes the ucount sysctls on big endian architectures.
The counts were expanded to be longs instead of ints, and the sysctl
code was overlooked, so only the low 32bit were being processed. On
litte endian just processing the low 32bits is fine, but on 64bit big
endian processing just the low 32bits results in the high order bits
instead of the low order bits being processed and nothing works
proper.
This change took a little bit to mature as we have the SYSCTL_ZERO,
and SYSCTL_INT_MAX macros that are only usable for sysctls operating
on ints, but unfortunately are not obviously broken. Which resulted in
the versions of this change working on big endian and not on little
endian, because the int SYSCTL_ZERO when extended 64bit wound up being
0x100000000. So we only allowed values greater than 0x100000000 and
less than 0faff. Which unfortunately broken everything that tried to
set the sysctls. (First reported with the windows subsystem for
linux).
I have tested this on x86_64 64bit after first reproducing the
problems with the earlier version of this change, and then verifying
the problems do not exist when we use appropriate long min and max
values for extra1 and extra2"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: add missing data type changes
Andy Shevchenko [Wed, 11 Aug 2021 12:48:45 +0000 (15:48 +0300)]
wwan: core: Avoid returning NULL from wwan_create_dev()
Make wwan_create_dev() to return either valid or error pointer,
In some cases it may return NULL. Prevent this by converting
it to the respective error pointer.
Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20210811124845.10955-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Thu, 12 Aug 2021 11:45:41 +0000 (12:45 +0100)]
Merge tag 'mlx5-updates-2021-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 updates 2021-08-11
This series provides misc updates to mlx5.
For more information please see tag log below.
Please pull and let me know if there is any problem.
mlx5-updates-2021-08-11
Misc. cleanup for mlx5.
1) Typos and use of netdev_warn()
2) smatch cleanup
3) Minor fix to inner TTC table creation
4) Dynamic capability cache allocation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>