Larysa Zaremba [Fri, 13 May 2022 12:17:43 +0000 (14:17 +0200)]
bpftool: Use sysfs vmlinux when dumping BTF by ID
Currently, dumping almost all BTFs specified by id requires
using the -B option to pass the base BTF. For kernel module
BTFs the vmlinux BTF sysfs path should work.
This patch simplifies dumping by ID usage by loading
vmlinux BTF from sysfs as base, if base BTF was not specified
and the ID corresponds to a kernel module BTF.
Joanne Koong [Mon, 9 May 2022 22:42:52 +0000 (15:42 -0700)]
bpf: Add MEM_UNINIT as a bpf_type_flag
Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.
Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.
In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.
Andrii Nakryiko [Fri, 13 May 2022 17:37:03 +0000 (10:37 -0700)]
selftests/bpf: Fix usdt_400 test case
usdt_400 test case relies on compiler using the same arg spec for
usdt_400 USDT. This assumption breaks with Clang (Clang generates
different arg specs with varying offsets relative to %rbp), so simplify
this further and hard-code the constant which will guarantee that arg
spec is the same across all 400 inlinings.
Andrii Nakryiko [Thu, 12 May 2022 22:07:12 +0000 (15:07 -0700)]
libbpf: Add safer high-level wrappers for map operations
Add high-level API wrappers for most common and typical BPF map
operations that works directly on instances of struct bpf_map * (so
you don't have to call bpf_map__fd()) and validate key/value size
expectations.
These helpers require users to specify key (and value, where
appropriate) sizes when performing lookup/update/delete/etc. This forces
user to actually think and validate (for themselves) those. This is
a good thing as user is expected by kernel to implicitly provide correct
key/value buffer sizes and kernel will just read/write necessary amount
of data. If it so happens that user doesn't set up buffers correctly
(which bit people for per-CPU maps especially) kernel either randomly
overwrites stack data or return -EFAULT, depending on user's luck and
circumstances. These high-level APIs are meant to prevent such
unpleasant and hard to debug bugs.
This patch also adds bpf_map_delete_elem_flags() low-level API and
requires passing flags to bpf_map__delete_elem() API for consistency
across all similar APIs, even though currently kernel doesn't expect
any extra flags for BPF_MAP_DELETE_ELEM operation.
List of map operations that get these high-level APIs:
Alexei Starovoitov [Fri, 13 May 2022 01:10:24 +0000 (18:10 -0700)]
bpf: Fix combination of jit blinding and pointers to bpf subprogs.
The combination of jit blinding and pointers to bpf subprogs causes:
[ 36.989548] BUG: unable to handle page fault for address: 0000000100000001
[ 36.990342] #PF: supervisor instruction fetch in kernel mode
[ 36.990968] #PF: error_code(0x0010) - not-present page
[ 36.994859] RIP: 0010:0x100000001
[ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7.
[ 37.004091] Call Trace:
[ 37.004351] <TASK>
[ 37.004576] ? bpf_loop+0x4d/0x70
[ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b
The jit blinding logic didn't recognize that ld_imm64 with an address
of bpf subprogram is a special instruction and proceeded to randomize it.
By itself it wouldn't have been an issue, but jit_subprogs() logic
relies on two step process to JIT all subprogs and then JIT them
again when addresses of all subprogs are known.
Blinding process in the first JIT phase caused second JIT to miss
adjustment of special ld_imm64.
Fix this issue by ignoring special ld_imm64 instructions that don't have
user controlled constants and shouldn't be blinded.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220513011025.13344-1-alexei.starovoitov@gmail.com
Yuntao Wang [Sat, 30 Apr 2022 13:08:03 +0000 (21:08 +0800)]
bpf: Fix potential array overflow in bpf_trampoline_get_progs()
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not
include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of
the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline
can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in
bpf_trampoline_get_progs() will cause progs array overflow as the
progs field in the bpf_tramp_progs struct can only hold at most
BPF_MAX_TRAMP_PROGS bpf programs.
Andrii Nakryiko [Wed, 11 May 2022 23:20:12 +0000 (16:20 -0700)]
selftests/bpf: make fexit_stress test run in serial mode
fexit_stress is attaching maximum allowed amount of fexit programs to
bpf_fentry_test1 kernel function, which is used by a bunch of other
parallel tests, thus pretty frequently interfering with their execution.
Given the test assumes nothing else is attaching to bpf_fentry_test1,
mark it serial.
Alexei Starovoitov [Thu, 12 May 2022 01:16:55 +0000 (18:16 -0700)]
Merge branch 'Introduce access remote cpu elem support in BPF percpu map'
Feng zhou says:
====================
From: Feng Zhou <zhoufeng.zf@bytedance.com>
Trace some functions, such as enqueue_task_fair, need to access the
corresponding cpu, not the current cpu, and bpf_map_lookup_elem percpu map
cannot do it. So add bpf_map_lookup_percpu_elem to accomplish it for
percpu_array_map, percpu_hash_map, lru_percpu_hash_map.
v1->v2: Addressed comments from Alexei Starovoitov.
- add a selftest for bpf_map_lookup_percpu_elem.
====================
Feng Zhou [Wed, 11 May 2022 09:38:53 +0000 (17:38 +0800)]
bpf: add bpf_map_lookup_percpu_elem for percpu map
Add new ebpf helpers bpf_map_lookup_percpu_elem.
The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.
* Add Fixes tag to patch 1
* Fix test_progs-noalu32 failure in CI due to different alloc_insn (Alexei)
* Remove per-CPU struct, use global struct (Alexei)
====================
This uses the newly added SEC("?foo") naming to disable autoload of
programs, and then loads them one by one for the object and verifies
that loading fails and matches the returned error string from verifier.
This is similar to already existing verifier tests but provides coverage
for BPF C.
bpf: Prepare prog_test_struct kfuncs for runtime tests
In an effort to actually test the refcounting logic at runtime, add a
refcount_t member to prog_test_ref_kfunc and use it in selftests to
verify and test the whole logic more exhaustively.
The kfunc calls for prog_test_member do not require runtime refcounting,
as they are only used for verifier selftests, not during runtime
execution. Hence, their implementation now has a WARN_ON_ONCE as it is
not meant to be reachable code at runtime. It is strictly used in tests
triggering failure cases in the verifier. bpf_kfunc_call_memb_release is
called from map free path, since prog_test_member is embedded in map
value for some verifier tests, so we skip WARN_ON_ONCE for it.
Yonghong Song [Wed, 11 May 2022 18:47:35 +0000 (11:47 -0700)]
selftests/bpf: fix a few clang compilation errors
With latest clang, I got the following compilation errors:
.../prog_tests/test_tunnel.c:291:6: error: variable 'local_ip_map_fd' is used uninitialized
whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (attach_tc_prog(&tc_hook, -1, set_dst_prog_fd))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../bpf/prog_tests/test_tunnel.c:312:6: note: uninitialized use occurs here
if (local_ip_map_fd >= 0)
^~~~~~~~~~~~~~~
...
.../prog_tests/kprobe_multi_test.c:346:6: error: variable 'err' is used uninitialized
whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (IS_ERR(map))
^~~~~~~~~~~
.../prog_tests/kprobe_multi_test.c:388:6: note: uninitialized use occurs here
if (err) {
^~~
Daniel Müller [Wed, 11 May 2022 17:22:49 +0000 (17:22 +0000)]
selftests/bpf: Enable CONFIG_FPROBE for self tests
Some of the BPF selftests are failing when running with a rather bare
bones configuration based on tools/testing/selftests/bpf/config.
Specifically, we see a bunch of failures due to errno 95:
Alexei Starovoitov [Wed, 11 May 2022 15:03:16 +0000 (08:03 -0700)]
Merge branch 'selftests: xsk: add busy-poll testing plus various fixes'
Magnus Karlsson says:
====================
This patch set adds busy-poll testing to the xsk selftests. It runs
exactly the same tests as with regular softirq processing, but with
busy-poll enabled. I have also included a number of fixes to the
selftests that have been bugging me for a while or was discovered
while implementing the busy-poll support. In summary these are:
* Fix the error reporting of failed tests. Each failed test used to be
reported as both failed and passed, messing up things.
* Added a summary test printout at the end of the test suite so that
users do not have to scroll up and look at the result of both the
softirq run and the busy_poll run.
* Added a timeout to the tests, so that if a test locks up, we report
a fail and still get to run all the other tests.
* Made the stats test just look and feel like all the other
tests. Makes the code simpler and the test reporting more
consistent. These are the 3 last commits.
* Replaced zero length packets with packets of 64 byte length. This so
that some of the tests will pass after commit 726e2c5929de84 ("veth:
Ensure eth header is in skb's linear part").
* Added clean-up of the veth pair when terminating the test run.
* Some smaller clean-ups of unused stuff.
Note, to pass the busy-poll tests commit 8de8b71b787f ("xsk: Fix
l2fwd for copy mode + busy poll combo") need to be present. It is
present in bpf but not yet in bpf-next.
Thanks: Magnus
====================
Acked-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Magnus Karlsson [Tue, 10 May 2022 11:56:04 +0000 (13:56 +0200)]
selftests: xsk: make stat tests not spin on getsockopt
Convert the stats tests from spinning on the getsockopt to just check
getsockopt once when the Rx thread has received all the packets. The
actual completion of receiving the last packet forms a natural point
in time when the receiver is ready to call the getsockopt to check the
stats. In the previous version , we just span on the getsockopt until
we received the right answer. This could be forever or just getting
the "correct" answer by shear luck.
The pacing_on variable can now be dropped since all test can now
handle pacing properly.
Magnus Karlsson [Tue, 10 May 2022 11:56:03 +0000 (13:56 +0200)]
selftests: xsk: make the stats tests normal tests
Make the stats tests look and feel just like normal tests instead of
bunched under the umbrella of TEST_STATS. This means we will always
run each of them even if one fails. Also gets rid of some special case
code.
Magnus Karlsson [Tue, 10 May 2022 11:56:02 +0000 (13:56 +0200)]
selftests: xsk: introduce validation functions
Introduce validation functions that can be optionally called by the Rx
and Tx threads. These are then used to replace the Rx and Tx stats
dispatchers. This so that we in the next commit can make the stats
tests proper normal tests and not be some special case, as today.
Magnus Karlsson [Tue, 10 May 2022 11:56:01 +0000 (13:56 +0200)]
selftests: xsk: cleanup veth pair at ctrl-c
Remove the veth pair when the tests are aborted by pressing
ctrl-c. Currently in this situation, the veth pair is left on the
system polluting the netdev space.
Magnus Karlsson [Tue, 10 May 2022 11:56:00 +0000 (13:56 +0200)]
selftests: xsk: add timeout to tests
Add a timeout to the tests so that if all packets have not been
received within 3 seconds, fail the ongoing test. Hinders a test from
dead-locking if there is something wrong.
Magnus Karlsson [Tue, 10 May 2022 11:55:59 +0000 (13:55 +0200)]
selftests: xsk: fix reporting of failed tests
Fix the reporting of failed tests as it was broken in several
ways. First, a failed test was reported as both failed and passed
messing up the count. Second, tests were not aborted after a failure
and could generate more "failures" messing up the count even
more. Third, the failure reporting from the application to the shell
script was wrong. It always reported pass. And finally, the handling
of the failures in the launch script was not correct.
Correct all this by propagating the failure up through the function
calls to a calling function that can abort the test. A receiver or
sender thread will mark the new variable in the test spec called fail,
if a test has failed. This is then picked up by the main thread when
everyone else has exited and this is then marked and propagated up to
the calling script.
Also add a summary function in the calling script so that a user
does not have to go through the sub tests to see if something has
failed.
Magnus Karlsson [Tue, 10 May 2022 11:55:58 +0000 (13:55 +0200)]
selftests: xsk: run all tests for busy-poll
Execute all xsk selftests for busy-poll mode too. Currently they were
only run for the standard interrupt driven softirq mode. Replace the
unused option queue-id with the new option busy-poll.
Magnus Karlsson [Tue, 10 May 2022 11:55:57 +0000 (13:55 +0200)]
selftests: xsk: do not send zero-length packets
Do not try to send packets of zero length since they are dropped by
veth after commit 726e2c5929de84 ("veth: Ensure eth header is in skb's
linear part"). Replace these two packets with packets of length 60 so
that they are not dropped.
Also clean up the confusing naming. MIN_PKT_SIZE was really
MIN_ETH_PKT_SIZE and PKT_SIZE was both MIN_ETH_SIZE and the default
packet size called just PKT_SIZE. Make it consistent by using the
right define in the right place.
Kui-Feng Lee [Tue, 10 May 2022 20:59:21 +0000 (13:59 -0700)]
bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Pass a cookie along with BPF_LINK_CREATE requests.
Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
The cookie of a bpf_tracing_link is available by calling
bpf_get_attach_cookie when running the BPF program of the attached
link.
The value of a cookie will be set at bpf_tramp_run_ctx by the
trampoline of the link.
Kui-Feng Lee [Tue, 10 May 2022 20:59:19 +0000 (13:59 -0700)]
bpf, x86: Generate trampolines from bpf_tramp_links
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect
struct bpf_tramp_link(s) for a trampoline. struct bpf_tramp_link
extends bpf_link to act as a linked list node.
arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to
collects all bpf_tramp_link(s) that a trampoline should call.
Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links
instead of bpf_tramp_progs.
v3 changes:
- renamed kallsyms_lookup_names to ftrace_lookup_symbols
and moved it to ftrace.c [Masami]
- added ack [Andrii]
- couple small test fixes [Andrii]
v2 changes (first version [2]):
- removed the 2 seconds check [Alexei]
- moving/forcing symbols sorting out of kallsyms_lookup_names function [Alexei]
- skipping one array allocation and copy_from_user [Andrii]
- several small fixes [Masami,Andrii]
- build fix [kernel test robot]
Jiri Olsa [Tue, 10 May 2022 12:26:15 +0000 (14:26 +0200)]
bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link
Using kallsyms_lookup_names function to speed up symbols lookup in
kprobe multi link attachment and replacing with it the current
kprobe_multi_resolve_syms function.
Jiri Olsa [Tue, 10 May 2022 12:26:13 +0000 (14:26 +0200)]
ftrace: Add ftrace_lookup_symbols function
Adding ftrace_lookup_symbols function that resolves array of symbols
with single pass over kallsyms.
The user provides array of string pointers with count and pointer to
allocated array for resolved values.
int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt,
unsigned long *addrs)
It iterates all kallsyms symbols and tries to loop up each in provided
symbols array with bsearch. The symbols array needs to be sorted by
name for this reason.
We also check each symbol to pass ftrace_location, because this API
will be used for fprobe symbols resolving.
Alexei Starovoitov [Tue, 10 May 2022 18:20:45 +0000 (11:20 -0700)]
Merge branch 'bpf: bpf link iterator'
Dmitrii Dolgov says:
====================
Bpf links seem to be one of the important structures for which no
iterator is provided. Such iterator could be useful in those cases when
generic 'task/file' is not suitable or better performance is needed.
The implementation is mostly copied from prog iterator. This time tests were
executed, although I still had to exclude test_bpf_nf (failed to find BTF info
for global/extern symbol 'bpf_skb_ct_lookup') -- since it's unrelated, I hope
it's a minor issue.
Per suggestion from the previous discussion, there is a new patch for
converting CHECK to corresponding ASSERT_* macro. Such replacement is done only
if the final result would be the same, e.g. CHECK with important-looking custom
formatting strings are still in place -- from what I understand ASSERT_*
doesn't allow to specify such format.
The third small patch fixes what looks like a copy-paste error in the condition
checking.
====================
Dmitrii Dolgov [Tue, 10 May 2022 15:52:32 +0000 (17:52 +0200)]
selftests/bpf: Use ASSERT_* instead of CHECK
Replace usage of CHECK with a corresponding ASSERT_* macro for bpf_iter
tests. Only done if the final result is equivalent, no changes when
replacement means loosing some information, e.g. from formatting string.
Dmitrii Dolgov [Tue, 10 May 2022 15:52:30 +0000 (17:52 +0200)]
bpf: Add bpf_link iterator
Implement bpf_link iterator to traverse links via bpf_seq_file
operations. The changeset is mostly shamelessly copied from
commit a228a64fc1e4 ("bpf: Add bpf_prog iterator")
Alexei Starovoitov [Tue, 10 May 2022 17:49:03 +0000 (10:49 -0700)]
Merge branch 'Add source ip in bpf tunnel key'
Kaixi Fan says:
====================
From: Kaixi Fan <fankaixi.li@bytedance.com>
Now bpf code could not set tunnel source ip address of ip tunnel. So it
could not support flow based tunnel mode completely. Because flow based
tunnel mode could set tunnel source, destination ip address and tunnel
key simultaneously.
Flow based tunnel is useful for overlay networks. And by configuring tunnel
source ip address, user could make their networks more elastic.
For example, tunnel source ip could be used to select different egress
nic interface for different flows with same tunnel destination ip. Another
example, user could choose one of multiple ip address of the egress nic
interface as the packet's tunnel source ip.
Add tunnel and tunnel source testcases in test_progs. Other types of
tunnel testcases would be moved to test_progs step by step in the
future.
v6:
- use libbpf api to attach tc progs and remove some shell commands to reduce
test runtime based on Alexei Starovoitov's suggestion
v5:
- fix some code format errors
- use bpf kernel code at namespace at_ns0 to set tunnel metadata
v4:
- fix subject error of first patch
v3:
- move vxlan tunnel testcases to test_progs
- replace bpf_trace_printk with bpf_printk
- rename bpf kernel prog section name to tic
v2:
- merge vxlan tunnel and tunnel source ip testcases in test_tunnel.sh
====================
Kaixi Fan [Sat, 30 Apr 2022 07:48:43 +0000 (15:48 +0800)]
selftests/bpf: Move vxlan tunnel testcases to test_progs
Move vxlan tunnel testcases from test_tunnel.sh to test_progs.
And add vxlan tunnel source testcases also. Other tunnel testcases
will be moved to test_progs step by step in the future.
Rename bpf program section name as SEC("tc") because test_progs
bpf loader could not load sections with name SEC("gre_set_tunnel").
Because of this, add bpftool to load bpf programs in test_tunnel.sh.
KP Singh [Mon, 9 May 2022 21:49:05 +0000 (21:49 +0000)]
bpftool: bpf_link_get_from_fd support for LSM programs in lskel
bpf_link_get_from_fd currently returns a NULL fd for LSM programs.
LSM programs are similar to tracing programs and can also use
skel_raw_tracepoint_open.
Takshak Chahande [Tue, 10 May 2022 08:22:21 +0000 (01:22 -0700)]
selftests/bpf: Handle batch operations for map-in-map bpf-maps
This patch adds up test cases that handles 4 combinations:
a) outer map: BPF_MAP_TYPE_ARRAY_OF_MAPS
inner maps: BPF_MAP_TYPE_ARRAY and BPF_MAP_TYPE_HASH
b) outer map: BPF_MAP_TYPE_HASH_OF_MAPS
inner maps: BPF_MAP_TYPE_ARRAY and BPF_MAP_TYPE_HASH
Takshak Chahande [Tue, 10 May 2022 08:22:20 +0000 (01:22 -0700)]
bpf: Extend batch operations for map-in-map bpf-maps
This patch extends batch operations support for map-in-map map-types:
BPF_MAP_TYPE_HASH_OF_MAPS and BPF_MAP_TYPE_ARRAY_OF_MAPS
A usecase where outer HASH map holds hundred of VIP entries and its
associated reuse-ports per VIP stored in REUSEPORT_SOCKARRAY type
inner map, needs to do batch operation for performance gain.
This patch leverages the exiting generic functions for most of the batch
operations. As map-in-map's value contains the actual reference of the inner map,
for BPF_MAP_TYPE_HASH_OF_MAPS type, it needed an extra step to fetch the
map_id from the reference value.
Jerome Marchand [Sat, 7 May 2022 16:16:35 +0000 (18:16 +0200)]
samples: bpf: Don't fail for a missing VMLINUX_BTF when VMLINUX_H is provided
samples/bpf build currently always fails if it can't generate
vmlinux.h from vmlinux, even when vmlinux.h is directly provided by
VMLINUX_H variable, which makes VMLINUX_H pointless.
Only fails when neither method works.
Andrii Nakryiko [Tue, 10 May 2022 00:16:05 +0000 (17:16 -0700)]
Merge branch 'bpftool: fix feature output when helper probes fail'
Milan Landaverde says:
====================
Currently in bpftool's feature probe, we incorrectly tell the user that
all of the helper functions are supported for program types where helper
probing fails or is explicitly unsupported[1]:
$ bpftool feature probe
...
eBPF helpers supported for program type tracing:
- bpf_map_lookup_elem
- bpf_map_update_elem
- bpf_map_delete_elem
...
- bpf_redirect_neigh
- bpf_check_mtu
- bpf_sys_bpf
- bpf_sys_close
This patch adjusts bpftool to relay to the user when helper support
can't be determined:
$ bpftool feature probe
...
eBPF helpers supported for program type lirc_mode2:
Program type not supported
eBPF helpers supported for program type tracing:
Could not determine which helpers are available
eBPF helpers supported for program type struct_opts:
Could not determine which helpers are available
eBPF helpers supported for program type ext:
Could not determine which helpers are available
Rather than imply that no helpers are available for the program type, we
let the user know that helper function probing failed entirely.
Milan Landaverde [Wed, 4 May 2022 16:13:32 +0000 (12:13 -0400)]
bpftool: Output message if no helpers found in feature probing
Currently in libbpf, we have hardcoded program types that are not
supported for helper function probing (e.g. tracing, ext, lsm).
Due to this (and other legitimate failures), bpftool feature probe returns
empty for those program type helper functions.
Instead of implying to the user that there are no helper functions
available for a program type, we output a message to the user explaining
that helper function probing failed for that program type.
Milan Landaverde [Wed, 4 May 2022 16:13:31 +0000 (12:13 -0400)]
bpftool: Adjust for error codes from libbpf probes
Originally [1], libbpf's (now deprecated) probe functions returned a bool
to acknowledge support but the new APIs return an int with a possible
negative error code to reflect probe failure. This change decides for
bpftool to declare maps and helpers are not available on probe failures.
Andrii Nakryiko [Mon, 9 May 2022 00:41:47 +0000 (17:41 -0700)]
libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary
Kernel imposes a pretty particular restriction on ringbuf map size. It
has to be a power-of-2 multiple of page size. While generally this isn't
hard for user to satisfy, sometimes it's impossible to do this
declaratively in BPF source code or just plain inconvenient to do at
runtime.
One such example might be BPF libraries that are supposed to work on
different architectures, which might not agree on what the common page
size is.
Let libbpf find the right size for user instead, if it turns out to not
satisfy kernel requirements. If user didn't set size at all, that's most
probably a mistake so don't upsize such zero size to one full page,
though. Also we need to be careful about not overflowing __u32
max_entries.
Andrii Nakryiko [Mon, 9 May 2022 00:41:46 +0000 (17:41 -0700)]
libbpf: Provide barrier() and barrier_var() in bpf_helpers.h
Add barrier() and barrier_var() macros into bpf_helpers.h to be used by
end users. While a bit advanced and specialized instruments, they are
sometimes indispensable. Instead of requiring each user to figure out
exact asm volatile incantations for themselves, provide them from
bpf_helpers.h.
Also remove conflicting definitions from selftests. Some tests rely on
barrier_var() definition being nothing, those will still work as libbpf
does the #ifndef/#endif guarding for barrier() and barrier_var(),
allowing users to redefine them, if necessary.
Andrii Nakryiko [Mon, 9 May 2022 00:41:44 +0000 (17:41 -0700)]
libbpf: Complete field-based CO-RE helpers with field offset helper
Add bpf_core_field_offset() helper to complete field-based CO-RE
helpers. This helper can be useful for feature-detection and for some
more advanced cases of field reading (e.g., reading flexible array members).
Andrii Nakryiko [Mon, 9 May 2022 00:41:43 +0000 (17:41 -0700)]
selftests/bpf: Use both syntaxes for field-based CO-RE helpers
Excercise both supported forms of bpf_core_field_exists() and
bpf_core_field_size() helpers: variable-based field reference and
type/field name-based one.
Andrii Nakryiko [Mon, 9 May 2022 00:41:42 +0000 (17:41 -0700)]
libbpf: Improve usability of field-based CO-RE helpers
Allow to specify field reference in two ways:
- if user has variable of necessary type, they can use variable-based
reference (my_var.my_field or my_var_ptr->my_field). This was the
only supported syntax up till now.
- now, bpf_core_field_exists() and bpf_core_field_size() support also
specifying field in a fashion similar to offsetof() macro, by
specifying type of the containing struct/union separately and field
name separately: bpf_core_field_exists(struct my_type, my_field).
This forms is quite often more convenient in practice and it matches
type-based CO-RE helpers that support specifying type by its name
without requiring any variables.
Andrii Nakryiko [Mon, 9 May 2022 00:41:41 +0000 (17:41 -0700)]
libbpf: Make __kptr and __kptr_ref unconditionally use btf_type_tag() attr
It will be annoying and surprising for users of __kptr and __kptr_ref if
libbpf silently ignores them just because Clang used for compilation
didn't support btf_type_tag(). It's much better to get clear compiler
error than debug BPF verifier failures later on.
Andrii Nakryiko [Mon, 9 May 2022 00:41:40 +0000 (17:41 -0700)]
selftests/bpf: Prevent skeleton generation race
Prevent "classic" and light skeleton generation rules from stomping on
each other's toes due to the use of the same <obj>.linked{1,2,3}.o
naming pattern. There is no coordination and synchronizataion between
.skel.h and .lskel.h rules, so they can easily overwrite each other's
intermediate object files, leading to errors like:
/bin/sh: line 1: 170928 Bus error (core dumped)
/data/users/andriin/linux/tools/testing/selftests/bpf/tools/sbin/bpftool gen skeleton
/data/users/andriin/linux/tools/testing/selftests/bpf/test_ksyms_weak.linked3.o
name test_ksyms_weak
> /data/users/andriin/linux/tools/testing/selftests/bpf/test_ksyms_weak.skel.h
make: *** [Makefile:507: /data/users/andriin/linux/tools/testing/selftests/bpf/test_ksyms_weak.skel.h] Error 135
make: *** Deleting file '/data/users/andriin/linux/tools/testing/selftests/bpf/test_ksyms_weak.skel.h'
Fix by using different suffix for light skeleton rule.
Merge branch 'libbpf: allow to opt-out from BPF map creation'
Andrii Nakryiko says:
====================
Add bpf_map__set_autocreate() API which is a BPF map counterpart of
bpf_program__set_autoload() and serves similar goal of allowing to build more
flexible CO-RE applications. See patch #3 for example scenarios in which the
need for such API came up previously.
Patch #1 is a follow-up patch to previous patch set adding verifier log fixup
logic, making sure bpf_core_format_spec()'s return result is used for
something useful.
Patch #2 is a small refactoring to avoid unnecessary verbose memory management
around obj->maps array.
Patch #3 adds and API and corresponding BPF verifier log fix up logic to
provide human-comprehensible error message with useful details.
Patch #4 adds a simple selftest validating both the API itself and libbpf's
log fixup logic for it.
====================
Add bpf_map__set_autocreate() API that allows user to opt-out from
libbpf automatically creating BPF map during BPF object load.
This is a useful feature when building CO-RE-enabled BPF application
that takes advantage of some new-ish BPF map type (e.g., socket-local
storage) if kernel supports it, but otherwise uses some alternative way
(e.g., extra HASH map). In such case, being able to disable the creation
of a map that kernel doesn't support allows to successfully create and
load BPF object file with all its other maps and programs.
It's still up to user to make sure that no "live" code in any of their BPF
programs are referencing such map instance, which can be achieved by
guarding such code with CO-RE relocation check or by using .rodata
global variables.
If user fails to properly guard such code to turn it into "dead code",
libbpf will helpfully post-process BPF verifier log and will provide
more meaningful error and map name that needs to be guarded properly. As
such, instead of:
libbpf: Use libbpf_mem_ensure() when allocating new map
Reuse libbpf_mem_ensure() when adding a new map to the list of maps
inside bpf_object. It takes care of proper resizing and reallocating of
map array and zeroing out newly allocated memory.
selftests/bpf: Use target-less SEC() definitions in various tests
Add new or modify existing SEC() definitions to be target-less and
validate that libbpf handles such program definitions correctly.
For kprobe/kretprobe we also add explicit test that generic
bpf_program__attach() works in cases when kprobe definition contains
proper target. It wasn't previously tested as selftests code always
explicitly specified the target regardless.
libbpf: Support target-less SEC() definitions for BTF-backed programs
Similar to previous patch, support target-less definitions like
SEC("fentry"), SEC("freplace"), etc. For such BTF-backed program types
it is expected that user will specify BTF target programmatically at
runtime using bpf_program__set_attach_target() *before* load phase. If
not, libbpf will report this as an error.
Aslo use SEC_ATTACH_BTF flag instead of explicitly listing a set of
types that are expected to require attach_btf_id. This was an accidental
omission during custom SEC() support refactoring.
In a lot of cases the target of kprobe/kretprobe, tracepoint, raw
tracepoint, etc BPF program might not be known at the compilation time
and will be discovered at runtime. This was always a supported case by
libbpf, with APIs like bpf_program__attach_{kprobe,tracepoint,etc}()
accepting full target definition, regardless of what was defined in
SEC() definition in BPF source code.
Unfortunately, up till now libbpf still enforced users to specify at
least something for the fake target, e.g., SEC("kprobe/whatever"), which
is cumbersome and somewhat misleading.
This patch allows target-less SEC() definitions for basic tracing BPF
program types:
- kprobe/kretprobe;
- multi-kprobe/multi-kretprobe;
- tracepoints;
- raw tracepoints.
Such target-less SEC() definitions are meant to specify declaratively
proper BPF program type only. Attachment of them will have to be handled
programmatically using correct APIs. As such, skeleton's auto-attachment
of such BPF programs is skipped and generic bpf_program__attach() will
fail, if attempted, due to the lack of enough target information.
Liu Jian [Wed, 27 Apr 2022 11:51:50 +0000 (19:51 +0800)]
bpf, sockmap: Call skb_linearize only when required in sk_psock_skb_ingress_enqueue
The skb_to_sgvec fails only when the number of frag_list and frags
exceeds MAX_MSG_FRAGS. Therefore, we can call skb_linearize only
when the conversion fails.
samples/bpf: Detach xdp prog when program exits unexpectedly in xdp_rxq_info_user
When xdp_rxq_info_user program exits unexpectedly, it doesn't detach xdp
prog of device, and other xdp prog can't be attached to the device. So
call init_exit() to detach xdp prog when program exits unexpectedly.
bpf/selftests: Add granular subtest output for prog_test
Implement per subtest log collection for both parallel
and sequential test execution. This allows granular
per-subtest error output in the 'All error logs' section.
Add subtest log transfer into the protocol during the
parallel test execution.
Move all test log printing logic into dump_test_log
function. One exception is the output of test names when
verbose printing is enabled. Move test name/result
printing into separate functions to avoid repetition.
Print all successful subtest results in the log. Print
only failed test logs when test does not have subtests.
Or only failed subtests' logs when test has subtests.
Disable 'All error logs' output when verbose mode is
enabled. This functionality was already broken and is
causing confusion.
We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).
The main changes are:
1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
information about failed CO-RE relocations, from Andrii Nakryiko.
2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
(via probe read) and referenced ones that can be passed to in-kernel helpers,
from Kumar Kartikeya Dwivedi.
3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.
4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
the effective prog array before the rcu_read_lock, from Stanislav Fomichev.
5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
for USDT arguments under riscv{32,64}, from Pu Lehui.
6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.
7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
so it can be shipped under Alpine which is musl-based, from Dominique Martinet.
8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
need to use call_rcu_tasks_trace() barrier, from KP Singh.
9) Improve libbpf API documentation and fix error return handling of various
API functions, from Grant Seltzer.
10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
length of frags + frag_list may surpass old offset limit, from Liu Jian.
11) Various improvements to prog_tests in area of logging, test execution
and by-name subtest selection, from Mykola Lysenko.
12) Simplify map_btf_id generation for all map types by moving this process
to build time with help of resolve_btfids infra, from Menglong Dong.
13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
helpers; the probing caused always to use old helpers, from Runqing Yang.
14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
tracing macros, from Vladimir Isaev.
15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
switched to memcg-based memory accouting a while ago, from Yafang Shao.
16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.
17) Fix BPF selftests in two occasions to work around regressions caused by latest
LLVM to unblock CI until their fixes are worked out, from Yonghong Song.
18) Misc cleanups all over the place, from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add libbpf's log fixup logic selftests
libbpf: Fix up verifier log for unguarded failed CO-RE relos
libbpf: Simplify bpf_core_parse_spec() signature
libbpf: Refactor CO-RE relo human description formatting routine
libbpf: Record subprog-resolved CO-RE relocations unconditionally
selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
libbpf: Avoid joining .BTF.ext data with BPF programs by section name
libbpf: Fix logic for finding matching program for CO-RE relocation
libbpf: Drop unhelpful "program too large" guess
libbpf: Fix anonymous type check in CO-RE logic
bpf: Compute map_btf_id during build time
selftests/bpf: Add test for strict BTF type check
selftests/bpf: Add verifier tests for kptr
selftests/bpf: Add C tests for kptr
libbpf: Add kptr type tag macros to bpf_helpers.h
bpf: Make BTF type match stricter for release arguments
bpf: Teach verifier about kptr_get kfunc helpers
bpf: Wire up freeing of referenced kptr
bpf: Populate pairs of btf_id and destructor kfunc in btf
bpf: Adapt copy_map_value for multiple offset case
...
====================
net: dsa: ksz9477: move get_stats64 to ksz_common.c
The mib counters for the ksz9477 is same for the ksz9477 switch and
LAN937x switch. Hence moving it to ksz_common.c file in order to have it
generic function. The DSA hook get_stats64 now can call ksz_get_stats64.
David S. Miller [Wed, 27 Apr 2022 11:22:56 +0000 (12:22 +0100)]
Merge branch 'remove-virt_to_bus-drivers'
Jakub Kicinski says:
====================
net: remove non-Ethernet drivers using virt_to_bus()
Networking is currently the main offender in using virt_to_bus().
Frankly all the drivers which use it are super old and unlikely
to be used today. They are just an ongoing maintenance burden.
In other words this series is using virt_to_bus() as an excuse
to shed some old stuff. Having done the tree-wide dev_addr_set()
conversion recently I have limited sympathy for carrying dead
code.
Obviously please scream if any of these drivers _is_ in fact
still being used. Otherwise let's take the chance, we can always
apologize and revert if users show up later.
Also I should say thanks to everyone who contributed to this code!
The work continues to be appreciated although realistically in more
of a "history book" fashion...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 26 Apr 2022 17:54:36 +0000 (10:54 -0700)]
net: hamradio: remove support for DMA SCC devices
Another incarnation of Z8530, looks like? Again, no real changes
in the git history, and it needs VIRT_TO_BUS. Unlikely to have
users, let's spend less time refactoring dead code...
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 26 Apr 2022 17:54:35 +0000 (10:54 -0700)]
net: wan: remove support for Z85230-based devices
Looks like all the changes to this driver had been automated
churn since git era begun. The driver is using virt_to_bus(),
it's just a maintenance burden unlikely to have any users.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 26 Apr 2022 17:54:34 +0000 (10:54 -0700)]
net: wan: remove support for COSA and SRP synchronous serial boards
Looks like all the changes to this driver had been automated
churn since git era begun. The driver is using virt_to_bus()
so it should be updated to a proper DMA API or removed. Given
the latest "news" entry on the website is from 1999 I'm opting
for the latter.
I'm marking the allocated char device major number as [REMOVED],
I reckon we can't reuse it in case some SW out there assumes its
COSA?
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 26 Apr 2022 17:54:31 +0000 (10:54 -0700)]
net: atm: remove support for Fujitsu FireStream ATM devices
This driver received nothing but automated fixes (mostly spelling
and compiler warnings) since git era begun. Since it's using
virt_to_bus it's unlikely to be used on any modern platform.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 Apr 2022 11:03:18 +0000 (12:03 +0100)]
Merge branch 'lan966x-ptp-programmable-pins'
Horatiu Vultur says:
====================
net: lan966x: Add support for PTP programmable pins
Lan966x has 8 PTP programmable pins. The last pin is hardcoded to be used
by PHC0 and all the rest are shareable between the PHCs. The PTP pins can
implement both extts and perout functions.
v1->v2:
- use ptp_find_pin_unlocked instead of ptp_find_pin inside the irq handler.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the PTP programmable pins to implement also PTP_PF_EXTTS
function. The PTP pin can be configured to capture only on the rising
edge of the PPS signal. And once an event is seen then an interrupt is
generated and the local time counter is saved.
The interrupt is shared between all the pins.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lan966x has 8 PTP programmable pins, where the last pins is hardcoded to
be used by PHC0, which does the frame timestamping. All the rest of the
PTP pins can be shared between the PHCs and can have different functions
like perout or extts. For now add support for PTP_FS_PEROUT.
The HW is not able to support absolute start time but can use the nsec
for phase adjustment when generating PPS.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>