From a311a08a4237241fb5b9d219d3e33346de6e83e0 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 2 Oct 2024 08:02:13 -0700 Subject: [PATCH 01/16] iomap: constrain the file range passed to iomap_file_unshare File contents can only be shared (i.e. reflinked) below EOF, so it makes no sense to try to unshare ranges beyond EOF. Constrain the file range parameters here so that we don't have to do that in the callers. Fixes: 5f4e5752a8a3 ("fs: add iomap_file_dirty") Signed-off-by: Darrick J. Wong Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig Reviewed-by: Brian Foster Signed-off-by: Christian Brauner --- fs/dax.c | 6 +++++- fs/iomap/buffered-io.c | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index becb4a6920c6..c62acd2812f8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1305,11 +1305,15 @@ int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, struct iomap_iter iter = { .inode = inode, .pos = pos, - .len = len, .flags = IOMAP_WRITE | IOMAP_UNSHARE | IOMAP_DAX, }; + loff_t size = i_size_read(inode); int ret; + if (pos < 0 || pos >= size) + return 0; + + iter.len = min(len, size - pos); while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = dax_unshare_iter(&iter); return ret; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index c1c559e0cc07..78ebd265f425 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1375,11 +1375,15 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, struct iomap_iter iter = { .inode = inode, .pos = pos, - .len = len, .flags = IOMAP_WRITE | IOMAP_UNSHARE, }; + loff_t size = i_size_read(inode); int ret; + if (pos < 0 || pos >= size) + return 0; + + iter.len = min(len, size - pos); while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_unshare_iter(&iter); return ret; -- 2.50.1 From b63ad06ddddfe792f93df0c24adb66622bd7b8c9 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Mon, 30 Sep 2024 11:39:54 -0400 Subject: [PATCH 02/16] doc: net: napi: Update documentation for napi_schedule_irqoff Since commit 8380c81d5c4f ("net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT"), napi_schedule_irqoff will do the right thing if IRQs are threaded. Therefore, there is no need to use IRQF_NO_THREAD. Signed-off-by: Sean Anderson Reviewed-by: Bagas Sanjaya Reviewed-by: Sebastian Andrzej Siewior Link: https://patch.msgid.link/20240930153955.971657-1-sean.anderson@linux.dev Signed-off-by: Paolo Abeni --- Documentation/networking/napi.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst index 7bf7b95c4f7a..dfa5d549be9c 100644 --- a/Documentation/networking/napi.rst +++ b/Documentation/networking/napi.rst @@ -144,9 +144,8 @@ IRQ should only be unmasked after a successful call to napi_complete_done(): napi_schedule_irqoff() is a variant of napi_schedule() which takes advantage of guarantees given by being invoked in IRQ context (no need to -mask interrupts). Note that PREEMPT_RT forces all interrupts -to be threaded so the interrupt may need to be marked ``IRQF_NO_THREAD`` -to avoid issues on real-time kernel configurations. +mask interrupts). napi_schedule_irqoff() will fall back to napi_schedule() if +IRQs are threaded (such as if ``PREEMPT_RT`` is enabled). Instance to queue mapping ------------------------- -- 2.50.1 From c6929644c1e0d6108e57061d427eb966e1746351 Mon Sep 17 00:00:00 2001 From: Ravikanth Tuniki Date: Tue, 1 Oct 2024 00:43:35 +0530 Subject: [PATCH 03/16] dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems Add missing reg minItems as based on current binding document only ethernet MAC IO space is a supported configuration. There is a bug in schema, current examples contain 64-bit addressing as well as 32-bit addressing. The schema validation does pass incidentally considering one 64-bit reg address as two 32-bit reg address entries. If we change axi_ethernet_eth1 example node reg addressing to 32-bit schema validation reports: Documentation/devicetree/bindings/net/xlnx,axi-ethernet.example.dtb: ethernet@40000000: reg: [[1073741824, 262144]] is too short To fix it add missing reg minItems constraints and to make things clearer stick to 32-bit addressing in examples. Fixes: cbb1ca6d5f9a ("dt-bindings: net: xlnx,axi-ethernet: convert bindings document to yaml") Signed-off-by: Ravikanth Tuniki Signed-off-by: Radhey Shyam Pandey Acked-by: Conor Dooley Link: https://patch.msgid.link/1727723615-2109795-1-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Paolo Abeni --- Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml b/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml index bbe89ea9590c..e95c21628281 100644 --- a/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml +++ b/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml @@ -34,6 +34,7 @@ properties: and length of the AXI DMA controller IO space, unless axistream-connected is specified, in which case the reg attribute of the node referenced by it is used. + minItems: 1 maxItems: 2 interrupts: @@ -181,7 +182,7 @@ examples: clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk"; clocks = <&axi_clk>, <&axi_clk>, <&pl_enet_ref_clk>, <&mgt_clk>; phy-mode = "mii"; - reg = <0x00 0x40000000 0x00 0x40000>; + reg = <0x40000000 0x40000>; xlnx,rxcsum = <0x2>; xlnx,rxmem = <0x800>; xlnx,txcsum = <0x2>; -- 2.50.1 From 8beee4d8dee76b67c75dc91fd8185d91e845c160 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 30 Sep 2024 16:49:51 -0400 Subject: [PATCH 04/16] sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start In sctp_listen_start() invoked by sctp_inet_listen(), it should set the sk_state back to CLOSED if sctp_autobind() fails due to whatever reason. Otherwise, next time when calling sctp_inet_listen(), if sctp_sk(sk)->reuse is already set via setsockopt(SCTP_REUSE_PORT), sctp_sk(sk)->bind_hash will be dereferenced as sk_state is LISTENING, which causes a crash as bind_hash is NULL. KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_inet_listen+0x7f0/0xa20 net/sctp/socket.c:8617 Call Trace: __sys_listen_socket net/socket.c:1883 [inline] __sys_listen+0x1b7/0x230 net/socket.c:1894 __do_sys_listen net/socket.c:1902 [inline] Fixes: 5e8f3f703ae4 ("sctp: simplify sctp listening code") Reported-by: syzbot+f4e0f821e3a3b7cee51d@syzkaller.appspotmail.com Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Link: https://patch.msgid.link/a93e655b3c153dc8945d7a812e6d8ab0d52b7aa0.1727729391.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni --- net/sctp/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 32f76f1298da..078bcb3858c7 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -8557,8 +8557,10 @@ static int sctp_listen_start(struct sock *sk, int backlog) */ inet_sk_set_state(sk, SCTP_SS_LISTENING); if (!ep->base.bind_addr.port) { - if (sctp_autobind(sk)) + if (sctp_autobind(sk)) { + inet_sk_set_state(sk, SCTP_SS_CLOSED); return -EAGAIN; + } } else { if (sctp_get_port(sk, inet_sk(sk)->inet_num)) { inet_sk_set_state(sk, SCTP_SS_CLOSED); -- 2.50.1 From 3840cbe24cf060ea05a585ca497814609f5d47d1 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 3 Oct 2024 07:29:05 -0400 Subject: [PATCH 05/16] sched: psi: fix bogus pressure spikes from aggregation race Brandon reports sporadic, non-sensical spikes in cumulative pressure time (total=) when reading cpu.pressure at a high rate. This is due to a race condition between reader aggregation and tasks changing states. While it affects all states and all resources captured by PSI, in practice it most likely triggers with CPU pressure, since scheduling events are so frequent compared to other resource events. The race context is the live snooping of ongoing stalls during a pressure read. The read aggregates per-cpu records for stalls that have concluded, but will also incorporate ad-hoc the duration of any active state that hasn't been recorded yet. This is important to get timely measurements of ongoing stalls. Those ad-hoc samples are calculated on-the-fly up to the current time on that CPU; since the stall hasn't concluded, it's expected that this is the minimum amount of stall time that will enter the per-cpu records once it does. The problem is that the path that concludes the state uses a CPU clock read that is not synchronized against aggregators; the clock is read outside of the seqlock protection. This allows aggregators to race and snoop a stall with a longer duration than will actually be recorded. With the recorded stall time being less than the last snapshot remembered by the aggregator, a subsequent sample will underflow and observe a bogus delta value, resulting in an erratic jump in pressure. Fix this by moving the clock read of the state change into the seqlock protection. This ensures no aggregation can snoop live stalls past the time that's recorded when the state concludes. Reported-by: Brandon Duffany Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194 Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/ Fixes: df77430639c9 ("psi: Reduce calls to sched_clock() in psi") Cc: stable@vger.kernel.org Signed-off-by: Johannes Weiner Reviewed-by: Chengming Zhou Signed-off-by: Linus Torvalds --- kernel/sched/psi.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 020d58967d4e..84dad1511d1e 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -769,12 +769,13 @@ static void record_times(struct psi_group_cpu *groupc, u64 now) } static void psi_group_change(struct psi_group *group, int cpu, - unsigned int clear, unsigned int set, u64 now, + unsigned int clear, unsigned int set, bool wake_clock) { struct psi_group_cpu *groupc; unsigned int t, m; u32 state_mask; + u64 now; lockdep_assert_rq_held(cpu_rq(cpu)); groupc = per_cpu_ptr(group->pcpu, cpu); @@ -789,6 +790,7 @@ static void psi_group_change(struct psi_group *group, int cpu, * SOME and FULL time these may have resulted in. */ write_seqcount_begin(&groupc->seq); + now = cpu_clock(cpu); /* * Start with TSK_ONCPU, which doesn't have a corresponding @@ -899,18 +901,15 @@ void psi_task_change(struct task_struct *task, int clear, int set) { int cpu = task_cpu(task); struct psi_group *group; - u64 now; if (!task->pid) return; psi_flags_change(task, clear, set); - now = cpu_clock(cpu); - group = task_psi_group(task); do { - psi_group_change(group, cpu, clear, set, now, true); + psi_group_change(group, cpu, clear, set, true); } while ((group = group->parent)); } @@ -919,7 +918,6 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, { struct psi_group *group, *common = NULL; int cpu = task_cpu(prev); - u64 now = cpu_clock(cpu); if (next->pid) { psi_flags_change(next, 0, TSK_ONCPU); @@ -936,7 +934,7 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, break; } - psi_group_change(group, cpu, 0, TSK_ONCPU, now, true); + psi_group_change(group, cpu, 0, TSK_ONCPU, true); } while ((group = group->parent)); } @@ -974,7 +972,7 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, do { if (group == common) break; - psi_group_change(group, cpu, clear, set, now, wake_clock); + psi_group_change(group, cpu, clear, set, wake_clock); } while ((group = group->parent)); /* @@ -986,7 +984,7 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, if ((prev->psi_flags ^ next->psi_flags) & ~TSK_ONCPU) { clear &= ~TSK_ONCPU; for (; group; group = group->parent) - psi_group_change(group, cpu, clear, set, now, wake_clock); + psi_group_change(group, cpu, clear, set, wake_clock); } } } @@ -997,8 +995,8 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st int cpu = task_cpu(curr); struct psi_group *group; struct psi_group_cpu *groupc; - u64 now, irq; s64 delta; + u64 irq; if (static_branch_likely(&psi_disabled)) return; @@ -1011,7 +1009,6 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st if (prev && task_psi_group(prev) == group) return; - now = cpu_clock(cpu); irq = irq_time_read(cpu); delta = (s64)(irq - rq->psi_irq_time); if (delta < 0) @@ -1019,12 +1016,15 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st rq->psi_irq_time = irq; do { + u64 now; + if (!group->enabled) continue; groupc = per_cpu_ptr(group->pcpu, cpu); write_seqcount_begin(&groupc->seq); + now = cpu_clock(cpu); record_times(groupc, now); groupc->times[PSI_IRQ_FULL] += delta; @@ -1223,11 +1223,9 @@ void psi_cgroup_restart(struct psi_group *group) for_each_possible_cpu(cpu) { struct rq *rq = cpu_rq(cpu); struct rq_flags rf; - u64 now; rq_lock_irq(rq, &rf); - now = cpu_clock(cpu); - psi_group_change(group, cpu, 0, 0, now, true); + psi_group_change(group, cpu, 0, 0, true); rq_unlock_irq(rq, &rf); } } -- 2.50.1 From d002b922c4d5d695d617ec262f3e07cd62ee866e Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 16 Sep 2024 19:59:22 +0000 Subject: [PATCH 06/16] selftests/bpf: Remove test_skb_cgroup_id.sh from TEST_PROGS MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit test_skb_cgroup_id.sh was deleted in https://git.kernel.org/bpf/bpf-next/c/f957c230e173 It has to be removed from TEST_PROGS variable in tools/testing/selftests/bpf/Makefile, otherwise install target fails. Signed-off-by: Ihor Solodrai Signed-off-by: Andrii Nakryiko Tested-by: Björn Töpel Link: https://lore.kernel.org/bpf/20240916195919.1872371-1-ihor.solodrai@pm.me Link: https://lore.kernel.org/bpf/Q3BN2kW9Kgy6LkrDOwnyY4Pv7_YF8fInLCd2_QA3LimKYM3wD64kRdnwp7blwG2dI_s7UGnfUae-4_dOmuTrxpYCi32G_KTzB3PfmxIerH8=@pm.me/ Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index f04af11df8eb..df75f1beb731 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -132,7 +132,6 @@ TEST_PROGS := test_kmod.sh \ test_tunnel.sh \ test_lwt_seg6local.sh \ test_lirc_mode2.sh \ - test_skb_cgroup_id.sh \ test_flow_dissector.sh \ test_xdp_vlan_mode_generic.sh \ test_xdp_vlan_mode_native.sh \ -- 2.50.1 From fd4a0e67838c1e0fc4927fae113d785aa893997d Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 16 Sep 2024 19:59:27 +0000 Subject: [PATCH 07/16] selftests/bpf: Set vpath in Makefile to search for skels MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Auto-dependencies generated for %.test.o files refer to skels using filenames as opposed to full paths. This requires make to be able to link this name to an actual path, because not all generated skels are put in the working directory. In the original patch [1], this was mitigated by this target: $(notdir %.skel.h): $(TRUNNER_OUTPUT)/%.skel.h @true This turned out to be insufficient. First, %.lskel.h and %.subskel.h were missed, because a typical selftests/bpf build could find these files in the working directory. This error was detected by an out-of-tree build [2]. Second, even with missing rules added, this target causes unnecessary rebuilds in the out-of-tree case, as X.skel.h is searched for in the working directory, and not in the $(OUTPUT). Using vpath directive [3] is a better solution. Instead of introducing a separate target (X.skel.h in addition to $(TRUNNER_OUTPUT)/X.skel.h), make is instructed to search for skels in the output, which allows make to correctly detect that skel has already been generated. [1]: https://lore.kernel.org/bpf/VJihUTnvtwEgv_mOnpfy7EgD9D2MPNoHO-MlANeLIzLJPGhDeyOuGKIYyKgk0O6KPjfM-MuhtvPwZcngN8WFqbTnTRyCSMc2aMZ1ODm1T_g=@pm.me/ [2]: https://lore.kernel.org/bpf/CIjrhJwoIqMc2IhuppVqh4ZtJGbx8kC8rc9PHhAIU6RccnWT4I04F_EIr4GxQwxZe89McuGJlCnUk9UbkdvWtSJjAsd7mHmnTy9F8K2TLZM=@pm.me/ [3]: https://www.gnu.org/software/make/manual/html_node/Selective-Search.html Reported-by: Björn Töpel Signed-off-by: Ihor Solodrai Signed-off-by: Andrii Nakryiko Tested-by: Björn Töpel Link: https://lore.kernel.org/bpf/20240916195919.1872371-2-ihor.solodrai@pm.me Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/Makefile | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index df75f1beb731..365740f24d2e 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -622,10 +622,11 @@ $(TRUNNER_BPF_SKELS_LINKED): $(TRUNNER_OUTPUT)/%: $$$$(%-deps) $(BPFTOOL) | $(TR # When the compiler generates a %.d file, only skel basenames (not # full paths) are specified as prerequisites for corresponding %.o -# file. This target makes %.skel.h basename dependent on full paths, -# linking generated %.d dependency with actual %.skel.h files. -$(notdir %.skel.h): $(TRUNNER_OUTPUT)/%.skel.h - @true +# file. vpath directives below instruct make to search for skel files +# in TRUNNER_OUTPUT, if they are not present in the working directory. +vpath %.skel.h $(TRUNNER_OUTPUT) +vpath %.lskel.h $(TRUNNER_OUTPUT) +vpath %.subskel.h $(TRUNNER_OUTPUT) endif -- 2.50.1 From 4b7c05598a644782b8451e415bb56f31e5c9d3ee Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 24 Sep 2024 13:07:30 +0200 Subject: [PATCH 08/16] selftests/bpf: Fix uprobe consumer test With newly merged code the uprobe behaviour is slightly different and affects uprobe consumer test. We no longer need to check if the uprobe object is still preserved after removing last uretprobe, because it stays as long as there's pending/installed uretprobe instance. This allows to run uretprobe consumers registered 'after' uprobe was hit even if previous uretprobe got unregistered before being hit. The uprobe object will be now removed after the last uprobe ref is released and in such case it's held by ri->uprobe (return instance) which is released after the uretprobe is hit. Reported-by: Ihor Solodrai Signed-off-by: Jiri Olsa Signed-off-by: Daniel Borkmann Tested-by: Ihor Solodrai Closes: https://lore.kernel.org/bpf/w6U8Z9fdhjnkSp2UaFaV1fGqJXvfLEtDKEUyGDkwmoruDJ_AgF_c0FFhrkeKW18OqiP-05s9yDKiT6X-Ns-avN_ABf0dcUkXqbSJN1TQSXo=@pm.me/ Signed-off-by: Alexei Starovoitov --- .../testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c index 844f6fc8487b..c1ac813ff9ba 100644 --- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c @@ -869,21 +869,14 @@ static void consumer_test(struct uprobe_multi_consumers *skel, fmt = "prog 0/1: uprobe"; } else { /* - * uprobe return is tricky ;-) - * * to trigger uretprobe consumer, the uretprobe needs to be installed, * which means one of the 'return' uprobes was alive when probe was hit: * * idxs: 2/3 uprobe return in 'installed' mask - * - * in addition if 'after' state removes everything that was installed in - * 'before' state, then uprobe kernel object goes away and return uprobe - * is not installed and we won't hit it even if it's in 'after' state. */ unsigned long had_uretprobes = before & 0b1100; /* is uretprobe installed */ - unsigned long probe_preserved = before & after; /* did uprobe go away */ - if (had_uretprobes && probe_preserved && test_bit(idx, after)) + if (had_uretprobes && test_bit(idx, after)) val++; fmt = "idx 2/3: uretprobe"; } -- 2.50.1 From 58dbb36930183aea41024d9c0b0ed97629473e20 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 24 Sep 2024 13:07:31 +0200 Subject: [PATCH 09/16] selftests/bpf: Bail out quickly from failing consumer test Let's bail out from consumer test after we hit first fail, so we don't pollute the log with many instances with possibly the same error. Signed-off-by: Jiri Olsa Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/prog_tests/uprobe_multi_test.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c index c1ac813ff9ba..2c39902b8a09 100644 --- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c @@ -836,10 +836,10 @@ uprobe_consumer_test(struct uprobe_multi_consumers *skel, return 0; } -static void consumer_test(struct uprobe_multi_consumers *skel, - unsigned long before, unsigned long after) +static int consumer_test(struct uprobe_multi_consumers *skel, + unsigned long before, unsigned long after) { - int err, idx; + int err, idx, ret = -1; printf("consumer_test before %lu after %lu\n", before, after); @@ -881,13 +881,17 @@ static void consumer_test(struct uprobe_multi_consumers *skel, fmt = "idx 2/3: uretprobe"; } - ASSERT_EQ(skel->bss->uprobe_result[idx], val, fmt); + if (!ASSERT_EQ(skel->bss->uprobe_result[idx], val, fmt)) + goto cleanup; skel->bss->uprobe_result[idx] = 0; } + ret = 0; + cleanup: for (idx = 0; idx < 4; idx++) uprobe_detach(skel, idx); + return ret; } static void test_consumers(void) @@ -939,9 +943,11 @@ static void test_consumers(void) for (before = 0; before < 16; before++) { for (after = 0; after < 16; after++) - consumer_test(skel, before, after); + if (consumer_test(skel, before, after)) + goto out; } +out: uprobe_multi_consumers__destroy(skel); } -- 2.50.1 From 7bae563c0dbe0039d80a103601f64dcdb48b1481 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 15 Sep 2024 18:21:54 +0200 Subject: [PATCH 10/16] bpf: Constify struct btf_kind_operations struct btf_kind_operations are not modified in BTF. Constifying this structures moves some data to a read-only section, so increase overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 184320 7091 548 191959 2edd7 kernel/bpf/btf.o After: ===== text data bss dec hex filename 184896 6515 548 191959 2edd7 kernel/bpf/btf.o Signed-off-by: Christophe JAILLET Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/r/9192ab72b2e9c66aefd6520f359a20297186327f.1726417289.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexei Starovoitov --- kernel/bpf/btf.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 75e4fe83c509..13dd1fa1d1b9 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -2808,7 +2808,7 @@ static void btf_ref_type_log(struct btf_verifier_env *env, btf_verifier_log(env, "type_id=%u", t->type); } -static struct btf_kind_operations modifier_ops = { +static const struct btf_kind_operations modifier_ops = { .check_meta = btf_ref_type_check_meta, .resolve = btf_modifier_resolve, .check_member = btf_modifier_check_member, @@ -2817,7 +2817,7 @@ static struct btf_kind_operations modifier_ops = { .show = btf_modifier_show, }; -static struct btf_kind_operations ptr_ops = { +static const struct btf_kind_operations ptr_ops = { .check_meta = btf_ref_type_check_meta, .resolve = btf_ptr_resolve, .check_member = btf_ptr_check_member, @@ -2858,7 +2858,7 @@ static void btf_fwd_type_log(struct btf_verifier_env *env, btf_verifier_log(env, "%s", btf_type_kflag(t) ? "union" : "struct"); } -static struct btf_kind_operations fwd_ops = { +static const struct btf_kind_operations fwd_ops = { .check_meta = btf_fwd_check_meta, .resolve = btf_df_resolve, .check_member = btf_df_check_member, @@ -3109,7 +3109,7 @@ static void btf_array_show(const struct btf *btf, const struct btf_type *t, __btf_array_show(btf, t, type_id, data, bits_offset, show); } -static struct btf_kind_operations array_ops = { +static const struct btf_kind_operations array_ops = { .check_meta = btf_array_check_meta, .resolve = btf_array_resolve, .check_member = btf_array_check_member, @@ -4185,7 +4185,7 @@ static void btf_struct_show(const struct btf *btf, const struct btf_type *t, __btf_struct_show(btf, t, type_id, data, bits_offset, show); } -static struct btf_kind_operations struct_ops = { +static const struct btf_kind_operations struct_ops = { .check_meta = btf_struct_check_meta, .resolve = btf_struct_resolve, .check_member = btf_struct_check_member, @@ -4353,7 +4353,7 @@ static void btf_enum_show(const struct btf *btf, const struct btf_type *t, btf_show_end_type(show); } -static struct btf_kind_operations enum_ops = { +static const struct btf_kind_operations enum_ops = { .check_meta = btf_enum_check_meta, .resolve = btf_df_resolve, .check_member = btf_enum_check_member, @@ -4456,7 +4456,7 @@ static void btf_enum64_show(const struct btf *btf, const struct btf_type *t, btf_show_end_type(show); } -static struct btf_kind_operations enum64_ops = { +static const struct btf_kind_operations enum64_ops = { .check_meta = btf_enum64_check_meta, .resolve = btf_df_resolve, .check_member = btf_enum_check_member, @@ -4534,7 +4534,7 @@ done: btf_verifier_log(env, ")"); } -static struct btf_kind_operations func_proto_ops = { +static const struct btf_kind_operations func_proto_ops = { .check_meta = btf_func_proto_check_meta, .resolve = btf_df_resolve, /* @@ -4592,7 +4592,7 @@ static int btf_func_resolve(struct btf_verifier_env *env, return 0; } -static struct btf_kind_operations func_ops = { +static const struct btf_kind_operations func_ops = { .check_meta = btf_func_check_meta, .resolve = btf_func_resolve, .check_member = btf_df_check_member, -- 2.50.1 From a1ec23b947538520b3182c598dc2bb9930d032b1 Mon Sep 17 00:00:00 2001 From: Zhang Jiao Date: Tue, 24 Sep 2024 12:55:34 +0800 Subject: [PATCH 11/16] selftests/bpf: Add missing va_end. There is no va_end after va_copy, just add it. Signed-off-by: Zhang Jiao Signed-off-by: Daniel Borkmann Acked-by: Eduard Zingerman Link: https://lore.kernel.org/r/20240924045534.8672-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_progs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index c7a70e1a1085..7846f7f98908 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -868,6 +868,7 @@ static int libbpf_print_fn(enum libbpf_print_level level, va_copy(args2, args); vfprintf(libbpf_capture_stream, format, args2); + va_end(args2); } if (env.verbosity < VERBOSE_VERY && level == LIBBPF_DEBUG) -- 2.50.1 From 8b334d91834666dbc4c1c0b0abed3f855ed16cf3 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Wed, 18 Sep 2024 19:33:22 +0000 Subject: [PATCH 12/16] libbpf: Change log level of BTF loading error message Reduce log level of BTF loading error to INFO if BTF is not required. Andrii says: Nowadays the expectation is that the BPF program will have a valid .BTF section, so even though .BTF is "optional", I think it's fine to emit a warning for that case (any reasonably recent Clang will produce valid BTF). Ihor's patch is fixing the situation with an outdated host kernel that doesn't understand BTF. libbpf will try to "upload" the program's BTF, but if that fails and the BPF object doesn't use any features that require having BTF uploaded, then it's just an information message to the user, but otherwise can be ignored. Suggested-by: Andrii Nakryiko Signed-off-by: Ihor Solodrai Acked-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/libbpf.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 219facd0e66e..b8d72b5fbc79 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -3581,11 +3581,12 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj) report: if (err) { btf_mandatory = kernel_needs_btf(obj); - pr_warn("Error loading .BTF into kernel: %d. %s\n", err, - btf_mandatory ? "BTF is mandatory, can't proceed." - : "BTF is optional, ignoring."); - if (!btf_mandatory) + if (btf_mandatory) { + pr_warn("Error loading .BTF into kernel: %d. BTF is mandatory, can't proceed.\n", err); + } else { + pr_info("Error loading .BTF into kernel: %d. BTF is optional, ignoring.\n", err); err = 0; + } } return err; } -- 2.50.1 From a400d08b3014a4f4e939366bb6fd769b9caff4c9 Mon Sep 17 00:00:00 2001 From: Tao Chen Date: Wed, 25 Sep 2024 23:30:12 +0800 Subject: [PATCH 13/16] libbpf: Fix expected_attach_type set handling in program load callback Referenced commit broke the logic of resetting expected_attach_type to zero for allowed program types if kernel doesn't yet support such field. We do need to overwrite and preserve expected_attach_type for multi-uprobe though, but that can be done explicitly in libbpf_prepare_prog_load(). Fixes: 5902da6d8a52 ("libbpf: Add uprobe multi link support to bpf_program__attach_usdt") Suggested-by: Jiri Olsa Signed-off-by: Tao Chen Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240925153012.212866-1-chen.dylane@gmail.com Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/libbpf.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index b8d72b5fbc79..1ef221df0541 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -7353,8 +7353,14 @@ static int libbpf_prepare_prog_load(struct bpf_program *prog, opts->prog_flags |= BPF_F_XDP_HAS_FRAGS; /* special check for usdt to use uprobe_multi link */ - if ((def & SEC_USDT) && kernel_supports(prog->obj, FEAT_UPROBE_MULTI_LINK)) + if ((def & SEC_USDT) && kernel_supports(prog->obj, FEAT_UPROBE_MULTI_LINK)) { + /* for BPF_TRACE_UPROBE_MULTI, user might want to query expected_attach_type + * in prog, and expected_attach_type we set in kernel is from opts, so we + * update both. + */ prog->expected_attach_type = BPF_TRACE_UPROBE_MULTI; + opts->expected_attach_type = BPF_TRACE_UPROBE_MULTI; + } if ((def & SEC_ATTACH_BTF) && !prog->attach_btf_id) { int btf_obj_fd = 0, btf_type_id = 0, err; @@ -7444,6 +7450,7 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog load_attr.attach_btf_id = prog->attach_btf_id; load_attr.kern_version = kern_version; load_attr.prog_ifindex = prog->prog_ifindex; + load_attr.expected_attach_type = prog->expected_attach_type; /* specify func_info/line_info only if kernel supports them */ if (obj->btf && btf__fd(obj->btf) >= 0 && kernel_supports(obj, FEAT_BTF_FUNC)) { @@ -7475,9 +7482,6 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog insns_cnt = prog->insns_cnt; } - /* allow prog_prepare_load_fn to change expected_attach_type */ - load_attr.expected_attach_type = prog->expected_attach_type; - if (obj->gen_loader) { bpf_gen__prog_load(obj->gen_loader, prog->type, prog->name, license, insns, insns_cnt, &load_attr, -- 2.50.1 From 78971150660650cd22ef236c708aab3a7620e2fa Mon Sep 17 00:00:00 2001 From: Manu Bretelle Date: Tue, 24 Sep 2024 17:22:10 -0700 Subject: [PATCH 14/16] selftests/bpf: vm: Add support for VIRTIO_FS danobi/vmtest is going to migrate from using 9p to using virtio_fs to mount the local rootfs: https://github.com/danobi/vmtest/pull/88 BPF CI uses danobi/vmtest to run bpf selftests and will need to support VIRTIO_FS. This change enables new kconfigs to be able to support the upcoming danobi/vmtest. Tested by building a new kernel with those config and confirming it would successfully run with 9p (currently what is used by vmtest), and with virtio_fs (using a local build of vmtest). $ vmtest -k arch/x86/boot/bzImage "findmnt /" => bzImage ===> Booting ===> Setting up VM ===> Running command TARGET SOURCE FSTYPE OPTIONS / /dev/root 9p rw,relatime,cache=5,access=client,msize=512000,trans=virtio $ /home/chantra/local/danobi-vmtest/target/debug/vmtest -k arch/x86/boot/bzImage "findmnt /" => bzImage ===> Initializing host environment ===> Booting ===> Setting up VM ===> Running command TARGET SOURCE FSTYPE OPTIONS / rootfs virtiofs rw,relatime Changes in v2: * Sorted configs alphabetically Signed-off-by: Manu Bretelle Signed-off-by: Andrii Nakryiko Acked-by: Daniel Xu Link: https://lore.kernel.org/bpf/20240925002210.501266-1-chantr4@gmail.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/config.vm | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/config.vm b/tools/testing/selftests/bpf/config.vm index a9746ca78777..da543b24c144 100644 --- a/tools/testing/selftests/bpf/config.vm +++ b/tools/testing/selftests/bpf/config.vm @@ -1,12 +1,15 @@ -CONFIG_9P_FS=y CONFIG_9P_FS_POSIX_ACL=y CONFIG_9P_FS_SECURITY=y +CONFIG_9P_FS=y CONFIG_CRYPTO_DEV_VIRTIO=y -CONFIG_NET_9P=y +CONFIG_FUSE_FS=y +CONFIG_FUSE_PASSTHROUGH=y CONFIG_NET_9P_VIRTIO=y +CONFIG_NET_9P=y CONFIG_VIRTIO_BALLOON=y CONFIG_VIRTIO_BLK=y CONFIG_VIRTIO_CONSOLE=y +CONFIG_VIRTIO_FS=y CONFIG_VIRTIO_NET=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_VSOCKETS_COMMON=y -- 2.50.1 From 89abc4080301cad681a52217a2a622dcd8947193 Mon Sep 17 00:00:00 2001 From: Zhu Jun Date: Wed, 25 Sep 2024 03:00:05 -0700 Subject: [PATCH 15/16] tools/bpf: Remove unused variable from runqslower This variable is never referenced in the code, just remove it. Signed-off-by: Zhu Jun Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240925100005.3989-1-zhujun2@cmss.chinamobile.com Signed-off-by: Alexei Starovoitov --- tools/bpf/runqslower/runqslower.bpf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c index 9a5c1f008fe6..fced54a3adf6 100644 --- a/tools/bpf/runqslower/runqslower.bpf.c +++ b/tools/bpf/runqslower/runqslower.bpf.c @@ -70,7 +70,6 @@ int handle__sched_switch(u64 *ctx) struct task_struct *next = (struct task_struct *)ctx[2]; struct runq_event event = {}; u64 *tsp, delta_us; - long state; u32 pid; /* ivcsw: treat like an enqueue event and store timestamp */ -- 2.50.1 From 90d0f736bd1cfb94e9dfcf1be716242c0b08dd7f Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Thu, 26 Sep 2024 10:38:23 +0800 Subject: [PATCH 16/16] libbpf: Remove unneeded semicolon Remove unneeded semicolon in zip_archive_open(). Signed-off-by: Chen Ni Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240926023823.3632993-1-nichen@iscas.ac.cn Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/zip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/zip.c b/tools/lib/bpf/zip.c index 3f26d629b2b4..88c376a8348d 100644 --- a/tools/lib/bpf/zip.c +++ b/tools/lib/bpf/zip.c @@ -223,7 +223,7 @@ struct zip_archive *zip_archive_open(const char *path) if (!archive) { munmap(data, size); return ERR_PTR(-ENOMEM); - }; + } archive->data = data; archive->size = size; -- 2.50.1