From 9937aae39bc09645cd67d53e0320926cd91570de Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Wed, 9 Oct 2024 09:47:22 +0100 Subject: [PATCH 01/16] MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL] We already exclude wireless drivers from the netdev@ traffic, to delegate it to linux-wireless@, and avoid overwhelming netdev@. Many of the following wireless-related sections MAINTAINERS are already not included in the NETWORKING [GENERAL] section. For consistency, exclude those that are. * 802.11 (including CFG80211/NL80211) * MAC80211 * RFKILL Acked-by: Johannes Berg Signed-off-by: Simon Horman Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 58380aeafbf0..89789bddfb6a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16197,8 +16197,19 @@ F: lib/random32.c F: net/ F: tools/net/ F: tools/testing/selftests/net/ +X: Documentation/networking/mac80211-injection.rst +X: Documentation/networking/mac80211_hwsim/ +X: Documentation/networking/regulatory.rst +X: include/net/cfg80211.h +X: include/net/ieee80211_radiotap.h +X: include/net/iw_handler.h +X: include/net/mac80211.h +X: include/net/wext.h X: net/9p/ X: net/bluetooth/ +X: net/mac80211/ +X: net/rfkill/ +X: net/wireless/ NETWORKING [IPSEC] M: Steffen Klassert -- 2.50.1 From 5404b5a2fea9831a1f5be4ab9a94de07d976b177 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Wed, 9 Oct 2024 09:47:23 +0100 Subject: [PATCH 02/16] MAINTAINERS: Add headers and mailing list to UDP section Add netdev mailing list and some more udp.h headers to the UDP section. This is now more consistent with the TCP section. Acked-by: Willem de Bruijn Signed-off-by: Simon Horman Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 89789bddfb6a..baf2ba9dc070 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24184,8 +24184,12 @@ F: drivers/usb/host/xhci* USER DATAGRAM PROTOCOL (UDP) M: Willem de Bruijn +L: netdev@vger.kernel.org S: Maintained F: include/linux/udp.h +F: include/net/udp.h +F: include/trace/events/udp.h +F: include/uapi/linux/udp.h F: net/ipv4/udp.c F: net/ipv6/udp.c -- 2.50.1 From f7345ccc62a4b880cf76458db5f320725f28e400 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Thu, 10 Oct 2024 18:36:09 +0200 Subject: [PATCH 03/16] rcu/nocb: Fix rcuog wake-up from offline softirq After a CPU has set itself offline and before it eventually calls rcutree_report_cpu_dead(), there are still opportunities for callbacks to be enqueued, for example from a softirq. When that happens on NOCB, the rcuog wake-up is deferred through an IPI to an online CPU in order not to call into the scheduler and risk arming the RT-bandwidth after hrtimers have been migrated out and disabled. But performing a synchronized IPI from a softirq is buggy as reported in the following scenario: WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single Modules linked in: rcutorture torture CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1 Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120 RIP: 0010:smp_call_function_single swake_up_one_online __call_rcu_nocb_wake __call_rcu_common ? rcu_torture_one_read call_timer_fn __run_timers run_timer_softirq handle_softirqs irq_exit_rcu ? tick_handle_periodic sysvec_apic_timer_interrupt Fix this with forcing deferred rcuog wake up through the NOCB timer when the CPU is offline. The actual wake up will happen from rcutree_report_cpu_dead(). Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU") Reviewed-by: "Joel Fernandes (Google)" Signed-off-by: Frederic Weisbecker Signed-off-by: Neeraj Upadhyay --- kernel/rcu/tree_nocb.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 97b99cd06923..16865475120b 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -554,13 +554,19 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, rcu_nocb_unlock(rdp); wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY, TPS("WakeLazy")); - } else if (!irqs_disabled_flags(flags)) { + } else if (!irqs_disabled_flags(flags) && cpu_online(rdp->cpu)) { /* ... if queue was empty ... */ rcu_nocb_unlock(rdp); wake_nocb_gp(rdp, false); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeEmpty")); } else { + /* + * Don't do the wake-up upfront on fragile paths. + * Also offline CPUs can't call swake_up_one_online() from + * (soft-)IRQs. Rely on the final deferred wake-up from + * rcutree_report_cpu_dead() + */ rcu_nocb_unlock(rdp); wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE, TPS("WakeEmptyIsDeferred")); -- 2.50.1 From 6e0391e48cf9fb8b1b5e27c0cbbaf2e4639f2c33 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Wed, 9 Oct 2024 13:41:31 -0700 Subject: [PATCH 04/16] of: Skip kunit tests when arm64+ACPI doesn't populate root node A root node is required to apply DT overlays. A root node is usually present after commit 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware"), except for on arm64 systems booted with ACPI tables. In that case, the root node is intentionally not populated because it would "allow DT devices to be instantiated atop an ACPI base system"[1]. Introduce an OF function that skips the kunit test if the root node isn't populated. Limit the test to when both CONFIG_ARM64 and CONFIG_ACPI are set, because otherwise the lack of a root node is a bug. Make the function private and take a kunit test parameter so that it can't be abused to test for the presence of the root node in non-test code. Use this function to skip tests that require the root node. Currently that's the DT tests and any tests that apply overlays. Reported-by: Guenter Roeck Closes: https://lore.kernel.org/r/6cd337fb-38f0-41cb-b942-5844b84433db@roeck-us.net Link: https://lore.kernel.org/r/Zd4dQpHO7em1ji67@FVFF77S0Q05N.cambridge.arm.com [1] Fixes: 893ecc6d2d61 ("of: Add KUnit test to confirm DTB is loaded") Signed-off-by: Stephen Boyd Tested-by: Guenter Roeck Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20241009204133.1169931-1-sboyd@kernel.org Signed-off-by: Rob Herring (Arm) --- drivers/of/of_kunit_helpers.c | 15 +++++++++++++++ drivers/of/of_private.h | 3 +++ drivers/of/of_test.c | 3 +++ drivers/of/overlay_test.c | 3 +++ 4 files changed, 24 insertions(+) diff --git a/drivers/of/of_kunit_helpers.c b/drivers/of/of_kunit_helpers.c index 287d6c91bb37..7b3ed5a382aa 100644 --- a/drivers/of/of_kunit_helpers.c +++ b/drivers/of/of_kunit_helpers.c @@ -10,6 +10,19 @@ #include #include +#include "of_private.h" + +/** + * of_root_kunit_skip() - Skip test if the root node isn't populated + * @test: test to skip if the root node isn't populated + */ +void of_root_kunit_skip(struct kunit *test) +{ + if (IS_ENABLED(CONFIG_ARM64) && IS_ENABLED(CONFIG_ACPI) && !of_root) + kunit_skip(test, "arm64+acpi doesn't populate a root node"); +} +EXPORT_SYMBOL_GPL(of_root_kunit_skip); + #if defined(CONFIG_OF_OVERLAY) && defined(CONFIG_OF_EARLY_FLATTREE) static void of_overlay_fdt_apply_kunit_exit(void *ovcs_id) @@ -36,6 +49,8 @@ int of_overlay_fdt_apply_kunit(struct kunit *test, void *overlay_fdt, int ret; int *copy_id; + of_root_kunit_skip(test); + copy_id = kunit_kmalloc(test, sizeof(*copy_id), GFP_KERNEL); if (!copy_id) return -ENOMEM; diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index 04aa2a91f851..c235d6c909a1 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -42,6 +42,9 @@ extern raw_spinlock_t devtree_lock; extern struct list_head aliases_lookup; extern struct kset *of_kset; +struct kunit; +extern void of_root_kunit_skip(struct kunit *test); + #if defined(CONFIG_OF_DYNAMIC) extern int of_property_notify(int action, struct device_node *np, struct property *prop, struct property *old_prop); diff --git a/drivers/of/of_test.c b/drivers/of/of_test.c index c85a258bc6ae..b0557ded838f 100644 --- a/drivers/of/of_test.c +++ b/drivers/of/of_test.c @@ -7,6 +7,8 @@ #include +#include "of_private.h" + /* * Test that the root node "/" can be found by path. */ @@ -36,6 +38,7 @@ static struct kunit_case of_dtb_test_cases[] = { static int of_dtb_test_init(struct kunit *test) { + of_root_kunit_skip(test); if (!IS_ENABLED(CONFIG_OF_EARLY_FLATTREE)) kunit_skip(test, "requires CONFIG_OF_EARLY_FLATTREE"); diff --git a/drivers/of/overlay_test.c b/drivers/of/overlay_test.c index 19695bdf77be..1f76d50fb16a 100644 --- a/drivers/of/overlay_test.c +++ b/drivers/of/overlay_test.c @@ -11,6 +11,8 @@ #include #include +#include "of_private.h" + static const char * const kunit_node_name = "kunit-test"; static const char * const kunit_compatible = "test,empty"; @@ -62,6 +64,7 @@ static void of_overlay_apply_kunit_cleanup(struct kunit *test) struct device *dev; struct device_node *np; + of_root_kunit_skip(test); if (!IS_ENABLED(CONFIG_OF_EARLY_FLATTREE)) kunit_skip(test, "requires CONFIG_OF_EARLY_FLATTREE for root node"); -- 2.50.1 From 8956c582ac6b1693a351230179f898979dd00bdf Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 5 Oct 2024 10:53:29 +0200 Subject: [PATCH 05/16] powerpc/8xx: Fix kernel DTLB miss on dcbz Following OOPS is encountered while loading test_bpf module on powerpc 8xx: [ 218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000 [ 218.842473] Faulting instruction address: 0xc0017a80 [ 218.847451] Oops: Kernel access of bad area, sig: 11 [#1] [ 218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885 [ 218.857207] SAF3000 DIE NOTIFICATION [ 218.860713] Modules linked in: test_bpf(+) test_module [ 218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280 [ 218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885 [ 218.880521] NIP: c0017a80 LR: beab859c CTR: 000101d4 [ 218.885584] REGS: cac2bc90 TRAP: 0300 Not tainted (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty) [ 218.894308] MSR: 00009032 CR: 55005555 XER: a0007100 [ 218.901290] DAR: cb000000 DSISR: c2000000 [ 218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000 [ 218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369 [ 218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369 [ 218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c [ 218.941087] NIP [c0017a80] memcpy+0x88/0xec [ 218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf] [ 218.951476] Call Trace: [ 218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable) [ 218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc [ 218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360 [ 218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0 [ 218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0 [ 218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c [ 218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28 This happens in the main loop of memcpy() ==> c0017a80: 7c 0b 37 ec dcbz r11,r6 c0017a84: 80 e4 00 04 lwz r7,4(r4) c0017a88: 81 04 00 08 lwz r8,8(r4) c0017a8c: 81 24 00 0c lwz r9,12(r4) c0017a90: 85 44 00 10 lwzu r10,16(r4) c0017a94: 90 e6 00 04 stw r7,4(r6) c0017a98: 91 06 00 08 stw r8,8(r6) c0017a9c: 91 26 00 0c stw r9,12(r6) c0017aa0: 95 46 00 10 stwu r10,16(r6) c0017aa4: 42 00 ff dc bdnz c0017a80 Commit ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses") relies on re-reading DAR register to know if an error is due to a missing copy of a PMD entry in task's PGDIR, allthough DAR was already read in the exception prolog and copied into thread struct. This is because is it done very early in the exception and there are not enough registers available to keep a pointer to thread struct. However, dcbz instruction is buggy and doesn't update DAR register on fault. That is detected and generates a call to FixupDAR workaround which updates DAR copy in thread struct but doesn't fix DAR register. Let's fix DAR in addition to the update of DAR copy in thread struct. Fixes: ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses") Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu --- arch/powerpc/kernel/head_8xx.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 811a7130505c..56c5ebe21b99 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -494,6 +494,7 @@ FixupDAR:/* Entry point for dcbx workaround. */ bctr /* jump into table */ 152: mfdar r11 + mtdar r10 mtctr r11 /* restore ctr reg from DAR */ mfspr r11, SPRN_SPRG_THREAD stw r10, DAR(r11) -- 2.50.1 From a0cc649353bb726d4aa0db60dce467432197b746 Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers Date: Tue, 8 Oct 2024 21:28:01 -0400 Subject: [PATCH 06/16] selftests/rseq: Fix mm_cid test failure Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by: glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)") Without this fix, rseq selftests for mm_cid fail: ./run_param_test.sh Default parameters Running test spinlock Running compare-twice test spinlock Running mm_cid test spinlock Error: cpu id getter unavailable Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support") Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra CC: Boqun Feng CC: "Paul E. McKenney" Cc: Shuah Khan CC: Carlos O'Donell CC: Florian Weimer CC: linux-kselftest@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Shuah Khan --- tools/testing/selftests/rseq/rseq.c | 110 +++++++++++++++++++--------- tools/testing/selftests/rseq/rseq.h | 10 +-- 2 files changed, 77 insertions(+), 43 deletions(-) diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c index 96e812bdf8a4..5b9772cdf265 100644 --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -60,12 +60,6 @@ unsigned int rseq_size = -1U; /* Flags used during rseq registration. */ unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -unsigned int rseq_feature_size = -1U; - static int rseq_ownership; static int rseq_reg_success; /* At least one rseq registration has succeded. */ @@ -111,6 +105,43 @@ int rseq_available(void) } } +/* The rseq areas need to be at least 32 bytes. */ +static +unsigned int get_rseq_min_alloc_size(void) +{ + unsigned int alloc_size = rseq_size; + + if (alloc_size < ORIG_RSEQ_ALLOC_SIZE) + alloc_size = ORIG_RSEQ_ALLOC_SIZE; + return alloc_size; +} + +/* + * Return the feature size supported by the kernel. + * + * Depending on the value returned by getauxval(AT_RSEQ_FEATURE_SIZE): + * + * 0: Return ORIG_RSEQ_FEATURE_SIZE (20) + * > 0: Return the value from getauxval(AT_RSEQ_FEATURE_SIZE). + * + * It should never return a value below ORIG_RSEQ_FEATURE_SIZE. + */ +static +unsigned int get_rseq_kernel_feature_size(void) +{ + unsigned long auxv_rseq_feature_size, auxv_rseq_align; + + auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); + assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); + + auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); + assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); + if (auxv_rseq_feature_size) + return auxv_rseq_feature_size; + else + return ORIG_RSEQ_FEATURE_SIZE; +} + int rseq_register_current_thread(void) { int rc; @@ -119,7 +150,7 @@ int rseq_register_current_thread(void) /* Treat libc's ownership as a successful registration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, 0, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG); if (rc) { if (RSEQ_READ_ONCE(rseq_reg_success)) { /* Incoherent success/failure within process. */ @@ -140,28 +171,12 @@ int rseq_unregister_current_thread(void) /* Treat libc's ownership as a successful unregistration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); if (rc) return -1; return 0; } -static -unsigned int get_rseq_feature_size(void) -{ - unsigned long auxv_rseq_feature_size, auxv_rseq_align; - - auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); - assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); - - auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); - assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); - if (auxv_rseq_feature_size) - return auxv_rseq_feature_size; - else - return ORIG_RSEQ_FEATURE_SIZE; -} - static __attribute__((constructor)) void rseq_init(void) { @@ -178,28 +193,54 @@ void rseq_init(void) } if (libc_rseq_size_p && libc_rseq_offset_p && libc_rseq_flags_p && *libc_rseq_size_p != 0) { + unsigned int libc_rseq_size; + /* rseq registration owned by glibc */ rseq_offset = *libc_rseq_offset_p; - rseq_size = *libc_rseq_size_p; + libc_rseq_size = *libc_rseq_size_p; rseq_flags = *libc_rseq_flags_p; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size > rseq_size) - rseq_feature_size = rseq_size; + + /* + * Previous versions of glibc expose the value + * 32 even though the kernel only supported 20 + * bytes initially. Therefore treat 32 as a + * special-case. glibc 2.40 exposes a 20 bytes + * __rseq_size without using getauxval(3) to + * query the supported size, while still allocating a 32 + * bytes area. Also treat 20 as a special-case. + * + * Special-cases are handled by using the following + * value as active feature set size: + * + * rseq_size = min(32, get_rseq_kernel_feature_size()) + */ + switch (libc_rseq_size) { + case ORIG_RSEQ_FEATURE_SIZE: + fallthrough; + case ORIG_RSEQ_ALLOC_SIZE: + { + unsigned int rseq_kernel_feature_size = get_rseq_kernel_feature_size(); + + if (rseq_kernel_feature_size < ORIG_RSEQ_ALLOC_SIZE) + rseq_size = rseq_kernel_feature_size; + else + rseq_size = ORIG_RSEQ_ALLOC_SIZE; + break; + } + default: + /* Otherwise just use the __rseq_size from libc as rseq_size. */ + rseq_size = libc_rseq_size; + break; + } return; } rseq_ownership = 1; if (!rseq_available()) { rseq_size = 0; - rseq_feature_size = 0; return; } rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer(); rseq_flags = 0; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE) - rseq_size = ORIG_RSEQ_ALLOC_SIZE; - else - rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE; } static __attribute__((destructor)) @@ -209,7 +250,6 @@ void rseq_exit(void) return; rseq_offset = 0; rseq_size = -1U; - rseq_feature_size = -1U; rseq_ownership = 0; } diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h index d7364ea4d201..4e217b620e0c 100644 --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -68,12 +68,6 @@ extern unsigned int rseq_size; /* Flags used during rseq registration. */ extern unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -extern unsigned int rseq_feature_size; - enum rseq_mo { RSEQ_MO_RELAXED = 0, RSEQ_MO_CONSUME = 1, /* Unused */ @@ -193,7 +187,7 @@ static inline uint32_t rseq_current_cpu(void) static inline bool rseq_node_id_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, node_id); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, node_id); } /* @@ -207,7 +201,7 @@ static inline uint32_t rseq_current_node_id(void) static inline bool rseq_mm_cid_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, mm_cid); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, mm_cid); } static inline uint32_t rseq_current_mm_cid(void) -- 2.50.1 From 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Thu, 10 Oct 2024 16:52:35 -0400 Subject: [PATCH 07/16] ftrace/selftest: Test combination of function_graph tracer and function profiler Masami reported a bug when running function graph tracing then the function profiler. The following commands would cause a kernel crash: # cd /sys/kernel/tracing/ # echo function_graph > current_tracer # echo 1 > function_profile_enabled In that order. Create a test to test this two to make sure this does not come back as a regression. Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2 Link: https://lore.kernel.org/all/20241010165235.35122877@gandalf.local.home/ Acked-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) Signed-off-by: Shuah Khan --- .../ftrace/test.d/ftrace/fgraph-profiler.tc | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc new file mode 100644 index 000000000000..ffff8646733c --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc @@ -0,0 +1,31 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: ftrace - function profiler with function graph tracing +# requires: function_profile_enabled set_ftrace_filter function_graph:tracer + +# The function graph tracer can now be run along side of the function +# profiler. But there was a bug that caused the combination of the two +# to crash. It also required the function graph tracer to be started +# first. +# +# This test triggers that bug +# +# We need both function_graph and profiling to run this test + +fail() { # mesg + echo $1 + exit_fail +} + +echo "Enabling function graph tracer:" +echo function_graph > current_tracer +echo "enable profiler" + +# Older kernels do not allow function_profile to be enabled with +# function graph tracer. If the below fails, mark it as unsupported +echo 1 > function_profile_enabled || exit_unsupported + +# Let it run for a bit to make sure nothing explodes +sleep 1 + +exit 0 -- 2.50.1 From 8e929cb546ee42c9a61d24fae60605e9e3192354 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 13 Oct 2024 14:33:32 -0700 Subject: [PATCH 08/16] Linux 6.12-rc3 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index c5493c0c0ca1..8cf3cf528892 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 6 PATCHLEVEL = 12 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -rc3 NAME = Baby Opossum Posse # *DOCUMENTATION* -- 2.50.1 From 9803787a23c57328cd70c393a661266c396d12fb Mon Sep 17 00:00:00 2001 From: =?utf8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Fri, 4 Oct 2024 17:31:20 +0200 Subject: [PATCH 09/16] landlock: Improve documentation of previous limitations MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Improve consistency of previous limitations' subsection titles, and expand a bit the IOCTL section. This changes some HTML anchors and may break some external links though. Cc: Konstantin Meskhidze Cc: Tahera Fahimi Reviewed-by: Günther Noack Link: https://lore.kernel.org/r/20241004153122.501775-1-mic@digikod.net Signed-off-by: Mickaël Salaün --- Documentation/userspace-api/landlock.rst | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index c8d3e46badc5..bb7480a05e2c 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -8,7 +8,7 @@ Landlock: unprivileged access control ===================================== :Author: Mickaël Salaün -:Date: September 2024 +:Date: October 2024 The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock @@ -563,33 +563,34 @@ always allowed when using a kernel that only supports the first or second ABI. Starting with the Landlock ABI version 3, it is now possible to securely control truncation thanks to the new ``LANDLOCK_ACCESS_FS_TRUNCATE`` access right. -Network support (ABI < 4) -------------------------- +TCP bind and connect (ABI < 4) +------------------------------ Starting with the Landlock ABI version 4, it is now possible to restrict TCP bind and connect actions to only a set of allowed ports thanks to the new ``LANDLOCK_ACCESS_NET_BIND_TCP`` and ``LANDLOCK_ACCESS_NET_CONNECT_TCP`` access rights. -IOCTL (ABI < 5) ---------------- +Device IOCTL (ABI < 5) +---------------------- IOCTL operations could not be denied before the fifth Landlock ABI, so :manpage:`ioctl(2)` is always allowed when using a kernel that only supports an earlier ABI. Starting with the Landlock ABI version 5, it is possible to restrict the use of -:manpage:`ioctl(2)` using the new ``LANDLOCK_ACCESS_FS_IOCTL_DEV`` right. +:manpage:`ioctl(2)` on character and block devices using the new +``LANDLOCK_ACCESS_FS_IOCTL_DEV`` right. -Abstract UNIX socket scoping (ABI < 6) --------------------------------------- +Abstract UNIX socket (ABI < 6) +------------------------------ Starting with the Landlock ABI version 6, it is possible to restrict connections to an abstract :manpage:`unix(7)` socket by setting ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET`` to the ``scoped`` ruleset attribute. -Signal scoping (ABI < 6) ------------------------- +Signal (ABI < 6) +---------------- Starting with the Landlock ABI version 6, it is possible to restrict :manpage:`signal(7)` sending by setting ``LANDLOCK_SCOPE_SIGNAL`` to the -- 2.50.1 From dad2f20715163e80aab284fb092efc8c18bf97c7 Mon Sep 17 00:00:00 2001 From: Daniel Burgener Date: Tue, 15 Oct 2024 13:26:46 -0400 Subject: [PATCH 10/16] landlock: Fix grammar issues in documentation MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Improve user space and kernel documentation. Signed-off-by: Daniel Burgener Link: https://lore.kernel.org/r/20241015172647.2007644-1-dburgener@linux.microsoft.com [mic: Extend commit message, reword ptrace restriction as discussed in the thread] Signed-off-by: Mickaël Salaün --- Documentation/security/landlock.rst | 14 ++--- Documentation/userspace-api/landlock.rst | 69 ++++++++++++------------ 2 files changed, 41 insertions(+), 42 deletions(-) diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst index 36f26501fd15..59ecdb1c0d4d 100644 --- a/Documentation/security/landlock.rst +++ b/Documentation/security/landlock.rst @@ -11,18 +11,18 @@ Landlock LSM: kernel documentation Landlock's goal is to create scoped access-control (i.e. sandboxing). To harden a whole system, this feature should be available to any process, -including unprivileged ones. Because such process may be compromised or +including unprivileged ones. Because such a process may be compromised or backdoored (i.e. untrusted), Landlock's features must be safe to use from the kernel and other processes point of view. Landlock's interface must therefore expose a minimal attack surface. Landlock is designed to be usable by unprivileged processes while following the system security policy enforced by other access control mechanisms (e.g. DAC, -LSM). Indeed, a Landlock rule shall not interfere with other access-controls -enforced on the system, only add more restrictions. +LSM). A Landlock rule shall not interfere with other access-controls enforced +on the system, only add more restrictions. Any user can enforce Landlock rulesets on their processes. They are merged and -evaluated according to the inherited ones in a way that ensures that only more +evaluated against inherited rulesets in a way that ensures that only more constraints can be added. User space documentation can be found here: @@ -43,7 +43,7 @@ Guiding principles for safe access controls only impact the processes requesting them. * Resources (e.g. file descriptors) directly obtained from the kernel by a sandboxed process shall retain their scoped accesses (at the time of resource - acquisition) whatever process use them. + acquisition) whatever process uses them. Cf. `File descriptor access rights`_. Design choices @@ -71,7 +71,7 @@ the same results, when they are executed under the same Landlock domain. Taking the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right as an example, it may be allowed to open a file for writing without being allowed to :manpage:`ftruncate` the resulting file descriptor if the related file -hierarchy doesn't grant such access right. The following sequences of +hierarchy doesn't grant that access right. The following sequences of operations have the same semantic and should then have the same result: * ``truncate(path);`` @@ -81,7 +81,7 @@ Similarly to file access modes (e.g. ``O_RDWR``), Landlock access rights attached to file descriptors are retained even if they are passed between processes (e.g. through a Unix domain socket). Such access rights will then be enforced even if the receiving process is not sandboxed by Landlock. Indeed, -this is required to keep a consistent access control over the whole system, and +this is required to keep access controls consistent over the whole system, and this avoids unattended bypasses through file descriptor passing (i.e. confused deputy attack). diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index bb7480a05e2c..d639c61cb472 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -10,11 +10,11 @@ Landlock: unprivileged access control :Author: Mickaël Salaün :Date: October 2024 -The goal of Landlock is to enable to restrict ambient rights (e.g. global +The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock -is a stackable LSM, it makes possible to create safe security sandboxes as new -security layers in addition to the existing system-wide access-controls. This -kind of sandbox is expected to help mitigate the security impact of bugs or +is a stackable LSM, it makes it possible to create safe security sandboxes as +new security layers in addition to the existing system-wide access-controls. +This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. @@ -86,8 +86,8 @@ to be explicit about the denied-by-default access rights. LANDLOCK_SCOPE_SIGNAL, }; -Because we may not know on which kernel version an application will be -executed, it is safer to follow a best-effort security approach. Indeed, we +Because we may not know which kernel version an application will be executed +on, it is safer to follow a best-effort security approach. Indeed, we should try to protect users as much as possible whatever the kernel they are using. @@ -129,7 +129,7 @@ version, and only use the available subset of access rights: LANDLOCK_SCOPE_SIGNAL); } -This enables to create an inclusive ruleset that will contain our rules. +This enables the creation of an inclusive ruleset that will contain our rules. .. code-block:: c @@ -219,42 +219,41 @@ If the ``landlock_restrict_self`` system call succeeds, the current thread is now restricted and this policy will be enforced on all its subsequently created children as well. Once a thread is landlocked, there is no way to remove its security policy; only adding more restrictions is allowed. These threads are -now in a new Landlock domain, merge of their parent one (if any) with the new -ruleset. +now in a new Landlock domain, which is a merger of their parent one (if any) +with the new ruleset. Full working code can be found in `samples/landlock/sandboxer.c`_. Good practices -------------- -It is recommended setting access rights to file hierarchy leaves as much as +It is recommended to set access rights to file hierarchy leaves as much as possible. For instance, it is better to be able to have ``~/doc/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy, compared to ``~/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy. Following this good practice leads to self-sufficient hierarchies that do not depend on their location (i.e. parent directories). This is particularly relevant when we want to allow linking or renaming. Indeed, having consistent -access rights per directory enables to change the location of such directory +access rights per directory enables changing the location of such directories without relying on the destination directory access rights (except those that are required for this operation, see ``LANDLOCK_ACCESS_FS_REFER`` documentation). Having self-sufficient hierarchies also helps to tighten the required access rights to the minimal set of data. This also helps avoid sinkhole directories, -i.e. directories where data can be linked to but not linked from. However, +i.e. directories where data can be linked to but not linked from. However, this depends on data organization, which might not be controlled by developers. In this case, granting read-write access to ``~/tmp/``, instead of write-only -access, would potentially allow to move ``~/tmp/`` to a non-readable directory +access, would potentially allow moving ``~/tmp/`` to a non-readable directory and still keep the ability to list the content of ``~/tmp/``. Layers of file path access rights --------------------------------- Each time a thread enforces a ruleset on itself, it updates its Landlock domain -with a new layer of policy. Indeed, this complementary policy is stacked with -the potentially other rulesets already restricting this thread. A sandboxed -thread can then safely add more constraints to itself with a new enforced -ruleset. +with a new layer of policy. This complementary policy is stacked with any +other rulesets potentially already restricting this thread. A sandboxed thread +can then safely add more constraints to itself with a new enforced ruleset. One policy layer grants access to a file path if at least one of its rules encountered on the path grants the access. A sandboxed thread can only access @@ -265,7 +264,7 @@ etc.). Bind mounts and OverlayFS ------------------------- -Landlock enables to restrict access to file hierarchies, which means that these +Landlock enables restricting access to file hierarchies, which means that these access rights can be propagated with bind mounts (cf. Documentation/filesystems/sharedsubtree.rst) but not with Documentation/filesystems/overlayfs.rst. @@ -278,21 +277,21 @@ access to multiple file hierarchies at the same time, whether these hierarchies are the result of bind mounts or not. An OverlayFS mount point consists of upper and lower layers. These layers are -combined in a merge directory, result of the mount point. This merge hierarchy -may include files from the upper and lower layers, but modifications performed -on the merge hierarchy only reflects on the upper layer. From a Landlock -policy point of view, each OverlayFS layers and merge hierarchies are -standalone and contains their own set of files and directories, which is -different from bind mounts. A policy restricting an OverlayFS layer will not -restrict the resulted merged hierarchy, and vice versa. Landlock users should -then only think about file hierarchies they want to allow access to, regardless -of the underlying filesystem. +combined in a merge directory, and that merged directory becomes available at +the mount point. This merge hierarchy may include files from the upper and +lower layers, but modifications performed on the merge hierarchy only reflect +on the upper layer. From a Landlock policy point of view, all OverlayFS layers +and merge hierarchies are standalone and each contains their own set of files +and directories, which is different from bind mounts. A policy restricting an +OverlayFS layer will not restrict the resulted merged hierarchy, and vice versa. +Landlock users should then only think about file hierarchies they want to allow +access to, regardless of the underlying filesystem. Inheritance ----------- Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain -restrictions from its parent. This is similar to the seccomp inheritance (cf. +restrictions from its parent. This is similar to seccomp inheritance (cf. Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with task's :manpage:`credentials(7)`. For instance, one process's thread may apply Landlock rules to itself, but they will not be automatically applied to other @@ -311,8 +310,8 @@ Ptrace restrictions A sandboxed process has less privileges than a non-sandboxed process and must then be subject to additional restrictions when manipulating another process. To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target -process, a sandboxed process should have a subset of the target process rules, -which means the tracee must be in a sub-domain of the tracer. +process, a sandboxed process should have a superset of the target process's +access rights, which means the tracee must be in a sub-domain of the tracer. IPC scoping ----------- @@ -322,7 +321,7 @@ interactions between sandboxes. Each Landlock domain can be explicitly scoped for a set of actions by specifying it on a ruleset. For example, if a sandboxed process should not be able to :manpage:`connect(2)` to a non-sandboxed process through abstract :manpage:`unix(7)` sockets, we can -specify such restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``. +specify such a restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``. Moreover, if a sandboxed process should not be able to send a signal to a non-sandboxed process, we can specify this restriction with ``LANDLOCK_SCOPE_SIGNAL``. @@ -394,7 +393,7 @@ Backward and forward compatibility Landlock is designed to be compatible with past and future versions of the kernel. This is achieved thanks to the system call attributes and the associated bitflags, particularly the ruleset's ``handled_access_fs``. Making -handled access right explicit enables the kernel and user space to have a clear +handled access rights explicit enables the kernel and user space to have a clear contract with each other. This is required to make sure sandboxing will not get stricter with a system update, which could break applications. @@ -606,9 +605,9 @@ Build time configuration Landlock was first introduced in Linux 5.13 but it must be configured at build time with ``CONFIG_SECURITY_LANDLOCK=y``. Landlock must also be enabled at boot -time as the other security modules. The list of security modules enabled by +time like other security modules. The list of security modules enabled by default is set with ``CONFIG_LSM``. The kernel configuration should then -contains ``CONFIG_LSM=landlock,[...]`` with ``[...]`` as the list of other +contain ``CONFIG_LSM=landlock,[...]`` with ``[...]`` as the list of other potentially useful security modules for the running system (see the ``CONFIG_LSM`` help). @@ -670,7 +669,7 @@ Questions and answers What about user space sandbox managers? --------------------------------------- -Using user space process to enforce restrictions on kernel resources can lead +Using user space processes to enforce restrictions on kernel resources can lead to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of the OS code and state `_). -- 2.50.1 From 387285530d1d4bdba8c5dff5aeabd8d71638173f Mon Sep 17 00:00:00 2001 From: Matthieu Buffet Date: Sat, 19 Oct 2024 17:15:32 +0200 Subject: [PATCH 11/16] samples/landlock: Fix port parsing in sandboxer MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit If you want to specify that no port can be bind()ed, you would think (looking quickly at both help message and code) that setting LL_TCP_BIND="" would do it. However the code splits on ":" then applies atoi(), which does not allow checking for errors. Passing an empty string returns 0, which is interpreted as "allow bind(0)", which means bind to any ephemeral port. This bug occurs whenever passing an empty string or when leaving a trailing/leading colon, making it impossible to completely deny bind(). To reproduce: export LL_FS_RO="/" LL_FS_RW="" LL_TCP_BIND="" ./sandboxer strace -e bind nc -n -vvv -l -p 0 Executing the sandboxed command... bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 Listening on 0.0.0.0 37629 Use strtoull(3) instead, which allows error checking. Check that the entire string has been parsed correctly without overflows/underflows, but not that the __u64 (the type of struct landlock_net_port_attr.port) is a valid __u16 port: that is already done by the kernel. Fixes: 5e990dcef12e ("samples/landlock: Support TCP restrictions") Signed-off-by: Matthieu Buffet Link: https://lore.kernel.org/r/20241019151534.1400605-2-matthieu@buffet.re Signed-off-by: Mickaël Salaün --- samples/landlock/sandboxer.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c index f847e832ba14..4cbef9d2f15b 100644 --- a/samples/landlock/sandboxer.c +++ b/samples/landlock/sandboxer.c @@ -60,6 +60,25 @@ static inline int landlock_restrict_self(const int ruleset_fd, #define ENV_SCOPED_NAME "LL_SCOPED" #define ENV_DELIMITER ":" +static int str2num(const char *numstr, __u64 *num_dst) +{ + char *endptr = NULL; + int err = 0; + __u64 num; + + errno = 0; + num = strtoull(numstr, &endptr, 10); + if (errno != 0) + err = errno; + /* Was the string empty, or not entirely parsed successfully? */ + else if ((*numstr == '\0') || (*endptr != '\0')) + err = EINVAL; + else + *num_dst = num; + + return err; +} + static int parse_path(char *env_path, const char ***const path_list) { int i, num_paths = 0; @@ -160,7 +179,6 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd, char *env_port_name, *env_port_name_next, *strport; struct landlock_net_port_attr net_port = { .allowed_access = allowed_access, - .port = 0, }; env_port_name = getenv(env_var); @@ -171,7 +189,17 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd, env_port_name_next = env_port_name; while ((strport = strsep(&env_port_name_next, ENV_DELIMITER))) { - net_port.port = atoi(strport); + __u64 port; + + if (strcmp(strport, "") == 0) + continue; + + if (str2num(strport, &port)) { + fprintf(stderr, "Failed to parse port at \"%s\"\n", + strport); + goto out_free_name; + } + net_port.port = port; if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, &net_port, 0)) { fprintf(stderr, -- 2.50.1 From f51e55a0892bd2030c847d4583c12498bb93f812 Mon Sep 17 00:00:00 2001 From: Matthieu Buffet Date: Sat, 19 Oct 2024 17:15:33 +0200 Subject: [PATCH 12/16] samples/landlock: Refactor help message MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Help message is getting larger with each new supported feature (scopes, and soon UDP). Also the large number of calls to fprintf with environment variables make it hard to read. Refactor it away into a single simpler constant format string. Signed-off-by: Matthieu Buffet Link: https://lore.kernel.org/r/20241019151534.1400605-3-matthieu@buffet.re [mic: Move the small cleanups in the next commit] Signed-off-by: Mickaël Salaün --- samples/landlock/sandboxer.c | 79 +++++++++++++++++------------------- 1 file changed, 38 insertions(+), 41 deletions(-) diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c index 4cbef9d2f15b..dba143f62bf5 100644 --- a/samples/landlock/sandboxer.c +++ b/samples/landlock/sandboxer.c @@ -290,6 +290,43 @@ out_unset: #define LANDLOCK_ABI_LAST 6 +#define XSTR(s) #s +#define STR(s) XSTR(s) + +/* clang-format off */ + +static const char help[] = + "usage: " + ENV_FS_RO_NAME "=\"...\" " + ENV_FS_RW_NAME "=\"...\" " + ENV_TCP_BIND_NAME "=\"...\" " + ENV_TCP_CONNECT_NAME "=\"...\" " + ENV_SCOPED_NAME "=\"...\" %1$s [args]...\n" + "\n" + "Execute a command in a restricted environment.\n" + "\n" + "Environment variables containing paths and ports each separated by a colon:\n" + "* " ENV_FS_RO_NAME ": list of paths allowed to be used in a read-only way.\n" + "* " ENV_FS_RW_NAME ": list of paths allowed to be used in a read-write way.\n" + "\n" + "Environment variables containing ports are optional and could be skipped.\n" + "* " ENV_TCP_BIND_NAME ": list of ports allowed to bind (server).\n" + "* " ENV_TCP_CONNECT_NAME ": list of ports allowed to connect (client).\n" + "* " ENV_SCOPED_NAME ": list of scoped IPCs.\n" + "\n" + "example:\n" + ENV_FS_RO_NAME "=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" " + ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " + ENV_TCP_BIND_NAME "=\"9418\" " + ENV_TCP_CONNECT_NAME "=\"80:443\" " + ENV_SCOPED_NAME "=\"a:s\" " + "%1$s bash -i\n" + "\n" + "This sandboxer can use Landlock features up to ABI version " + STR(LANDLOCK_ABI_LAST) ".\n"; + +/* clang-format on */ + int main(const int argc, char *const argv[], char *const *const envp) { const char *cmd_path; @@ -308,47 +345,7 @@ int main(const int argc, char *const argv[], char *const *const envp) }; if (argc < 2) { - fprintf(stderr, - "usage: %s=\"...\" %s=\"...\" %s=\"...\" %s=\"...\" %s=\"...\" %s " - " [args]...\n\n", - ENV_FS_RO_NAME, ENV_FS_RW_NAME, ENV_TCP_BIND_NAME, - ENV_TCP_CONNECT_NAME, ENV_SCOPED_NAME, argv[0]); - fprintf(stderr, - "Execute a command in a restricted environment.\n\n"); - fprintf(stderr, - "Environment variables containing paths and ports " - "each separated by a colon:\n"); - fprintf(stderr, - "* %s: list of paths allowed to be used in a read-only way.\n", - ENV_FS_RO_NAME); - fprintf(stderr, - "* %s: list of paths allowed to be used in a read-write way.\n\n", - ENV_FS_RW_NAME); - fprintf(stderr, - "Environment variables containing ports are optional " - "and could be skipped.\n"); - fprintf(stderr, - "* %s: list of ports allowed to bind (server).\n", - ENV_TCP_BIND_NAME); - fprintf(stderr, - "* %s: list of ports allowed to connect (client).\n", - ENV_TCP_CONNECT_NAME); - fprintf(stderr, "* %s: list of scoped IPCs.\n", - ENV_SCOPED_NAME); - fprintf(stderr, - "\nexample:\n" - "%s=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" " - "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " - "%s=\"9418\" " - "%s=\"80:443\" " - "%s=\"a:s\" " - "%s bash -i\n\n", - ENV_FS_RO_NAME, ENV_FS_RW_NAME, ENV_TCP_BIND_NAME, - ENV_TCP_CONNECT_NAME, ENV_SCOPED_NAME, argv[0]); - fprintf(stderr, - "This sandboxer can use Landlock features " - "up to ABI version %d.\n", - LANDLOCK_ABI_LAST); + fprintf(stderr, help, argv[0]); return 1; } -- 2.50.1 From 53b9d789df983790015ef04b0283ac5a33917cad Mon Sep 17 00:00:00 2001 From: Matthieu Buffet Date: Sat, 19 Oct 2024 17:15:34 +0200 Subject: [PATCH 13/16] samples/landlock: Clarify option parsing behaviour MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Clarify the distinction between filesystem variables (mandatory) and all others (optional). For optional variables, explain the difference between unset variables (no access check performed) and empty variables (nothing allowed for lists of allowed paths/ports, or no effect for lists of scopes). List the known LL_SCOPED values and their effect. Signed-off-by: Matthieu Buffet Link: https://lore.kernel.org/r/20241019151534.1400605-4-matthieu@buffet.re [mic: Add a missing colon] Signed-off-by: Mickaël Salaün --- samples/landlock/sandboxer.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c index dba143f62bf5..57565dfd74a2 100644 --- a/samples/landlock/sandboxer.c +++ b/samples/landlock/sandboxer.c @@ -296,25 +296,26 @@ out_unset: /* clang-format off */ static const char help[] = - "usage: " - ENV_FS_RO_NAME "=\"...\" " - ENV_FS_RW_NAME "=\"...\" " - ENV_TCP_BIND_NAME "=\"...\" " - ENV_TCP_CONNECT_NAME "=\"...\" " - ENV_SCOPED_NAME "=\"...\" %1$s [args]...\n" + "usage: " ENV_FS_RO_NAME "=\"...\" " ENV_FS_RW_NAME "=\"...\" " + "[other environment variables] %1$s [args]...\n" "\n" - "Execute a command in a restricted environment.\n" + "Execute the given command in a restricted environment.\n" + "Multi-valued settings (lists of ports, paths, scopes) are colon-delimited.\n" "\n" - "Environment variables containing paths and ports each separated by a colon:\n" - "* " ENV_FS_RO_NAME ": list of paths allowed to be used in a read-only way.\n" - "* " ENV_FS_RW_NAME ": list of paths allowed to be used in a read-write way.\n" + "Mandatory settings:\n" + "* " ENV_FS_RO_NAME ": paths allowed to be used in a read-only way\n" + "* " ENV_FS_RW_NAME ": paths allowed to be used in a read-write way\n" "\n" - "Environment variables containing ports are optional and could be skipped.\n" - "* " ENV_TCP_BIND_NAME ": list of ports allowed to bind (server).\n" - "* " ENV_TCP_CONNECT_NAME ": list of ports allowed to connect (client).\n" - "* " ENV_SCOPED_NAME ": list of scoped IPCs.\n" + "Optional settings (when not set, their associated access check " + "is always allowed, which is different from an empty string which " + "means an empty list):\n" + "* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n" + "* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n" + "* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n" + " - \"a\" to restrict opening abstract unix sockets\n" + " - \"s\" to restrict sending signals\n" "\n" - "example:\n" + "Example:\n" ENV_FS_RO_NAME "=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" " ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " ENV_TCP_BIND_NAME "=\"9418\" " -- 2.50.1 From 0c0effb07f7d662af3e6f74da4d34241e412029b Mon Sep 17 00:00:00 2001 From: =?utf8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Sat, 9 Nov 2024 12:08:54 +0100 Subject: [PATCH 14/16] landlock: Refactor filesystem access mask management MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Replace get_raw_handled_fs_accesses() with a generic landlock_union_access_masks(), and replace get_fs_domain() with a generic landlock_get_applicable_domain(). These helpers will also be useful for other types of access. Cc: Mikhail Ivanov Reviewed-by: Günther Noack Link: https://lore.kernel.org/r/20241109110856.222842-2-mic@digikod.net [mic: Slightly improve doc as suggested by Günther] Signed-off-by: Mickaël Salaün --- security/landlock/fs.c | 31 ++++----------- security/landlock/ruleset.h | 74 ++++++++++++++++++++++++++++++++---- security/landlock/syscalls.c | 2 +- 3 files changed, 75 insertions(+), 32 deletions(-) diff --git a/security/landlock/fs.c b/security/landlock/fs.c index 7d79fc8abe21..e31b97a9f175 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -388,38 +388,22 @@ static bool is_nouser_or_private(const struct dentry *dentry) unlikely(IS_PRIVATE(d_backing_inode(dentry)))); } -static access_mask_t -get_raw_handled_fs_accesses(const struct landlock_ruleset *const domain) -{ - access_mask_t access_dom = 0; - size_t layer_level; - - for (layer_level = 0; layer_level < domain->num_layers; layer_level++) - access_dom |= - landlock_get_raw_fs_access_mask(domain, layer_level); - return access_dom; -} - static access_mask_t get_handled_fs_accesses(const struct landlock_ruleset *const domain) { /* Handles all initially denied by default access rights. */ - return get_raw_handled_fs_accesses(domain) | + return landlock_union_access_masks(domain).fs | LANDLOCK_ACCESS_FS_INITIALLY_DENIED; } -static const struct landlock_ruleset * -get_fs_domain(const struct landlock_ruleset *const domain) -{ - if (!domain || !get_raw_handled_fs_accesses(domain)) - return NULL; - - return domain; -} +static const struct access_masks any_fs = { + .fs = ~0, +}; static const struct landlock_ruleset *get_current_fs_domain(void) { - return get_fs_domain(landlock_get_current_domain()); + return landlock_get_applicable_domain(landlock_get_current_domain(), + any_fs); } /* @@ -1517,7 +1501,8 @@ static int hook_file_open(struct file *const file) access_mask_t open_access_request, full_access_request, allowed_access, optional_access; const struct landlock_ruleset *const dom = - get_fs_domain(landlock_cred(file->f_cred)->domain); + landlock_get_applicable_domain( + landlock_cred(file->f_cred)->domain, any_fs); if (!dom) return 0; diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h index 61bdbc550172..631e24d4ffe9 100644 --- a/security/landlock/ruleset.h +++ b/security/landlock/ruleset.h @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -47,6 +48,15 @@ struct access_masks { access_mask_t scope : LANDLOCK_NUM_SCOPE; }; +union access_masks_all { + struct access_masks masks; + u32 all; +}; + +/* Makes sure all fields are covered. */ +static_assert(sizeof(typeof_member(union access_masks_all, masks)) == + sizeof(typeof_member(union access_masks_all, all))); + typedef u16 layer_mask_t; /* Makes sure all layers can be checked. */ static_assert(BITS_PER_TYPE(layer_mask_t) >= LANDLOCK_MAX_NUM_LAYERS); @@ -260,6 +270,61 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset) refcount_inc(&ruleset->usage); } +/** + * landlock_union_access_masks - Return all access rights handled in the + * domain + * + * @domain: Landlock ruleset (used as a domain) + * + * Returns: an access_masks result of the OR of all the domain's access masks. + */ +static inline struct access_masks +landlock_union_access_masks(const struct landlock_ruleset *const domain) +{ + union access_masks_all matches = {}; + size_t layer_level; + + for (layer_level = 0; layer_level < domain->num_layers; layer_level++) { + union access_masks_all layer = { + .masks = domain->access_masks[layer_level], + }; + + matches.all |= layer.all; + } + + return matches.masks; +} + +/** + * landlock_get_applicable_domain - Return @domain if it applies to (handles) + * at least one of the access rights specified + * in @masks + * + * @domain: Landlock ruleset (used as a domain) + * @masks: access masks + * + * Returns: @domain if any access rights specified in @masks is handled, or + * NULL otherwise. + */ +static inline const struct landlock_ruleset * +landlock_get_applicable_domain(const struct landlock_ruleset *const domain, + const struct access_masks masks) +{ + const union access_masks_all masks_all = { + .masks = masks, + }; + union access_masks_all merge = {}; + + if (!domain) + return NULL; + + merge.masks = landlock_union_access_masks(domain); + if (merge.all & masks_all.all) + return domain; + + return NULL; +} + static inline void landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset, const access_mask_t fs_access_mask, @@ -295,19 +360,12 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset, ruleset->access_masks[layer_level].scope |= mask; } -static inline access_mask_t -landlock_get_raw_fs_access_mask(const struct landlock_ruleset *const ruleset, - const u16 layer_level) -{ - return ruleset->access_masks[layer_level].fs; -} - static inline access_mask_t landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset, const u16 layer_level) { /* Handles all initially denied by default access rights. */ - return landlock_get_raw_fs_access_mask(ruleset, layer_level) | + return ruleset->access_masks[layer_level].fs | LANDLOCK_ACCESS_FS_INITIALLY_DENIED; } diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c index f5a0e7182ec0..c097d356fa45 100644 --- a/security/landlock/syscalls.c +++ b/security/landlock/syscalls.c @@ -329,7 +329,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset, return -ENOMSG; /* Checks that allowed_access matches the @ruleset constraints. */ - mask = landlock_get_raw_fs_access_mask(ruleset, 0); + mask = ruleset->access_masks[0].fs; if ((path_beneath_attr.allowed_access | mask) != mask) return -EINVAL; -- 2.50.1 From 8376226e5f53e78cd16a2b23577304e43acb3ba4 Mon Sep 17 00:00:00 2001 From: =?utf8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Sat, 9 Nov 2024 12:08:55 +0100 Subject: [PATCH 15/16] landlock: Refactor network access mask management MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Replace get_raw_handled_net_accesses() and get_current_net_domain() with a call to landlock_get_applicable_domain(). Cc: Konstantin Meskhidze Cc: Mikhail Ivanov Reviewed-by: Günther Noack Link: https://lore.kernel.org/r/20241109110856.222842-3-mic@digikod.net Signed-off-by: Mickaël Salaün --- security/landlock/net.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/security/landlock/net.c b/security/landlock/net.c index c8bcd29bde09..d5dcc4407a19 100644 --- a/security/landlock/net.c +++ b/security/landlock/net.c @@ -39,27 +39,9 @@ int landlock_append_net_rule(struct landlock_ruleset *const ruleset, return err; } -static access_mask_t -get_raw_handled_net_accesses(const struct landlock_ruleset *const domain) -{ - access_mask_t access_dom = 0; - size_t layer_level; - - for (layer_level = 0; layer_level < domain->num_layers; layer_level++) - access_dom |= landlock_get_net_access_mask(domain, layer_level); - return access_dom; -} - -static const struct landlock_ruleset *get_current_net_domain(void) -{ - const struct landlock_ruleset *const dom = - landlock_get_current_domain(); - - if (!dom || !get_raw_handled_net_accesses(dom)) - return NULL; - - return dom; -} +static const struct access_masks any_net = { + .net = ~0, +}; static int current_check_access_socket(struct socket *const sock, struct sockaddr *const address, @@ -72,7 +54,9 @@ static int current_check_access_socket(struct socket *const sock, struct landlock_id id = { .type = LANDLOCK_KEY_NET_PORT, }; - const struct landlock_ruleset *const dom = get_current_net_domain(); + const struct landlock_ruleset *const dom = + landlock_get_applicable_domain(landlock_get_current_domain(), + any_net); if (!dom) return 0; -- 2.50.1 From 03197e40a22c2641a1f9d1744418cd29f4954b83 Mon Sep 17 00:00:00 2001 From: =?utf8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Sat, 9 Nov 2024 12:08:56 +0100 Subject: [PATCH 16/16] landlock: Optimize scope enforcement MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Do not walk through the domain hierarchy when the required scope is not supported by this domain. This is the same approach as for filesystem and network restrictions. Cc: Mikhail Ivanov Cc: Tahera Fahimi Reviewed-by: Günther Noack Link: https://lore.kernel.org/r/20241109110856.222842-4-mic@digikod.net Signed-off-by: Mickaël Salaün --- security/landlock/task.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/security/landlock/task.c b/security/landlock/task.c index 4acbd7c40eee..dc7dab78392e 100644 --- a/security/landlock/task.c +++ b/security/landlock/task.c @@ -204,12 +204,17 @@ static bool is_abstract_socket(struct sock *const sock) return false; } +static const struct access_masks unix_scope = { + .scope = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET, +}; + static int hook_unix_stream_connect(struct sock *const sock, struct sock *const other, struct sock *const newsk) { const struct landlock_ruleset *const dom = - landlock_get_current_domain(); + landlock_get_applicable_domain(landlock_get_current_domain(), + unix_scope); /* Quick return for non-landlocked tasks. */ if (!dom) @@ -225,7 +230,8 @@ static int hook_unix_may_send(struct socket *const sock, struct socket *const other) { const struct landlock_ruleset *const dom = - landlock_get_current_domain(); + landlock_get_applicable_domain(landlock_get_current_domain(), + unix_scope); if (!dom) return 0; @@ -243,6 +249,10 @@ static int hook_unix_may_send(struct socket *const sock, return 0; } +static const struct access_masks signal_scope = { + .scope = LANDLOCK_SCOPE_SIGNAL, +}; + static int hook_task_kill(struct task_struct *const p, struct kernel_siginfo *const info, const int sig, const struct cred *const cred) @@ -256,6 +266,7 @@ static int hook_task_kill(struct task_struct *const p, } else { dom = landlock_get_current_domain(); } + dom = landlock_get_applicable_domain(dom, signal_scope); /* Quick return for non-landlocked tasks. */ if (!dom) @@ -279,7 +290,8 @@ static int hook_file_send_sigiotask(struct task_struct *tsk, /* Lock already held by send_sigio() and send_sigurg(). */ lockdep_assert_held(&fown->lock); - dom = landlock_file(fown->file)->fown_domain; + dom = landlock_get_applicable_domain( + landlock_file(fown->file)->fown_domain, signal_scope); /* Quick return for unowned socket. */ if (!dom) -- 2.50.1