From be913e7c4034bd7a5cbfc3d53188344dc588d45c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 20 Apr 2025 11:04:00 -0700 Subject: [PATCH 01/16] gcc-15: get rid of misc extra NUL character padding This removes two cases of explicit NUL padding that now causes warnings because of '-Wunterminated-string-initialization' being part of -Wextra in gcc-15. Gcc is being silly in this case when it says that it truncates a NUL terminator, because in these cases there were _multiple_ NUL characters. But we can get rid of the warning by just simplifying the two initializers that trigger the warning for me, so this does exactly that. I'm not sure why the power supply code did that odd .attr_name = #_name "\0", pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core: Cleanup power supply sysfs attribute list"), but that 'attr_name[]' field is an explicitly sized character array in a statically initialized variable, and a string initializer always has a terminating NUL _and_ statically initialized character arrays are zero-padded anyway, so it really seems to be rather extraneous belt-and-suspenders. The zero_uuid[16] initialization in drivers/md/bcache/super.c makes perfect sense, but it isn't necessary for the same reasons, and not worth the new gcc warning noise. Signed-off-by: Linus Torvalds --- drivers/md/bcache/super.c | 2 +- drivers/power/supply/power_supply_sysfs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e42f1400cea9..813b38aec3e4 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -546,7 +546,7 @@ static struct uuid_entry *uuid_find(struct cache_set *c, const char *uuid) static struct uuid_entry *uuid_find_empty(struct cache_set *c) { - static const char zero_uuid[16] = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"; + static const char zero_uuid[16] = { 0 }; return uuid_find(c, zero_uuid); } diff --git a/drivers/power/supply/power_supply_sysfs.c b/drivers/power/supply/power_supply_sysfs.c index edb058c19c9c..439dd0bf8644 100644 --- a/drivers/power/supply/power_supply_sysfs.c +++ b/drivers/power/supply/power_supply_sysfs.c @@ -33,7 +33,7 @@ struct power_supply_attr { [POWER_SUPPLY_PROP_ ## _name] = \ { \ .prop_name = #_name, \ - .attr_name = #_name "\0", \ + .attr_name = #_name, \ .text_values = _text, \ .text_values_len = _len, \ } -- 2.51.0 From 05e8d261a34e5c637e37be55c26e42cf5c75ee5c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 20 Apr 2025 11:18:55 -0700 Subject: [PATCH 02/16] gcc-15: add '__nonstring' markers to byte arrays All of these cases are perfectly valid and good traditional C, but hit by the "you're not NUL-terminating your byte array" warning. And none of the cases want any terminating NUL character. Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974 magnetometer driver, I just removed the explicit array size and let gcc expand the 3-byte and 6-byte arrays by one extra byte, because it was the simpler change). Signed-off-by: Linus Torvalds --- drivers/iio/magnetometer/ak8974.c | 4 ++-- drivers/input/joystick/magellan.c | 2 +- drivers/net/wireless/ath/carl9170/fw.c | 2 +- fs/cachefiles/key.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iio/magnetometer/ak8974.c b/drivers/iio/magnetometer/ak8974.c index 08975c60e325..7bc341c69697 100644 --- a/drivers/iio/magnetometer/ak8974.c +++ b/drivers/iio/magnetometer/ak8974.c @@ -535,8 +535,8 @@ static int ak8974_detect(struct ak8974 *ak8974) fab_data2, sizeof(fab_data2)); for (i = 0; i < 3; ++i) { - static const char axis[3] = "XYZ"; - static const char pgaxis[6] = "ZYZXYX"; + static const char axis[] = "XYZ"; + static const char pgaxis[] = "ZYZXYX"; unsigned offz = le16_to_cpu(fab_data2[i]) & 0x7F; unsigned fine = le16_to_cpu(fab_data1[i]); unsigned sens = le16_to_cpu(fab_data1[i + 3]); diff --git a/drivers/input/joystick/magellan.c b/drivers/input/joystick/magellan.c index 2eaa25c9c68c..d73389af4dd5 100644 --- a/drivers/input/joystick/magellan.c +++ b/drivers/input/joystick/magellan.c @@ -48,7 +48,7 @@ struct magellan { static int magellan_crunch_nibbles(unsigned char *data, int count) { - static unsigned char nibbles[16] = "0AB3D56GH9:K #include "internal.h" -static const char cachefiles_charmap[64] = +static const char cachefiles_charmap[64] __nonstring = "0123456789" /* 0 - 9 */ "abcdefghijklmnopqrstuvwxyz" /* 10 - 35 */ "ABCDEFGHIJKLMNOPQRSTUVWXYZ" /* 36 - 61 */ -- 2.51.0 From ac71fabf15679fc7bc56c51bc92bd4b626564c37 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 20 Apr 2025 11:30:11 -0700 Subject: [PATCH 03/16] gcc-15: work around sequence-point warning MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit The C sequence points are complicated things, and gcc-15 has apparently added a warning for the case where an object is both used and modified multiple times within the same sequence point. That's a great warning. Or rather, it would be a great warning, except gcc-15 seems to not really be very exact about it, and doesn't notice that the modification are to two entirely different members of the same object: the array counter and the array entries. So that seems kind of silly. That said, the code that gcc complains about is unnecessarily complicated, so moving the array counter update into a separate statement seems like the most straightforward fix for these warnings: drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’: drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point] 1102 | netdetect_info->matches[netdetect_info->n_matches++] = match; | ~~~~~~~~~~~~~~~~~~~~~~~~~^~ drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point] 1120 | match->channels[match->n_channels++] = | ~~~~~~~~~~~~~~~~~^~ side note: the code at that second warning is actively buggy, and only works on little-endian machines that don't do strict alignment checks. The code casts an array of integers into an array of unsigned long in order to use our bitmap iterators. That happens to work fine on any sane architecture, but it's still wrong. This does *not* fix that more serious problem. This only splits the two assignments into two statements and fixes the compiler warning. I need to get rid of the new warnings in order to be able to actually do any build testing. Signed-off-by: Linus Torvalds --- drivers/net/wireless/intel/iwlwifi/mld/d3.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/d3.c b/drivers/net/wireless/intel/iwlwifi/mld/d3.c index 2c6e8ecd93b7..ee99298eebf5 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/d3.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/d3.c @@ -1099,7 +1099,8 @@ iwl_mld_set_netdetect_info(struct iwl_mld *mld, if (!match) return; - netdetect_info->matches[netdetect_info->n_matches++] = match; + netdetect_info->matches[netdetect_info->n_matches] = match; + netdetect_info->n_matches++; /* We inverted the order of the SSIDs in the scan * request, so invert the index here. @@ -1116,9 +1117,11 @@ iwl_mld_set_netdetect_info(struct iwl_mld *mld, for_each_set_bit(j, (unsigned long *)&matches[i].matching_channels[0], - sizeof(matches[i].matching_channels)) - match->channels[match->n_channels++] = + sizeof(matches[i].matching_channels)) { + match->channels[match->n_channels] = netdetect_cfg->channels[j]->center_freq; + match->n_channels++; + } } } -- 2.51.0 From 9c32cda43eb78f78c73aee4aa344b777714e259b Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 20 Apr 2025 13:43:47 -0700 Subject: [PATCH 04/16] Linux 6.15-rc3 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 0a9992db4fe0..3dcad2319662 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 6 PATCHLEVEL = 15 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -rc3 NAME = Baby Opossum Posse # *DOCUMENTATION* -- 2.51.0 From f8693f6dffcd1865da7f5f4c3df1de445e519bec Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:25 +0800 Subject: [PATCH 05/16] riscv: ftrace: support fastcc in Clang for WITH_ARGS MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Some caller-saved registers which are not defined as function arguments in the ABI can still be passed as arguments when the kernel is compiled with Clang. As a result, we must save and restore those registers to prevent ftrace from clobbering them. - [1]: https://reviews.llvm.org/D68559 Reported-by: Evgenii Shatokhin Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/ Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS") Acked-by: Nathan Chancellor Reviewed-by: Björn Töpel Signed-off-by: Andy Chiu Tested-by: Björn Töpel Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/ftrace.h | 7 +++++++ arch/riscv/kernel/asm-offsets.c | 7 +++++++ arch/riscv/kernel/mcount-dyn.S | 16 ++++++++++++++-- 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h index d627f63ee289..d8b2138bd9c6 100644 --- a/arch/riscv/include/asm/ftrace.h +++ b/arch/riscv/include/asm/ftrace.h @@ -146,6 +146,13 @@ struct __arch_ftrace_regs { unsigned long a5; unsigned long a6; unsigned long a7; +#ifdef CONFIG_CC_IS_CLANG + unsigned long t2; + unsigned long t3; + unsigned long t4; + unsigned long t5; + unsigned long t6; +#endif }; }; }; diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 16490755304e..7c43c8e26ae7 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -501,6 +501,13 @@ void asm_offsets(void) DEFINE(FREGS_SP, offsetof(struct __arch_ftrace_regs, sp)); DEFINE(FREGS_S0, offsetof(struct __arch_ftrace_regs, s0)); DEFINE(FREGS_T1, offsetof(struct __arch_ftrace_regs, t1)); +#ifdef CONFIG_CC_IS_CLANG + DEFINE(FREGS_T2, offsetof(struct __arch_ftrace_regs, t2)); + DEFINE(FREGS_T3, offsetof(struct __arch_ftrace_regs, t3)); + DEFINE(FREGS_T4, offsetof(struct __arch_ftrace_regs, t4)); + DEFINE(FREGS_T5, offsetof(struct __arch_ftrace_regs, t5)); + DEFINE(FREGS_T6, offsetof(struct __arch_ftrace_regs, t6)); +#endif DEFINE(FREGS_A0, offsetof(struct __arch_ftrace_regs, a0)); DEFINE(FREGS_A1, offsetof(struct __arch_ftrace_regs, a1)); DEFINE(FREGS_A2, offsetof(struct __arch_ftrace_regs, a2)); diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S index 745dd4c4a69c..e988bd26b28b 100644 --- a/arch/riscv/kernel/mcount-dyn.S +++ b/arch/riscv/kernel/mcount-dyn.S @@ -96,7 +96,13 @@ REG_S x8, FREGS_S0(sp) #endif REG_S x6, FREGS_T1(sp) - +#ifdef CONFIG_CC_IS_CLANG + REG_S x7, FREGS_T2(sp) + REG_S x28, FREGS_T3(sp) + REG_S x29, FREGS_T4(sp) + REG_S x30, FREGS_T5(sp) + REG_S x31, FREGS_T6(sp) +#endif // save the arguments REG_S x10, FREGS_A0(sp) REG_S x11, FREGS_A1(sp) @@ -115,7 +121,13 @@ REG_L x8, FREGS_S0(sp) #endif REG_L x6, FREGS_T1(sp) - +#ifdef CONFIG_CC_IS_CLANG + REG_L x7, FREGS_T2(sp) + REG_L x28, FREGS_T3(sp) + REG_L x29, FREGS_T4(sp) + REG_L x30, FREGS_T5(sp) + REG_L x31, FREGS_T6(sp) +#endif // restore the arguments REG_L x10, FREGS_A0(sp) REG_L x11, FREGS_A1(sp) -- 2.51.0 From 54ecbc8d857122b32a98fe204cb61a06aabc063d Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:26 +0800 Subject: [PATCH 06/16] riscv: ftrace factor out code defined by !WITH_ARG DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in riscv, so we can remove ifdef jargons of WITH_ARG when it is known that DYNAMIC_FTRACE is true. Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-2-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/ftrace.c | 15 --------------- arch/riscv/kernel/mcount-dyn.S | 34 ---------------------------------- 2 files changed, 49 deletions(-) diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index 674dcdfae7a1..1fd10555c580 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -210,7 +210,6 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr, } #ifdef CONFIG_DYNAMIC_FTRACE -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { @@ -231,19 +230,5 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, if (!function_graph_enter_regs(old, ip, frame_pointer, parent, fregs)) *parent = return_hooker; } -#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ -extern void ftrace_graph_call(void); -int ftrace_enable_ftrace_graph_caller(void) -{ - return __ftrace_modify_call((unsigned long)&ftrace_graph_call, - (unsigned long)&prepare_ftrace_return, true, true); -} - -int ftrace_disable_ftrace_graph_caller(void) -{ - return __ftrace_modify_call((unsigned long)&ftrace_graph_call, - (unsigned long)&prepare_ftrace_return, false, true); -} -#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ #endif /* CONFIG_DYNAMIC_FTRACE */ #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S index e988bd26b28b..3f06b40bb6c8 100644 --- a/arch/riscv/kernel/mcount-dyn.S +++ b/arch/riscv/kernel/mcount-dyn.S @@ -56,8 +56,6 @@ addi sp, sp, ABI_SIZE_ON_STACK .endm -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS - /** * SAVE_ABI_REGS - save regs against the ftrace_regs struct * @@ -149,36 +147,6 @@ mv a3, sp .endm -#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ - -#ifndef CONFIG_DYNAMIC_FTRACE_WITH_ARGS -SYM_FUNC_START(ftrace_caller) - SAVE_ABI - - addi a0, t0, -FENTRY_RA_OFFSET - la a1, function_trace_op - REG_L a2, 0(a1) - mv a1, ra - mv a3, sp - -SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) - call ftrace_stub - -#ifdef CONFIG_FUNCTION_GRAPH_TRACER - addi a0, sp, ABI_RA - REG_L a1, ABI_T0(sp) - addi a1, a1, -FENTRY_RA_OFFSET -#ifdef HAVE_FUNCTION_GRAPH_FP_TEST - mv a2, s0 -#endif -SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) - call ftrace_stub -#endif - RESTORE_ABI - jr t0 -SYM_FUNC_END(ftrace_caller) - -#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ SYM_FUNC_START(ftrace_caller) mv t1, zero SAVE_ABI_REGS @@ -194,8 +162,6 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) jr t1 SYM_FUNC_END(ftrace_caller) -#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ - #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS SYM_CODE_START(ftrace_stub_direct_tramp) jr t0 -- 2.51.0 From c41bf4326c7b0af3f7004235ac91551833ec95c6 Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:27 +0800 Subject: [PATCH 07/16] riscv: ftrace: align patchable functions to 4 Byte boundary MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit We are changing ftrace code patching in order to remove dependency from stop_machine() and enable kernel preemption. This requires us to align functions entry at a 4-B align address. However, -falign-functions on older versions of GCC alone was not strong enoungh to align all functions. In fact, cold functions are not aligned after turning on optimizations. We consider this is a bug in GCC and turn off guess-branch-probility as a workaround to align all functions. GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 The option -fmin-function-alignment is able to align all functions properly on newer versions of gcc. So, we add a cc-option to test if the toolchain supports it. Suggested-by: Evgenii Shatokhin Signed-off-by: Andy Chiu Reviewed-by: Björn Töpel Link: https://lore.kernel.org/r/20250407180838.42877-3-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index bbec87b79309..7dbed10843d2 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -150,6 +150,7 @@ config RISCV select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS if MMU select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE) + select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE select HAVE_FTRACE_GRAPH_FUNC @@ -236,6 +237,7 @@ config CLANG_SUPPORTS_DYNAMIC_FTRACE config GCC_SUPPORTS_DYNAMIC_FTRACE def_bool CC_IS_GCC depends on $(cc-option,-fpatchable-function-entry=8) + depends on CC_HAS_MIN_FUNCTION_ALIGNMENT || !RISCV_ISA_C config HAVE_SHADOW_CALL_STACK def_bool $(cc-option,-fsanitize=shadow-call-stack) -- 2.51.0 From 500e626c4a5bf2fa7cc6f1cd6dd86c77b5b127f2 Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:28 +0800 Subject: [PATCH 08/16] kernel: ftrace: export ftrace_sync_ipi The following ftrace patch for riscv uses a data store to update ftrace function. Therefore, a romote fence is required to order it against function_trace_op updates. The mechanism is similar to the fence between function_trace_op and update_ftrace_func in the generic ftrace, so we leverage the same ftrace_sync_ipi function. [ alex: Fix build warning when !CONFIG_DYNAMIC_FTRACE ] Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-4-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- include/linux/ftrace.h | 2 ++ kernel/trace/ftrace.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index fbabc3d848b3..30374478cb07 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -635,6 +635,8 @@ enum { #define ftrace_get_symaddr(fentry_ip) (0) #endif +void ftrace_sync_ipi(void *data); + #ifdef CONFIG_DYNAMIC_FTRACE void ftrace_arch_code_modify_prepare(void); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 61130bb34d6c..31e9fe3bf964 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -188,7 +188,7 @@ static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip, op->saved_func(ip, parent_ip, op, fregs); } -static void ftrace_sync_ipi(void *data) +void ftrace_sync_ipi(void *data) { /* Probably not needed, but do it anyway */ smp_rmb(); -- 2.51.0 From b2137c3b6d7a45622967aac78b46a4bd2cff6c42 Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:29 +0800 Subject: [PATCH 09/16] riscv: ftrace: prepare ftrace for atomic code patching MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since instruction fetch can break down to 4 byte at a time, it is impossible to update two instructions without a race. In order to mitigate it, we initialize the patchable entry to AUIPC + NOP4. Then, the run-time code patching can change NOP4 to JALR to eable/disable ftrcae from a function. This limits the reach of each ftrace entry to +-2KB displacing from ftrace_caller. Starting from the trampoline, we add a level of indirection for it to reach ftrace caller target. Now, it loads the target address from a memory location, then perform the jump. This enable the kernel to update the target atomically. The new don't-stop-the-world text patching on change only one RISC-V instruction: | -8: &ftrace_ops of the associated tracer function. | : | 0: auipc t0, hi(ftrace_caller) | 4: jalr t0, lo(ftrace_caller) | | -8: &ftrace_nop_ops | : | 0: auipc t0, hi(ftrace_caller) | 4: nop This means that f+0x0 is fixed, and should not be claimed by ftrace, e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the offset and MCOUNT_INSN_SIZE accordingly. [ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ] Co-developed-by: Björn Töpel Signed-off-by: Björn Töpel Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/ftrace.h | 49 +++++------- arch/riscv/kernel/ftrace.c | 137 +++++++++++++++++--------------- arch/riscv/kernel/mcount-dyn.S | 9 +-- 3 files changed, 98 insertions(+), 97 deletions(-) diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h index d8b2138bd9c6..6a5c0a7fb826 100644 --- a/arch/riscv/include/asm/ftrace.h +++ b/arch/riscv/include/asm/ftrace.h @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level); #define ftrace_return_address(n) return_address(n) void _mcount(void); -static inline unsigned long ftrace_call_adjust(unsigned long addr) -{ - return addr; -} +unsigned long ftrace_call_adjust(unsigned long addr); +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip); +#define ftrace_get_symaddr(fentry_ip) arch_ftrace_get_symaddr(fentry_ip) /* * Let's do like x86/arm64 and ignore the compat syscalls. @@ -57,12 +56,21 @@ struct dyn_arch_ftrace { * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to * return address (original pc + 4) * + * The first 2 instructions for each tracable function is compiled to 2 nop + * instructions. Then, the kernel initializes the first instruction to auipc at + * boot time (). The second instruction is patched to jalr to + * start the trace. + * + *: + * 0: nop + * 4: nop + * *: - * 0: auipc t0/ra, 0x? - * 4: jalr t0/ra, ?(t0/ra) + * 0: auipc t0, 0x? + * 4: jalr t0, ?(t0) * *: - * 0: nop + * 0: auipc t0, 0x? * 4: nop * * Dynamic ftrace generates probes to call sites, so we must deal with @@ -75,10 +83,9 @@ struct dyn_arch_ftrace { #define AUIPC_OFFSET_MASK (0xfffff000) #define AUIPC_PAD (0x00001000) #define JALR_SHIFT 20 -#define JALR_RA (0x000080e7) -#define AUIPC_RA (0x00000097) #define JALR_T0 (0x000282e7) #define AUIPC_T0 (0x00000297) +#define JALR_RANGE (JALR_SIGN_MASK - 1) #define to_jalr_t0(offset) \ (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0) @@ -96,26 +103,14 @@ do { \ call[1] = to_jalr_t0(offset); \ } while (0) -#define to_jalr_ra(offset) \ - (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA) - -#define to_auipc_ra(offset) \ - ((offset & JALR_SIGN_MASK) ? \ - (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) : \ - ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA)) - -#define make_call_ra(caller, callee, call) \ -do { \ - unsigned int offset = \ - (unsigned long) (callee) - (unsigned long) (caller); \ - call[0] = to_auipc_ra(offset); \ - call[1] = to_jalr_ra(offset); \ -} while (0) - /* - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here. + * Only the jalr insn in the auipc+jalr is patched, so we make it 4 + * bytes here. */ -#define MCOUNT_INSN_SIZE 8 +#define MCOUNT_INSN_SIZE 4 +#define MCOUNT_AUIPC_SIZE 4 +#define MCOUNT_JALR_SIZE 4 +#define MCOUNT_NOP4_SIZE 4 #ifndef __ASSEMBLY__ struct dyn_ftrace; diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index 1fd10555c580..ea04f09f9d4d 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -8,11 +8,22 @@ #include #include #include +#include #include #include #include #ifdef CONFIG_DYNAMIC_FTRACE +unsigned long ftrace_call_adjust(unsigned long addr) +{ + return addr + MCOUNT_AUIPC_SIZE; +} + +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip) +{ + return fentry_ip - MCOUNT_AUIPC_SIZE; +} + void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex) { mutex_lock(&text_mutex); @@ -32,51 +43,32 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex) mutex_unlock(&text_mutex); } -static int ftrace_check_current_call(unsigned long hook_pos, - unsigned int *expected) +static int __ftrace_modify_call(unsigned long source, unsigned long target, bool validate) { + unsigned int call[2], offset; unsigned int replaced[2]; - unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4}; - /* we expect nops at the hook position */ - if (!expected) - expected = nops; + offset = target - source; + call[1] = to_jalr_t0(offset); - /* - * Read the text we want to modify; - * return must be -EFAULT on read error - */ - if (copy_from_kernel_nofault(replaced, (void *)hook_pos, - MCOUNT_INSN_SIZE)) - return -EFAULT; - - /* - * Make sure it is what we expect it to be; - * return must be -EINVAL on failed comparison - */ - if (memcmp(expected, replaced, sizeof(replaced))) { - pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n", - (void *)hook_pos, expected[0], expected[1], replaced[0], - replaced[1]); - return -EINVAL; + if (validate) { + call[0] = to_auipc_t0(offset); + /* + * Read the text we want to modify; + * return must be -EFAULT on read error + */ + if (copy_from_kernel_nofault(replaced, (void *)source, 2 * MCOUNT_INSN_SIZE)) + return -EFAULT; + + if (replaced[0] != call[0]) { + pr_err("%p: expected (%08x) but got (%08x)\n", + (void *)source, call[0], replaced[0]); + return -EINVAL; + } } - return 0; -} - -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, - bool enable, bool ra) -{ - unsigned int call[2]; - unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4}; - - if (ra) - make_call_ra(hook_pos, target, call); - else - make_call_t0(hook_pos, target, call); - - /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */ - if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE)) + /* Replace the jalr at once. Return -EPERM on write error. */ + if (patch_insn_write((void *)(source + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE)) return -EPERM; return 0; @@ -84,22 +76,21 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) { - unsigned int call[2]; - - make_call_t0(rec->ip, addr, call); + unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE; - if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE)) - return -EPERM; + orig_addr = (unsigned long)&ftrace_caller; + distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr; + if (distance > JALR_RANGE) + return -EINVAL; - return 0; + return __ftrace_modify_call(pc, addr, false); } -int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, - unsigned long addr) +int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr) { - unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4}; + u32 nop4 = RISCV_INSN_NOP4; - if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE)) + if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE)) return -EPERM; return 0; @@ -114,21 +105,38 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, */ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) { - int out; + unsigned long pc = rec->ip - MCOUNT_AUIPC_SIZE; + unsigned int nops[2], offset; + int ret; + + offset = (unsigned long) &ftrace_caller - pc; + nops[0] = to_auipc_t0(offset); + nops[1] = RISCV_INSN_NOP4; mutex_lock(&text_mutex); - out = ftrace_make_nop(mod, rec, MCOUNT_ADDR); + ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE); mutex_unlock(&text_mutex); - return out; + return ret; } +ftrace_func_t ftrace_call_dest = ftrace_stub; int ftrace_update_ftrace_func(ftrace_func_t func) { - int ret = __ftrace_modify_call((unsigned long)&ftrace_call, - (unsigned long)func, true, true); - - return ret; + WRITE_ONCE(ftrace_call_dest, func); + /* + * The data fence ensure that the update to ftrace_call_dest happens + * before the write to function_trace_op later in the generic ftrace. + * If the sequence is not enforced, then an old ftrace_call_dest may + * race loading a new function_trace_op set in ftrace_modify_all_code + * + * If we are in stop_machine, then we don't need to call remote fence + * as there is no concurrent read-side of ftrace_call_dest. + */ + smp_wmb(); + if (!irqs_disabled()) + smp_call_function(ftrace_sync_ipi, NULL, 1); + return 0; } struct ftrace_modify_param { @@ -166,23 +174,22 @@ void arch_ftrace_update_code(int command) stop_machine(__ftrace_modify_code, ¶m, cpu_online_mask); } -#endif +#else /* CONFIG_DYNAMIC_FTRACE */ +unsigned long ftrace_call_adjust(unsigned long addr) +{ + return addr; +} +#endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr) { + unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE; unsigned int call[2]; - unsigned long caller = rec->ip; - int ret; make_call_t0(caller, old_addr, call); - ret = ftrace_check_current_call(caller, call); - - if (ret) - return ret; - - return __ftrace_modify_call(caller, addr, true, false); + return __ftrace_modify_call(caller, addr, true); } #endif diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S index 3f06b40bb6c8..8aa554d56096 100644 --- a/arch/riscv/kernel/mcount-dyn.S +++ b/arch/riscv/kernel/mcount-dyn.S @@ -13,7 +13,6 @@ .text -#define FENTRY_RA_OFFSET 8 #define ABI_SIZE_ON_STACK 80 #define ABI_A0 0 #define ABI_A1 8 @@ -62,8 +61,7 @@ * After the stack is established, * * 0(sp) stores the PC of the traced function which can be accessed -* by &(fregs)->epc in tracing function. Note that the real -* function entry address should be computed with -FENTRY_RA_OFFSET. +* by &(fregs)->epc in tracing function. * * 8(sp) stores the function return address (i.e. parent IP) that * can be accessed by &(fregs)->ra in tracing function. @@ -140,7 +138,7 @@ .endm .macro PREPARE_ARGS - addi a0, t0, -FENTRY_RA_OFFSET + addi a0, t0, -MCOUNT_JALR_SIZE // ip (callsite's jalr insn) la a1, function_trace_op REG_L a2, 0(a1) mv a1, ra @@ -153,7 +151,8 @@ SYM_FUNC_START(ftrace_caller) PREPARE_ARGS SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) - call ftrace_stub + REG_L ra, ftrace_call_dest + jalr ra, 0(ra) RESTORE_ABI_REGS bnez t1, .Ldirect -- 2.51.0 From 5aa4ef95588456df5a5563dd23892827f97fb14f Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:30 +0800 Subject: [PATCH 10/16] riscv: ftrace: do not use stop_machine to update code Now it is safe to remove dependency from stop_machine() for us to patch code in ftrace. Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-6-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/ftrace.c | 64 ++++++-------------------------------- 1 file changed, 10 insertions(+), 54 deletions(-) diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index ea04f09f9d4d..b133c60808fe 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -24,23 +24,13 @@ unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip) return fentry_ip - MCOUNT_AUIPC_SIZE; } -void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex) +void arch_ftrace_update_code(int command) { mutex_lock(&text_mutex); - - /* - * The code sequences we use for ftrace can't be patched while the - * kernel is running, so we need to use stop_machine() to modify them - * for now. This doesn't play nice with text_mutex, we use this flag - * to elide the check. - */ - riscv_patch_in_stop_machine = true; -} - -void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex) -{ - riscv_patch_in_stop_machine = false; + command |= FTRACE_MAY_SLEEP; + ftrace_modify_all_code(command); mutex_unlock(&text_mutex); + flush_icache_all(); } static int __ftrace_modify_call(unsigned long source, unsigned long target, bool validate) @@ -129,51 +119,17 @@ int ftrace_update_ftrace_func(ftrace_func_t func) * before the write to function_trace_op later in the generic ftrace. * If the sequence is not enforced, then an old ftrace_call_dest may * race loading a new function_trace_op set in ftrace_modify_all_code - * - * If we are in stop_machine, then we don't need to call remote fence - * as there is no concurrent read-side of ftrace_call_dest. */ smp_wmb(); - if (!irqs_disabled()) - smp_call_function(ftrace_sync_ipi, NULL, 1); - return 0; -} - -struct ftrace_modify_param { - int command; - atomic_t cpu_count; -}; - -static int __ftrace_modify_code(void *data) -{ - struct ftrace_modify_param *param = data; - - if (atomic_inc_return(¶m->cpu_count) == num_online_cpus()) { - ftrace_modify_all_code(param->command); - /* - * Make sure the patching store is effective *before* we - * increment the counter which releases all waiting CPUs - * by using the release variant of atomic increment. The - * release pairs with the call to local_flush_icache_all() - * on the waiting CPU. - */ - atomic_inc_return_release(¶m->cpu_count); - } else { - while (atomic_read(¶m->cpu_count) <= num_online_cpus()) - cpu_relax(); - - local_flush_icache_all(); - } - + /* + * Updating ftrace dpes not take stop_machine path, so irqs should not + * be disabled. + */ + WARN_ON(irqs_disabled()); + smp_call_function(ftrace_sync_ipi, NULL, 1); return 0; } -void arch_ftrace_update_code(int command) -{ - struct ftrace_modify_param param = { command, ATOMIC_INIT(0) }; - - stop_machine(__ftrace_modify_code, ¶m, cpu_online_mask); -} #else /* CONFIG_DYNAMIC_FTRACE */ unsigned long ftrace_call_adjust(unsigned long addr) { -- 2.51.0 From d1049fc0de81bca3abbb35e8d4b8794170498b78 Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:31 +0800 Subject: [PATCH 11/16] riscv: vector: Support calling schedule() for preemptible Vector Each function entry implies a call to ftrace infrastructure. And it may call into schedule in some cases. So, it is possible for preemptible kernel-mode Vector to implicitly call into schedule. Since all V-regs are caller-saved, it is possible to drop all V context when a thread voluntarily call schedule(). Besides, we currently don't pass argument through vector register, so we don't have to save/restore V-regs in ftrace trampoline. Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/processor.h | 5 +++++ arch/riscv/include/asm/vector.h | 22 +++++++++++++++++++--- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 5f56eb9d114a..9c1cc716b891 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -79,6 +79,10 @@ struct pt_regs; * Thus, the task does not own preempt_v. Any use of Vector will have to * save preempt_v, if dirty, and fallback to non-preemptible kernel-mode * Vector. + * - bit 29: The thread voluntarily calls schedule() while holding an active + * preempt_v. All preempt_v context should be dropped in such case because + * V-regs are caller-saved. Only sstatus.VS=ON is persisted across a + * schedule() call. * - bit 30: The in-kernel preempt_v context is saved, and requries to be * restored when returning to the context that owns the preempt_v. * - bit 31: The in-kernel preempt_v context is dirty, as signaled by the @@ -93,6 +97,7 @@ struct pt_regs; #define RISCV_PREEMPT_V 0x00000100 #define RISCV_PREEMPT_V_DIRTY 0x80000000 #define RISCV_PREEMPT_V_NEED_RESTORE 0x40000000 +#define RISCV_PREEMPT_V_IN_SCHEDULE 0x20000000 /* CPU-specific state of a task */ struct thread_struct { diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index e8a83f55be2b..45c9b426fcc5 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -120,6 +120,11 @@ static __always_inline void riscv_v_disable(void) csr_clear(CSR_SSTATUS, SR_VS); } +static __always_inline bool riscv_v_is_on(void) +{ + return !!(csr_read(CSR_SSTATUS) & SR_VS); +} + static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest) { asm volatile ( @@ -366,6 +371,11 @@ static inline void __switch_to_vector(struct task_struct *prev, struct pt_regs *regs; if (riscv_preempt_v_started(prev)) { + if (riscv_v_is_on()) { + WARN_ON(prev->thread.riscv_v_flags & RISCV_V_CTX_DEPTH_MASK); + riscv_v_disable(); + prev->thread.riscv_v_flags |= RISCV_PREEMPT_V_IN_SCHEDULE; + } if (riscv_preempt_v_dirty(prev)) { __riscv_v_vstate_save(&prev->thread.kernel_vstate, prev->thread.kernel_vstate.datap); @@ -376,10 +386,16 @@ static inline void __switch_to_vector(struct task_struct *prev, riscv_v_vstate_save(&prev->thread.vstate, regs); } - if (riscv_preempt_v_started(next)) - riscv_preempt_v_set_restore(next); - else + if (riscv_preempt_v_started(next)) { + if (next->thread.riscv_v_flags & RISCV_PREEMPT_V_IN_SCHEDULE) { + next->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_IN_SCHEDULE; + riscv_v_enable(); + } else { + riscv_preempt_v_set_restore(next); + } + } else { riscv_v_vstate_set_restore(next, task_pt_regs(next)); + } } void riscv_v_vstate_ctrl_init(struct task_struct *tsk); -- 2.51.0 From ca358692de41b273468e625f96926fa53e13bd8c Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:32 +0800 Subject: [PATCH 12/16] riscv: add a data fence for CMODX in the kernel mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit RISC-V spec explicitly calls out that a local fence.i is not enough for the code modification to be visble from a remote hart. In fact, it states: To make a store to instruction memory visible to all RISC-V harts, the writing hart also has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I. Although current riscv drivers for IPI use ordered MMIO when sending IPIs in order to synchronize the action between previous csd writes, riscv does not restrict itself to any particular flavor of IPI. Any driver or firmware implementation that does not order data writes before the IPI may pose a risk for code-modifying race. Thus, add a fence here to order data writes before making the IPI. Signed-off-by: Andy Chiu Reviewed-by: Björn Töpel Link: https://lore.kernel.org/r/20250407180838.42877-8-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/mm/cacheflush.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c index b81672729887..b2e4b81763f8 100644 --- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -24,7 +24,20 @@ void flush_icache_all(void) if (num_online_cpus() < 2) return; - else if (riscv_use_sbi_for_rfence()) + + /* + * Make sure all previous writes to the D$ are ordered before making + * the IPI. The RISC-V spec states that a hart must execute a data fence + * before triggering a remote fence.i in order to make the modification + * visable for remote harts. + * + * IPIs on RISC-V are triggered by MMIO writes to either CLINT or + * S-IMSIC, so the fence ensures previous data writes "happen before" + * the MMIO. + */ + RISCV_FENCE(w, o); + + if (riscv_use_sbi_for_rfence()) sbi_remote_fence_i(NULL); else on_each_cpu(ipi_remote_fence_i, NULL, 1); -- 2.51.0 From d0262e907e2991ae09ca476281fc8cae3ec57850 Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:33 +0800 Subject: [PATCH 13/16] riscv: ftrace: support PREEMPT MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Now, we can safely enable dynamic ftrace with kernel preemption. Signed-off-by: Andy Chiu Reviewed-by: Björn Töpel Link: https://lore.kernel.org/r/20250407180838.42877-9-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 7dbed10843d2..dc0fc11b6e96 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -157,7 +157,7 @@ config RISCV select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL select HAVE_FUNCTION_GRAPH_TRACER if HAVE_DYNAMIC_FTRACE_WITH_ARGS select HAVE_FUNCTION_GRAPH_FREGS - select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION + select HAVE_FUNCTION_TRACER if !XIP_KERNEL select HAVE_EBPF_JIT if MMU select HAVE_GUP_FAST if MMU select HAVE_FUNCTION_ARG_ACCESS_API -- 2.51.0 From c217157bcd1df434fdeaeb24b7c25ec31e15cf4a Mon Sep 17 00:00:00 2001 From: Puranjay Mohan Date: Tue, 8 Apr 2025 02:08:34 +0800 Subject: [PATCH 14/16] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. This allows each ftrace callsite to provide an ftrace_ops to the common ftrace trampoline, allowing each callsite to invoke distinct tracer functions without the need to fall back to list processing or to allocate custom trampolines for each callsite. This significantly speeds up cases where multiple distinct trace functions are used and callsites are mostly traced by a single tracer. The idea and most of the implementation is taken from the ARM64's implementation of the same feature. The idea is to place a pointer to the ftrace_ops as a literal at a fixed offset from the function entry point, which can be recovered by the common ftrace trampoline. We use -fpatchable-function-entry to reserve 8 bytes above the function entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer to the associated ftrace_ops for that callsite. Functions are aligned to 8 bytes to make sure that the accesses to this literal are atomic. This approach allows for directly invoking ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or part of a module), without going via ftrace_ops_list_func. We've benchamrked this with the ftrace_ops sample module on Spacemit K1 Jupiter: Without this patch: baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29 +-----------------------+-----------------+----------------------------+ | Number of tracers | Total time (ns) | Per-call average time | |-----------------------+-----------------+----------------------------| | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | |----------+------------+-----------------+------------+---------------| | 0 | 0 | 1357958 | 13 | - | | 0 | 1 | 1302375 | 13 | - | | 0 | 2 | 1302375 | 13 | - | | 0 | 10 | 1379084 | 13 | - | | 0 | 100 | 1302458 | 13 | - | | 0 | 200 | 1302333 | 13 | - | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 13677833 | 136 | 123 | | 1 | 1 | 18500916 | 185 | 172 | | 1 | 2 | 22856459 | 228 | 215 | | 1 | 10 | 58824709 | 588 | 575 | | 1 | 100 | 505141584 | 5051 | 5038 | | 1 | 200 | 1580473126 | 15804 | 15791 | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 13561000 | 135 | 122 | | 2 | 0 | 19707292 | 197 | 184 | | 10 | 0 | 67774750 | 677 | 664 | | 100 | 0 | 714123125 | 7141 | 7128 | | 200 | 0 | 1918065668 | 19180 | 19167 | +----------+------------+-----------------+------------+---------------+ Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers. With this patch: v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29 +-----------------------+-----------------+----------------------------+ | Number of tracers | Total time (ns) | Per-call average time | |-----------------------+-----------------+----------------------------| | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | |----------+------------+-----------------+------------+---------------| | 0 | 0 | 1459917 | 14 | - | | 0 | 1 | 1408000 | 14 | - | | 0 | 2 | 1383792 | 13 | - | | 0 | 10 | 1430709 | 14 | - | | 0 | 100 | 1383791 | 13 | - | | 0 | 200 | 1383750 | 13 | - | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 5238041 | 52 | 38 | | 1 | 1 | 5228542 | 52 | 38 | | 1 | 2 | 5325917 | 53 | 40 | | 1 | 10 | 5299667 | 52 | 38 | | 1 | 100 | 5245250 | 52 | 39 | | 1 | 200 | 5238459 | 52 | 39 | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 5239083 | 52 | 38 | | 2 | 0 | 19449417 | 194 | 181 | | 10 | 0 | 67718584 | 677 | 663 | | 100 | 0 | 709840708 | 7098 | 7085 | | 200 | 0 | 2203580626 | 22035 | 22022 | +----------+------------+-----------------+------------+---------------+ Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers. As can be seen from the above: a) Whenever there is a single relevant tracer function associated with a tracee, the overhead of invoking the tracer is constant, and does not scale with the number of tracers which are *not* associated with that tracee. b) The overhead for a single relevant tracer has dropped to ~1/3 of the overhead prior to this series (from 122ns to 38ns). This is largely due to permitting calls to dynamically-allocated ftrace_ops without going through ftrace_ops_list_func. Signed-off-by: Puranjay Mohan [update kconfig, asm, refactor] Signed-off-by: Andy Chiu Tested-by: Björn Töpel Link: https://lore.kernel.org/r/20250407180838.42877-10-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 2 + arch/riscv/Makefile | 4 +- arch/riscv/kernel/asm-offsets.c | 3 ++ arch/riscv/kernel/ftrace.c | 67 +++++++++++++++++++++++++++++++++ arch/riscv/kernel/mcount-dyn.S | 35 +++++++++++++++-- 5 files changed, 105 insertions(+), 6 deletions(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index dc0fc11b6e96..ec986c9120e3 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -99,6 +99,7 @@ config RISCV select EDAC_SUPPORT select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE) select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if DYNAMIC_FTRACE + select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS select GENERIC_ARCH_TOPOLOGY select GENERIC_ATOMIC64 if !64BIT select GENERIC_CLOCKEVENTS_BROADCAST if SMP @@ -152,6 +153,7 @@ config RISCV select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE) select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG) select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE select HAVE_FTRACE_GRAPH_FUNC select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 539d2aef5cab..df57654a615e 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y) LDFLAGS_vmlinux += --no-relax KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY ifeq ($(CONFIG_RISCV_ISA_C),y) - CC_FLAGS_FTRACE := -fpatchable-function-entry=4 + CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4 else - CC_FLAGS_FTRACE := -fpatchable-function-entry=2 + CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2 endif endif diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 7c43c8e26ae7..2d96197a8abf 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -493,6 +493,9 @@ void asm_offsets(void) DEFINE(STACKFRAME_SIZE_ON_STACK, ALIGN(sizeof(struct stackframe), STACK_ALIGN)); OFFSET(STACKFRAME_FP, stackframe, fp); OFFSET(STACKFRAME_RA, stackframe, ra); +#ifdef CONFIG_FUNCTION_TRACER + DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func)); +#endif #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS DEFINE(FREGS_SIZE_ON_STACK, ALIGN(sizeof(struct __arch_ftrace_regs), STACK_ALIGN)); diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index b133c60808fe..d56fc6e9fba0 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -16,6 +16,9 @@ #ifdef CONFIG_DYNAMIC_FTRACE unsigned long ftrace_call_adjust(unsigned long addr) { + if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) + return addr + 8; + return addr + MCOUNT_AUIPC_SIZE; } @@ -64,9 +67,52 @@ static int __ftrace_modify_call(unsigned long source, unsigned long target, bool return 0; } +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS +static const struct ftrace_ops *riscv64_rec_get_ops(struct dyn_ftrace *rec) +{ + const struct ftrace_ops *ops = NULL; + + if (rec->flags & FTRACE_FL_CALL_OPS_EN) { + ops = ftrace_find_unique_ops(rec); + WARN_ON_ONCE(!ops); + } + + if (!ops) + ops = &ftrace_list_ops; + + return ops; +} + +static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, + const struct ftrace_ops *ops) +{ + unsigned long literal = rec->ip - 8; + + return patch_text_nosync((void *)literal, &ops, sizeof(ops)); +} + +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) +{ + return ftrace_rec_set_ops(rec, &ftrace_nop_ops); +} + +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) +{ + return ftrace_rec_set_ops(rec, riscv64_rec_get_ops(rec)); +} +#else +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; } +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; } +#endif + int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) { unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE; + int ret; + + ret = ftrace_rec_update_ops(rec); + if (ret) + return ret; orig_addr = (unsigned long)&ftrace_caller; distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr; @@ -79,6 +125,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr) { u32 nop4 = RISCV_INSN_NOP4; + int ret; + + ret = ftrace_rec_set_nop_ops(rec); + if (ret) + return ret; if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE)) return -EPERM; @@ -99,6 +150,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) unsigned int nops[2], offset; int ret; + ret = ftrace_rec_set_nop_ops(rec); + if (ret) + return ret; + offset = (unsigned long) &ftrace_caller - pc; nops[0] = to_auipc_t0(offset); nops[1] = RISCV_INSN_NOP4; @@ -113,6 +168,13 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) ftrace_func_t ftrace_call_dest = ftrace_stub; int ftrace_update_ftrace_func(ftrace_func_t func) { + /* + * When using CALL_OPS, the function to call is associated with the + * call site, and we don't have a global function pointer to update. + */ + if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) + return 0; + WRITE_ONCE(ftrace_call_dest, func); /* * The data fence ensure that the update to ftrace_call_dest happens @@ -143,8 +205,13 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, { unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE; unsigned int call[2]; + int ret; make_call_t0(caller, old_addr, call); + ret = ftrace_rec_update_ops(rec); + if (ret) + return ret; + return __ftrace_modify_call(caller, addr, true); } #endif diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S index 8aa554d56096..699684eea7f0 100644 --- a/arch/riscv/kernel/mcount-dyn.S +++ b/arch/riscv/kernel/mcount-dyn.S @@ -139,10 +139,34 @@ .macro PREPARE_ARGS addi a0, t0, -MCOUNT_JALR_SIZE // ip (callsite's jalr insn) +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS + /* + * When CALL_OPS is enabled (2 or 4) nops [8B] are placed before the + * function entry, these are later overwritten with the pointer to the + * associated struct ftrace_ops. + * + * -8: &ftrace_ops of the associated tracer function. + *: + * 0: auipc t0/ra, 0x? + * 4: jalr t0/ra, ?(t0/ra) + * + * -8: &ftrace_nop_ops + *: + * 0: nop + * 4: nop + * + * t0 is set to ip+8 after the jalr is executed at the callsite, + * so we find the associated op at t0-16. + */ + mv a1, ra // parent_ip + REG_L a2, -16(t0) // op + REG_L ra, FTRACE_OPS_FUNC(a2) // op->func +#else la a1, function_trace_op - REG_L a2, 0(a1) - mv a1, ra - mv a3, sp + REG_L a2, 0(a1) // op + mv a1, ra // parent_ip +#endif + mv a3, sp // regs .endm SYM_FUNC_START(ftrace_caller) @@ -150,10 +174,13 @@ SYM_FUNC_START(ftrace_caller) SAVE_ABI_REGS PREPARE_ARGS +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS + jalr ra +#else SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) REG_L ra, ftrace_call_dest jalr ra, 0(ra) - +#endif RESTORE_ABI_REGS bnez t1, .Ldirect jr t0 -- 2.51.0 From b21cdb9523e5561b97fd534dbb75d132c5c938ff Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:35 +0800 Subject: [PATCH 15/16] riscv: ftrace: support direct call using call_ops MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit jump to FTRACE_ADDR if distance is out of reach Co-developed-by: Björn Töpel Signed-off-by: Björn Töpel Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 2 +- arch/riscv/include/asm/ftrace.h | 6 ++++ arch/riscv/kernel/asm-offsets.c | 3 ++ arch/riscv/kernel/ftrace.c | 13 ++++----- arch/riscv/kernel/mcount-dyn.S | 51 +++++++++++++++++++++------------ 5 files changed, 48 insertions(+), 27 deletions(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index ec986c9120e3..8fdca6345fa3 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -152,7 +152,7 @@ config RISCV select HAVE_DMA_CONTIGUOUS if MMU select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE) select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C - select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG) select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE select HAVE_FTRACE_GRAPH_FUNC diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h index 6a5c0a7fb826..22ebea3c2b26 100644 --- a/arch/riscv/include/asm/ftrace.h +++ b/arch/riscv/include/asm/ftrace.h @@ -130,6 +130,9 @@ struct __arch_ftrace_regs { unsigned long sp; unsigned long s0; unsigned long t1; +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + unsigned long direct_tramp; +#endif union { unsigned long args[8]; struct { @@ -223,10 +226,13 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs); #define ftrace_graph_func ftrace_graph_func +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr) { arch_ftrace_regs(fregs)->t1 = addr; } +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ + #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */ #endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 2d96197a8abf..b26334075697 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -495,6 +495,9 @@ void asm_offsets(void) OFFSET(STACKFRAME_RA, stackframe, ra); #ifdef CONFIG_FUNCTION_TRACER DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func)); +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call)); +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ #endif #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index d56fc6e9fba0..4c6c24380cfd 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -17,7 +17,7 @@ unsigned long ftrace_call_adjust(unsigned long addr) { if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) - return addr + 8; + return addr + 8 + MCOUNT_AUIPC_SIZE; return addr + MCOUNT_AUIPC_SIZE; } @@ -83,10 +83,9 @@ static const struct ftrace_ops *riscv64_rec_get_ops(struct dyn_ftrace *rec) return ops; } -static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, - const struct ftrace_ops *ops) +static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct ftrace_ops *ops) { - unsigned long literal = rec->ip - 8; + unsigned long literal = ALIGN_DOWN(rec->ip - 12, 8); return patch_text_nosync((void *)literal, &ops, sizeof(ops)); } @@ -117,7 +116,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) orig_addr = (unsigned long)&ftrace_caller; distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr; if (distance > JALR_RANGE) - return -EINVAL; + addr = FTRACE_ADDR; return __ftrace_modify_call(pc, addr, false); } @@ -204,15 +203,13 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr) { unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE; - unsigned int call[2]; int ret; - make_call_t0(caller, old_addr, call); ret = ftrace_rec_update_ops(rec); if (ret) return ret; - return __ftrace_modify_call(caller, addr, true); + return __ftrace_modify_call(caller, FTRACE_ADDR, true); } #endif diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S index 699684eea7f0..48f6c4f7dca0 100644 --- a/arch/riscv/kernel/mcount-dyn.S +++ b/arch/riscv/kernel/mcount-dyn.S @@ -82,12 +82,9 @@ * +++++++++ **/ .macro SAVE_ABI_REGS - mv t4, sp // Save original SP in T4 addi sp, sp, -FREGS_SIZE_ON_STACK - REG_S t0, FREGS_EPC(sp) REG_S x1, FREGS_RA(sp) - REG_S t4, FREGS_SP(sp) // Put original SP on stack #ifdef HAVE_FUNCTION_GRAPH_FP_TEST REG_S x8, FREGS_S0(sp) #endif @@ -108,9 +105,12 @@ REG_S x15, FREGS_A5(sp) REG_S x16, FREGS_A6(sp) REG_S x17, FREGS_A7(sp) + mv a0, sp + addi a0, a0, FREGS_SIZE_ON_STACK + REG_S a0, FREGS_SP(sp) // Put original SP on stack .endm - .macro RESTORE_ABI_REGS, all=0 + .macro RESTORE_ABI_REGS REG_L t0, FREGS_EPC(sp) REG_L x1, FREGS_RA(sp) #ifdef HAVE_FUNCTION_GRAPH_FP_TEST @@ -139,6 +139,19 @@ .macro PREPARE_ARGS addi a0, t0, -MCOUNT_JALR_SIZE // ip (callsite's jalr insn) +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS + mv a1, ra // parent_ip + REG_L a2, -16(t0) // op + REG_L ra, FTRACE_OPS_FUNC(a2) // op->func +#else + la a1, function_trace_op + REG_L a2, 0(a1) // op + mv a1, ra // parent_ip +#endif + mv a3, sp // regs + .endm + +SYM_FUNC_START(ftrace_caller) #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS /* * When CALL_OPS is enabled (2 or 4) nops [8B] are placed before the @@ -158,19 +171,17 @@ * t0 is set to ip+8 after the jalr is executed at the callsite, * so we find the associated op at t0-16. */ - mv a1, ra // parent_ip - REG_L a2, -16(t0) // op - REG_L ra, FTRACE_OPS_FUNC(a2) // op->func -#else - la a1, function_trace_op - REG_L a2, 0(a1) // op - mv a1, ra // parent_ip -#endif - mv a3, sp // regs - .endm + REG_L t1, -16(t0) // op Should be SZ_REG instead of 16 -SYM_FUNC_START(ftrace_caller) - mv t1, zero +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + /* + * If the op has a direct call, handle it immediately without + * saving/restoring registers. + */ + REG_L t1, FTRACE_OPS_DIRECT_CALL(t1) + bnez t1, ftrace_caller_direct +#endif +#endif SAVE_ABI_REGS PREPARE_ARGS @@ -182,10 +193,14 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) jalr ra, 0(ra) #endif RESTORE_ABI_REGS - bnez t1, .Ldirect +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + bnez t1, ftrace_caller_direct +#endif jr t0 -.Ldirect: +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS +SYM_INNER_LABEL(ftrace_caller_direct, SYM_L_LOCAL) jr t1 +#endif SYM_FUNC_END(ftrace_caller) #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS -- 2.51.0 From d8ac85dad407f4cea65a970e9ba42201cb49686a Mon Sep 17 00:00:00 2001 From: Andy Chiu Date: Tue, 8 Apr 2025 02:08:36 +0800 Subject: [PATCH 16/16] riscv: Documentation: add a description about dynamic ftrace Add a section in cmodx to describe how dynamic ftrace works on riscv, limitations, and assumptions. Signed-off-by: Andy Chiu Link: https://lore.kernel.org/r/20250407180838.42877-12-andybnac@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- Documentation/arch/riscv/cmodx.rst | 46 +++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/Documentation/arch/riscv/cmodx.rst b/Documentation/arch/riscv/cmodx.rst index 8c48bcff3df9..e009873b2d17 100644 --- a/Documentation/arch/riscv/cmodx.rst +++ b/Documentation/arch/riscv/cmodx.rst @@ -10,13 +10,45 @@ modified by the program itself. Instruction storage and the instruction cache program must enforce its own synchronization with the unprivileged fence.i instruction. -However, the default Linux ABI prohibits the use of fence.i in userspace -applications. At any point the scheduler may migrate a task onto a new hart. If -migration occurs after the userspace synchronized the icache and instruction -storage with fence.i, the icache on the new hart will no longer be clean. This -is due to the behavior of fence.i only affecting the hart that it is called on. -Thus, the hart that the task has been migrated to may not have synchronized -instruction storage and icache. +CMODX in the Kernel Space +--------------------- + +Dynamic ftrace +--------------------- + +Essentially, dynamic ftrace directs the control flow by inserting a function +call at each patchable function entry, and patches it dynamically at runtime to +enable or disable the redirection. In the case of RISC-V, 2 instructions, +AUIPC + JALR, are required to compose a function call. However, it is impossible +to patch 2 instructions and expect that a concurrent read-side executes them +without a race condition. This series makes atmoic code patching possible in +RISC-V ftrace. Kernel preemption makes things even worse as it allows the old +state to persist across the patching process with stop_machine(). + +In order to get rid of stop_machine() and run dynamic ftrace with full kernel +preemption, we partially initialize each patchable function entry at boot-time, +setting the first instruction to AUIPC, and the second to NOP. Now, atmoic +patching is possible because the kernel only has to update one instruction. +According to Ziccif, as long as an instruction is naturally aligned, the ISA +guarantee an atomic update. + +By fixing down the first instruction, AUIPC, the range of the ftrace trampoline +is limited to +-2K from the predetermined target, ftrace_caller, due to the lack +of immediate encoding space in RISC-V. To address the issue, we introduce +CALL_OPS, where an 8B naturally align metadata is added in front of each +pacthable function. The metadata is resolved at the first trampoline, then the +execution can be derect to another custom trampoline. + +CMODX in the User Space +--------------------- + +Though fence.i is an unprivileged instruction, the default Linux ABI prohibits +the use of fence.i in userspace applications. At any point the scheduler may +migrate a task onto a new hart. If migration occurs after the userspace +synchronized the icache and instruction storage with fence.i, the icache on the +new hart will no longer be clean. This is due to the behavior of fence.i only +affecting the hart that it is called on. Thus, the hart that the task has been +migrated to may not have synchronized instruction storage and icache. There are two ways to solve this problem: use the riscv_flush_icache() syscall, or use the ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` prctl() and emit fence.i in -- 2.51.0