====================
net: fib_rules: Add DSCP selector support
Currently, the kernel rejects IPv4 FIB rules that try to match on the
upper three DSCP bits:
# ip -4 rule add tos 0x1c table 100
# ip -4 rule add tos 0x3c table 100
Error: Invalid tos.
The reason for that is that historically users of the FIB lookup API
only populated the lower three DSCP bits in the TOS field of the IPv4
flow key ('flowi4_tos'), which fits the TOS definition from the initial
IPv4 specification (RFC 791).
This is not very useful nowadays and instead some users want to be able
to match on the six bits DSCP field, which replaced the TOS and IP
precedence fields over 25 years ago (RFC 2474). In addition, the current
behavior differs between IPv4 and IPv6 which does allow users to match
on the entire DSCP field using the TOS selector.
Recent patchsets made sure that callers of the FIB lookup API now
populate the entire DSCP field in the IPv4 flow key. Therefore, it is
now possible to extend FIB rules to match on DSCP.
This is done by adding a new DSCP attribute which is implemented for
both IPv4 and IPv6 to provide user space programs a consistent behavior
between both address families.
The behavior of the old TOS selector is unchanged and IPv4 FIB rules
using it will only match on the lower three DSCP bits. The kernel will
reject rules that try to use both selectors.
Patch #1 adds the new DSCP attribute but rejects its usage.
Patches #2-#3 implement IPv4 and IPv6 support.
Patch #4 allows user space to use the new attribute.
Test that locally generated traffic from a socket that specifies a DS
Field using the IP_TOS / IPV6_TCLASS socket options is correctly
redirected using a FIB rule that matches on DSCP. Add negative tests to
verify that the rule is not it when it should not. Test with both IPv4
and IPv6 and with both TCP and UDP sockets.
Now that both IPv4 and IPv6 support the new DSCP selector, enable user
space to configure FIB rules that make use of it by changing the policy
of the new DSCP attribute so that it accepts values in the range of [0,
63].
Use NLA_U8 rather than NLA_UINT as the field is of fixed size.
Implement support for the new DSCP selector that allows IPv6 FIB rules
to match on the entire DSCP field. This is done despite the fact that
the above can be achieved using the existing TOS selector, so that user
space program will be able to work with IPv4 and IPv6 rules in the same
way.
Differentiate between both selectors by adding a new bit in the IPv6 FIB
rule structure that is only set when the 'FRA_DSCP' attribute is
specified by user space. Reject rules that use both selectors.
Implement support for the new DSCP selector that allows IPv4 FIB rules
to match on the entire DSCP field, unlike the existing TOS selector that
only matches on the three lower DSCP bits.
Differentiate between both selectors by adding a new bit in the IPv4 FIB
rule structure (in an existing one byte hole) that is only set when the
'FRA_DSCP' attribute is specified by user space. Reject rules that use
both selectors.
The FIB rule TOS selector is implemented differently between IPv4 and
IPv6. In IPv4 it is used to match on the three "Type of Services" bits
specified in RFC 791, while in IPv6 is it is used to match on the six
DSCP bits specified in RFC 2474.
Add a new FIB rule attribute to allow matching on DSCP. The attribute
will be used to implement a 'dscp' selector in ip-rule with a consistent
behavior between IPv4 and IPv6.
For now, set the type of the attribute to 'NLA_REJECT' so that user
space will not be able to configure it. This restriction will be lifted
once both IPv4 and IPv6 support the new attribute.
net: ethtool: Enhance error messages sent to user space
During the firmware flashing process, notifications are sent to user
space to provide progress updates. When an error occurs, an error
message is sent to indicate what went wrong.
In some cases, appropriate error messages are missing.
Add relevant error messages where applicable, allowing user space to better
understand the issues encountered.
Fixes: e332bc67cf5e ("ipv6: Don't call with rt6_uncached_list_flush_dev") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20240913083147.3095442-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Su Hui [Thu, 12 Sep 2024 11:01:20 +0000 (19:01 +0800)]
net: tipc: avoid possible garbage value
Clang static checker (scan-build) warning:
net/tipc/bcast.c:305:4:
The expression is an uninitialized value. The computed value will also
be garbage [core.uninitialized.Assign]
305 | (*cong_link_cnt)++;
| ^~~~~~~~~~~~~~~~~~
tipc_rcast_xmit() will increase cong_link_cnt's value, but cong_link_cnt
is uninitialized. Although it won't really cause a problem, it's better
to fix it.
Fixes: dca4a17d24ee ("tipc: fix potential hanging after b/rcast changing") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Link: https://patch.msgid.link/20240912110119.2025503-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
As described in commit 42d9b379e3e1 ("lib/Kconfig.debug: Allow BTF +
DWARF5 with pahole 1.21+"), the combination of CONFIG_DEBUG_INFO_BTF
and CONFIG_DEBUG_INFO_DWARF5 requires pahole 1.21+.
GCC 11+ and Clang 14+ default to DWARF 5 when the -g flag is passed.
For the same reason, the combination of CONFIG_DEBUG_INFO_BTF and
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is also likely to require
pahole 1.21+ these days. (At least, it is uncertain whether the actual
requirement is pahole 1.16+ or 1.21+.)
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
When DEBUG_INFO_DWARF5 is selected, pahole 1.21+ is required to enable
DEBUG_INFO_BTF.
When DEBUG_INFO_DWARF4 or DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is selected,
DEBUG_INFO_BTF can be enabled without pahole installed, but a build error
will occur in scripts/link-vmlinux.sh:
LD .tmp_vmlinux1
BTF: .tmp_vmlinux1: pahole (pahole) is not available
Failed to generate BTF for vmlinux
Try to disable CONFIG_DEBUG_INFO_BTF
We did not guard DEBUG_INFO_BTF by PAHOLE_VERSION when previously
discussed [1].
However, commit 613fe1692377 ("kbuild: Add CONFIG_PAHOLE_VERSION")
added CONFIG_PAHOLE_VERSION after all. Now several CONFIG options, as
well as the combination of DEBUG_INFO_BTF and DEBUG_INFO_DWARF5, are
guarded by PAHOLE_VERSION.
The remaining compile-time check in scripts/link-vmlinux.sh now appears
to be an awkward inconsistency.
The enetc driver uses ifdefs when checking whether
CONFIG_FSL_ENETC_PTP_CLOCK is enabled in a number of places. This works
if the driver is built-in but fails if the driver is available as a
kernel module. Replace the instances of ifdef with use of the IS_ENABLED
macro, that will evaluate as true when this feature is built as a kernel
module and follows the kernel's coding style.
fbnic: Set napi irq value after calling netif_napi_add
The driver calls netif_napi_set_irq() and then calls netif_napi_add(),
which calls netif_napi_add_weight(). At the end of
netif_napi_add_weight() is a call to netif_napi_set_irq(napi, -1), which
clears the previously set napi->irq value. Fix this by calling
netif_napi_set_irq() after calling netif_napi_add().
This was found when reviewing another patch and I have no way to test
this, but the fix seemed relatively straight forward.
Fixes: bc6107771bb4 ("eth: fbnic: Allocate a netdevice and napi vectors with queues") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20240912174922.10550-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
En-Wei reported that traffic breaks if cable is unplugged for more
than 3s and then re-plugged. This was supposed to be fixed by 621735f59064 ("r8169: fix rare issue with broken rx after link-down on
RTL8125"). But apparently this didn't fix the issue for everybody.
The 3s threshold rang a bell, as this is the delay after which ALDPS
kicks in. And indeed disabling ALDPS fixes the issue for this user.
Maybe this fixes the issue in general. In a follow-up step we could
remove the first fix attempt and see whether anybody complains.
Qianqiang Liu [Fri, 13 Sep 2024 01:47:32 +0000 (09:47 +0800)]
net: ag71xx: remove dead code path
The "err" is always zero, so the following branch can never be executed:
if (err) {
ndev->stats.rx_dropped++;
kfree_skb(skb);
}
Therefore, the "if" statement can be removed.
Use "ndev->stats.rx_errors" to count "napi_build_skb()" failure
Jakub Kicinski [Sat, 14 Sep 2024 02:50:25 +0000 (19:50 -0700)]
Merge tag 'for-net-next-2024-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- btusb: Add MediaTek MT7925-B22M support ID 0x13d3:0x3604
- btusb: Add Realtek RTL8852C support ID 0x0489:0xe122
- btrtl: Add the support for RTL8922A
- btusb: Add 2 USB HW IDs for MT7925 (0xe118/e)
- btnxpuart: Add support for ISO packets
- btusb: Add Mediatek MT7925 support ID 0x13d3:0x3608
- btsdio: Do not bind to non-removable CYW4373
- hci_uart: Add support for Amlogic HCI UART
* tag 'for-net-next-2024-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (27 commits)
Bluetooth: btintel_pcie: Allocate memory for driver private data
Bluetooth: btusb: Fix not handling ZPL/short-transfer
Bluetooth: btusb: Add 2 USB HW IDs for MT7925 (0xe118/e)
Bluetooth: btsdio: Do not bind to non-removable CYW4373
Bluetooth: hci_sync: Ignore errors from HCI_OP_REMOTE_NAME_REQ_CANCEL
Bluetooth: CMTP: Mark BT_CMTP as DEPRECATED
Bluetooth: replace deprecated strncpy with strscpy_pad
Bluetooth: hci_core: Fix sending MGMT_EV_CONNECT_FAILED
Bluetooth: btrtl: Set msft ext address filter quirk for RTL8852B
Bluetooth: Use led_set_brightness() in LED trigger activate() callback
Bluetooth: btrtl: Use kvmemdup to simplify the code
Bluetooth: btusb: Add Mediatek MT7925 support ID 0x13d3:0x3608
Bluetooth: btrtl: Add the support for RTL8922A
Bluetooth: hci_ldisc: Use speed set by btattach as oper_speed
Bluetooth: hci_conn: Remove redundant memset after kzalloc
Bluetooth: L2CAP: Remove unused declarations
dt-bindings: bluetooth: bring the HW description closer to reality for wcn6855
Bluetooth: btnxpuart: Add support for ISO packets
Bluetooth: hci_h4: Add support for ISO packets in h4_recv.h
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x0489:0xe122
...
====================
During the introduction of struct bpf_net_context handling for
XDP-redirect, the netkit driver has been missed, which also requires it
because NETKIT_REDIRECT invokes skb_do_redirect() which is accessing the
per-CPU variables. Otherwise we see the following crash:
Set the bpf_net_context before invoking netkit_xmit() program within the
netkit driver.
Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20240912155620.1334587-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maciej Fijalkowski [Wed, 11 Sep 2024 19:10:19 +0000 (21:10 +0200)]
xsk: fix batch alloc API on non-coherent systems
In cases when synchronizing DMA operations is necessary,
xsk_buff_alloc_batch() returns a single buffer instead of the requested
count. This puts the pressure on drivers that use batch API as they have
to check for this corner case on their side and take care of allocations
by themselves, which feels counter productive. Let us improve the core
by looping over xp_alloc() @max times when slow path needs to be taken.
Another issue with current interface, as spotted and fixed by Dries, was
that when driver called xsk_buff_alloc_batch() with @max == 0, for slow
path case it still allocated and returned a single buffer, which should
not happen. By introducing the logic from first paragraph we kill two
birds with one stone and address this problem as well.
Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Reported-and-tested-by: Dries De Winter <ddewinter@synamedia.com> Co-developed-by: Dries De Winter <ddewinter@synamedia.com> Signed-off-by: Dries De Winter <ddewinter@synamedia.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20240911191019.296480-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
The tiny patch set aims to fix two problems found during the development
of supporting dynptr key in hash table. Patch #1 fixes the missed
btf_record_free() when map creation fails and patch #2 fixes the missed
kfree() when there is no special field in the passed btf.
====================
Hou Tao [Thu, 12 Sep 2024 01:28:44 +0000 (09:28 +0800)]
bpf: Call the missed btf_record_free() when map creation fails
When security_bpf_map_create() in map_create() fails, map_create() will
call btf_put() and ->map_free() callback to free the map. It doesn't
free the btf_record of map value, so add the missed btf_record_free()
when map creation fails.
However btf_record_free() needs to be called after ->map_free() just
like bpf_map_free_deferred() did, because ->map_free() may use the
btf_record to free the special fields in preallocated map value. So
factor out bpf_map_free() helper to free the map, btf_record, and btf
orderly and use the helper in both map_create() and
bpf_map_free_deferred().
Merge tag 'pci-v6.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Prevent a possible deadlock (reported by lockdep) when a driver
relinquishes a pci_dev, another driver claims it, and one uses
managed pcim_enable_device() and the other doesn't (Philipp Stanner)
* tag 'pci-v6.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix potential deadlock in pcim_intx()
Daniel Borkmann [Fri, 13 Sep 2024 19:17:54 +0000 (21:17 +0200)]
selftests/bpf: Add a test case to write mtu result into .rodata
Add a test which attempts to call bpf_check_mtu() and writes the MTU
into .rodata section of the BPF program, and for comparison this adds
test cases also for .bss and .data section again. The bpf_check_mtu()
is a bit more special in that the passed mtu argument is read and
written by the helper (instead of just written to). Assert that writes
into .rodata remain rejected by the verifier.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:53 +0000 (21:17 +0200)]
selftests/bpf: Add a test case to write strtol result into .rodata
Add a test case which attempts to write into .rodata section of the
BPF program, and for comparison this adds test cases also for .bss
and .data section.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:51 +0000 (21:17 +0200)]
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
The assumption of 'in privileged mode reads from uninitialized stack locations
are permitted' is not quite correct since the verifier was probing for read
access rather than write access. Both tests need to be annotated as __success
for privileged and unprivileged.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:50 +0000 (21:17 +0200)]
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
For all non-tracing helpers which formerly had ARG_PTR_TO_{LONG,INT} as input
arguments, zero the value for the case of an error as otherwise it could leak
memory. For tracing, it is not needed given CAP_PERFMON can already read all
kernel memory anyway hence bpf_get_func_arg() and bpf_get_func_ret() is skipped
in here.
Also, the MTU helpers mtu_len pointer value is being written but also read.
Technically, the MEM_UNINIT should not be there in order to always force init.
Removing MEM_UNINIT needs more verifier rework though: MEM_UNINIT right now
implies two things actually: i) write into memory, ii) memory does not have
to be initialized. If we lift MEM_UNINIT, it then becomes: i) read into memory,
ii) memory must be initialized. This means that for bpf_*_check_mtu() we're
readding the issue we're trying to fix, that is, it would then be able to
write back into things like .rodata BPF maps. Follow-up work will rework the
MEM_UNINIT semantics such that the intent can be better expressed. For now
just clear the *mtu_len on error path which can be lifted later again.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:49 +0000 (21:17 +0200)]
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
When checking malformed helper function signatures, also take other argument
types into account aside from just ARG_PTR_TO_UNINIT_MEM.
This concerns (formerly) ARG_PTR_TO_{INT,LONG} given uninitialized memory can
be passed there, too.
The func proto sanity check goes back to commit 435faee1aae9 ("bpf, verifier:
add ARG_PTR_TO_RAW_STACK type"), and its purpose was to detect wrong func protos
which had more than just one MEM_UNINIT-tagged type as arguments.
The reason more than one is currently not supported is as we mark stack slots with
STACK_MISC in check_helper_call() in case of raw mode based on meta.access_size to
allow uninitialized stack memory to be passed to helpers when they just write into
the buffer.
Probing for base type as well as MEM_UNINIT tagging ensures that other types do not
get missed (as it used to be the case for ARG_PTR_TO_{INT,LONG}).
Daniel Borkmann [Fri, 13 Sep 2024 19:17:48 +0000 (21:17 +0200)]
bpf: Fix helper writes to read-only maps
Lonial found an issue that despite user- and BPF-side frozen BPF map
(like in case of .rodata), it was still possible to write into it from
a BPF program side through specific helpers having ARG_PTR_TO_{LONG,INT}
as arguments.
In check_func_arg() when the argument is as mentioned, the meta->raw_mode
is never set. Later, check_helper_mem_access(), under the case of
PTR_TO_MAP_VALUE as register base type, it assumes BPF_READ for the
subsequent call to check_map_access_type() and given the BPF map is
read-only it succeeds.
The helpers really need to be annotated as ARG_PTR_TO_{LONG,INT} | MEM_UNINIT
when results are written into them as opposed to read out of them. The
latter indicates that it's okay to pass a pointer to uninitialized memory
as the memory is written to anyway.
However, ARG_PTR_TO_{LONG,INT} is a special case of ARG_PTR_TO_FIXED_SIZE_MEM
just with additional alignment requirement. So it is better to just get
rid of the ARG_PTR_TO_{LONG,INT} special cases altogether and reuse the
fixed size memory types. For this, add MEM_ALIGNED to additionally ensure
alignment given these helpers write directly into the args via *<ptr> = val.
The .arg*_size has been initialized reflecting the actual sizeof(*<ptr>).
MEM_ALIGNED can only be used in combination with MEM_FIXED_SIZE annotated
argument types, since in !MEM_FIXED_SIZE cases the verifier does not know
the buffer size a priori and therefore cannot blindly write *<ptr> = val.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:47 +0000 (21:17 +0200)]
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
Both bpf_strtol() and bpf_strtoul() helpers passed a temporary "long long"
respectively "unsigned long long" to __bpf_strtoll() / __bpf_strtoull().
Later, the result was checked for truncation via _res != ({unsigned,} long)_res
as the destination buffer for the BPF helpers was of type {unsigned,} long
which is 32bit on 32bit architectures.
Given the latter was a bug in the helper signatures where the destination buffer
got adjusted to {s,u}64, the truncation check can now be removed.
Daniel Borkmann [Fri, 13 Sep 2024 19:17:46 +0000 (21:17 +0200)]
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
The bpf_strtol() and bpf_strtoul() helpers are currently broken on 32bit:
The argument type ARG_PTR_TO_LONG is BPF-side "long", not kernel-side "long"
and therefore always considered fixed 64bit no matter if 64 or 32bit underlying
architecture.
This contract breaks in case of the two mentioned helpers since their BPF_CALL
definition for the helpers was added with {unsigned,}long *res. Meaning, the
transition from BPF-side "long" (BPF program) to kernel-side "long" (BPF helper)
breaks here.
Both helpers call __bpf_strtoll() with "long long" correctly, but later assigning
the result into 32-bit "*(long *)" on 32bit architectures. From a BPF program
point of view, this means upper bits will be seen as uninitialised.
Therefore, fix both BPF_CALL signatures to {s,u}64 types to fix this situation.
Now, changing also uapi/bpf.h helper documentation which generates bpf_helper_defs.h
for BPF programs is tricky: Changing signatures there to __{s,u}64 would trigger
compiler warnings (incompatible pointer types passing 'long *' to parameter of type
'__s64 *' (aka 'long long *')) for existing BPF programs.
Leaving the signatures as-is would be fine as from BPF program point of view it is
still BPF-side "long" and thus equivalent to __{s,u}64 on 64 or 32bit underlying
architectures.
Note that bpf_strtol() and bpf_strtoul() are the only helpers with this issue.
Yonghong Song [Fri, 13 Sep 2024 15:03:32 +0000 (08:03 -0700)]
selftests/bpf: Add tests for sdiv/smod overflow cases
Subtests are added to exercise the patched code which handles
- LLONG_MIN/-1
- INT_MIN/-1
- LLONG_MIN%-1
- INT_MIN%-1
where -1 could be an immediate or in a register.
Without the previous patch, all these cases will crash the kernel on
x86_64 platform.
Additional tests are added to use small values (e.g. -5/-1, 5%-1, etc.)
in order to exercise the additional logic with patched insns.
Yonghong Song [Fri, 13 Sep 2024 15:03:26 +0000 (08:03 -0700)]
bpf: Fix a sdiv overflow issue
Zac Ecob reported a problem where a bpf program may cause kernel crash due
to the following error:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
The failure is due to the below signed divide:
LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
but it is impossible since for 64-bit system, the maximum positive
number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
LLONG_MIN.
Further investigation found all the following sdiv/smod cases may trigger
an exception when bpf program is running on x86_64 platform:
- LLONG_MIN/-1 for 64bit operation
- INT_MIN/-1 for 32bit operation
- LLONG_MIN%-1 for 64bit operation
- INT_MIN%-1 for 32bit operation
where -1 can be an immediate or in a register.
On arm64, there are no exceptions:
- LLONG_MIN/-1 = LLONG_MIN
- INT_MIN/-1 = INT_MIN
- LLONG_MIN%-1 = 0
- INT_MIN%-1 = 0
where -1 can be an immediate or in a register.
Insn patching is needed to handle the above cases and the patched codes
produced results aligned with above arm64 result. The below are pseudo
codes to handle sdiv/smod exceptions including both divisor -1 and divisor 0
and the divisor is stored in a register.
sdiv:
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L2
if tmp == 0 goto L1
rY = 0
L1:
rY = -rY;
goto L3
L2:
rY /= rX
L3:
smod:
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L1
if tmp == 1 (is64 ? goto L2 : goto L3)
rY = 0;
goto L2
L1:
rY %= rX
L2:
goto L4 // only when !is64
L3:
wY = wY // only when !is64
L4:
dt-bindings: cpu: Drop duplicate nvidia,tegra186-ccplex-cluster.yaml
"nvidia,tegra186-ccplex-cluster" is also documented in
arm/tegra/nvidia,tegra-ccplex-cluster.yaml. As it covers Tegra234 as
well, drop nvidia,tegra186-ccplex-cluster.yaml.
dt-bindings: clock: mediatek: Drop duplicate mediatek,mt6795-sys-clock.yaml
The compatible strings for mt6795 clocks are also documented in other
schemas:
"mediatek,mt6795-apmixedsys" in clock/mediatek,apmixedsys.yaml
"mediatek,mt6795-topckgen" in clock/mediatek,topckgen.yaml
"mediatek,mt6795-pericfg" in clock/mediatek,pericfg.yaml
"mediatek,mt6795-infracfg" in clock/mediatek,infracfg.yaml
The only difference is #reset-cells is not allowed in some of these,
but that aligns with actual users in .dts files.
Geert Uytterhoeven [Wed, 11 Jan 2023 15:55:17 +0000 (16:55 +0100)]
dt-bindings: clk: vc5: Make SD/OE pin configuration properties not required
"make dtbs_check":
arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb: clock-generator@6a: 'idt,shutdown' is a required property
From schema: Documentation/devicetree/bindings/clock/idt,versaclock5.yaml
arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb: clock-generator@6a: 'idt,output-enable-active' is a required property
From schema: Documentation/devicetree/bindings/clock/idt,versaclock5.yaml
Versaclock 5 clock generators can have their configuration stored in
One-Time Programmable (OTP) memory. Hence there is no need to specify
DT properties for manual configuration if the OTP has been programmed
before. Likewise, the Linux driver does not touch the SD/OE bits if the
corresponding properties are not specified, cfr. commit d83e561d43bc71e5
("clk: vc5: Add properties for configuring SD/OE behavior").
Reflect this in the bindings by making the "idt,shutdown" and
"idt,output-enable-active" properties not required, just like the
various "idt,*" properties in the per-output child nodes.
Fixes: 275e4e2dc0411508 ("dt-bindings: clk: vc5: Add properties for configuring the SD/OE pin") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/68037ad181991fe0b792f6d003e3e9e538d5ffd7.1673452118.git.geert+renesas@glider.be Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
drivers/of: Improve documentation for match_string
The description of the function now explicitly states that it's
an *exact* match for the given string (i.e. not a submatch). It also
better states all the possible return values.
Zhang Zekun [Fri, 30 Aug 2024 02:06:26 +0000 (10:06 +0800)]
of: property: Do some clean up with use of __free()
__free() provides a scoped of_node_put() functionality to put the
device_node automatically, and we don't need to call of_node_put()
directly. Let's simplify the code a bit with the use of __free().
Piotr Wojtaszczyk [Thu, 27 Jun 2024 15:00:20 +0000 (17:00 +0200)]
dt-bindings: dma: Add lpc32xx DMA mux binding
LPC32XX SoCs use pl080 dma controller which have few request signals
multiplexed between peripherals. This binding describes how devices can
use the multiplexed request signals.
The correct vendor prefix for Analog Devices is "adi", not "ad". Both
forms are in use. Add the "adi,ad7414" version and deprecate the
"ad,ad7414" version.
Keep them together even though it breaks strict alphabetical ordering.
dt-bindings: trivial-devices: Drop incorrect and duplicate at24 compatibles
"at,24c08" does not have a correct vendor prefix. The correct compatible
string would be "atmel,24c08" which is already documented in at24.yaml.
It is also unused anywhere, so just drop it.
"st,24c256" is already documented in at24.yaml, so drop it as well.
Simon Horman [Sun, 8 Sep 2024 20:25:16 +0000 (21:25 +0100)]
dt-bindings: wakeup-source: update reference to m8921-keypad.yaml
commit 53ed3233e6b5 ("dt-bindings: input: qcom,pm8921-keypad: convert to
YAML format") resulted in a renaming of the output .txt file from
qcom,pm8xxx-keypad.txt to qcom,pm8921-keypad.yaml.
This patch makes a corresponding update to the link to that .txt file
in wakeup-source.txt.
Flagged by make htmldocs:
Warning: Documentation/devicetree/bindings/power/wakeup-source.txt references a file that doesn't exist: Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt
The members "start" and "end" of struct resource are of type
"resource_size_t" which can be 32bit wide.
Values read from OF however are always 64bit wide.
Refactor the diff overflow checks into a helper function.
Also extend the checks to validate each calculation step.
s390/vdso: Wire up getrandom() vdso implementation
Provide the s390 specific vdso getrandom() architecture backend.
_vdso_rng_data required data is placed within the _vdso_data vvar page,
by using a hardcoded offset larger than vdso_data.
As required the chacha20 implementation does not write to the stack.
The implementation follows more or less the arm64 implementations and
makes use of vector instructions. It has a fallback to the getrandom()
system call for machines where the vector facility is not installed.
The check if the vector facility is installed, as well as an
optimization for machines with the vector-enhancements facility 2, is
implemented with alternatives, avoiding runtime checks.
Note that __kernel_getrandom() is implemented without the vdso user
wrapper which would setup a stack frame for odd cases (aka very old
glibc variants) where the caller has not done that. All callers of
__kernel_getrandom() are required to setup a stack frame, like the C ABI
requires it.
The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Merge tag 'spi-fix-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few last minute fixes for v6.11, they're all individually
unremarkable and only last minute due to when they came in"
* tag 'spi-fix-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: nxp-fspi: fix the KASAN report out-of-bounds bug
spi: geni-qcom: Fix incorrect free_irq() sequence
spi: geni-qcom: Undo runtime PM changes at driver exit time
Mina Almasry [Fri, 13 Sep 2024 06:07:45 +0000 (06:07 +0000)]
memory-provider: disable building dmabuf mp on !CONFIG_PAGE_POOL
When CONFIG_TRACEPOINTS=y but CONFIG_PAGE_POOL=n, we end up with this
build failure that is reported by the 0-day bot:
ld: vmlinux.o: in function `mp_dmabuf_devmem_alloc_netmems':
>> (.text+0xc37286): undefined reference to `__tracepoint_page_pool_state_hold'
>> ld: (.text+0xc3729a): undefined reference to `__SCT__tp_func_page_pool_state_hold'
>> ld: vmlinux.o:(__jump_table+0x10c48): undefined reference to `__tracepoint_page_pool_state_hold'
>> ld: vmlinux.o:(.static_call_sites+0xb824): undefined reference to `__SCK__tp_func_page_pool_state_hold'
The root cause is that in this configuration, traces are enabled but the
page_pool specific trace_page_pool_state_hold is not registered.
There is no reason to build the dmabuf memory provider when
CONFIG_PAGE_POOL is not present, as it's really a provider to the
page_pool.
In fact the whole NET_DEVMEM is RX path-only at the moment, so we can
make the entire config dependent on the PAGE_POOL.
Note that this may need to be revisited after/while devmem TX is
added, as devmem TX likely does not need CONFIG_PAGE_POOL. For now this
build fix is sufficient.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409131239.ysHQh4Tv-lkp@intel.com/ Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20240913060746.2574191-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'soundwire-6.11-fixes_2' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- Revert of earlier fix sent for non-continuous port map programming
which caused regression on Intel platforms
* tag 'soundwire-6.11-fixes_2' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: Revert "soundwire: stream: fix programming slave ports for non-continous port maps"
Merge tag 'drm-fixes-2024-09-13' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes pull, the amdgpu JPEG engine fixes are probably the
biggest, they look to block some register accessing, otherwise there
are just minor fixes and regression fixes all over.
nouveau had a regression report going back a few kernels that finally
got fixed, Not entirely happy with so many changes so late, but they
all seem quite benign apart from the jpeg one.
dma-buf/heaps:
- fix off by one in CMA heap fault handler
syncobj:
- fix syncobj leak in drm_syncobj_eventfd_ioctl
amdgpu:
- Avoid races between set_drr() functions and dc_state_destruct()
- Fix regerssion related to zpos
- Fix regression related to overlay cursor
- SMU 14.x updates
- JPEG fixes
- Silence an UBSAN warning
amdkfd:
- Fetch cacheline size from IP discovery
i915:
- Prevent a possible int overflow in wq offsets
xe:
- Remove a double include
- Fix null checks and UAF
- Fix access_ok check in user_fence_create
- Fix compat IS_DISPLAY_STEP() range
- OA fix
- Fixes in show_meminfo
nouveau:
- fix GP10x regression on boot
stm:
- add COMMON_CLK dep
rockchip:
- iommu api change
tegra:
- iommu api change"
* tag 'drm-fixes-2024-09-13' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/xe/client: add missing bo locking in show_meminfo()
drm/xe/client: fix deadlock in show_meminfo()
drm/xe/oa: Enable Xe2+ PES disaggregation
drm/xe/display: fix compat IS_DISPLAY_STEP() range end
drm/xe: Fix access_ok check in user_fence_create
drm/xe: Fix possible UAF in guc_exec_queue_process_msg
drm/xe: Remove fence check from send_tlb_invalidation
drm/xe/gt: Remove double include
drm/amd/display: Add all planes on CRTC to state for overlay cursor
drm/amdgpu/atomfirmware: Silence UBSAN warning
drm/amd/amdgpu: apply command submission parser for JPEG v1
drm/amd/amdgpu: apply command submission parser for JPEG v2+
drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3
drm/amd/pm: update the features set on smu v14.0.2/3
drm/amd/display: Do not reset planes based on crtc zpos_changed
drm/amd/display: Avoid race between dcn35_set_drr() and dc_state_destruct()
drm/amd/display: Avoid race between dcn10_set_drr() and dc_state_destruct()
drm/amdkfd: Add cache line size info
drm/tegra: Use iommu_paging_domain_alloc()
drm/rockchip: Use iommu_paging_domain_alloc()
...
_regulator_get() contains a lot of common code doing checks prior to the
regulator lookup and housekeeping work after the lookup. Almost all the
code could be shared with a OF-specific variant of _regulator_get().
Split out the common parts so that they can be reused. The OF-specific
version of _regulator_get() will be added in a subsequent patch.
No functional changes were made.
Mark Brown [Fri, 13 Sep 2024 15:59:45 +0000 (16:59 +0100)]
AMD SoundWire machine driver code refactor
Merge series from Vijendar Mukunda <Vijendar.Mukunda@amd.com>:
This patch series moves common Soundwire endpoint parsing and dai
creation logic to common placeholder from Intel generic SoundWire
machine driver code to make it generic. AMD SoundWire machine driver
code is refactored to use these functions for SoundWire endpoint
parsing and dai creation logic.
s390/vdso: Move vdso symbol handling to separate header file
The vdso.h header file, which is included at many places, includes
generated header files. This can easily lead to recursive header file
inclusions if the vdso code is changed.
Therefore move the vdso symbol code, which requires the generated
header files, to a separate header file, and include it at the two
locations which require it.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
s390/facility: Let test_facility() generate static branch if possible
Let test_facility() generate a branch instruction if the tested facility is
a constant, and where the result cannot be evaluated during compile
time. The branch instruction defaults to "false" and is patched to nop
(branch not taken) if the tested facility is available.
This avoids runtime checks and is similar to x86's static_cpu_has() and
arm64's alternative_has_cap_likely().
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Patch all alternatives which depend on facilities from the decompressor.
There is no technical reason which enforces to split patching of such
alternatives to the decompressor and the kernel.
This simplifies alternative handling a bit, since one alternative type is
removed.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
s390/facility: Disable compile time optimization for decompressor code
Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.
Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
Extend getrandom() vDSO implementation to VDSO64.
Tested on QEMU on both ppc64_defconfig and ppc64le_defconfig.
Results from a Power9 (PowerNV):
~ # ./vdso_test_getrandom bench-single
vdso: 25000000 times in 0.787943615 seconds
libc: 25000000 times in 14.101887252 seconds
syscall: 25000000 times in 14.047475082 seconds
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
To be consistent with other VDSO functions, the function is called
__kernel_getrandom()
__arch_chacha20_blocks_nostack() fonction is implemented basically
with 32 bits operations. It performs 4 QUARTERROUND operations in
parallele. There are enough registers to avoid using the stack:
On input:
r3: output bytes
r4: 32-byte key input
r5: 8-byte counter input/output
r6: number of 64-byte blocks to write to output
During operation:
stack: pointer to counter (r5) and non-volatile registers (r14-131)
r0: counter of blocks (initialised with r6)
r4: Value '4' after key has been read, used for indexing
r5-r12: key
r14-r15: block counter
r16-r31: chacha state
At the end:
r0, r6-r12: Zeroised
r5, r14-r31: Restored
Performance on powerpc 885 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 62.938002291 seconds
libc: 25000000 times in 535.581916866 seconds
syscall: 25000000 times in 531.525042806 seconds
Performance on powerpc 8321 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 16.899318858 seconds
libc: 25000000 times in 131.050596522 seconds
syscall: 25000000 times in 129.794790389 seconds
This first patch adds support for VDSO32. As selftests cannot easily
be generated only for VDSO32, and because the following patch brings
support for VDSO64 anyway, this patch opts out all code in
__arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
fail to compile and will not crash on PPC64/PPC64LE, allthough the
selftest itself will fail.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
In order to avoid two much duplication when we add new VDSO
functionnalities in C like getrandom, refactor common CFLAGS.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix
build failure") added _restgpr_31_x to the vdso for gettimeofday, but
the work on getrandom shows that we will need more of those functions.
Remove _restgpr_31_x and link in crtsavres.o so that we get all
save/restore functions when optimising the kernel for size.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Commit 9651fcedf7b9 ("mm: add MAP_DROPPABLE for designating always
lazily freeable mappings") only adds VM_DROPPABLE for 64 bits
architectures.
In order to also use the getrandom vDSO implementation on powerpc/32,
use VM_ARCH_1 for VM_DROPPABLE on powerpc/32. This is possible because
VM_ARCH_1 is used for VM_SAO on powerpc and VM_SAO is only for
powerpc/64. It is used in combination with PROT_SAO in some parts of
code that are restricted to CONFIG_PPC64 through #ifdefs, it is
therefore possible to define VM_SAO for CONFIG_PPC64 only.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
When running in a non-root time namespace, the global VDSO data page
is replaced by a dedicated namespace data page and the global data
page is mapped next to it. Detailed explanations can be found at
commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support").
When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq
and __kernel_sync_dicache don't work anymore because they read 0
instead of the data they need.
To address that, clock_mode has to be read. When it is set to
VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page
and the global data is located on the following page.
Add a macro called get_realdatapage which reads clock_mode and add
PAGE_SIZE to the pointer provided by get_datapage macro when
clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro
instead of get_datapage macro except for time functions as they handle
it internally.
Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Closes: https://lore.kernel.org/all/ZtnYqZI-nrsNslwy@zx2c4.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Tue, 3 Sep 2024 12:52:45 +0000 (14:52 +0200)]
selftests: vDSO: don't include generated headers for chacha test
It's not correct to use $(top_srcdir) for generated header files, for
builds that are done out of tree via O=, and $(objtree) isn't valid in
the selftests context. Instead, just obviate the need for these
generated header files by defining empty stubs in tools/include, which
is the same thing that's done for rwlock.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
arm64: vDSO: Wire up getrandom() vDSO implementation
Hook up the generic vDSO implementation to the aarch64 vDSO data page.
The _vdso_rng_data required data is placed within the _vdso_data vvar
page, by using a offset larger than the vdso_data.
The vDSO function requires a ChaCha20 implementation that does not write
to the stack, and that can do an entire ChaCha20 permutation. The one
provided uses NEON on the permute operation, with a fallback to the
syscall for chips that do not support AdvSIMD.
This also passes the vdso_test_chacha test along with
vdso_test_getrandom. The vdso_test_getrandom bench-single result on
Neoverse-N1 shows:
A small fixup to arch/arm64/include/asm/mman.h was required to avoid
pulling kernel code into the vDSO, similar to what's already done in
arch/arm64/include/asm/rwonce.h.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Mark Rutland [Tue, 3 Sep 2024 12:09:16 +0000 (12:09 +0000)]
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
Currently alternative_has_cap_unlikely() can be used in VDSO code, but
alternative_has_cap_likely() cannot as it references alt_cb_patch_nops,
which is not available when linking the VDSO. This is unfortunate as it
would be useful to have alternative_has_cap_likely() available in VDSO
code.
... as removing duplicate NOPs within the kernel Image saved areasonable
amount of space.
Given the VDSO code will have nowhere near as many alternative branches
as the main kernel image, this isn't much of a concern, and a few extra
nops isn't a massive problem.
Change alternative_has_cap_likely() to only use alt_cb_patch_nops for
the main kernel image, and allow duplicate NOPs in VDSO code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
selftests: vDSO: also test counter in vdso_test_chacha
The chacha vDSO selftest doesn't check the way the counter is handled by
__arch_chacha20_blocks_nostack(). It indirectly checks that the counter
is writen on exit and read back on new entry, but it doesn't check that
the format is correct. When implementing this function on powerpc, I
missed a case where the counter was writen and read in wrong byte order.
Also, the counter uses two words, but the tests with a zero counter and
uses a small amount of blocks, so at the end the upper part of the
counter is always 0, so it is not checked.
Add a verification of counter's content in addition to the verification
of the output.
Also add two tests where the counter crosses the u32 upper limit. The
first test verifies that the function properly writes back the upper
word, the second test verifies that the function properly reads back the
upper word.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Without -O2, the generated code for testing chacha function is awful.
GCC even implements rol32() as a function of 20 instructions instead of
just using the rotlwi instruction.
~# time ./vdso_test_chacha
TAP version 13
1..1
ok 1 chacha: PASS
real 0m 37.16s
user 0m 36.89s
sys 0m 0.26s
Several other selftests directory add -O2, and the kernel is also
always built with optimisation active. Do the same for vDSO selftests.
With this patch the time is reduced by approximately 15%.
~# time ./vdso_test_chacha
TAP version 13
1..1
ok 1 chacha: PASS
real 0m 32.09s
user 0m 31.86s
sys 0m 0.22s
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Xi Ruoyao [Sun, 1 Sep 2024 06:13:11 +0000 (14:13 +0800)]
LoongArch: vDSO: Wire up getrandom() vDSO implementation
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Xi Ruoyao [Sun, 1 Sep 2024 06:13:10 +0000 (14:13 +0800)]
random: vDSO: add a __vdso_getrandom prototype for all architectures
Without a prototype, we'll have to add a prototype for each architecture
implementing vDSO getrandom. As most architectures will likely have the
vDSO getrandom implemented in a near future, and we'd like to keep the
declarations compatible everywhere (to ease the libc implementor work),
we should really just have one copy of the prototype.
This also is what's already done inside of include/vdso/gettime.h for
those vDSO functions, so this continues that convention.
Suggested-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Huacai Chen <chenhuacai@kernel.org>
[Jason: rewrite docbook comment for prototype.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Sun, 1 Sep 2024 13:05:01 +0000 (15:05 +0200)]
selftests: vDSO: fix cross build for getrandom and chacha tests
Unlike the check for the standalone x86 test, the check for building the
vDSO getrandom and chacaha tests looks at the architecture for the host
rather than the architecture for the target when deciding if they should
be built. Since the chacha test includes some assembler code this means
that cross building with x86 as either the target or host is broken.
There's also some additional complications, where ARCH can legitimately
be either x86_64 or x86, but the source code we need to compile lives in
a directory path containing arch/x86. The standard SRCARCH variable
handles that. And actually, all these variables and proper substitutions
are already described in tools/scripts/Makefile.arch, so just include
that to handle it.
Similarly, ARCH=x86 can actually describe ARCH=x86_64,
just with CONFIG_64BIT, so we can't rely on ARCH for selecting
non-32-bit tests. For that, check against $(ARCH)$(CONFIG_X86_32). This
won't help for people manually running this inside the vDSO selftest
directory (which isn't really supported anyway and has problems on
various archs), but it should work for builds of the kselftests, where
the CONFIG_* variables are defined. On x86_64 machines,
$(ARCH)$(CONFIG_X86_32) will evaluate to x86. On arm64 machines, it will
evaluate to arm64. On 32-bit x86 machines, it will evaluate to x86y,
which won't match the filter list.
Reported-by: Mark Brown <broonie@kernel.org> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Christophe Leroy [Tue, 27 Aug 2024 07:31:47 +0000 (09:31 +0200)]
random: vDSO: minimize and simplify header includes
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel
is problematic when some system headers are included.
Minimise the amount of headers by moving needed items, such as
__{get,put}_unaligned_t, into dedicated common headers and in general
use more specific headers, similar to what was done in commit 8165b57bca21 ("linux/const.h: Extract common header for vDSO") and
commit 8c59ab839f52 ("lib/vdso: Enable common headers").
On some architectures this results in missing PAGE_SIZE, as was
described by commit 8b3843ae3634 ("vdso/datapage: Quick fix - use
asm/page-def.h for ARM64"), so define this if necessary, in the same way
as done prior by commit cffaefd15a8f ("vdso: Use CONFIG_PAGE_SHIFT in
vdso/datapage.h").
Removing linux/time64.h leads to missing 'struct timespec64' in
x86's asm/pvclock.h. Add a forward declaration of that struct in
that file.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Christophe Leroy [Thu, 22 Aug 2024 07:13:11 +0000 (09:13 +0200)]
random: vDSO: add __arch_get_k_vdso_rng_data() helper for data page access
_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided
so that all architectures can provide the requested pointer.
Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data()
and don't use x86 _vdso_rng_data directly.
Until now vdso/vsyscall.h was only included by time/vsyscall.c but now
it will also be included in char/random.c, leading to a duplicate
declaration of _vdso_data and _vdso_rng_data.
To fix this issue, move the declaration in a C file. vma.c looks like
the most appropriate candidate. We don't need to replace the definitions
in vsyscall.h by declarations as declarations are already in asm/vvar.h.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Christophe Leroy [Tue, 27 Aug 2024 07:31:50 +0000 (09:31 +0200)]
random: vDSO: don't use 64-bit atomics on 32-bit architectures
Performing SMP atomic operations on u64 fails on powerpc32:
CC drivers/char/random.o
In file included from <command-line>:
drivers/char/random.c: In function 'crng_reseed':
././include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_391' declared with attribute error: Need native word sized stores/loads for atomicity.
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
././include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert'
491 | prefix ## suffix(); \
| ^~~~~~
././include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert'
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:513:9: note: in expansion of macro 'compiletime_assert'
513 | compiletime_assert(__native_word(t), \
| ^~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/barrier.h:74:9: note: in expansion of macro 'compiletime_assert_atomic_type'
74 | compiletime_assert_atomic_type(*p); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro '__smp_store_release'
172 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
| ^~~~~~~~~~~~~~~~~~~
drivers/char/random.c:286:9: note: in expansion of macro 'smp_store_release'
286 | smp_store_release(&__arch_get_k_vdso_rng_data()->generation, next_gen + 1);
| ^~~~~~~~~~~~~~~~~
The kernel-side generation counter in the random driver is handled as an
unsigned long, not as a u64, in base_crng and struct crng.
But on the vDSO side, it needs to be an u64, not just an unsigned long,
in order to support a 32-bit vDSO atop a 64-bit kernel.
On kernel side, however, it is an unsigned long, hence a 32-bit value on
32-bit architectures, so just cast it to unsigned long for the
smp_store_release(). A side effect is that on big endian architectures
the store will be performed in the upper 32 bits. It is not an issue on
its own because the vDSO site doesn't mind the value, as it only checks
differences. Just make sure that the vDSO side checks the full 64 bits.
For that, the local current_generation has to be u64 as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Wed, 28 Aug 2024 13:49:32 +0000 (15:49 +0200)]
selftests: vDSO: open code basic chacha instead of linking to libsodium
Linking to libsodium makes building this test annoying in cross
compilation environments and is just way too much. Since this is just a
basic correctness test, simply open code a simple, unoptimized, dumb
chacha, rather than linking to libsodium.
This also fixes a correctness issue on big endian systems. The kernel's
random.c doesn't bother doing a le32_to_cpu operation on the random
bytes that are passed as the key, and consequently neither does
vgetrandom-chacha.S. However, libsodium's chacha _does_ do this, since
it takes the key as an array of bytes. This meant that the test was
broken on big endian systems, which this commit rectifies.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Tue, 27 Aug 2024 15:14:18 +0000 (17:14 +0200)]
random: vDSO: move prototype of arch chacha function to vdso/getrandom.h
Having the prototype for __arch_chacha20_blocks_nostack in
arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large
doc comment were cloned by every architecture, which has been causing
unnecessary churn. Instead move it into include/vdso/getrandom.h, where
it can be shared by all archs implementing it.
As a side bonus, this then lets us use that prototype in the
vdso_test_chacha self test, to ensure that it matches the source, and
indeed doing so turned up some inconsistencies, which are rectified
here.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Thu, 5 Sep 2024 17:17:24 +0000 (19:17 +0200)]
selftests: vDSO: ensure vgetrandom works in a time namespace
After verifying that vDSO getrandom does work, which ensures that the
RNG is initialized, test to see if it also works inside of a time
namespace. This is important to test, because the vvar pages get
swizzled around there. If the arch code isn't careful, the RNG will
appear uninitialized inside of a time namespace.
Because broken code makes the RNG appear uninitialized, test that
everything works by issuing a call to vgetrandom from a fork in a time
namespace, and use ptrace to ensure that the actual syscall getrandom
doesn't get called. If it doesn't get called, then the test succeeds.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Merge tag 'nvme-6.12-2024-09-13' of git://git.infradead.org/nvme into for-6.12/block
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.12
- A syntax cleanup (Shen)
- Fix a Kconfig linking error (Arnd)
- New queue-depth quirk (Keith)"
* tag 'nvme-6.12-2024-09-13' of git://git.infradead.org/nvme:
nvme-pci: qdepth 1 quirk
nvme-tcp: fix link failure for TCP auth
nvme: Convert comma to semicolon