====================
mlx4: Add support for netdev-genl API
There are no functional changes from v5, which I mistakenly sent right
after net-next was closed (oops). This revision, however, includes
Tariq's Reviewed-by tags of the v5 in each commit message. See the
changelog below.
This series adds support to mlx4 for the netdev-genl API which makes it
much easier for users and user programs to map NAPI IDs back to
ifindexes, queues, and IRQs. This is extremely useful for a number of
use cases, including epoll-based busy poll.
In addition, this series includes a patch to generate per-queue
statistics using the netlink API, as well.
To facilitate the stats, patch 1/3 adds a field "alloc_fail" to the ring
structure. This is incremented by the driver in an appropriate place and
used in patch 3/3 as alloc_fail.
Please note: I do not have access to mlx4 hardware, but I've been
working closely with Martin Karsten from University of Waterloo (CC'd)
who has very graciously tested my patches on their mlx4 hardware (hence
his Tested-by attribution in each commit). His latest research work is
particularly interesting [1] and this series helps to support that (and
future) work.
Martin re-test v4 using Jakub's suggested tool [2] and the
stats.pkt_byte_sum and stats.qstat_by_ifindex tests passed. He also
adjusted the queue count and re-ran test to confirm it still passed even
if the queue count was modified.
====================
Introduce switch mode support for ICSSG driver
This series adds support for switch-mode for ICSSG driver. This series
also introduces helper APIs to configure firmware maintained FDB
(Forwarding Database) and VLAN tables. These APIs are later used by ICSSG
driver in switch mode.
Now the driver will boot by default in dual EMAC mode. When first ICSSG
interface is added to bridge driver will still be in EMAC mode. As soon as
second ICSSG interface is added to same bridge, switch-mode will be
enabled and switch firmwares will be loaded to PRU cores. The driver will
remain in dual EMAC mode if ICSSG interfaces are added to two different
bridges or if two different interfaces (One ICSSG, one other) is added to
the same bridge. We'll only enable is_switch_mode flag when two ICSSG
interfaces are added to same bridge.
We start in dual MAC mode. Let's say lan0 and lan1 are ICSSG interfaces
ip link add name br0 type bridge
ip link set lan0 master br0
At this point, we get a CHANGEUPPER event. Only one port is a member of
the bridge, so we will still be in dual MAC mode.
ip link set lan1 master br0
We get a second CHANGEUPPER event, the second interface lan1 is also ICSSG
interface so we will set the is_switch_mode flag and when interfaces are
brought up again, ICSSG switch firmwares will be loaded to PRU Cores.
There are some other cases to consider as well.
ip link add name br0 type bridge
ip link add name br1 type bridge
ip link set lan0 master br0
ip link set ppp0 master br0
Here we are adding lan0 (ICSSG) and ppp0 (non ICSSG) to same bridge, as
they both are not ICSSG, we will still be running in dual EMAC mode.
ip link set lan1 master br1
ip link set vpn0 master br1
Here we are adding lan1 (ICSSG) and vpn0 (non ICSSG) to same bridge, as
they both are not ICSSG, we will still be running in dual EMAC mode.
This is v6 of the series.
Changes from v5 to v6:
*) Removed __packed from structures in icssg_config.h file.
*) Added RB tags of Andrew Lunn <andrew@lunn.ch> to patch 2/3 and patch
3/3 of this series.
Changes from v4 to v5:
*) Rebased on 6.10-rc1.
*) Dropped the RFC tag.
Changes from v3 to v4:
*) Added RFC tag as net-next is closed now.
*) Modified the driver to remove the need of bringing interfaces up / down
for enabling / disabling switch mode. Now switch mode can be enabled
without bringig interfaces up / down as requested by Andrew Lunn
<andrew@lunn.ch>
*) Modified commit message of patch 3/3.
Changes from v2 to v3:
*) Dropped RFC tag.
*) Used ether_addr_copy() instead of manually copying mac address using
for loop in patch 1/3 as suggested by Andrew Lunn <andrew@lunn.ch>
*) Added helper API icssg_fdb_setup() in patch 1/3 to reduce code
duplication as suggested by Andrew Lunn <andrew@lunn.ch>
*) In prueth_switchdev_stp_state_set() removed BR_STATE_LEARNING as
learning without forwarding is not supported by ICSSG firmware.
*) Used ether_addr_equal() wherever possible in patch 2/3 as suggested
by Andrew Lunn <andrew@lunn.ch>
*) Fixed typo "nit: s/prueth_switchdevice_nb/prueth_switchdev_nb/" in
patch 2/3 as suggested by Simon Horman <horms@kernel.org>
*) Squashed "#include "icssg_mii_rt.h" to patch 2/3 from patch 3/3 as
suggested by Simon Horman <horms@kernel.org>
*) Rebased on latest net-next/main.
Changes from v1 to v2:
*) Removed TAPRIO support patch from this series.
*) Stopped using devlink for enabling switch-mode as suggested by Andrew L
*) Added read_poll_timeout() in patch 1 / 3 as suggested by Andrew L.
net: ti: icssg-prueth: Add support for ICSSG switch firmware
Add support for ICSSG switch firmware using existing Dual EMAC driver
with switchdev.
Limitations:
VLAN offloading is limited to 0-256 IDs.
MDB/FDB static entries are limited to 511 entries and different FDBs can
hash to same bucket and thus may not completely offloaded
Example assuming ETH1 and ETH2 as ICSSG2 interfaces:
Switch to ICSSG Switch mode:
ip link add name br0 type bridge
ip link set dev eth1 master br0
ip link set dev eth2 master br0
ip link set dev br0 up
bridge vlan add dev br0 vid 1 pvid untagged self
Going back to Dual EMAC mode:
ip link set dev br0 down
ip link set dev eth1 nomaster
ip link set dev eth2 nomaster
ip link del name br0 type bridge
By default, Dual EMAC firmware is loaded, and can be changed to switch
mode by above steps
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net: ti: icssg-switch: Add switchdev based driver for ethernet switch support
ICSSG can operating in switch mode with 2 ext port and 1 host port with
VLAN/FDB/MDB and STP offloading. Add switchdev based driver to
support the same.
Driver itself will be integrated with icssg_prueth in future commits
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Tue, 28 May 2024 15:15:42 +0000 (16:15 +0100)]
net: dsa: felix: provide own phylink MAC operations
Convert felix to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/E1sByYA-00EM0y-Jn@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In patch 2, by not using 'sanitize' for op docs, any formatting in the
.yaml gets passed straight through to the generated .rst which means
that basic rst (also markdown compatible) list formatting can be used in
the .yaml
====================
Donald Hunter [Tue, 28 May 2024 14:06:50 +0000 (15:06 +0100)]
doc: netlink: Don't 'sanitize' op docstrings in generated .rst
The doc strings for do/dump ops are emitted as toplevel .rst constructs
so they can be multi-line. Pass multi-line text straight through to the
.rst to retain any simple formatting from the .yaml
This fixes e.g. list formatting for the pin-get docs in dpll.yaml:
====================
net: ethernet: ti: am65-cpsw-nuss: support stacked switches
Currently an external Ethernet switch connected to a am65-cpsw-nuss CPU
port will not be probed successfully because of_find_net_device_by_node()
will not be able to find the netdev of the CPU port.
It's necessary to populate of_node of the struct device for the
am65-cpsw-nuss ports. DT nodes of the ports are already stored in per-port
private data, but because of some legacy reasons the naming ("phy_node")
was misleading.
====================
Rename phy_node to port_np to better reflect what it actually is,
because the new phylink API takes netdev node (or DSA port node),
and resolves the phandle internally.
Eric Dumazet [Tue, 28 May 2024 12:52:51 +0000 (12:52 +0000)]
tcp: fix race in tcp_write_err()
I noticed flakes in a packetdrill test, expecting an epoll_wait()
to return EPOLLERR | EPOLLHUP on a failed connect() attempt,
after multiple SYN retransmits. It sometimes return EPOLLERR only.
The issue is that tcp_write_err():
1) writes an error in sk->sk_err,
2) calls sk_error_report(),
3) then calls tcp_done().
tcp_done() is writing SHUTDOWN_MASK into sk->sk_shutdown,
among other things.
Problem is that the awaken user thread (from 2) sk_error_report())
might call tcp_poll() before tcp_done() has written sk->sk_shutdown.
tcp_poll() only sees a non zero sk->sk_err and returns EPOLLERR.
This patch fixes the issue by making sure to call sk_error_report()
after tcp_done().
tcp_write_err() also lacks an smp_wmb().
We can reuse tcp_done_with_error() to factor out the details,
as Neal suggested.
Breno Leitao [Tue, 28 May 2024 08:42:24 +0000 (01:42 -0700)]
netconsole: Do not shutdown dynamic configuration if cmdline is invalid
If a user provides an invalid netconsole configuration during boot time
(e.g., specifying an invalid ethX interface), netconsole will be
entirely disabled. Consequently, the user won't be able to create new
entries in /sys/kernel/config/netconsole/ as that directory does not
exist.
Apart from misconfiguration, another issue arises when ethX is loaded as
a module and the netconsole= line in the command line points to ethX,
resulting in an obvious failure. This renders netconsole unusable, as
/sys/kernel/config/netconsole/ will never appear. This is more annoying
since users reconfigure (or just toggle) the configuratin later (see
commit 5fbd6cdbe304b ("netconsole: Attach cmdline target to dynamic
target"))
Create /sys/kernel/config/netconsole/ even if the command line arguments
are invalid, so, users can create dynamic entries in netconsole.
Schema validation using rockchip,rk3308-gmac compatible fails with:
ethernet@ff4e0000: compatible: ['rockchip,rk3308-gmac'] does not contain items matching the given schema
from schema $id: http://devicetree.org/schemas/net/rockchip-dwmac.yaml#
ethernet@ff4e0000: Unevaluated properties are not allowed ('interrupt-names', 'interrupts', 'phy-mode',
'reg', 'reset-names', 'resets', 'snps,reset-active-low', 'snps,reset-delays-us',
'snps,reset-gpio' were unexpected)
from schema $id: http://devicetree.org/schemas/net/rockchip-dwmac.yaml#
Add rockchip,rk3308-gmac to snps,dwmac.yaml to fix DT schema validation.
Fixes: 2cc8c910f515 ("dt-bindings: net: rockchip-dwmac: add rk3308 gmac compatible") Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240528093751.3690231-1-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Sverdlin [Tue, 28 May 2024 07:31:13 +0000 (09:31 +0200)]
net: dsa: lan9303: imply SMSC_PHY
Both LAN9303 and LAN9354 have internal PHYs on both external ports.
Therefore a configuration without SMSC PHY support is non-practical at
least and leads to:
Vineeth Karumanchi [Tue, 28 May 2024 06:20:08 +0000 (11:50 +0530)]
net: phy: xilinx-gmii2rgmii: Adopt clock support
Add clock support to the gmii_to_rgmii IP.
Make clk optional to keep DTB backward compatibility.
Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vineeth Karumanchi [Tue, 28 May 2024 06:20:07 +0000 (11:50 +0530)]
dt-bindings: net: xilinx_gmii2rgmii: Add clock support
Add "clocks" bindings for the input clock.
Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Mon, 27 May 2024 19:26:44 +0000 (21:26 +0200)]
net: ethernet: cortina: Restore TSO support
An earlier commit deleted the TSO support in the Cortina Gemini
driver because the driver was confusing gso_size and MTU,
probably because what the Linux kernel calls "gso_size" was
called "MTU" in the datasheet.
Restore the functionality properly reading the gso_size from
the skbuff.
Tested with iperf3, running a server on a different machine
and client on the device with the cortina gemini ethernet:
Several such packets often follow after each other verifying
the segmentation into 0x05a8 (1448) byte packages also on the
reveiving end. As can be seen, the ethernet frames are
0x05ea (1514) in size.
Performance with iperf3 before this patch: ~15.5 Mbit/s
Performance with iperf3 after this patch: ~175 Mbit/s
This was running a 60 second test (twice) the best measurement
was 179 Mbit/s.
For comparison if I run iperf3 with UDP I get around 1.05 Mbit/s
both before and after this patch.
While this is a gigabit ethernet interface, the CPU is a cheap
D-Link DIR-685 router (based on the ARMv5 Faraday FA526 at
~50 MHz), and the software is not supposed to drive traffic,
as the device has a DSA chip, so this kind of numbers can be
expected.
Fixes: ac631873c9e7 ("net: ethernet: cortina: Drop TSO support") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 27 May 2024 19:20:16 +0000 (21:20 +0200)]
r8169: remove detection of chip version 11 (early RTL8168b)
This early RTL8168b version was the first PCIe chip version, and it's
quite quirky. Last sign of life is from more than 15 yrs ago.
Let's remove detection of this chip version, we'll see whether anybody
complains. If not, support for this chip version can be removed a few
kernel versions later.
Heiner Kallweit [Mon, 27 May 2024 19:16:56 +0000 (21:16 +0200)]
r8169: disable interrupt source RxOverflow
Vendor driver calls this bit RxDescUnavail. All we do in the interrupt
handler in this case is scheduling NAPI. If we should be out of
RX descriptors, then NAPI is scheduled anyway. Therefore remove this
interrupt source. Tested on RTL8168h.
We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).
The main changes are:
1) Rename skb's mono_delivery_time to tstamp_type for extensibility
and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
from Abhishek Chauhan.
2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
CT zones can be used from XDP/tc BPF netfilter CT helper functions,
from Brad Cowie.
3) Several tweaks to the instruction-set.rst IETF doc to address
the Last Call review comments, from Dave Thaler.
4) Small batch of riscv64 BPF JIT optimizations in order to emit more
compressed instructions to the JITed image for better icache efficiency,
from Xiao Wang.
5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
diffing and forcing more natural type definitions ordering,
from Mykyta Yatsenko.
6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
a syzbot/KCSAN race report for the tx_errors counter,
from Jiang Yunshui.
7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
which started treating const structs as constants and thus breaking
full BTF program name resolution, from Ivan Babrou.
8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
some of the test-internal array sizes, from Geliang Tang.
9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
pahole, from Alan Maguire.
10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
rebuilds in some corner cases, from Artem Savkov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
bpf, net: Use DEV_STAT_INC()
bpf, docs: Fix instruction.rst indentation
bpf, docs: Clarify call local offset
bpf, docs: Add table captions
bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
bpf, docs: Use RFC 2119 language for ISA requirements
bpf, docs: Move sentence about returning R0 to abi.rst
bpf: constify member bpf_sysctl_kern:: Table
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
riscv, bpf: Use STACK_ALIGN macro for size rounding up
riscv, bpf: Optimize zextw insn with Zba extension
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
net: Add additional bit to support clockid_t timestamp type
net: Rename mono_delivery_time to tstamp_type for scalabilty
selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
net: netfilter: Make ct zone opts configurable for bpf ct helpers
selftests/bpf: Fix prog numbers in test_sockmap
bpf: Remove unused variable "prev_state"
bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
bpf: Fix order of args in call to bpf_map_kvcalloc
...
====================
'mlx4_port_config was added by
commit ab9c17a009ee ("mlx4_core: Modify driver initialization flow to
accommodate SRIOV for Ethernet")
but remained unused.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gou Hao [Sun, 26 May 2024 14:57:18 +0000 (22:57 +0800)]
net/core: move the lockdep-init of sk_callback_lock to sk_init_common()
In commit cdfbabfb2f0c ("net: Work around lockdep limitation in
sockets that use sockets"), it introduces 'af_kern_callback_keys'
to lockdep-init of sk_callback_lock according to 'sk_kern_sock',
it modifies sock_init_data() only, and sk_clone_lock() calls
sk_init_common() to initialize sk_callback_lock too, so the
lockdep-init of sk_callback_lock should be moved to sk_init_common().
yunshui [Thu, 23 May 2024 03:35:20 +0000 (11:35 +0800)]
bpf, net: Use DEV_STAT_INC()
syzbot/KCSAN reported that races happen when multiple CPUs updating
dev->stats.tx_error concurrently. Adopt SMP safe DEV_STATS_INC() to
update the dev->stats fields.
Dave Thaler [Sat, 25 May 2024 15:33:32 +0000 (08:33 -0700)]
bpf, docs: Clarify call local offset
In the Jump instructions section it explains that the offset is
"relative to the instruction following the jump instruction".
But the program-local section confusingly said "referenced by
offset from the call instruction, similar to JA".
This patch updates that sentence with consistent wording, saying
it's relative to the instruction following the call instruction.
Dave Thaler [Mon, 20 May 2024 21:52:55 +0000 (14:52 -0700)]
bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
imm is defined as a 32-bit signed integer.
{MOV, K, ALU64} says it does "dst = src" (where src is 'imm') and it
does do dst = (s64)imm, which in that sense does sign extend imm. The MOVSX
instruction is explained as sign extending, so added the example of
{MOV, K, ALU64} to make this more clear.
{JLE, K, JMP} says it does "PC += offset if dst <= src" (where src is 'imm',
and the comparison is unsigned). This was apparently ambiguous to some
readers as to whether the comparison was "dst <= (u64)(u32)imm" or
"dst <= (u64)(s64)imm" so added an example to make this more clear.
Dave Thaler [Fri, 17 May 2024 16:58:55 +0000 (09:58 -0700)]
bpf, docs: Use RFC 2119 language for ISA requirements
Per IETF convention and discussion at LSF/MM/BPF, use MUST etc.
keywords as requested by IETF Area Director review. Also as
requested, indicate that documenting BTF is out of scope of this
document and will be covered by a separate IETF specification.
Added paragraph about the terminology that is required IETF boilerplate
and must be worded exactly as such.
Dave Thaler [Fri, 17 May 2024 15:34:45 +0000 (08:34 -0700)]
bpf, docs: Move sentence about returning R0 to abi.rst
As discussed at LSF/MM/BPF, the sentence about using R0 for returning
values from calls is part of the calling convention and belongs in
abi.rst. Any further additions or clarifications to this text are left
for future patches on abi.rst. The current patch is simply to unblock
progression of instruction-set.rst to a standard.
In contrast, the restriction of register numbers to the range 0-10
is untouched, left in the instruction-set.rst definition of the
src_reg and dst_reg fields.
Thomas Weißschuh [Sat, 18 May 2024 14:58:47 +0000 (16:58 +0200)]
bpf: constify member bpf_sysctl_kern:: Table
The sysctl core is preparing to only expose instances of struct ctl_table
as "const". This will also affect the ctl_table argument of sysctl handlers,
for which bpf_sysctl_kern::table is also used.
As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.
To reduce the size of that final commit, switch this utility type which
is not bound by "typedef proc_handler" to "const struct ctl_table".
Xiao Wang [Sun, 19 May 2024 05:05:07 +0000 (13:05 +0800)]
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
We could try to emit compressed insn for reg move operation during CMPXCHG
JIT, the instruction compression has no impact on the jump offsets of
following forward and backward jump instructions.
Xiao Wang [Thu, 16 May 2024 09:04:30 +0000 (17:04 +0800)]
riscv, bpf: Optimize zextw insn with Zba extension
The Zba extension provides add.uw insn which can be used to implement
zext.w with rs2 set as ZERO.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20240516090430.493122-1-xiao.w.wang@intel.com
Martin KaFai Lau [Thu, 23 May 2024 21:13:57 +0000 (14:13 -0700)]
Merge branch 'Replace mono_delivery_time with tstamp_type'
Abhishek Chauhan says:
====================
Patch 1 :- This patch takes care of only renaming the mono delivery
timestamp to tstamp_type with no change in functionality of
existing available code in kernel also
Starts assigning tstamp_type with either mono or real and
introduces a new enum in the skbuff.h, again no change in functionality
of the existing available code in kernel , just making the code scalable.
Patch 2 :- Additional bit was added to support tai timestamp type to
avoid tstamp drops in the forwarding path when testing TC-ETF.
Patch is also updating bpf filter.c
Some updates to bpf header files with introduction to BPF_SKB_CLOCK_TAI
and documentation updates stating deprecation of BPF_SKB_TSTAMP_UNSPEC
and BPF_SKB_TSTAMP_DELIVERY_MONO
Patch 3:- Handles forwarding of UDP packets with TAI clock id tstamp_type
type with supported changes for tc_redirect/tc_redirect_dtime
to handle forwarding of UDP packets with TAI tstamp_type
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Abhishek Chauhan [Thu, 9 May 2024 21:18:34 +0000 (14:18 -0700)]
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
With changes in the design to forward CLOCK_TAI in the skbuff
framework, existing selftest framework needs modification
to handle forwarding of UDP packets with CLOCK_TAI as clockid.
Abhishek Chauhan [Thu, 9 May 2024 21:18:33 +0000 (14:18 -0700)]
net: Add additional bit to support clockid_t timestamp type
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.
To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.
We now support CLOCK_TAI as part of tstamp_type as part of this
commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME.
Abhishek Chauhan [Thu, 9 May 2024 21:18:32 +0000 (14:18 -0700)]
net: Rename mono_delivery_time to tstamp_type for scalabilty
mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.
Also skb_set_delivery_type_by_clockid is a new function which accepts
clockid to determine the tstamp_type.
In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.
Linus Torvalds [Thu, 23 May 2024 19:49:37 +0000 (12:49 -0700)]
Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Quite smaller than usual. Notably it includes the fix for the unix
regression from the past weeks. The TCP window fix will require some
follow-up, already queued.
Current release - regressions:
- af_unix: fix garbage collection of embryos
Previous releases - regressions:
- af_unix: fix race between GC and receive path
- ipv6: sr: fix missing sk_buff release in seg6_input_core
- tcp: remove 64 KByte limit for initial tp->rcv_wnd value
- eth: r8169: fix rx hangup
- eth: lan966x: remove ptp traps in case the ptp is not enabled
- eth: ixgbe: fix link breakage vs cisco switches
- eth: ice: prevent ethtool from corrupting the channels
Previous releases - always broken:
- openvswitch: set the skbuff pkt_type for proper pmtud support
- tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
Misc:
- a bunch of selftests stabilization patches"
* tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
r8169: Fix possible ring buffer corruption on fragmented Tx packets.
idpf: Interpret .set_channels() input differently
ice: Interpret .set_channels() input differently
nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
net: relax socket state check at accept time.
tcp: remove 64 KByte limit for initial tp->rcv_wnd value
net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
tls: fix missing memory barrier in tls_init
net: fec: avoid lock evasion when reading pps_enable
Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
testing: net-drv: use stats64 for testing
net: mana: Fix the extra HZ in mana_hwc_send_request
net: lan966x: Remove ptp traps in case the ptp is not enabled.
openvswitch: Set the skbuff pkt_type for proper pmtud support.
selftest: af_unix: Make SCM_RIGHTS into OOB data.
af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
selftests/net: use tc rule to filter the na packet
ipv6: sr: fix memleak in seg6_hmac_init_algo
af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
...
Linus Torvalds [Thu, 23 May 2024 19:36:38 +0000 (12:36 -0700)]
Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Minor last minute fixes:
- Fix a very tight race between the ring buffer readers and resizing
the ring buffer
- Correct some stale comments in the ring buffer code
- Fix kernel-doc in the rv code
- Add a MODULE_DESCRIPTION to preemptirq_delay_test"
* tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Update rv_en(dis)able_monitor doc to match kernel-doc
tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
ring-buffer: Fix a race between readers and resize checks
ring-buffer: Correct stale comments related to non-consuming readers
Linus Torvalds [Thu, 23 May 2024 19:32:15 +0000 (12:32 -0700)]
Merge tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tool fix from Steven Rostedt:
"Fix printf format warnings in latency-collector.
Use the printf format string with %s to take a string instead of
taking in a string directly"
* tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/latency-collector: Fix -Wformat-security compile warns
Linus Torvalds [Thu, 23 May 2024 19:28:01 +0000 (12:28 -0700)]
Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
"Remove second argument of __assign_str()
The __assign_str() macro logic of the TRACE_EVENT() macro was
optimized so that it no longer needs the second argument. The
__assign_str() is always matched with __string() field that takes a
field name and the source for that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then
use that value to copy into the ring buffer via the __assign_str().
Before commit c1fa617caeb0 ("tracing: Rework __assign_str() and
__string() to not duplicate getting the string"), the __assign_str()
needed the second argument which would perform the same logic as the
__string() source parameter did. Not only would this add overhead, but
it was error prone as if the __assign_str() source produced something
different, it may not have allocated enough for the string in the ring
buffer (as the __string() source was used to determine how much to
allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be
removed"
* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/treewide: Remove second parameter of __assign_str()
Linus Torvalds [Thu, 23 May 2024 19:09:22 +0000 (12:09 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The major fix here is for a filesystem corruption issue reported on
Apple M1 as a result of buggy management of the floating point
register state introduced in 6.8. I initially reverted one of the
offending patches, but in the end Ard cooked a proper fix so there's a
revert+reapply in the series.
Aside from that, we've got some CPU errata workarounds and misc other
fixes.
- Fix broken FP register state tracking which resulted in filesystem
corruption when dm-crypt is used
- Workarounds for Arm CPU errata affecting the SSBS Spectre
mitigation
- Fix lockdep assertion in DMC620 memory controller PMU driver
- Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is
disabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/fpsimd: Avoid erroneous elide of user state reload
Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
perf/arm-dmc620: Fix lockdep assert in ->event_init()
Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
arm64: errata: Add workaround for Arm errata 3194386 and 3312417
arm64: cputype: Add Neoverse-V3 definitions
arm64: cputype: Add Cortex-X4 definitions
arm64: barrier: Restore spec_bar() macro
Linus Torvalds [Thu, 23 May 2024 19:04:36 +0000 (12:04 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
Fix the following -Wformat-security compile warnings adding missing
format arguments:
latency-collector.c: In function ‘show_available’:
latency-collector.c:938:17: warning: format not a string literal and
no format arguments [-Wformat-security]
938 | warnx(no_tracer_msg);
| ^~~~~
latency-collector.c:943:17: warning: format not a string literal and
no format arguments [-Wformat-security]
943 | warnx(no_latency_tr_msg);
| ^~~~~
latency-collector.c: In function ‘find_default_tracer’:
latency-collector.c:986:25: warning: format not a string literal and
no format arguments [-Wformat-security]
986 | errx(EXIT_FAILURE, no_tracer_msg);
|
^~~~
latency-collector.c: In function ‘scan_arguments’:
latency-collector.c:1881:33: warning: format not a string literal and
no format arguments [-Wformat-security]
1881 | errx(EXIT_FAILURE, no_tracer_msg);
| ^~~~
Ken Milmore [Tue, 21 May 2024 22:45:50 +0000 (23:45 +0100)]
r8169: Fix possible ring buffer corruption on fragmented Tx packets.
An issue was found on the RTL8125b when transmitting small fragmented
packets, whereby invalid entries were inserted into the transmit ring
buffer, subsequently leading to calls to dma_unmap_single() with a null
address.
This was caused by rtl8169_start_xmit() not noticing changes to nr_frags
which may occur when small packets are padded (to work around hardware
quirks) in rtl8169_tso_csum_v2().
To fix this, postpone inspecting nr_frags until after any padding has been
applied.
The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect
interpretation of the asymmetric Tx and Rx parameters in their
.set_channels() implementations:
1. ethtool -l <IFNAME> -> combined: 40
2. Attach AF_XDP to queue 30
3. ethtool -L <IFNAME> rx 15 tx 15
combined number is not specified, so command becomes {rx_count = 15,
tx_count = 15, combined_count = 40}.
4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
new (combined_count + rx_count) to the old one, so from 55 to 40, check
does not trigger.
5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes
the queue that AF_XDP is attached to.
This is fundamentally a problem with interpreting a request for asymmetric
queues as symmetric combined queues.
Fix the ice and idpf drivers to stop interpreting such requests as a
request for combined queues. Due to current driver design for both ice and
idpf, it is not possible to support requests of the same count of Tx and Rx
queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15)
so such requests are now rejected.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================
Larysa Zaremba [Tue, 21 May 2024 19:39:54 +0000 (12:39 -0700)]
idpf: Interpret .set_channels() input differently
Unlike ice, idpf does not check, if user has requested at least 1 combined
channel. Instead, it relies on a check in the core code. Unfortunately, the
check does not trigger for us because of the hacky .set_channels()
interpretation logic that is not consistent with the core code.
This naturally leads to user being able to trigger a crash with an invalid
input. This is how:
1. ethtool -l <IFNAME> -> combined: 40
2. ethtool -L <IFNAME> rx 0 tx 0
combined number is not specified, so command becomes {rx_count = 0,
tx_count = 0, combined_count = 40}.
3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel,
comparing (combined_count + rx_count) and (combined_count + tx_count)
to zero. Obviously, (40 + 0) is greater than zero, so the core code
deems the input OK.
4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such
configuration.
The issue has to be solved fundamentally, as current logic is also known to
cause AF_XDP problems in ice [0].
Interpret the command in a way that is more consistent with ethtool
manual [1] (--show-channels and --set-channels) and new ice logic.
Considering that in the idpf driver only the difference between RX and TX
queues forms dedicated channels, change the correct way to set number of
channels to:
Larysa Zaremba [Tue, 21 May 2024 19:39:53 +0000 (12:39 -0700)]
ice: Interpret .set_channels() input differently
A bug occurs because a safety check guarding AF_XDP-related queues in
ethnl_set_channels(), does not trigger. This happens, because kernel and
ice driver interpret the ethtool command differently.
How the bug occurs:
1. ethtool -l <IFNAME> -> combined: 40
2. Attach AF_XDP to queue 30
3. ethtool -L <IFNAME> rx 15 tx 15
combined number is not specified, so command becomes {rx_count = 15,
tx_count = 15, combined_count = 40}.
4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
new (combined_count + rx_count) to the old one, so from 55 to 40, check
does not trigger.
5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the
queue that AF_XDP is attached to.
Interpret the command in a way that is more consistent with ethtool
manual [0] (--show-channels and --set-channels).
Considering that in the ice driver only the difference between RX and TX
queues forms dedicated channels, change the correct way to set number of
channels to:
Ryosuke Yasuoka [Tue, 21 May 2024 15:34:42 +0000 (00:34 +0900)]
nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
When nci_rx_work() receives a zero-length payload packet, it should not
discard the packet and exit the loop. Instead, it should continue
processing subsequent packets.
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The reproducer invokes shutdown() before entering the listener status.
After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
TCP_SYN_RECV sockets"), the above causes the child to reach the accept
syscall in FIN_WAIT1 status.
Eric noted we can relax the existing assertion in __inet_accept()
Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490 Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason Xing [Tue, 21 May 2024 13:42:20 +0000 (21:42 +0800)]
tcp: remove 64 KByte limit for initial tp->rcv_wnd value
Recently, we had some servers upgraded to the latest kernel and noticed
the indicator from the user side showed worse results than before. It is
caused by the limitation of tp->rcv_wnd.
In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
CDN teams would not benefit from this change because they cannot have a
large window to receive a big packet, which will be slowed down especially
in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
the side effect for the latency/time-sensitive users.
To avoid future confusion, current change doesn't affect the initial
receive window on the wire in a SYN or SYN+ACK packet which are set within
65535 bytes according to RFC 7323 also due to the limit in
__tcp_transmit_skb():
th->window = htons(min(tp->rcv_wnd, 65535U));
In one word, __tcp_transmit_skb() already ensures that constraint is
respected, no matter how large tp->rcv_wnd is. The change doesn't violate
RFC.
Let me provide one example if with or without the patch:
Before:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=65536 ---> server
Note: for the last ACK, the calculation is 512 << 7.
After:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=175232 ---> server
Note: I use the following command to make it work:
ip route change default via [ip] dev eth0 metric 100 initrwnd 120
For the last ACK, the calculation is 1369 << 7.
When we apply such a patch, having a large rcv_wnd if the user tweak this
knob can help transfer data more rapidly and save some rtts.
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Romain Gantois [Tue, 21 May 2024 12:44:11 +0000 (14:44 +0200)]
net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
In the prueth_probe() function, if one of the calls to emac_phy_connect()
fails due to of_phy_connect() returning NULL, then the subsequent call to
phy_attached_info() will dereference a NULL pointer.
Check the return code of emac_phy_connect and fail cleanly if there is an
error.
Dae R. Jeong [Tue, 21 May 2024 10:34:38 +0000 (19:34 +0900)]
tls: fix missing memory barrier in tls_init
In tls_init(), a write memory barrier is missing, and store-store
reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
CPU0 CPU1
----- -----
// In tls_init()
// In tls_ctx_create()
ctx = kzalloc()
ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
// In update_sk_prot()
WRITE_ONCE(sk->sk_prot, tls_prots) -(2)
// In sock_common_setsockopt()
READ_ONCE(sk->sk_prot)->setsockopt()
// In tls_{setsockopt,getsockopt}()
ctx->sk_proto->setsockopt() -(3)
In the above scenario, when (1) and (2) are reordered, (3) can observe
the NULL value of ctx->sk_proto, causing NULL dereference.
To fix it, we rely on rcu_assign_pointer() which implies the release
barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
initialized, we can ensure that ctx->sk_proto are visible when
changing sk->sk_prot.
Wei Fang [Tue, 21 May 2024 02:38:00 +0000 (10:38 +0800)]
net: fec: avoid lock evasion when reading pps_enable
The assignment of pps_enable is protected by tmreg_lock, but the read
operation of pps_enable is not. So the Coverity tool reports a lock
evasion warning which may cause data race to occur when running in a
multithread environment. Although this issue is almost impossible to
occur, we'd better fix it, at least it seems more logically reasonable,
and it also prevents Coverity from continuing to issue warnings.
According to the commit, it implements a manual AN-37 for some
"troublesome" Juniper MX5 switches. This appears to be a workaround for a
particular switch.
It has been reported that this causes a severe breakage for other switches,
including a Cisco 3560CX-12PD-S.
The code appears to be a workaround for a specific switch which fails to
link in SFI mode. It expects to see AN-37 auto negotiation in order to
link. The Cisco switch is not expecting AN-37 auto negotiation. When the
device starts the manual AN-37, the Cisco switch decides that the port is
confused and stops attempting to link with it. This persists until a power
cycle. A simple driver unload and reload does not resolve the issue, even
if loading with a version of the driver which lacks this workaround.
The authors of the workaround commit have not responded with
clarifications, and the result of the workaround is complete failure to
connect with other switches.
This appears to be a case where the driver can either "correctly" link with
the Juniper MX5 switch, at the cost of bricking the link with the Cisco
switch, or it can behave properly for the Cisco switch, but fail to link
with the Junipir MX5 switch. I do not know enough about the standards
involved to clearly determine whether either switch is at fault or behaving
incorrectly. Nor do I know whether there exists some alternative fix which
corrects behavior with both switches.
Joe Damato [Mon, 20 May 2024 23:58:43 +0000 (23:58 +0000)]
testing: net-drv: use stats64 for testing
Testing a network device that has large numbers of bytes/packets may
overflow. Using stats64 when comparing fixes this problem.
I tripped on this while iterating on a qstats patch for mlx5. See below
for confirmation without my added code that this is a bug.
Before this patch (with added debugging output):
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
rstat: 481708634 qstat: 666201639514 key: tx-bytes
not ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
Note the huge delta above ^^^ in the rtnl vs qstats.
After this patch:
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to
copy the 64bit stats into a 32bit structure which is probably why this
behavior is occurring.
To show this is happening, you can get the underlying stats that the
stats.py test uses like this:
Note that the above was taken from a system with an mlx5 NIC, which only
exposes ndo_get_stats64.
Based on the ethtool output and qstat output, it appears that stats.py
should be updated to use the 'stats64' structure for accurate
comparisons when packet/byte counters get very large.
To confirm that this was not related to the qstats code I was iterating
on, I booted a kernel without my driver changes and re-ran the test
which shows the qstats are skipped (as they don't exist for mlx5):
NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device
ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats
Linus Torvalds [Thu, 23 May 2024 01:59:29 +0000 (18:59 -0700)]
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-mm updates from Andrew Morton:
- A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer
AMD GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
nilfs2: make block erasure safe in nilfs_finish_roll_forward()
selftests/harness: use 1024 in place of LINE_MAX
Revert "selftests/harness: remove use of LINE_MAX"
selftests/fpu: allow building on other architectures
selftests/fpu: move FP code to a separate translation unit
drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
drm/amd/display: only use hard-float, not altivec on powerpc
riscv: add support for kernel-mode FPU
x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
x86/fpu: fix asm/fpu/types.h include guard
kbuild: enable -Wcast-function-type-strict unconditionally
kbuild: enable -Wformat-truncation on clang
...
Linus Torvalds [Thu, 23 May 2024 00:32:04 +0000 (17:32 -0700)]
Merge tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
"A series from Dave Chinner which cleans up and fixes the handling of
nested allocations within stackdepot and page-owner"
* tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/page-owner: use gfp_nested_mask() instead of open coded masking
stackdepot: use gfp_nested_mask() instead of open coded masking
mm: lift gfp_kmemleak_mask() to gfp.h
Steven Rostedt (Google) [Thu, 16 May 2024 17:34:54 +0000 (13:34 -0400)]
tracing/treewide: Remove second parameter of __assign_str()
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
Brad Cowie [Wed, 22 May 2024 05:07:11 +0000 (17:07 +1200)]
net: netfilter: Make ct zone opts configurable for bpf ct helpers
Add ct zone id and direction to bpf_ct_opts so that arbitrary ct zones
can be used for xdp/tc bpf ct helper functions bpf_{xdp,skb}_ct_alloc
and bpf_{xdp,skb}_ct_lookup.
Use '%pD' to print out the filename, and print out the actual offset
within the file too, rather than just what the virtual address of the
mapping is (which doesn't tell you anything about any mapping offsets).
Also, use the exact vma_lookup() instead of find_vma() - the latter
looks up any vma _after_ the address, which is of questionable value
(yes, maybe you fell off the beginning, but you'd be more likely to fall
off the end).
Linus Torvalds [Wed, 22 May 2024 21:13:22 +0000 (14:13 -0700)]
Merge local branch 'x86-codegen'
Merge trivial x86 code generation annoyances
- Introduce helper macros for clang asm input problems
- use said macros to improve trivially stupid code generation issues in
bitops and array_index_mask_nospec
- also improve codegen with 32-bit array index comparisons
None of these really matter, but I look at code generation and profiles
fairly regularly, and these misfeatures caused the generated code to
look really odd and distract from the real issues.
* branch 'x86-codegen' of local tree:
x86: improve bitop code generation with clang
x86: improve array_index_mask_nospec() code generation
clang: work around asm input constraint problems
Don't force the inputs to be 'unsigned long', when the comparison can
easily be done in 32-bit if that's more appropriate.
Note that while we can look at the inputs to choose an appropriate size
for the compare instruction, the output is fixed at 'unsigned long'.
That's not technically optimal either, since a 32-bit 'sbbl' would often
be sufficient.
But for the outgoing mask we don't know how the mask ends up being used
(ie we have uses that have an incoming 32-bit array index, but end up
using the mask for other things). That said, it only costs the extra
REX prefix to always generate the 64-bit mask.
[ A 'sbbl' also always technically generates a 64-bit mask, but with the
upper 32 bits clear: that's fine for when the incoming index that will
be masked is already 32-bit, but not if you use the mask to mask a
pointer afterwards, like the file table lookup does ]
Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Work around clang problems with asm constraints that have multiple
possibilities, particularly "g" and "rm".
Clang seems to turn inputs like that into the most generic form, which
is the memory input - but to make matters worse, clang won't even use a
possible original memory location, but will spill the value to stack,
and use the stack for the asm input.
Linus Torvalds [Wed, 22 May 2024 19:26:46 +0000 (12:26 -0700)]
Merge tag 'char-misc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver subsystem updates from Greg KH:
"Here is the big set of char/misc and other driver subsystem updates
for 6.10-rc1. Nothing major here, just lots of new drivers and updates
for apis and new hardware types. Included in here are:
- big IIO driver updates with more devices and drivers added
- fpga driver updates
- hyper-v driver updates
- uio_pruss driver removal, no one uses it, other drivers control the
same hardware now
- binder minor updates
- mhi driver updates
- excon driver updates
- counter driver updates
- accessability driver updates
- coresight driver updates
- other hwtracing driver updates
- nvmem driver updates
- slimbus driver updates
- spmi driver updates
- other smaller misc and char driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (319 commits)
misc: ntsync: mark driver as "broken" to prevent from building
spmi: pmic-arb: Add multi bus support
spmi: pmic-arb: Register controller for bus instead of arbiter
spmi: pmic-arb: Make core resources acquiring a version operation
spmi: pmic-arb: Make the APID init a version operation
spmi: pmic-arb: Fix some compile warnings about members not being described
dt-bindings: spmi: Deprecate qcom,bus-id
dt-bindings: spmi: Add X1E80100 SPMI PMIC ARB schema
spmi: pmic-arb: Replace three IS_ERR() calls by null pointer checks in spmi_pmic_arb_probe()
spmi: hisi-spmi-controller: Do not override device identifier
dt-bindings: spmi: hisilicon,hisi-spmi-controller: clean up example
dt-bindings: spmi: hisilicon,hisi-spmi-controller: fix binding references
spmi: make spmi_bus_type const
extcon: adc-jack: Document missing struct members
extcon: realtek: Remove unused of_gpio.h
extcon: usbc-cros-ec: Convert to platform remove callback returning void
extcon: usb-gpio: Convert to platform remove callback returning void
extcon: max77843: Convert to platform remove callback returning void
extcon: max3355: Convert to platform remove callback returning void
extcon: intel-mrfld: Convert to platform remove callback returning void
...
Linus Torvalds [Wed, 22 May 2024 19:13:40 +0000 (12:13 -0700)]
Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in
here are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
device property: Fix a typo in the description of device_get_child_node_count()
kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
scsi: Use device_show_string() helper for sysfs attributes
platform/x86: Use device_show_string() helper for sysfs attributes
perf: Use device_show_string() helper for sysfs attributes
IB/qib: Use device_show_string() helper for sysfs attributes
hwmon: Use device_show_string() helper for sysfs attributes
driver core: Add device_show_string() helper for sysfs attributes
treewide: Use sysfs_bin_attr_simple_read() helper
sysfs: Add sysfs_bin_attr_simple_read() helper
module: don't ignore sysfs_create_link() failures
driver core: Remove unused platform_notify, platform_notify_remove
Linus Torvalds [Wed, 22 May 2024 19:11:48 +0000 (12:11 -0700)]
Merge tag 'staging-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the big set of staging driver changes for 6.10-rc1. Not a lot
of cleanups happening this kernel release, intern applications must be
out of sync at the moment. But we did delete two drivers, wlan-ng and
pi433, as they are no longer in use and the developers involved wanted
them just gone entirely, allowing us to drop 19k lines from the tree.
Other than the normal coding style cleanups here, there has been a lot
of work on the vc04_services code, with the intent to finally get that
out of staging hopefully soon. It's getting closer, which is nice to
see.
All of these have been in linux-next for a while with no reported
issues"
Linus Torvalds [Wed, 22 May 2024 18:53:02 +0000 (11:53 -0700)]
Merge tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.10-rc1.
Included in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos
instead of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
serial: imx: Raise TX trigger level to 8
serial: 8250_pnp: Simplify "line" related code
serial: sh-sci: simplify locking when re-issuing RXDMA fails
serial: sh-sci: let timeout timer only run when DMA is scheduled
serial: sh-sci: describe locking requirements for invalidating RXDMA
serial: sh-sci: protect invalidating RXDMA on shutdown
tty: add the option to have a tty reject a new ldisc
serial: core: Call device_set_awake_path() for console port
dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
tty: serial: uartps: Add support for uartps controller reset
arm64: zynqmp: Add resets property for UART nodes
dt-bindings: serial: cdns,uart: Add optional reset property
serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
serial: 8250_exar: Keep the includes sorted
serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
serial: 8250_exar: Use BIT() in exar_ee_read()
serial: 8250_exar: Switch to use dev_err_probe()
serial: 8250_exar: Return directly from switch-cases
serial: 8250_exar: Decrease indentation level
...
Linus Torvalds [Wed, 22 May 2024 18:40:09 +0000 (11:40 -0700)]
Merge tag 'usb-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 6.10-rc1.
Nothing hugely earth-shattering, just constant forward progress for
hardware support of new devices and cleanups over the drivers.
Included in here are:
- Thunderbolt / USB 4 driver updates
- typec driver updates
- dwc3 driver updates
- gadget driver updates
- uss720 driver id additions and fixes (people use USB->arallel port
devices still!)
- onboard-hub driver rename and additions for new hardware
- xhci driver updates
- other small USB driver updates and additions for quirks and api
changes
All of these have been in linux-next for a while with no reported
problems"
* tag 'usb-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (154 commits)
drm/bridge: aux-hpd-bridge: correct devm_drm_dp_hpd_bridge_add() stub
usb: fotg210: Add missing kernel doc description
usb: dwc3: core: Fix unused variable warning in core driver
usb: typec: tipd: rely on i2c_get_match_data()
usb: typec: tipd: fix event checking for tps6598x
usb: typec: tipd: fix event checking for tps25750
dt-bindings: usb: qcom,dwc3: fix interrupt max items
usb: fotg210: Use *-y instead of *-objs in Makefile
usb: phy: tegra: Replace of_gpio.h by proper one
usb: typec: ucsi: displayport: Fix potential deadlock
usb: typec: qcom-pmic-typec: split HPD bridge alloc and registration
usb: musc: Remove unused list 'buffers'
usb: dwc3: Wait unconditionally after issuing EndXfer command
usb: gadget: u_audio: Clear uac pointer when freed.
usb: gadget: u_audio: Fix race condition use of controls after free during gadget unbind.
dt-bindings: usb: dwc3: Add QDU1000 compatible
usb: core: Remove the useless struct usb_devmap which is just a bitmap
MAINTAINERS: Remove {ehci,uhci}-platform.c from ARM/VT8500 entry
USB: usb_parse_endpoint: ignore reserved bits
usb: xhci: compact 'trb_in_td()' arguments
...
Linus Torvalds [Wed, 22 May 2024 17:49:54 +0000 (10:49 -0700)]
Merge tag 'leds-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"Core Frameworks:
- Ensure seldom updated triggers have a brightness value before first
update
New Device Support:
- Add support for Simatic IPC Device BX_59A to IPC LEDs Core
- Add support for Qualcomm PMI8950 PWM to LPG Core
New Functionality:
- Add a bunch of new LED function identifiers
- Add support for High Resolution Timers in LED Trigger Patten
Fix-ups:
- Shift out Audio Trigger to the Sound subsystem
- Convert suitable calls to devm_* managed resources
- Device Tree binding adaptions/conversions/creation
- Remove superfluous code/variables/attributes and simplify overall
- Use/convert to new/better APIs/helpers/MACROs instead of
hand-rolling implementations
Bug Fixes:
- Repair enabling Torch Mode from V4L2 on the second LED
- Ensure PWM is disabled when suspending"
* tag 'leds-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (28 commits)
leds: mt6370: Remove unused field 'reg_cfgs' from 'struct mt6370_priv'
leds: lp50xx: Remove unused field 'num_of_banked_leds' from 'struct lp50xx'
leds: lp50xx: Remove unused field 'bank_modules' from 'struct lp50xx_led'
leds: aat1290: Remove unused field 'torch_brightness' from 'struct aat1290_led'
leds: sun50i-a100: Use match_string() helper to simplify the code
leds: pwm: Disable PWM when going to suspend
leds: trigger: pattern: Add support for hrtimer
leds: mt6360: Fix the second LED can not enable torch mode by V4L2
dt-bindings: leds: leds-qcom-lpg: Add support for PMI8950 PWM
leds: qcom-lpg: Add support for PMI8950 PWM
leds: apu: Remove duplicate DMI lookup data
leds: trigger: netdev: Remove not needed call to led_set_brightness in deactivate
dt-bindings: leds: Add LED_FUNCTION_SPEED_* for link speed on LAN/WAN
dt-bindings: leds: Add LED_FUNCTION_MOBILE for mobile network
leds: simatic-ipc-leds-gpio: Add support for module BX-59A
dt-bindings: leds: qcom-lpg: Document PM6150L compatible
dt-bindings: leds: pca963x: Convert text bindings to YAML
leds: an30259a: Use devm_mutex_init() for mutex initialization
leds: mlxreg: Use devm_mutex_init() for mutex initialization
leds: nic78bx: Use devm API to cleanup module's resources
...