Russell King (Oracle) [Sat, 26 Aug 2023 10:02:51 +0000 (11:02 +0100)]
net: stmmac: clarify difference between "interface" and "phy_interface"
Clarify the difference between "interface" and "phy_interface" in
struct plat_stmmacenet_data, both by adding a comment, and also
renaming "interface" to be "mac_interface". The difference between
these are:
MAC ----- optional PCS ----- SerDes ----- optional PHY ----- Media
^ ^
mac_interface phy_interface
Note that phylink currently only deals with phy_interface.
====================
devlink: finish file split and get retire leftover.c
This patchset finishes a move Jakub started and Moshe continued in the
past. I was planning to do this for a long time, so here it is, finally.
This patchset does not change any behaviour. It just splits leftover.c
into per-object files and do necessary changes, like declaring functions
used from other code, on the way.
The last 3 patches are pushing the rest of the code into appropriate
existing files.
====================
Jiri Pirko [Mon, 28 Aug 2023 06:16:51 +0000 (08:16 +0200)]
devlink: use tracepoint_enabled() helper
In preparation for the trap code move, use tracepoint_enabled() helper
instead of trace_devlink_trap_report_enabled() which would not be
defined in that scope.
Jiri Pirko [Mon, 28 Aug 2023 06:16:46 +0000 (08:16 +0200)]
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
Since both dpipe and resource code is using this helper, in preparation
for code split to separate files, move
devlink_dpipe_send_and_alloc_skb() helper into netlink.c. Rename it on
the way.
Jiri Pirko [Mon, 28 Aug 2023 06:16:43 +0000 (08:16 +0200)]
devlink: push object register/unregister notifications into separate helpers
In preparations of leftover.c split to individual files, avoid need to
have object structures exposed in devl_internal.h and allow to have them
maintained in object files.
The register/unregister notifications need to know the structures
to iterate lists. To avoid the need, introduce per-object
register/unregister notification helpers and use them.
Eric Dumazet [Mon, 28 Aug 2023 08:47:32 +0000 (08:47 +0000)]
inet: fix IP_TRANSPARENT error handling
My recent patch forgot to change error handling for IP_TRANSPARENT
socket option.
WARNING: bad unlock balance detected! 6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Not tainted
-------------------------------------
syz-executor151/5028 is trying to release lock (sk_lock-AF_INET) at:
[<ffffffff88213983>] sockopt_release_sock+0x53/0x70 net/core/sock.c:1073
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz-executor151/5028:
Zhengchao Shao [Sat, 26 Aug 2023 02:23:30 +0000 (10:23 +0800)]
selftests: bonding: create directly devices in the target namespaces
If failed to set link1_1 to netns client, we should delete link1_1 in the
cleanup path. But if set link1_1 to netns client successfully, delete
link1_1 will report warning. So it will be safer creating directly the
devices in the target namespaces.
Reported-by: Hangbin Liu <liuhangbin@gmail.com> Closes: https://lore.kernel.org/all/ZNyJx1HtXaUzOkNA@Laptop-X1/ Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mikhail Kobuk [Fri, 25 Aug 2023 19:04:41 +0000 (22:04 +0300)]
ethernet: tg3: remove unreachable code
'tp->irq_max' value is either 1 [L16336] or 5 [L16354], as indicated in
tg3_get_invariants(). Therefore, 'i' can't exceed 4 in tg3_init_one()
that makes (i <= 4) always true. Moreover, 'intmbx' value set at the
last iteration is not used later in it's scope.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 78f90dcf184b ("tg3: Move napi_add calls below tg3_get_invariants") Signed-off-by: Mikhail Kobuk <m.kobuk@ispras.ru> Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 25 Aug 2023 13:49:46 +0000 (15:49 +0200)]
net: Make consumed action consistent in sch_handle_egress
While looking at TC_ACT_* handling, the TC_ACT_CONSUMED is only handled in
sch_handle_ingress but not sch_handle_egress. This was added via cd11b164073b
("net/tc: introduce TC_ACT_REINSERT.") and e5cf1baf92cb ("act_mirred: use
TC_ACT_REINSERT when possible") and later got renamed into TC_ACT_CONSUMED
via 720f22fed81b ("net: sched: refactor reinsert action").
The initial work was targeted for ovs back then and only needed on ingress,
and the mirred action module also restricts it to only that. However, given
it's an API contract it would still make sense to make this consistent to
sch_handle_ingress and handle it on egress side in the same way, that is,
setting return code to "success" and returning NULL back to the caller as
otherwise an action module sitting on egress returning TC_ACT_CONSUMED could
lead to an UAF when untreated.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ip link add dev dummy0 type dummy
ip link set dev dummy0 up
tc qdisc add dev eth0 clsact
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
ping 1.1.1.1
<stolen>
After the fix, there are no kmemleak reports with the reproducer. This is
in line with what is also done on the ingress side, and from debugging the
skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
that these are two different skbs with both skb_unref(skb) as true. The two
seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
is false in tcf_mirred_act() for egress. This was initially reported by Gal.
Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn [Fri, 25 Aug 2023 13:32:41 +0000 (15:32 +0200)]
dccp: Fix out of bounds access in DCCP error handler
There was a previous attempt to fix an out-of-bounds access in the DCCP
error handlers, but that fix assumed that the error handlers only want
to access the first 8 bytes of the DCCP header. Actually, they also look
at the DCCP sequence number, which is stored beyond 8 bytes, so an
explicit pskb_may_pull() is required.
Fixes: 6706a97fec96 ("dccp: fix out of bound access in dccp_v4_err()") Fixes: 1aa9d1a0e7ee ("ipv6: dccp: fix out of bound access in dccp_v6_err()") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
octeontx2-af: misc MAC block changes
This series of patches adds recent changes added in MAC (CGX/RPM) block.
Patch1: Adds new LMAC mode supported by CN10KB silicon
Patch2: In a scenario where system boots with no cgx devices, currently
AF driver treats this as error as a result no interfaces will work.
This patch relaxes this check, such that non cgx mapped netdev
devices will work.
Patch3: This patch adds required lmac validation in MAC block APIs.
Patch4: Prints error message incase, no netdev is mapped with given
cgx,lmac pair.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:22 +0000 (16:10 +0530)]
octeontx2-af: print error message incase of invalid pf mapping
During AF driver initialization, it creates a mapping between pf to
cgx,lmac pair. Whenever there is a physical link change, using this
mapping driver forwards the message to the associated netdev.
This patch prints error message incase of cgx,lmac pair is not
associated with any pf netdev.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:21 +0000 (16:10 +0530)]
octeontx2-af: Add validation of lmac
With the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous. Though in most of the functions,
lmac validation checks exist but in few functions they are missing.
This patch adds the same.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Fri, 25 Aug 2023 10:40:20 +0000 (16:10 +0530)]
octeontx2-af: Don't treat lack of CGX interfaces as error
Don't treat lack of CGX LMACs on the system as a error.
Instead ignore it so that LBK VFs are created and can be used.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:19 +0000 (16:10 +0530)]
octeontx2-af: CN10KB: Add USGMII LMAC mode
Upon physical link change, firmware reports to the kernel about the
change along with the details like speed, lmac_type_id, etc.
Kernel derives lmac_type based on lmac_type_id received from firmware.
This patch extends current lmac list with new USGMII mode supported
by CN10KB RPM block.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Radoslaw Tyl [Thu, 24 Aug 2023 20:46:19 +0000 (13:46 -0700)]
igb: set max size RX buffer when store bad packet is enabled
Increase the RX buffer size to 3K when the SBP bit is on. The size of
the RX buffer determines the number of pages allocated which may not
be sufficient for receive frames larger than the set MTU size.
Cc: stable@vger.kernel.org Fixes: 89eaefb61dc9 ("igb: Support RX-ALL feature flag.") Reported-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com> Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Thu, 24 Aug 2023 16:50:59 +0000 (09:50 -0700)]
netrom: Deny concurrent connect().
syzkaller reported null-ptr-deref [0] related to AF_NETROM.
This is another self-accept issue from the strace log. [1]
syz-executor creates an AF_NETROM socket and calls connect(), which
is blocked at that time. Then, sk->sk_state is TCP_SYN_SENT and
sock->state is SS_CONNECTING.
Another thread calls connect() concurrently, which finally fails
with -EINVAL. However, the problem here is the socket state is
reset even while the first connect() is blocked.
As sk->state is TCP_CLOSE and sock->state is SS_UNCONNECTED, the
following listen() succeeds. Then, the first connect() looks up
itself as a listener and puts skb into the queue with skb->sk itself.
As a result, the next accept() gets another FD of itself as 3, and
the first connect() finishes.
Finally, the leader thread close()s all FDs. Since the three FDs
reference the same socket, nr_release() does the cleanup for it
three times, and the remaining accept4() causes the following fault.
Pranavi Somisetty [Thu, 24 Aug 2023 11:44:56 +0000 (17:14 +0530)]
dt-bindings: net: xilinx_gmii2rgmii: Convert to json schema
Convert the Xilinx GMII to RGMII Converter device tree binding
documentation to json schema.
This converter is usually used as gem <---> gmii2rgmii <---> external phy
and, it's phy-handle should point to the phandle of the external phy.
Signed-off-by: Pranavi Somisetty <pranavi.somisetty@amd.com> Signed-off-by: Harini Katakam <harini.katakam@amd.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tls: expand tls_cipher_size_desc to simplify getsockopt/setsockopt
Commit 2d2c5ea24243 ("net/tls: Describe ciphers sizes by const
structs") introduced tls_cipher_size_desc to describe the size of the
fields of the per-cipher crypto_info structs, and commit ea7a9d88ba21
("net/tls: Use cipher sizes structs") used it, but only in
tls_device.c and tls_device_fallback.c, and skipped converting similar
code in tls_main.c and tls_sw.c.
This series expands tls_cipher_size_desc (renamed to tls_cipher_desc
to better fit this expansion) to fully describe a cipher:
- offset of the fields within the per-cipher crypto_info
- size of the full struct (for copies to/from userspace)
- offload flag
- algorithm name used by SW crypto
With these additions, we can remove ~350L of
switch (crypto_info->cipher_type) { ... }
from tls_set_device_offload, tls_sw_fallback_init,
do_tls_getsockopt_conf, do_tls_setsockopt_conf, tls_set_sw_offload
(mainly do_tls_getsockopt_conf and tls_set_sw_offload).
This series also adds the ARIA ciphers to the tls selftests, and some
more getsockopt/setsockopt tests to cover more of the code changed by
this series.
====================
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:19 +0000 (23:35 +0200)]
tls: use tls_cipher_desc to simplify do_tls_getsockopt_conf
Every cipher uses the same code to update its crypto_info struct based
on the values contained in the cctx, with only the struct type and
size/offset changing. We can get those from tls_cipher_desc, and use
a single pair of memcpy and final copy_to_user.
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:18 +0000 (23:35 +0200)]
tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_conf
We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use
get_cipher_desc's result to check if the cipher_type coming from
userspace is valid.
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:17 +0000 (23:35 +0200)]
tls: expand use of tls_cipher_desc in tls_sw_fallback_init
tls_sw_fallback_init already gets the key and tag size from
tls_cipher_desc. We can now also check that the cipher type is valid,
and stop hard-coding the algorithm name passed to crypto_alloc_aead.
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:15 +0000 (23:35 +0200)]
tls: expand use of tls_cipher_desc in tls_set_device_offload
tls_set_device_offload is already getting iv and rec_seq sizes from
tls_cipher_desc. We can now also check if the cipher_type coming from
userspace is valid and can be offloaded.
We can also remove the runtime check on rec_seq, since we validate it
at compile time.
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:11 +0000 (23:35 +0200)]
tls: reduce size of tls_cipher_size_desc
tls_cipher_size_desc indexes ciphers by their type, but we're not
using indices 0..50 of the array. Each struct tls_cipher_size_desc is
20B, so that's a lot of unused memory. We can reindex the array
starting at the lowest used cipher_type.
Introduce the get_cipher_size_desc helper to find the right item and
avoid out-of-bounds accesses, and make tls_cipher_size_desc's size
explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we
add a new cipher.
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:07 +0000 (23:35 +0200)]
selftests: tls: add getsockopt test
The kernel accepts fetching either just the version and cipher type,
or exactly the per-cipher struct. Also check that getsockopt returns
what we just passed to the kernel.
====================
tools/net/ynl: Add support for netlink-raw families
This patchset adds support for netlink-raw families such as rtnetlink.
Patch 1 fixes a typo in existing schemas
Patch 2 contains the schema definition
Patches 3 & 4 update the schema documentation
Patches 5 - 9 extends ynl
Patches 10 - 12 add several netlink-raw specs
The netlink-raw schema is very similar to genetlink-legacy and I thought
about making the changes there and symlinking to it. On balance I
thought that might be problematic for accurate schema validation.
rtnetlink doesn't seem to fit into unified or directional message
enumeration models. It seems like an 'explicit' model would be useful,
to force the schema author to specify the message ids directly.
There is not yet support for notifications because ynl currently doesn't
support defining 'event' properties on a 'do' operation. The message ids
are shared so ops need to be both sync and async. I plan to look at this
in a future patch.
The link and route messages contain different nested attributes
dependent on the type of link or route. Decoding these will need some
kind of attr-space selection that uses the value of another attribute as
the selector key. These nested attributes have been left with type
'binary' for now.
====================
Donald Hunter [Fri, 25 Aug 2023 12:27:45 +0000 (13:27 +0100)]
doc/netlink: Add a schema for netlink-raw families
This schema is largely a copy of the genetlink-legacy schema with the
following modifications:
- change the schema id to netlink-raw
- add a top-level protonum property, e.g. 0 (for NETLINK_ROUTE)
- change the protocol enumeration to netlink-raw, removing the
genetlink options.
- replace doc references to generic netlink with raw netlink
- add a value property to mcast-group definitions
====================
{devlink,mlx5}: Add port function attributes for ipsec
From Dima:
Introduce hypervisor-level control knobs to set the functionality of PCI
VF devices passed through to guests. The administrator of a hypervisor
host may choose to change the settings of a port function from the
defaults configured by the device firmware.
The software stack has two types of IPsec offload - crypto and packet.
Specifically, the ip xfrm command has sub-commands for "state" and
"policy" that have an "offload" parameter. With ip xfrm state, both
crypto and packet offload types are supported, while ip xfrm policy can
only be offloaded in packet mode.
The series introduces two new boolean attributes of a port function:
ipsec_crypto and ipsec_packet. The goal is to provide a similar level of
granularity for controlling VF IPsec offload capabilities, which would
be aligned with the software model. This will allow users to decide if
they want both types of offload enabled for a VF, just one of them, or
none at all (which is the default).
At a high level, the difference between the two knobs is that with
ipsec_crypto, only XFRM state can be offloaded. Specifically, only the
crypto operation (Encrypt/Decrypt) is offloaded. With ipsec_packet, both
XFRM state and policy can be offloaded. Furthermore, in addition to
crypto operation offload, IPsec encapsulation is also offloaded. For
XFRM state, choosing between crypto and packet offload types is
possible. From the HW perspective, different resources may be required
for each offload type.
Examples of when a user prefers to enable IPsec packet offload for a VF
when using switchdev mode:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet enable
This enables the corresponding IPsec capability of the function before
it's enumerated, so when the driver reads the capability from the device
firmware, it is enabled. The driver is then able to configure
corresponding features and ops of the VF net device to support IPsec
state and policy offloading.
Dima Chumak [Fri, 25 Aug 2023 06:28:36 +0000 (23:28 -0700)]
net/mlx5: Implement devlink port function cmds to control ipsec_packet
Implement devlink port function commands to enable / disable IPsec
packet offloads. This is used to control the IPsec capability of the
device.
When ipsec_offload is enabled for a VF, it prevents adding IPsec packet
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec packet
offloads on the PF, it's not allowed to enable ipsec_packet on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dima Chumak [Fri, 25 Aug 2023 06:28:35 +0000 (23:28 -0700)]
net/mlx5: Implement devlink port function cmds to control ipsec_crypto
Implement devlink port function commands to enable / disable IPsec
crypto offloads. This is used to control the IPsec capability of the
device.
When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec crypto
offloads on the PF, it's not allowed to enable ipsec_crypto on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Fri, 25 Aug 2023 06:28:34 +0000 (23:28 -0700)]
net/mlx5: Provide an interface to block change of IPsec capabilities
mlx5 HW can't perform IPsec offload operation simultaneously both on PF
and VFs at the same time. While the previous patches added devlink knobs
to change IPsec capabilities dynamically, there is a need to add a logic
to block such IPsec capabilities for the cases when IPsec is already
configured.
Leon Romanovsky [Fri, 25 Aug 2023 06:28:32 +0000 (23:28 -0700)]
net/mlx5e: Rewrite IPsec vs. TC block interface
In the commit 366e46242b8e ("net/mlx5e: Make IPsec offload work together
with eswitch and TC"), new API to block IPsec vs. TC creation was introduced.
Internally, that API used devlink lock to avoid races with userspace, but it is
not really needed as dev->priv.eswitch is stable and can't be changed. So remove
dependency on devlink lock and move block encap code back to its original place.
Dima Chumak [Fri, 25 Aug 2023 06:28:30 +0000 (23:28 -0700)]
devlink: Expose port function commands to control IPsec packet offloads
Expose port function commands to enable / disable IPsec packet offloads,
this is used to control the port IPsec capabilities.
When IPsec packet is disabled for a function of the port (default),
function cannot offload IPsec packet operations (encapsulation and XFRM
policy offload). When enabled, IPsec packet operations can be offloaded
by the function of the port, which includes crypto operation
(Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy
offload.
Example of a PCI VF port which supports IPsec packet offloads:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable
Dima Chumak [Fri, 25 Aug 2023 06:28:29 +0000 (23:28 -0700)]
devlink: Expose port function commands to control IPsec crypto offloads
Expose port function commands to enable / disable IPsec crypto offloads,
this is used to control the port IPsec capabilities.
When IPsec crypto is disabled for a function of the port (default),
function cannot offload any IPsec crypto operations (Encrypt/Decrypt and
XFRM state offloading). When enabled, IPsec crypto operations can be
offloaded by the function of the port.
Example of a PCI VF port which supports IPsec crypto offloads:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable
David S. Miller [Sun, 27 Aug 2023 06:13:24 +0000 (07:13 +0100)]
Merge branch 'iep-drver-timestamping-support'
MD Danish Anwar says:
====================
Introduce IEP driver and packet timestamping support
This series introduces Industrial Ethernet Peripheral (IEP) driver to
support timestamping of ethernet packets and thus support PTP and PPS
for PRU ICSSG ethernet ports.
This series also adds 10M full duplex support for ICSSG ethernet driver.
There are two IEP instances. IEP0 is used for packet timestamping while IEP1
is used for 10M full duplex support.
This is v7 of the series [v1]. It addresses comments made on [v6].
This series is based on linux-next(#next-20230823).
Changes from v6 to v7:
*) Dropped blank line in example section of patch 1.
*) Patch 1 previously had three examples, removed two examples and kept only
one example as asked by Krzysztof.
*) Added Jacob Keller's RB tag in patch 5.
*) Dropped Roger's RB tags from the patches that he has authored (Patch 3 and 4)
Changes from v5 to v6:
*) Added description of IEP in commit messages of patch 2 as asked by Rob.
*) Described the items constraints properly for iep property in patch 2 as
asked by Rob.
*) Added Roger and Simon's RB tags.
Changes from v4 to v5:
*) Added comments on why we are using readl / writel instead of regmap_read()
/ write() in icss_iep_gettime() / settime() APIs as asked by Roger.
*) Added Conor's RB tag in patch 1 and 2.
Change from v3 to v4:
*) Changed compatible in iep dt bindings. Now each SoC has their own compatible
in the binding with "ti,am654-icss-iep" as a fallback as asked by Conor.
*) Addressed Andew's comments and removed helper APIs icss_iep_readl() /
writel(). Now the settime/gettime APIs directly use readl() / writel().
*) Moved selecting TI_ICSS_IEP in Kconfig from patch 3 to patch 4.
*) Removed forward declaration of icss_iep_of_match in patch 3.
*) Replaced use of of_device_get_match_data() to device_get_match_data() in
patch 3.
*) Removed of_match_ptr() from patch 3 as it is not needed.
Changes from v2 to v3:
*) Addressed Roger's comment and moved IEP1 related changes in patch 5.
*) Addressed Roger's comment and moved icss_iep.c / .h changes from patch 4
to patch 3.
*) Added support for multiple timestamping in patch 4 as asked by Roger.
*) Addressed Andrew's comment and added comment in case SPEED_10 in
icssg_config_ipg() API.
*) Kept compatible as "ti,am654-icss-iep" for all TI K3 SoCs
Changes from v1 to v2:
*) Addressed Simon's comment to fix reverse xmas tree declaration. Some APIs
in patch 3 and 4 were not following reverse xmas tree variable declaration.
Fixed it in this version.
*) Addressed Conor's comments and removed unsupported SoCs from compatible
comment in patch 1.
*) Addded patch 2 which was not part of v1. Patch 2, adds IEP node to dt
bindings for ICSSG.
Grygorii Strashko [Thu, 24 Aug 2023 11:46:18 +0000 (17:16 +0530)]
net: ti: icssg-prueth: am65x SR2.0 add 10M full duplex support
For AM65x SR2.0 it's required to enable IEP1 in raw 64bit mode which is
used by PRU FW to monitor the link and apply w/a for 10M link issue.
Note. No public errata available yet.
Without this w/a the PRU FW will stuck if link state changes under TX
traffic pressure.
Hence, add support for 10M full duplex for AM65x SR2.0:
- add new IEP API to enable IEP, but without PTP support
- add pdata quirk_10m_link_issue to enable 10M link issue w/a.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Thu, 24 Aug 2023 11:46:17 +0000 (17:16 +0530)]
net: ti: icssg-prueth: add packet timestamping and ptp support
Add packet timestamping TS and PTP PHC clock support.
For AM65x and AM64x:
- IEP1 is not used
- IEP0 is configured in shadow mode with 1ms cycle and shared between
Linux and FW. It provides time and TS in number cycles, so special
conversation in ns is required.
- IEP0 shared between PRUeth ports.
- IEP0 supports PPS, periodic output.
- IEP0 settime() and enabling PPS required FW interraction.
- RX TS provided with each packet in CPPI5 descriptor.
- TX TS returned through separate ICSSG hw queues for each port. TX TS
readiness is signaled by INTC IRQ. Only one packet at time can be requested
for TX TS.
Signed-off-by: Roger Quadros <rogerq@ti.com> Co-developed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Thu, 24 Aug 2023 11:46:16 +0000 (17:16 +0530)]
net: ti: icss-iep: Add IEP driver
Add a driver for Industrial Ethernet Peripheral (IEP) block of PRUSS to
support timestamping of ethernet packets and thus support PTP and PPS
for PRU ethernet ports.
Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add IEP property in ICSSG hardware DT binding document.
ICSSG uses IEP (Industrial Ethernet Peripheral) to support timestamping
of ethernet packets, PTP and PPS.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a DT binding document for the ICSS Industrial Ethernet Peripheral(IEP)
hardware. IEP supports packet timestamping, PTP and PPS.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 27 Aug 2023 05:56:54 +0000 (06:56 +0100)]
Merge branch 'sfc-pedit-offloads'
Pieter Jansen van Vuuren says:
====================
sfc: introduce eth, ipv4 and ipv6 pedit offloads
This set introduces mac source and destination pedit set action offloads.
It also adds offload for ipv4 ttl and ipv6 hop limit pedit set action as
well pedit add actions that would result in the same semantics as
decrementing the ttl and hop limit.
v2:
- fix 'efx_tc_mangle' kdoc which was orphaned when adding 'efx_tc_pedit_add'.
- add description of 'match' in 'efx_tc_mangle' kdoc.
- correct some inconsistent kdoc indentation.
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:42 +0000 (12:28 +0100)]
sfc: extend pedit add action to handle decrement ipv6 hop limit
Extend the pedit add actions to handle this case for ipv6. Similar to ipv4
dec ttl, decrementing ipv6 hop limit can be achieved by adding 0xff to the
hop limit field.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:41 +0000 (12:28 +0100)]
sfc: introduce pedit add actions on the ipv4 ttl field
Introduce pedit add actions and use it to achieve decrement ttl offload.
Decrement ttl can be achieved by adding 0xff to the ttl field.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:40 +0000 (12:28 +0100)]
sfc: add decrement ipv6 hop limit by offloading set hop limit actions
Offload pedit set ipv6 hop limit, where the hop limit has already been
matched and the new value is one less, by translating it to a decrement.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:39 +0000 (12:28 +0100)]
sfc: add decrement ttl by offloading set ipv4 ttl actions
Offload pedit set ipv4 ttl field, where the ttl field has already been
matched and the new value is one less, by translating it to a decrement.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:38 +0000 (12:28 +0100)]
sfc: add mac source and destination pedit action offload
Introduce the first pedit set offload functionality for the sfc driver.
In addition to this, add offload functionality for both mac source and
destination pedit set actions.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 24 Aug 2023 11:28:37 +0000 (12:28 +0100)]
sfc: introduce ethernet pedit set action infrastructure
Introduce the initial ethernet pedit set action infrastructure in
preparation for adding mac src and dst pedit action offloads.
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 26 Aug 2023 02:09:45 +0000 (19:09 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-24 (igc, e1000e)
This series contains updates to igc and e1000e drivers.
Vinicius adds support for utilizing multiple PTP registers on igc.
Sasha reduces interval time for PTM on igc and adds new device support
on e1000e.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000e: Add support for the next LOM generation
igc: Decrease PTM short interval from 10 us to 1 us
igc: Add support for multiple in-flight TX timestamps
====================
Shannon Nelson [Thu, 24 Aug 2023 16:17:54 +0000 (09:17 -0700)]
pds_core: pass opcode to devcmd_wait
Don't rely on the PCI memory for the devcmd opcode because we
read a 0xff value if the PCI bus is broken, which can cause us
to report a bogus dev_cmd opcode later.
Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-6-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 24 Aug 2023 16:17:53 +0000 (09:17 -0700)]
pds_core: check for work queue before use
Add a check that the wq exists before queuing up work for a
failed devcmd, as the PF is responsible for health and the VF
doesn't have a wq.
Fixes: c2dbb0904310 ("pds_core: health timer and workqueue") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-5-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 24 Aug 2023 16:17:50 +0000 (09:17 -0700)]
pds_core: protect devlink callbacks from fw_down state
Don't access structs that have been cleared when in the fw_down
state and the various structs have been cleaned and are waiting
to recover. This caused a panic on rmmod when already in fw_down
and devlink_param_unregister() tried to check the parameters.
Fixes: 40ced8944536 ("pds_core: devlink params for enabling VIF support") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-2-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Budimir Markovic [Thu, 24 Aug 2023 08:49:05 +0000 (01:49 -0700)]
net/sched: sch_hfsc: Ensure inner classes have fsc curve
HFSC assumes that inner classes have an fsc curve, but it is currently
possible for classes without an fsc curve to become parents. This leads
to bugs including a use-after-free.
Don't allow non-root classes without HFSC_FSC to become parents.
Alex Austin [Thu, 24 Aug 2023 16:46:57 +0000 (17:46 +0100)]
sfc: Check firmware supports Ethernet PTP filter
Not all firmware variants support RSS filters. Do not fail all PTP
functionality when raw ethernet PTP filters fail to insert.
Fixes: e4616f64726b ("sfc: support PTP over Ethernet") Signed-off-by: Alex Austin <alex.austin@amd.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Link: https://lore.kernel.org/r/20230824164657.42379-1-alex.austin@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>