]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
2 years agonet/mlx5e: Improve IPsec flow steering autogroup
Leon Romanovsky [Fri, 2 Dec 2022 20:14:49 +0000 (22:14 +0200)]
net/mlx5e: Improve IPsec flow steering autogroup

Flow steering API separates newly created rules based on their
match criteria. Right now, all IPsec tables are created with one
group and suffers from not-optimal FS performance.

Count number of different match criteria for relevant tables, and
set proper value at the table creation.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Configure IPsec packet offload flow steering
Leon Romanovsky [Fri, 2 Dec 2022 20:14:48 +0000 (22:14 +0200)]
net/mlx5e: Configure IPsec packet offload flow steering

In packet offload mode, the HW is responsible to handle ESP headers,
SPI numbers and trailers (ICV) together with different logic for
RX and TX paths.

In order to support packet offload mode, special logic is added
to flow steering rules.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Use same coding pattern for Rx and Tx flows
Leon Romanovsky [Fri, 2 Dec 2022 20:14:47 +0000 (22:14 +0200)]
net/mlx5e: Use same coding pattern for Rx and Tx flows

Remove intermediate variable in favor of having similar coding style
for Rx and Tx add rule functions.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Add XFRM policy offload logic
Leon Romanovsky [Fri, 2 Dec 2022 20:14:46 +0000 (22:14 +0200)]
net/mlx5e: Add XFRM policy offload logic

Implement mlx5 flow steering logic and mlx5 IPsec code support
XFRM policy offload.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Create IPsec policy offload tables
Leon Romanovsky [Fri, 2 Dec 2022 20:14:45 +0000 (22:14 +0200)]
net/mlx5e: Create IPsec policy offload tables

Add empty table to be used for IPsec policy offload.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoMerge branch 'mlx5 IPsec packet offload support (Part I)'
Steffen Klassert [Thu, 8 Dec 2022 09:25:30 +0000 (10:25 +0100)]
Merge branch 'mlx5 IPsec packet offload support (Part I)'

Leon Romanovsky says:

============
This series follows previously sent "Extend XFRM core to allow packet
offload configuration" series [1].

It is first part with refactoring to mlx5 allow us natively extend
mlx5 IPsec logic to support both crypto and packet offloads.
============

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Generalize creation of default IPsec miss group and rule
Leon Romanovsky [Fri, 2 Dec 2022 20:10:37 +0000 (22:10 +0200)]
net/mlx5e: Generalize creation of default IPsec miss group and rule

Create general function that sets miss group and rule to forward all
not-matched traffic to the next table.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Group IPsec miss handles into separate struct
Leon Romanovsky [Fri, 2 Dec 2022 20:10:36 +0000 (22:10 +0200)]
net/mlx5e: Group IPsec miss handles into separate struct

Move miss handles into dedicated struct, so we can reuse it in next
patch when creating IPsec policy flow table.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Make clear what IPsec rx_err does
Leon Romanovsky [Fri, 2 Dec 2022 20:10:35 +0000 (22:10 +0200)]
net/mlx5e: Make clear what IPsec rx_err does

Reuse existing struct what holds all information about modify
header pointer and rule. This helps to reduce ambiguity from the
name _err_ that doesn't describe the real purpose of that flow
table, rule and function - to copy status result from HW to
the stack.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Flatten the IPsec RX add rule path
Leon Romanovsky [Fri, 2 Dec 2022 20:10:34 +0000 (22:10 +0200)]
net/mlx5e: Flatten the IPsec RX add rule path

Rewrote the IPsec RX add rule path to be less convoluted
and don't rely on pre-initialized variables. The code now has clean
linear flow with clean separation between error and success paths.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Refactor FTE setup code to be more clear
Leon Romanovsky [Fri, 2 Dec 2022 20:10:33 +0000 (22:10 +0200)]
net/mlx5e: Refactor FTE setup code to be more clear

The policy offload logic needs to set flow steering rule that match
on saddr and daddr too, so factor out this code to separate functions,
together with code alignment to netdev coding pattern of relying on
family type.

As part of this change, let's separate more logic from setup_fte_common
to make sure that the function names describe that is done in the
function better than general *common* name.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Move IPsec flow table creation to separate function
Leon Romanovsky [Fri, 2 Dec 2022 20:10:32 +0000 (22:10 +0200)]
net/mlx5e: Move IPsec flow table creation to separate function

Even now, to support IPsec crypto, the RX and TX paths use same
logic to create flow tables. In the following patches, we will
add more tables to support IPsec packet offload. So reuse existing
code and rewrite it to support IPsec packet offload from the beginning.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Create hardware IPsec packet offload objects
Leon Romanovsky [Fri, 2 Dec 2022 20:10:31 +0000 (22:10 +0200)]
net/mlx5e: Create hardware IPsec packet offload objects

Create initial hardware IPsec packet offload object and connect it
to advanced steering operation (ASO) context and queue, so the data
path can communicate with the stack.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Create Advanced Steering Operation object for IPsec
Leon Romanovsky [Fri, 2 Dec 2022 20:10:30 +0000 (22:10 +0200)]
net/mlx5e: Create Advanced Steering Operation object for IPsec

Setup the ASO (Advanced Steering Operation) object that is needed
for IPsec to interact with SW stack about various fast changing
events: replay window, lifetime limits,  e.t.c

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Remove accesses to priv for low level IPsec FS code
Leon Romanovsky [Fri, 2 Dec 2022 20:10:29 +0000 (22:10 +0200)]
net/mlx5e: Remove accesses to priv for low level IPsec FS code

mlx5 priv structure is driver main structure that holds high level data.
That information is not needed for IPsec flow steering logic and the
pointer to mlx5e_priv was not supposed to be passed in the first place.

This change "cleans" the logic to rely on internal to IPsec structures
without touching global mlx5e_priv.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Use mlx5 print routines for low level IPsec code
Leon Romanovsky [Fri, 2 Dec 2022 20:10:28 +0000 (22:10 +0200)]
net/mlx5e: Use mlx5 print routines for low level IPsec code

Low level mlx5 code needs to use mlx5_core print routines and
not netdev ones, as the failures are relevant to the HW itself
and not to its netdev.

This change allows us to remove access to mlx5 priv structure, which
holds high level driver data that isn't needed for mlx5 IPsec code.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Create symmetric IPsec RX and TX flow steering structs
Leon Romanovsky [Fri, 2 Dec 2022 20:10:27 +0000 (22:10 +0200)]
net/mlx5e: Create symmetric IPsec RX and TX flow steering structs

Remove AF family obfuscation by creating symmetric structs for RX and
TX IPsec flow steering chains. This simplifies to us low level IPsec
FS creation logic without need to dig into multiple levels of structs.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Remove extra layers of defines
Leon Romanovsky [Fri, 2 Dec 2022 20:10:26 +0000 (22:10 +0200)]
net/mlx5e: Remove extra layers of defines

Instead of performing redefinition of XFRM core defines to same
values but with MLX5_* prefix, cache the input values as is by making
sure that the proper storage objects are used.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Store replay window in XFRM attributes
Leon Romanovsky [Fri, 2 Dec 2022 20:10:25 +0000 (22:10 +0200)]
net/mlx5e: Store replay window in XFRM attributes

As a preparation for future extension of IPsec hardware object to allow
configuration of packet offload mode, extend the XFRM validator to check
replay window values.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5e: Advertise IPsec packet offload support
Leon Romanovsky [Fri, 2 Dec 2022 20:10:24 +0000 (22:10 +0200)]
net/mlx5e: Advertise IPsec packet offload support

Add needed capabilities check to determine if device supports IPsec
packet offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5: Add HW definitions for IPsec packet offload
Leon Romanovsky [Fri, 2 Dec 2022 20:10:23 +0000 (22:10 +0200)]
net/mlx5: Add HW definitions for IPsec packet offload

Add all needed bits to support IPsec packet offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet/mlx5: Return ready to use ASO WQE
Leon Romanovsky [Fri, 2 Dec 2022 20:10:22 +0000 (22:10 +0200)]
net/mlx5: Return ready to use ASO WQE

There is no need in hiding returned ASO WQE type by providing void*,
use the real type instead. Do it together with zeroing that memory,
so ASO WQE will be ready to use immediately.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoMerge branch 'Extend XFRM core to allow packet offload configuration'
Steffen Klassert [Tue, 6 Dec 2022 12:41:23 +0000 (13:41 +0100)]
Merge branch 'Extend XFRM core to allow packet offload configuration'

Leon Romanovsky says:

============
The following series extends XFRM core code to handle a new type of IPsec
offload - packet offload.

In this mode, the HW is going to be responsible for the whole data path,
so both policy and state should be offloaded.

IPsec packet offload is an improved version of IPsec crypto mode,
In packet mode, HW is responsible to trim/add headers in addition to
decrypt/encrypt. In this mode, the packet arrives to the stack as already
decrypted and vice versa for TX (exits to HW as not-encrypted).

Devices that implement IPsec packet offload mode offload policies too.
In the RX path, it causes the situation that HW can't effectively
handle mixed SW and HW priorities unless users make sure that HW offloaded
policies have higher priorities.

It means that we don't need to perform any search of inexact policies
and/or priority checks if HW policy was discovered. In such situation,
the HW will catch the packets anyway and HW can still implement inexact
lookups.

In case specific policy is not found, we will continue with packet lookup
and check for existence of HW policies in inexact list.

HW policies are added to the head of SPD to ensure fast lookup, as XFRM
iterates over all policies in the loop.

This simple solution allows us to achieve same benefits of separate HW/SW
policies databases without over-engineering the code to iterate and manage
two databases at the same path.

To not over-engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of the same priorities for
policies databases without over-engineering the code to iterate and manage
two databases at the same path.

To not over-engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of the same priorities for
different devices.
 * No software fallback
 * Fragments are dropped, both in RX and TX
 * No sockets policies
 * Only IPsec transport mode is implemented

================================================================================
Rekeying:

In order to support rekeying, as XFRM core is skipped, the HW/driver should
do the following:
 * Count the handled packets
 * Raise event that limits are reached
 * Drop packets once hard limit is occurred.

The XFRM core calls to newly introduced xfrm_dev_state_update_curlft()
function in order to perform sync between device statistics and internal
structures. On HW limit event, driver calls to xfrm_state_check_expire()
to allow XFRM core take relevant decisions.

This separation between control logic (in XFRM) and data plane allows us
to packet reuse SW stack.

================================================================================
Configuration:

iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/

Packet offload mode:
  ip xfrm state offload packet dev <if-name> dir <in|out>
  ip xfrm policy .... offload packet dev <if-name>
Crypto offload mode:
  ip xfrm state offload crypto dev <if-name> dir <in|out>
or (backward compatibility)
  ip xfrm state offload dev <if-name> dir <in|out>

================================================================================
Performance results:

TCP multi-stream, using iperf3 instance per-CPU.
+----------------------+--------+--------+--------+--------+---------+---------+
|                      | 1 CPU  | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs |
|                      +--------+--------+--------+--------+---------+---------+
|                      |                   BW (Gbps)                           |
+----------------------+--------+--------+-------+---------+---------+---------+
| Baseline             | 27.9   | 59     | 93.1  | 92.8    | 93.7    | 94.4    |
+----------------------+--------+--------+-------+---------+---------+---------+
| Software IPsec       | 6      | 11.9   | 23.3  | 45.9    | 83.8    | 91.8    |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec crypto offload | 15     | 29.7   | 58.5  | 89.6    | 90.4    | 90.8    |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec packet offload | 28     | 57     | 90.7  | 91      | 91.3    | 91.9    |
+----------------------+--------+--------+-------+---------+---------+---------+

IPsec packet offload mode behaves as baseline and reaches linerate with same amount
of CPUs.

Setups details (similar for both sides):
* NIC: ConnectX6-DX dual port, 100 Gbps each.
  Single port used in the tests.
* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz

================================================================================
Series together with mlx5 part:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next

================================================================================
Changelog:

v10:
 * Added forgotten xdo_dev_state_del. Patch #4.
 * Moved changelog in cover letter to the end.
 * Added "if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {" line to newly
   added netronome IPsec support. Patch #2.
v9: https://lore.kernel.org/all/cover.1669547603.git.leonro@nvidia.com
 * Added acquire support
v8: https://lore.kernel.org/all/cover.1668753030.git.leonro@nvidia.com
 * Removed not-related blank line
 * Fixed typos in documentation
v7: https://lore.kernel.org/all/cover.1667997522.git.leonro@nvidia.com
As was discussed in IPsec workshop:
 * Renamed "full offload" to be "packet offload".
 * Added check that offloaded SA and policy have same device while sending packet
 * Added to SAD same optimization as was done for SPD to speed-up lookups.
v6: https://lore.kernel.org/all/cover.1666692948.git.leonro@nvidia.com
 * Fixed misplaced "!" in sixth patch.
v5: https://lore.kernel.org/all/cover.1666525321.git.leonro@nvidia.com
 * Rebased to latest ipsec-next.
 * Replaced HW priority patch with solution which mimics separated SPDs
   for SW and HW. See more description in this cover letter.
 * Dropped RFC tag, usecase, API and implementation are clear.
v4: https://lore.kernel.org/all/cover.1662295929.git.leonro@nvidia.com
 * Changed title from "PATCH" to "PATCH RFC" per-request.
 * Added two new patches: one to update hard/soft limits and another
   initial take on documentation.
 * Added more info about lifetime/rekeying flow to cover letter, see
   relevant section.
 * perf traces for crypto mode will come later.
v3: https://lore.kernel.org/all/cover.1661260787.git.leonro@nvidia.com
 * I didn't hear any suggestion what term to use instead of
   "packet offload", so left it as is. It is used in commit messages
   and documentation only and easy to rename.
 * Added performance data and background info to cover letter
 * Reused xfrm_output_resume() function to support multiple XFRM transformations
 * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
 * Documentation is in progress, but not part of this series yet.
v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
 * Rebased to latest 6.0-rc1
 * Add an extra check in TX datapath patch to validate packets before
   forwarding to HW.
 * Added policy cleanup logic in case of netdev down event
v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
 * Moved comment to be before if (...) in third patch.
v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
-----------------------------------------------------------------------
============

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: document IPsec packet offload mode
Leon Romanovsky [Fri, 2 Dec 2022 18:41:34 +0000 (20:41 +0200)]
xfrm: document IPsec packet offload mode

Extend XFRM device offload API description with newly
added packet offload mode.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: add support to HW update soft and hard limits
Leon Romanovsky [Fri, 2 Dec 2022 18:41:33 +0000 (20:41 +0200)]
xfrm: add support to HW update soft and hard limits

Both in RX and TX, the traffic that performs IPsec packet offload
transformation is accounted by HW. It is needed to properly handle
hard limits that require to drop the packet.

It means that XFRM core needs to update internal counters with the one
that accounted by the HW, so new callbacks are introduced in this patch.

In case of soft or hard limit is occurred, the driver should call to
xfrm_state_check_expire() that will perform key rekeying exactly as
done by XFRM core.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: speed-up lookup of HW policies
Leon Romanovsky [Fri, 2 Dec 2022 18:41:32 +0000 (20:41 +0200)]
xfrm: speed-up lookup of HW policies

Devices that implement IPsec packet offload mode should offload SA and
policies too. In RX path, it causes to the situation that HW will always
have higher priority over any SW policies.

It means that we don't need to perform any search of inexact policies
and/or priority checks if HW policy was discovered. In such situation,
the HW will catch the packets anyway and HW can still implement inexact
lookups.

In case specific policy is not found, we will continue with packet lookup and
check for existence of HW policies in inexact list.

HW policies are added to the head of SPD to ensure fast lookup, as XFRM
iterates over all policies in the loop.

The same solution of adding HW SAs at the begging of the list is applied
to SA database too. However, we don't need to change lookups as they are
sorted by insertion order and not priority.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: add RX datapath protection for IPsec packet offload mode
Leon Romanovsky [Fri, 2 Dec 2022 18:41:31 +0000 (20:41 +0200)]
xfrm: add RX datapath protection for IPsec packet offload mode

Traffic received by device with enabled IPsec packet offload should
be forwarded to the stack only after decryption, packet headers and
trailers removed.

Such packets are expected to be seen as normal (non-XFRM) ones, while
not-supported packets should be dropped by the HW.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: add TX datapath support for IPsec packet offload mode
Leon Romanovsky [Fri, 2 Dec 2022 18:41:30 +0000 (20:41 +0200)]
xfrm: add TX datapath support for IPsec packet offload mode

In IPsec packet mode, the device is going to encrypt and encapsulate
packets that are associated with offloaded policy. After successful
policy lookup to indicate if packets should be offloaded or not,
the stack forwards packets to the device to do the magic.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: add an interface to offload policy
Leon Romanovsky [Fri, 2 Dec 2022 18:41:29 +0000 (20:41 +0200)]
xfrm: add an interface to offload policy

Extend netlink interface to add and delete XFRM policy from the device.
This functionality is a first step to implement packet IPsec offload solution.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: allow state packet offload mode
Leon Romanovsky [Fri, 2 Dec 2022 18:41:28 +0000 (20:41 +0200)]
xfrm: allow state packet offload mode

Allow users to configure xfrm states with packet offload mode.
The packet mode must be requested both for policy and state, and
such requires us to do not implement fallback.

We explicitly return an error if requested packet mode can't
be configured.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agoxfrm: add new packet offload flag
Leon Romanovsky [Fri, 2 Dec 2022 18:41:27 +0000 (20:41 +0200)]
xfrm: add new packet offload flag

In the next patches, the xfrm core code will be extended to support
new type of offload - packet offload. In that mode, both policy and state
should be specially configured in order to perform whole offloaded data
path.

Full offload takes care of encryption, decryption, encapsulation and
other operations with headers.

As this mode is new for XFRM policy flow, we can "start fresh" with flag
bits and release first and second bit for future use.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 years agonet: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill
Lorenzo Bianconi [Thu, 1 Dec 2022 15:26:53 +0000 (16:26 +0100)]
net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[    9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[    9.054665] Call trace:
[    9.057096]  dump_backtrace+0x0/0x154
[    9.060751]  show_stack+0x14/0x1c
[    9.064052]  dump_stack_lvl+0x64/0x7c
[    9.067702]  dump_stack+0x14/0x2c
[    9.071001]  ___might_sleep+0xec/0x120
[    9.074736]  __might_sleep+0x4c/0x9c
[    9.078296]  __alloc_pages+0x184/0x2e4
[    9.082030]  page_frag_alloc_align+0x98/0x1ac
[    9.086369]  mtk_wed_wo_queue_refill+0x134/0x234
[    9.090974]  mtk_wed_wo_init+0x174/0x2c0
[    9.094881]  mtk_wed_attach+0x7c8/0x7e0
[    9.098701]  mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[    9.103940]  mt7915_pci_probe+0xec/0x3bc [mt7915e]
[    9.108727]  pci_device_probe+0xac/0x13c
[    9.112638]  really_probe.part.0+0x98/0x2f4
[    9.116807]  __driver_probe_device+0x94/0x13c
[    9.121147]  driver_probe_device+0x40/0x114
[    9.125314]  __driver_attach+0x7c/0x180
[    9.129133]  bus_for_each_dev+0x5c/0x90
[    9.132953]  driver_attach+0x20/0x2c
[    9.136513]  bus_add_driver+0x104/0x1fc
[    9.140333]  driver_register+0x74/0x120
[    9.144153]  __pci_register_driver+0x40/0x50
[    9.148407]  mt7915_init+0x5c/0x1000 [mt7915e]
[    9.152848]  do_one_initcall+0x40/0x25c
[    9.156669]  do_init_module+0x44/0x230
[    9.160403]  load_module+0x1f30/0x2750
[    9.164135]  __do_sys_init_module+0x150/0x200
[    9.168475]  __arm64_sys_init_module+0x18/0x20
[    9.172901]  invoke_syscall.constprop.0+0x4c/0xe0
[    9.177589]  do_el0_svc+0x48/0xe0
[    9.180889]  el0_svc+0x14/0x50
[    9.183929]  el0t_64_sync_handler+0x9c/0x120
[    9.188183]  el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: use 2-arg optimal variant of kfree_rcu()
Eric Dumazet [Fri, 2 Dec 2022 05:28:47 +0000 (05:28 +0000)]
tcp: use 2-arg optimal variant of kfree_rcu()

kfree_rcu(1-arg) should be avoided as much as possible,
since this is only possible from sleepable contexts,
and incurr extra rcu barriers.

I wish the 1-arg variant of kfree_rcu() would
get a distinct name, like kfree_rcu_slow()
to avoid it being abused.

Fixes: 459837b522f7 ("net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20221202052847.2623997-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Sat, 3 Dec 2022 04:33:29 +0000 (20:33 -0800)]
Merge tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.2

Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7
devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise
smaller features and fixes.

Major changes:

ath10k
 - store WLAN firmware version in SMEM image table

mt76
 - mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices
 - mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support
 - mt7915: add ack signal support
 - mt7915: enable coredump support
 - mt7921: remain_on_channel support
 - mt7921: channel context support

iwlwifi
 - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
 - 320 MHz channels support

* tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (144 commits)
  wifi: ath10k: fix QCOM_SMEM dependency
  wifi: mt76: mt7921e: add pci .shutdown() support
  wifi: mt76: mt7915: mmio: fix naming convention
  wifi: mt76: mt7996: add support to configure spatial reuse parameter set
  wifi: mt76: mt7996: enable ack signal support
  wifi: mt76: mt7996: enable use_cts_prot support
  wifi: mt76: mt7915: rely on band_idx of mt76_phy
  wifi: mt76: mt7915: enable per bandwidth power limit support
  wifi: mt76: mt7915: introduce mt7915_get_power_bound()
  mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2()
  wifi: mt76: do not send firmware FW_FEATURE_NON_DL region
  wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc
  wifi: mt76: fix coverity overrun-call in mt76_get_txpower()
  wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices
  wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power
  wifi: mt76: mt7915: fix band_idx usage
  wifi: mt76: mt7915: enable .sta_set_txpwr support
  wifi: mt76: mt7915: add basedband Txpower info into debugfs
  wifi: mt76: mt7915: add support to configure spatial reuse parameter set
  wifi: mt76: mt7915: add missing MODULE_PARM_DESC
  ...
====================

Link: https://lore.kernel.org/r/20221202214254.D0D3DC433C1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agowifi: ath10k: fix QCOM_SMEM dependency
Kalle Valo [Fri, 2 Dec 2022 10:30:27 +0000 (12:30 +0200)]
wifi: ath10k: fix QCOM_SMEM dependency

Nathan noticed that when HWSPINLOCK is disabled there's a Kconfig warning:

  WARNING: unmet direct dependencies detected for QCOM_SMEM
    Depends on [n]: (ARCH_QCOM [=y] || COMPILE_TEST [=n]) && HWSPINLOCK [=n]
    Selected by [m]:
    - ATH10K_SNOC [=m] && NETDEVICES [=y] && WLAN [=y] && WLAN_VENDOR_ATH [=y] && ATH10K [=m] && (ARCH_QCOM [=y] || COMPILE_TEST [=n])

The problem here is that QCOM_SMEM depends on HWSPINLOCK so we cannot select
QCOM_SMEM and instead we neeed to use 'depends on'.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/all/Y4YsyaIW+CPdHWv3@dev-arch.thelio-3990X/
Fixes: 4d79f6f34bbb ("wifi: ath10k: Store WLAN firmware version in SMEM image table")
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221202103027.25974-1-kvalo@kernel.org
2 years agotsnep: Rework RX buffer allocation
Gerhard Engleder [Wed, 30 Nov 2022 19:37:08 +0000 (20:37 +0100)]
tsnep: Rework RX buffer allocation

Refill RX queue in batches of descriptors to improve performance. Refill
is allowed to fail as long as a minimum number of descriptors is active.
Thus, a limited number of failed RX buffer allocations is now allowed
for normal operation. Previously every failed allocation resulted in a
dropped frame.

If the minimum number of active descriptors is reached, then RX buffers
are still reused and frames are dropped. This ensures that the RX queue
never runs empty and always continues to operate.

Prework for future XDP support.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotsnep: Throttle interrupts
Gerhard Engleder [Wed, 30 Nov 2022 19:37:07 +0000 (20:37 +0100)]
tsnep: Throttle interrupts

Without interrupt throttling, iperf server mode generates a CPU load of
100% (A53 1.2GHz). Also the throughput suffers with less than 900Mbit/s
on a 1Gbit/s link. The reason is a high interrupt load with interrupts
every ~20us.

Reduce interrupt load by throttling of interrupts. Interrupt delay
default is 64us. For iperf server mode the CPU load is significantly
reduced to ~20% and the throughput reaches the maximum of 941MBit/s.
Interrupts are generated every ~140us.

RX and TX coalesce can be configured with ethtool. RX coalesce has
priority over TX coalesce if the same interrupt is used.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotsnep: Add ethtool::get_channels support
Gerhard Engleder [Wed, 30 Nov 2022 19:37:06 +0000 (20:37 +0100)]
tsnep: Add ethtool::get_channels support

Allow user space to read number of TX and RX queue. This is useful for
device dependent qdisc configurations like TAPRIO with hardware offload.
Also ethtool::get_per_queue_coalesce / set_per_queue_coalesce requires
that interface.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotsnep: Consistent naming of struct net_device
Gerhard Engleder [Wed, 30 Nov 2022 19:37:05 +0000 (20:37 +0100)]
tsnep: Consistent naming of struct net_device

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoDocumentation: bonding: correct xmit hash steps
Jonathan Toppins [Wed, 30 Nov 2022 20:12:07 +0000 (15:12 -0500)]
Documentation: bonding: correct xmit hash steps

Correct xmit hash steps for layer3+4 as introduced by commit
49aefd131739 ("bonding: do not discard lowest hash bit for non layer3+4
hashing").

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoDocumentation: bonding: update miimon default to 100
Jonathan Toppins [Wed, 30 Nov 2022 20:12:06 +0000 (15:12 -0500)]
Documentation: bonding: update miimon default to 100

With commit c1f897ce186a ("bonding: set default miimon value for non-arp
modes if not set") the miimon default was changed from zero to 100 if
arp_interval is also zero. Document this fact in bonding.rst.

Fixes: c1f897ce186a ("bonding: set default miimon value for non-arp modes if not set")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: thunderbolt: Use bitwise types in the struct thunderbolt_ip_frame_header
Andy Shevchenko [Wed, 30 Nov 2022 12:36:13 +0000 (14:36 +0200)]
net: thunderbolt: Use bitwise types in the struct thunderbolt_ip_frame_header

The main usage of the struct thunderbolt_ip_frame_header is to handle
the packets on the media layer. The header is bound to the protocol
in which the byte ordering is crucial. However the data type definition
doesn't use that and sparse is unhappy, for example (17 altogether):

  .../thunderbolt.c:718:23: warning: cast to restricted __le32

  .../thunderbolt.c:966:42: warning: incorrect type in assignment (different base types)
  .../thunderbolt.c:966:42:    expected unsigned int [usertype] frame_count
  .../thunderbolt.c:966:42:    got restricted __le32 [usertype]

Switch to the bitwise types in the struct thunderbolt_ip_frame_header to
reduce this, but not completely solving (9 left), because the same data
type is used for Rx header handled locally (in CPU byte order).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: thunderbolt: Switch from __maybe_unused to pm_sleep_ptr() etc
Andy Shevchenko [Wed, 30 Nov 2022 12:36:12 +0000 (14:36 +0200)]
net: thunderbolt: Switch from __maybe_unused to pm_sleep_ptr() etc

Letting the compiler remove these functions when the kernel is built
without CONFIG_PM_SLEEP support is simpler and less heavier for builds
than the use of __maybe_unused attributes.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: devlink: convert port_list into xarray
Jiri Pirko [Wed, 30 Nov 2022 08:52:50 +0000 (09:52 +0100)]
net: devlink: convert port_list into xarray

Some devlink instances may contain thousands of ports. Storing them in
linked list and looking them up is not scalable. Convert the linked list
into xarray.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'hsr'
Jakub Kicinski [Fri, 2 Dec 2022 04:26:24 +0000 (20:26 -0800)]
Merge branch 'hsr'

Sebastian Andrzej Siewior says:

====================
I started playing with HSR and run into a problem. Tested latest
upstream -rc and noticed more problems. Now it appears to work.
For testing I have a small three node setup with iperf and ping. While
iperf doesn't complain ping reports missing packets and duplicates.
====================

Link: https://lore.kernel.org/r/20221129164815.128922-1-bigeasy@linutronix.de/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: Add a basic HSR test.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:15 +0000 (17:48 +0100)]
selftests: Add a basic HSR test.

This test adds a basic HSRv0 network with 3 nodes. In its current shape
it sends and forwards packets, announcements and so merges nodes based
on MAC A/B information.
It is able to detect duplicate packets and packetloss should any occur.

Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Use a single struct for self_node.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:14 +0000 (17:48 +0100)]
hsr: Use a single struct for self_node.

self_node_db is a list_head with one entry of struct hsr_node. The
purpose is to hold the two MAC addresses of the node itself.
It is convenient to recycle the structure. However having a list_head
and fetching always the first entry is not really optimal.

Created a new data strucure contaning the two MAC addresses named
hsr_self_node. Access that structure like an RCU protected pointer so
it can be replaced on the fly without blocking the reader.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Synchronize sequence number updates.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:13 +0000 (17:48 +0100)]
hsr: Synchronize sequence number updates.

hsr_register_frame_out() compares new sequence_nr vs the old one
recorded in hsr_node::seq_out and if the new sequence_nr is higher then
it will be written to hsr_node::seq_out as the new value.

This operation isn't locked so it is possible that two frames with the
same sequence number arrive (via the two slave devices) and are fed to
hsr_register_frame_out() at the same time. Both will pass the check and
update the sequence counter later to the same value. As a result the
content of the same packet is fed into the stack twice.

This was noticed by running ping and observing DUP being reported from
time to time.

Instead of using the hsr_priv::seqnr_lock for the whole receive path (as
it is for sending in the master node) add an additional lock that is only
used for sequence number checks and updates.

Add a per-node lock that is used during sequence number reads and
updates.

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Synchronize sending frames to have always incremented outgoing seq nr.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:12 +0000 (17:48 +0100)]
hsr: Synchronize sending frames to have always incremented outgoing seq nr.

Sending frames via the hsr (master) device requires a sequence number
which is tracked in hsr_priv::sequence_nr and protected by
hsr_priv::seqnr_lock. Each time a new frame is sent, it will obtain a
new id and then send it via the slave devices.
Each time a packet is sent (via hsr_forward_do()) the sequence number is
checked via hsr_register_frame_out() to ensure that a frame is not
handled twice. This make sense for the receiving side to ensure that the
frame is not injected into the stack twice after it has been received
from both slave ports.

There is no locking to cover the sending path which means the following
scenario is possible:

  CPU0 CPU1
  hsr_dev_xmit(skb1) hsr_dev_xmit(skb2)
   fill_frame_info()             fill_frame_info()
    hsr_fill_frame_info()         hsr_fill_frame_info()
     handle_std_frame()            handle_std_frame()
      skb1's sequence_nr = 1
                                    skb2's sequence_nr = 2
   hsr_forward_do()              hsr_forward_do()

                                   hsr_register_frame_out(, 2)  // okay, send)

    hsr_register_frame_out(, 1) // stop, lower seq duplicate

Both skbs (or their struct hsr_frame_info) received an unique id.
However since skb2 was sent before skb1, the higher sequence number was
recorded in hsr_register_frame_out() and the late arriving skb1 was
dropped and never sent.

This scenario has been observed in a three node HSR setup, with node1 +
node2 having ping and iperf running in parallel. From time to time ping
reported a missing packet. Based on tracing that missing ping packet did
not leave the system.

It might be possible (didn't check) to drop the sequence number check on
the sending side. But if the higher sequence number leaves on wire
before the lower does and the destination receives them in that order
and it will drop the packet with the lower sequence number and never
inject into the stack.
Therefore it seems the only way is to lock the whole path from obtaining
the sequence number and sending via dev_queue_xmit() and assuming the
packets leave on wire in the same order (and don't get reordered by the
NIC).

Cover the whole path for the master interface from obtaining the ID
until after it has been forwarded via hsr_forward_skb() to ensure the
skbs are sent to the NIC in the order of the assigned sequence numbers.

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Disable netpoll.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:11 +0000 (17:48 +0100)]
hsr: Disable netpoll.

The hsr device is a software device. Its
net_device_ops::ndo_start_xmit() routine will process the packet and
then pass the resulting skb to dev_queue_xmit().
During processing, hsr acquires a lock with spin_lock_bh()
(hsr_add_node()) which needs to be promoted to the _irq() suffix in
order to avoid a potential deadlock.
Then there are the warnings in dev_queue_xmit() (due to
local_bh_disable() with disabled interrupts) left.

Instead trying to address those (there is qdisc and…) for netpoll sake,
just disable netpoll on hsr.

Disable netpoll on hsr and replace the _irqsave() locking with _bh().

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Avoid double remove of a node.
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:10 +0000 (17:48 +0100)]
hsr: Avoid double remove of a node.

Due to the hashed-MAC optimisation one problem become visible:
hsr_handle_sup_frame() walks over the list of available nodes and merges
two node entries into one if based on the information in the supervision
both MAC addresses belong to one node. The list-walk happens on a RCU
protected list and delete operation happens under a lock.

If the supervision arrives on both slave interfaces at the same time
then this delete operation can occur simultaneously on two CPUs. The
result is the first-CPU deletes the from the list and the second CPUs
BUGs while attempting to dereference a poisoned list-entry. This happens
more likely with the optimisation because a new node for the mac_B entry
is created once a packet has been received and removed (merged) once the
supervision frame has been received.

Avoid removing/ cleaning up a hsr_node twice by adding a `removed' field
which is set to true after the removal and checked before the removal.

Fixes: f266a683a4804 ("net/hsr: Better frame dispatch")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohsr: Add a rcu-read lock to hsr_forward_skb().
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:09 +0000 (17:48 +0100)]
hsr: Add a rcu-read lock to hsr_forward_skb().

hsr_forward_skb() a skb and keeps information in an on-stack
hsr_frame_info. hsr_get_node() assigns hsr_frame_info::node_src which is
from a RCU list. This pointer is used later in hsr_forward_do().
I don't see a reason why this pointer can't vanish midway since there is
no guarantee that hsr_forward_skb() is invoked from an RCU read section.

Use rcu_read_lock() to protect hsr_frame_info::node_src from its
assignment until it is no longer used.

Fixes: f266a683a4804 ("net/hsr: Better frame dispatch")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "net: hsr: use hlist_head instead of list_head for mac addresses"
Sebastian Andrzej Siewior [Tue, 29 Nov 2022 16:48:08 +0000 (17:48 +0100)]
Revert "net: hsr: use hlist_head instead of list_head for mac addresses"

The hlist optimisation (which not only uses hlist_head instead of
list_head but also splits hsr_priv::node_db into an array of 256 slots)
does not consider the "node merge":
Upon starting the hsr network (with three nodes) a packet that is
sent from node1 to node3 will also be sent from node1 to node2 and then
forwarded to node3.
As a result node3 will receive 2 packets because it is not able
to filter out the duplicate. Each packet received will create a new
struct hsr_node with macaddress_A only set the MAC address it received
from (the two MAC addesses from node1).
At some point (early in the process) two supervision frames will be
received from node1. They will be processed by hsr_handle_sup_frame()
and one frame will leave early ("Node has already been merged") and does
nothing. The other frame will be merged as portB and have its MAC
address written to macaddress_B and the hsr_node (that was created for
it as macaddress_A) will be removed.
From now on HSR is able to identify a duplicate because both packets
sent from one node will result in the same struct hsr_node because
hsr_get_node() will find the MAC address either on macaddress_A or
macaddress_B.

Things get tricky with the optimisation: If sender's MAC address is
saved as macaddress_A then the lookup will work as usual. If the MAC
address has been merged into macaddress_B of another hsr_node then the
lookup won't work because it is likely that the data structure is in
another bucket. This results in creating a new struct hsr_node and not
recognising a possible duplicate.

A way around it would be to add another hsr_node::mac_list_B and attach
it to the other bucket to ensure that this hsr_node will be looked up
either via macaddress_A _or_ macaddress_B.

I however prefer to revert it because it sounds like an academic problem
rather than real life workload plus it adds complexity. I'm not an HSR
expert with what is usual size of a network but I would guess 40 to 60
nodes. With 10.000 nodes and assuming 60us for pass-through (from node
to node) then it would take almost 600ms for a packet to almost wrap
around which sounds a lot.

Revert the hash MAC addresses optimisation.

Fixes: 4acc45db71158 ("net: hsr: use hlist_head instead of list_head for mac addresses")
Cc: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosctp: delete free member from struct sctp_sched_ops
Xin Long [Wed, 30 Nov 2022 23:04:31 +0000 (18:04 -0500)]
sctp: delete free member from struct sctp_sched_ops

After commit 9ed7bfc79542 ("sctp: fix memory leak in
sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only
place calling sched->free(), and it can actually be replaced by
sched->free_sid() on each stream, and yet there's already a loop
to traverse all streams in sctp_sched_set_sched().

This patch adds a function sctp_sched_free_sched() where it calls
sched->free_sid() for each stream to replace sched->free() calls
in sctp_sched_set_sched() and then deletes the unused free member
from struct sctp_sched_ops.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mptcp-pm-listener-events-selftests-cleanup'
Jakub Kicinski [Fri, 2 Dec 2022 04:06:10 +0000 (20:06 -0800)]
Merge branch 'mptcp-pm-listener-events-selftests-cleanup'

Matthieu Baerts says:

====================
mptcp: PM listener events + selftests cleanup

Thanks to the patch 6/11, the MPTCP path manager now sends Netlink events
when MPTCP listening sockets are created and closed. The reason why it is
needed is explained in the linked ticket [1]:

  MPTCP for Linux, when not using the in-kernel PM, depends on the
  userspace PM to create extra listening sockets before announcing
  addresses and ports. Let's call these "PM listeners".

  With the existing MPTCP netlink events, a userspace PM can create
  PM listeners at startup time, or in response to an incoming connection.
  Creating sockets in response to connections is not optimal: ADD_ADDRs
  can't be sent until the sockets are created and listen()ed, and if all
  connections are closed then it may not be clear to the userspace
  PM daemon that PM listener sockets should be cleaned up.

  Hence this feature request: to add MPTCP netlink events for listening
  socket close & create, so PM listening sockets can be managed based
  on application activity.

  [1] https://github.com/multipath-tcp/mptcp_net-next/issues/313

Selftests for these new Netlink events have been added in patches 9,11/11.

The remaining patches introduce different cleanups and small improvements
in MPTCP selftests to ease the maintenance and the addition of new tests.
====================

Link: https://lore.kernel.org/r/20221130140637.409926-1-matthieu.baerts@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: listener test for in-kernel PM
Geliang Tang [Wed, 30 Nov 2022 14:06:33 +0000 (15:06 +0100)]
selftests: mptcp: listener test for in-kernel PM

This patch adds test coverage for listening sockets created by the
in-kernel path manager in mptcp_join.sh.

It adds the listener event checking in the existing "remove single
address with port" test. The output looks like this:

 003 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ]
                                     add[ ok ] - echo  [ ok ] - pt [ ok ]
                                     syn[ ok ] - synack[ ok ] - ack[ ok ]
                                     syn[ ok ] - ack   [ ok ]
                                     rm [ ok ] - rmsf  [ ok ]   invert
                                     CREATE_LISTENER 10.0.2.1:10100[ ok ]
                                     CLOSE_LISTENER 10.0.2.1:10100 [ ok ]

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: make evts global in mptcp_join
Geliang Tang [Wed, 30 Nov 2022 14:06:32 +0000 (15:06 +0100)]
selftests: mptcp: make evts global in mptcp_join

This patch moves evts_ns1 and evts_ns2 out of do_transfer() as two global
variables in mptcp_join.sh. Init them in init() and remove them in
cleanup().

Add a new helper reset_with_events() to save the outputs of 'pm_nl_ctl
events' command in them. And a new helper kill_events_pids() to kill
pids of 'pm_nl_ctl events' command. Use these helpers in userspace pm
tests.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: listener test for userspace PM
Geliang Tang [Wed, 30 Nov 2022 14:06:31 +0000 (15:06 +0100)]
selftests: mptcp: listener test for userspace PM

This patch adds test coverage for listening sockets created by userspace
processes.

It adds a new test named test_listener() and a new verifying helper
verify_listener_events(). The new output looks like this:

 CREATE_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1)              [OK]
 DESTROY_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1)             [OK]
 MP_PRIO TX                                                   [OK]
 MP_PRIO RX                                                   [OK]
 CREATE_LISTENER 10.0.2.2:37106       [OK]
 CLOSE_LISTENER 10.0.2.2:37106       [OK]

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: make evts global in userspace_pm
Geliang Tang [Wed, 30 Nov 2022 14:06:30 +0000 (15:06 +0100)]
selftests: mptcp: make evts global in userspace_pm

This patch makes server_evts and client_evts global in userspace_pm.sh,
then these two variables could be used in test_announce(), test_remove()
and test_subflows(). The local variable 'evts' in these three functions
then could be dropped.

Also move local variable 'file' as a global one.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: enhance userspace pm tests
Geliang Tang [Wed, 30 Nov 2022 14:06:29 +0000 (15:06 +0100)]
selftests: mptcp: enhance userspace pm tests

Some userspace pm tests failed since pm listener events have been added.
Now MPTCP_EVENT_LISTENER_CREATED event becomes the first item in the
events list like this:

 type:15,family:2,sport:10006,saddr4:0.0.0.0
 type:1,token:3701282876,server_side:1,family:2,saddr4:10.0.1.1,...

And no token value in this MPTCP_EVENT_LISTENER_CREATED event.

This patch fixes this by specifying the type 1 item to search for token
values.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: add pm listener events
Geliang Tang [Wed, 30 Nov 2022 14:06:28 +0000 (15:06 +0100)]
mptcp: add pm listener events

This patch adds two new MPTCP netlink event types for PM listening
socket create and close, named MPTCP_EVENT_LISTENER_CREATED and
MPTCP_EVENT_LISTENER_CLOSED.

Add a new function mptcp_event_pm_listener() to push the new events
with family, port and addr to userspace.

Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in
mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with
MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: declare var as local
Matthieu Baerts [Wed, 30 Nov 2022 14:06:27 +0000 (15:06 +0100)]
selftests: mptcp: declare var as local

Just to avoid classical Bash pitfall where variables are accidentally
overridden by other functions because the proper scope has not been
defined.

That's also what is done in other MPTCP selftests scripts where all non
local variables are defined at the beginning of the script and the
others are defined with the "local" keyword.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: clearly declare global ns vars
Matthieu Baerts [Wed, 30 Nov 2022 14:06:26 +0000 (15:06 +0100)]
selftests: mptcp: clearly declare global ns vars

It is clearer to declare these global variables at the beginning of the
file as it is done in other MPTCP selftests rather than in functions in
the middle of the script.

So for uniformity reason, we can do the same here in mptcp_sockopt.sh.

Suggested-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: uniform 'rndh' variable
Matthieu Baerts [Wed, 30 Nov 2022 14:06:25 +0000 (15:06 +0100)]
selftests: mptcp: uniform 'rndh' variable

The definition of 'rndh' was probably copied from one script to another
but some times, 'sec' was not defined, not used and/or not spelled
properly.

Here all the 'rndh' are now defined the same way.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: removed defined but unused vars
Matthieu Baerts [Wed, 30 Nov 2022 14:06:24 +0000 (15:06 +0100)]
selftests: mptcp: removed defined but unused vars

Some variables were set but never used.

This was not causing any issues except adding some confusion and having
shellcheck complaining about them.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: run mptcp_inq from a clean netns
Matthieu Baerts [Wed, 30 Nov 2022 14:06:23 +0000 (15:06 +0100)]
selftests: mptcp: run mptcp_inq from a clean netns

A new "sandbox" net namespace is available where no other netfilter
rules have been added.

Use this new netns instead of re-using "ns1" and clean it.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobnxt: report FEC block stats via standard interface
Jakub Kicinski [Wed, 30 Nov 2022 01:31:08 +0000 (17:31 -0800)]
bnxt: report FEC block stats via standard interface

I must have missed that these stats are only exposed
via the unstructured ethtool -S when they got merged.
Plumb in the structured form.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20221130013108.90062-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'remove-label-cpu-from-dsa-dt-binding'
Jakub Kicinski [Thu, 1 Dec 2022 23:57:04 +0000 (15:57 -0800)]
Merge branch 'remove-label-cpu-from-dsa-dt-binding'

Arınç ÜNAL says:

====================
remove label = "cpu" from DSA dt-binding

With this patch series, we're completely getting rid of 'label = "cpu";'
which is not used by the DSA dt-binding at all.

Information for taking the patches for maintainers:
Patch 1: netdev maintainers (based off netdev/net-next.git main)
Patch 2-3: SoC maintainers (based off soc/soc.git soc/dt)
Patch 4: MIPS maintainers (based off mips/linux.git mips-next)
Patch 5: PowerPC maintainers (based off powerpc/linux.git next-test)

I've been meaning to submit this for a few months. Find the relevant
conversation here:
https://lore.kernel.org/netdev/20220913155408.GA3802998-robh@kernel.org/

Here's how I did it, for the interested (or suggestions):

Find the platforms which have got 'label = "cpu";' defined.
grep -rnw . -e 'label = "cpu";'

Remove the line where 'label = "cpu";' is included.
sed -i /'label = "cpu";'/,+d arch/arm/boot/dts/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/freescale/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/marvell/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/mediatek/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/rockchip/*
sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/qca/*
sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/ralink/*
sed -i /'label = "cpu";'/,+d arch/powerpc/boot/dts/turris1x.dts
sed -i /'label = "cpu";'/,+d Documentation/devicetree/bindings/net/qca,ar71xx.yaml

Restore the symlink files which typechange after running sed.
====================

Link: https://lore.kernel.org/r/20221130141040.32447-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: qca,ar71xx: remove label = "cpu" from examples
Arınç ÜNAL [Wed, 30 Nov 2022 14:10:36 +0000 (17:10 +0300)]
dt-bindings: net: qca,ar71xx: remove label = "cpu" from examples

This is not used by the DSA dt-binding, so remove it from the examples.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-tcp-dynamically-disable-tcp-md5-static-key'
Jakub Kicinski [Thu, 1 Dec 2022 23:53:08 +0000 (15:53 -0800)]
Merge branch 'net-tcp-dynamically-disable-tcp-md5-static-key'

Dmitry Safonov says:

====================
net/tcp: Dynamically disable TCP-MD5 static key

The static key introduced by commit 6015c71e656b ("tcp: md5: add
tcp_md5_needed jump label") is a fast-path optimization aimed at
avoiding a cache line miss.
Once an MD5 key is introduced in the system the static key is enabled
and never disabled. Address this by disabling the static key when
the last tcp_md5sig_info in system is destroyed.

Previously it was submitted as a part of TCP-AO patches set [1].
Now in attempt to split 36 patches submission, I send this independently.

Version 5:
https://lore.kernel.org/all/20221122185534.308643-1-dima@arista.com/T/#u
Version 4:
https://lore.kernel.org/all/20221115211905.1685426-1-dima@arista.com/T/#u
Version 3:
https://lore.kernel.org/all/20221111212320.1386566-1-dima@arista.com/T/#u
Version 2:
https://lore.kernel.org/all/20221103212524.865762-1-dima@arista.com/T/#u
Version 1:
https://lore.kernel.org/all/20221102211350.625011-1-dima@arista.com/T/#u

[1]: https://lore.kernel.org/all/20221027204347.529913-1-dima@arista.com/T/#u
====================

Link: https://lore.kernel.org/r/20221123173859.473629-1-dima@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/tcp: Separate initialization of twsk
Dmitry Safonov [Wed, 23 Nov 2022 17:38:59 +0000 (17:38 +0000)]
net/tcp: Separate initialization of twsk

Convert BUG_ON() to WARN_ON_ONCE() and warn as well for unlikely
static key int overflow error-path.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/tcp: Do cleanup on tcp_md5_key_copy() failure
Dmitry Safonov [Wed, 23 Nov 2022 17:38:58 +0000 (17:38 +0000)]
net/tcp: Do cleanup on tcp_md5_key_copy() failure

If the kernel was short on (atomic) memory and failed to allocate it -
don't proceed to creation of request socket. Otherwise the socket would
be unsigned and userspace likely doesn't expect that the TCP is not
MD5-signed anymore.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
Dmitry Safonov [Wed, 23 Nov 2022 17:38:57 +0000 (17:38 +0000)]
net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction

To do that, separate two scenarios:
- where it's the first MD5 key on the system, which means that enabling
  of the static key may need to sleep;
- copying of an existing key from a listening socket to the request
  socket upon receiving a signed TCP segment, where static key was
  already enabled (when the key was added to the listening socket).

Now the life-time of the static branch for TCP-MD5 is until:
- last tcp_md5sig_info is destroyed
- last socket in time-wait state with MD5 key is closed.

Which means that after all sockets with TCP-MD5 keys are gone, the
system gets back the performance of disabled md5-key static branch.

While at here, provide static_key_fast_inc() helper that does ref
counter increment in atomic fashion (without grabbing cpus_read_lock()
on CONFIG_JUMP_LABEL=y). This is needed to add a new user for
a static_key when the caller controls the lifetime of another user.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add()
Dmitry Safonov [Wed, 23 Nov 2022 17:38:56 +0000 (17:38 +0000)]
net/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add()

Add a helper to allocate tcp_md5sig_info, that will help later to
do/allocate things when info allocated, once per socket.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agojump_label: Prevent key->enabled int overflow
Dmitry Safonov [Wed, 23 Nov 2022 17:38:55 +0000 (17:38 +0000)]
jump_label: Prevent key->enabled int overflow

1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
   protection against key->enabled refcounter overflow.
2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
   still may turn the refcounter negative as (v + 1) may overflow.

key->enabled is indeed a ref-counter as it's documented in multiple
places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
etc.

As -1 is reserved for static key that's in process of being enabled,
functions would break with negative key->enabled refcount:
- for CONFIG_JUMP_LABEL=n negative return of static_key_count()
  breaks static_key_false(), static_key_true()
- the ref counter may become 0 from negative side by too many
  static_key_slow_inc() calls and lead to use-after-free issues.

These flaws result in that some users have to introduce an additional
mutex and prevent the reference counter from overflowing themselves,
see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.

Prevent the reference counter overflow by checking if (v + 1) > 0.
Change functions API to return whether the increment was successful.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'locking/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Jakub Kicinski [Thu, 1 Dec 2022 23:36:27 +0000 (15:36 -0800)]
Merge branch 'locking/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull in locking/core from tip (just a single patch) to avoid a conflict
with a jump_label change needed by a TCP cleanup.

Link: https://lore.kernel.org/all/Y4B17nBArWS1Iywo@hirez.programming.kicks-ass.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'iwlwifi-next-for-kalle-2022-11-28' of http://git.kernel.org/pub/scm/linux...
Kalle Valo [Thu, 1 Dec 2022 18:03:07 +0000 (20:03 +0200)]
Merge tag 'iwlwifi-next-for-kalle-2022-11-28' of http://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

This is the second pull request intended for v6.2

It contains two patch-sets sent before with the following content:
* iwlwifi EHT adjustments
* double-free fix in tx path
* iwlmei PLDR flow fixes
* iwlmei smatch fixes
* a logging data improvement

2 years agoMerge tag 'mt76-for-kvalo-2022-12-01' of https://github.com/nbd168/wireless
Kalle Valo [Thu, 1 Dec 2022 17:58:20 +0000 (19:58 +0200)]
Merge tag 'mt76-for-kvalo-2022-12-01' of https://github.com/nbd168/wireless

mt76 patches for 6.2

- fixes
- WED support for mt7986 + mt7915 for flow offloading
- new driver for the mt7996 wifi-7 chipset

2 years agowifi: mt76: mt7921e: add pci .shutdown() support
Leon Yen [Thu, 1 Dec 2022 10:38:42 +0000 (18:38 +0800)]
wifi: mt76: mt7921e: add pci .shutdown() support

Some combinations of hosts cannnot detect mt7921e after reboot. The
interoperability issue is caused by the status mismatch between host
and chip fw. In such cases, the driver should stop chip activities
and reset chip to default state before reboot.

Suggested-by: angelogioacchino.delregno@collabora.com
Co-developed-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Leon Yen <Leon.Yen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: mmio: fix naming convention
Lorenzo Bianconi [Thu, 1 Dec 2022 08:51:55 +0000 (09:51 +0100)]
wifi: mt76: mt7915: mmio: fix naming convention

Rename mt7915_wed_release_rx_buf in mt7915_mmio_wed_release_rx_buf,
mt7915_wed_init_rx_buf in mt7915_mmio_wed_init_rx_buf and
mt7915_wed_release_rx_buf in mt7915_mmio_wed_release_rx_buf

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7996: add support to configure spatial reuse parameter set
Ryder Lee [Thu, 1 Dec 2022 08:03:32 +0000 (16:03 +0800)]
wifi: mt76: mt7996: add support to configure spatial reuse parameter set

The SPR parameter set comprises OBSS PD threshold for SRG and
non SRG and Bitmap of BSS color and partial BSSID. This adds
support to configure fields of SPR element to firmware.

User can disable firmware SR algorithms by turning sr_scene_detect off.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7996: enable ack signal support
Ryder Lee [Thu, 1 Dec 2022 03:44:43 +0000 (11:44 +0800)]
wifi: mt76: mt7996: enable ack signal support

This reports signal strength of ACK packets from the peer as measured
at each interface.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7996: enable use_cts_prot support
Ryder Lee [Thu, 1 Dec 2022 03:44:42 +0000 (11:44 +0800)]
wifi: mt76: mt7996: enable use_cts_prot support

This adds selectable RTC/CTS enablement for each interface.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: rely on band_idx of mt76_phy
Ryder Lee [Thu, 1 Dec 2022 03:44:41 +0000 (11:44 +0800)]
wifi: mt76: mt7915: rely on band_idx of mt76_phy

The commit dc44c45c8cd0 added band_idx into mt76_phy, so switching to
rely on that.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: enable per bandwidth power limit support
Ryder Lee [Wed, 23 Nov 2022 19:59:11 +0000 (03:59 +0800)]
wifi: mt76: mt7915: enable per bandwidth power limit support

This power should override the per bandwidth max power that the
device emits.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: introduce mt7915_get_power_bound()
Ryder Lee [Wed, 23 Nov 2022 19:59:10 +0000 (03:59 +0800)]
wifi: mt76: mt7915: introduce mt7915_get_power_bound()

Add a helper for common boundary check. This is a preliminary patch
to add per bandwidth power control.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agomt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2()
Xiongfeng Wang [Fri, 25 Nov 2022 02:58:31 +0000 (10:58 +0800)]
mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2()

As comment of pci_get_device() says, it returns a pci_device with its
refcount increased. We need to call pci_dev_put() to decrease the
refcount. Save the return value of pci_get_device() and call
pci_dev_put() to decrease the refcount.

Fixes: 9093cfff72e3 ("mt76: mt7915: add support for using a secondary PCIe link for gen1")
Fixes: 2e30db0dde61 ("mt76: mt7915: add device id for mt7916")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: do not send firmware FW_FEATURE_NON_DL region
Deren Wu [Thu, 24 Nov 2022 14:20:38 +0000 (22:20 +0800)]
wifi: mt76: do not send firmware FW_FEATURE_NON_DL region

skip invalid section to avoid potential risks

Fixes: 23bdc5d8cadf ("wifi: mt76: mt7921: introduce Country Location Control support")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc
Deren Wu [Mon, 28 Nov 2022 07:04:21 +0000 (15:04 +0800)]
wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc

Add __packed annotation to avoid potential CLC parsing error

Fixes: 23bdc5d8cadf ("wifi: mt76: mt7921: introduce Country Location Control support")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: fix coverity overrun-call in mt76_get_txpower()
Deren Wu [Sun, 27 Nov 2022 02:35:37 +0000 (10:35 +0800)]
wifi: mt76: fix coverity overrun-call in mt76_get_txpower()

Make sure the nss is valid for nss_delta array. Return zero
if the index is invalid.

Coverity message:
Event overrun-call: Overrunning callee's array of size 4 by passing
argument "n_chains" (which evaluates to 15) in call to
"mt76_tx_power_nss_delta".
int delta = mt76_tx_power_nss_delta(n_chains);

Fixes: 07cda406308b ("mt76: fix rounding issues on converting per-chain and combined txpower")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices
Shayne Chen [Tue, 22 Nov 2022 08:45:46 +0000 (16:45 +0800)]
wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices

The driver first supports Filogic 680 PCI device, which is a Wi-Fi 7
chipset supporting concurrent tri-band operation at 6 GHz, 5 GHz, and
2.4 GHz with 4x4 antennas on each band.

Currently, mt7996 only supports tri-band HE or older mode.
EHT mode and more variants of Filogic 680 support will be introduced
in further patches.

Reviewed-by: Ryder Lee <ryder.lee@mediatek.com>
Co-developed-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Co-developed-by: Bo Jiao <Bo.Jiao@mediatek.com>
Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com>
Co-developed-by: Howard Hsu <howard-yh.hsu@mediatek.com>
Signed-off-by: Howard Hsu <howard-yh.hsu@mediatek.com>
Co-developed-by: MeiChia Chiu <meichia.chiu@mediatek.com>
Signed-off-by: MeiChia Chiu <meichia.chiu@mediatek.com>
Co-developed-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Co-developed-by: Money Wang <Money.Wang@mediatek.com>
Signed-off-by: Money Wang <Money.Wang@mediatek.com>
Co-developed-by: Evelyn Tsai <evelyn.tsai@mediatek.com>
Signed-off-by: Evelyn Tsai <evelyn.tsai@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power
Lorenzo Bianconi [Tue, 22 Nov 2022 13:52:08 +0000 (14:52 +0100)]
wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power

tx_rate can't be greater than 3 in mt76x0_phy_get_target_power routine
for cck rates. Get rid of dead code.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: fix band_idx usage
Ryder Lee [Tue, 22 Nov 2022 07:53:12 +0000 (15:53 +0800)]
wifi: mt76: mt7915: fix band_idx usage

The commit 006b9d4ad5bf introduced phy->band_idx to accommodate the
band definition change for mt7986 so that the band_idx of main_phy
can be 0 or 1. Accordingly, the source of band_idx 1 has switched to
"phy != &dev->phy" or "dev->phy.band_idx = 1".

We still use ext_phy to represent band 1 somewhere in driver, so fix it.
Also, band_idx sounds more reasonable than dbdc_idx, so change it.

Fixes: 006b9d4ad5bf ("mt76: mt7915: introduce band_idx in mt7915_phy")
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: enable .sta_set_txpwr support
Ryder Lee [Tue, 22 Nov 2022 07:53:11 +0000 (15:53 +0800)]
wifi: mt76: mt7915: enable .sta_set_txpwr support

This adds support for adjusting the Txpower level while pushing
traffic to an associated station. The allowed range is from 0 to
the maximum power of channel.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: add basedband Txpower info into debugfs
Ryder Lee [Tue, 22 Nov 2022 07:53:10 +0000 (15:53 +0800)]
wifi: mt76: mt7915: add basedband Txpower info into debugfs

This helps user to debug Txpower propagation path easily.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: add support to configure spatial reuse parameter set
Ryder Lee [Thu, 17 Nov 2022 17:09:47 +0000 (01:09 +0800)]
wifi: mt76: mt7915: add support to configure spatial reuse parameter set

The SPR parameter set comprises OBSS PD threshold for SRG and
non SRG and Bitmap of BSS color and partial BSSID. This adds
support to configure fields of SPR element to firmware.

User can disable firmware SR algorithms by turning sr_scene_detect off.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: add missing MODULE_PARM_DESC
Ryder Lee [Thu, 17 Nov 2022 17:09:46 +0000 (01:09 +0800)]
wifi: mt76: mt7915: add missing MODULE_PARM_DESC

Add documentation for module_param so that they're visible with
modinfo command.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: enable WED RX stats
Sujuan Chen [Sat, 12 Nov 2022 15:40:41 +0000 (16:40 +0100)]
wifi: mt76: mt7915: enable WED RX stats

Introduce the capability to report WED RX stats to mac80211.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: mt7915: enable WED RX support
Lorenzo Bianconi [Sat, 12 Nov 2022 15:40:40 +0000 (16:40 +0100)]
wifi: mt76: mt7915: enable WED RX support

Enable RX Wireless Ethernet Dispatch available on MT7986 Soc in oreder
to offlad traffic received by WLAN NIC and forwarded to LAN/WAN one.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2 years agowifi: mt76: connac: introduce mt76_connac_mcu_sta_wed_update utility routine
Sujuan Chen [Sat, 12 Nov 2022 15:40:39 +0000 (16:40 +0100)]
wifi: mt76: connac: introduce mt76_connac_mcu_sta_wed_update utility routine

This is a preliminary patch to introduce WED RX support for mt7915.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>