]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
3 years agoxfrm: rate limit SA mapping change message to user space
Antony Antony [Wed, 22 Dec 2021 13:11:18 +0000 (14:11 +0100)]
xfrm: rate limit SA mapping change message to user space

Kernel generates mapping change message, XFRM_MSG_MAPPING,
when a source port chage is detected on a input state with UDP
encapsulation set.  Kernel generates a message for each IPsec packet
with new source port.  For a high speed flow per packet mapping change
message can be excessive, and can overload the user space listener.

Introduce rate limiting for XFRM_MSG_MAPPING message to the user space.

The rate limiting is configurable via netlink, when adding a new SA or
updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds.

v1->v2 change:
update xfrm_sa_len()

v2->v3 changes:
use u32 insted unsigned long to reduce size of struct xfrm_state
fix xfrm_ompat size Reported-by: kernel test robot <lkp@intel.com>
accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present

Co-developed-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoxfrm: Add support for SM4 symmetric cipher algorithm
Xu Jia [Wed, 22 Dec 2021 09:06:59 +0000 (17:06 +0800)]
xfrm: Add support for SM4 symmetric cipher algorithm

This patch adds SM4 encryption algorithm entry to ealg_list.

Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoxfrm: Add support for SM3 secure hash
Xu Jia [Wed, 22 Dec 2021 09:06:58 +0000 (17:06 +0800)]
xfrm: Add support for SM3 secure hash

This patch allows IPsec to use SM3 HMAC authentication algorithm.

Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoxfrm: update SA curlft.use_time
Antony Antony [Fri, 17 Dec 2021 17:02:09 +0000 (18:02 +0100)]
xfrm: update SA curlft.use_time

SA use_time was only updated once, for the first packet.
with this fix update the use_time for every packet.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agonet: xfrm: drop check of pols[0] for the second time
Jean Sacren [Tue, 30 Nov 2021 07:03:12 +0000 (00:03 -0700)]
net: xfrm: drop check of pols[0] for the second time

!pols[0] is checked earlier.  If we don't return, pols[0] is always
true.  We should drop the check of pols[0] for the second time and the
binary is also smaller.

Before:
   text    data     bss     dec     hex filename
  48395     957     240   49592    c1b8 net/xfrm/xfrm_policy.o

After:
   text    data     bss     dec     hex filename
  48379     957     240   49576    c1a8 net/xfrm/xfrm_policy.o

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoxfrm: Remove duplicate assignment
luo penghao [Thu, 4 Nov 2021 06:26:21 +0000 (06:26 +0000)]
xfrm: Remove duplicate assignment

The statement in the switch is repeated with the statement at the
beginning of the while loop, so this statement is meaningless.

The clang_analyzer complains as follows:

net/xfrm/xfrm_policy.c:3392:2 warning:

Value stored to 'exthdr' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoipv6/esp6: Remove structure variables and alignment statements
luo penghao [Thu, 4 Nov 2021 03:19:31 +0000 (03:19 +0000)]
ipv6/esp6: Remove structure variables and alignment statements

The definition of this variable is just to find the length of the
structure after aligning the structure. The PTR alignment function
is to optimize the size of the structure. In fact, it doesn't seem
to be of much use, because both members of the structure are of
type u32.
So I think that the definition of the variable and the
corresponding alignment can be deleted, the value of extralen can
be directly passed in the size of the structure.

The clang_analyzer complains as follows:

net/ipv6/esp6.c:117:27 warning:

Value stored to 'extra' during its initialization is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoMerge branch 'lan78xx-napi'
David S. Miller [Thu, 18 Nov 2021 12:11:51 +0000 (12:11 +0000)]
Merge branch 'lan78xx-napi'

John Efstathiades says:

===================
lan78xx NAPI Performance Improvements

This patch set introduces a set of changes to the lan78xx driver
that were originally developed as part of an investigation into
the performance of TCP and UDP transfers on an Android system.
The changes increase the throughput of both UDP and TCP transfers
and reduce the overall CPU load.

These improvements are also seen on a standard Linux kernel. Typical
results are included at the end of this document.

The changes to the driver evolved over time. The patches presented
here attempt to organise the changes in to coherent blocks that
affect logically connected parts of the driver. The patches do not
reflect the way in which the code evolved during the performance
investigation.

Each patch produces a working driver that has an incremental
improvement but patches 2, 3 and 6 should be considered a single
update.

The changes affect the following parts of the driver:

1. Deferred URB processing

The deferred URB processing that was originally done by a tasklet
is now done by a NAPI polling routine. The NAPI cycle has a fixed
work budget that controls how many received frames are passed to
the network stack.

Patch 6 introduces the NAPI polling but depends on preceding patches.

The new NAPI polling routine is also responsible for submitting
Rx and Tx URBs to the USB host controller.

Moving the URB processing to a NAPI-based system "smoothed"
incoming and outgoing data flows on the Android system under
investigation. However, taken in isolation, moving from a tasklet
approach to a NAPI approach made little or no difference to the
overall performance.

2. URB buffer management

The driver creates a pool of Tx and a pool of Rx URB buffers. Each
buffer is large enough to accommodate a packet with the maximum MTU
data. URBs are allocated from these pools as required.

Patch 2 introduces the new Tx buffer pool.
Patch 3 introduces the new Rx buffer pool.

3. Tx pending data

SKBs containing data to be transmitted are added to a queue. The
driver tracks free Tx URBs and the corresponding free Tx URB space.
When new Tx URBs are submitted, pending data is copied into the
URB buffer until the URB buffer is filled or there is no more
pending data. This maximises utilisation the LAN78xx internal
USB and network frame buffers.

New Tx URBs are submitted to the USB host controller as part of the
NAPI polling cycle.

Patch 2 introduces these changes.

4. Rx URB completion

A new URB is no longer submitted as part of the URB completion
callback.
New URBs are submitted during the NAPI polling cycle.

Patch 3 introduces these changes.

5. Rx URB processing

Completed URBs are put on to queue for processing (as is done in the
current driver). Network packets in completed URBs are copied from
the URB buffer in to dynamically allocated SKBs and passed to
the network stack.

The emptied URBs are resubmitted to the USB host controller.

Patch 3 introduces this change. Patch 6 updates the change to use
NAPI SKBs.

Each packet passed to the network stack is a single NAPI work item.
If the NAPI work budget is exhausted the remaining packets in the
URB are put onto an overflow queue that is processed at the start
of the next NAPI cycle.

Patch 6 introduces this change.

6. Driver-specific hard_header_len

The driver-specific hard_header_len adjustment was removed as it
broke generic receive offload (GRO) processing. Moreover, it was no
longer required due the change in Tx pending data management (see
point 3. above).

Patch 5 introduces this change.

The modification has been tested on four different target machines:

Target           |    CPU     |   ARCH  | cores | kernel |  RAM  |
-----------------+------------+---------+-------+--------+-------|
Raspberry Pi 4B  | Cortex-A72 | aarch64 |   4   | 64-bit |  2 GB |
Nitrogen8M SBC   | Cortex-A53 | aarch64 |   4   | 64-bit |  2 GB |
Compaq Pressario | Pentium D  | i686    |   2   | 32-bit |  4 GB |
Dell T3620       | Core i3    | x86_64  |  2+2  | 64-bit | 16 GB |

The targets, apart from the Compaq, each have an on-chip USB3 host
controller. A PCIe-based USB3 host controller card was added to the
Compaq to provide the necessary USB3 host interface.

The network throughput was measured using iperf3. The peer device was
a second Dell T3620 fitted with an Intel i210 network interface. The
target machine and the peer device were connected via a Netgear GS105
gigabit switch.

The CPU load was measured using mpstat running on the target machine.

The tables below summarise the throughput and CPU load improvements
achieved by the updated driver.

The bandwidth is the average bandwidth reported by iperf3 at the end
of a 60-second test.

The percentage idle figure is the average idle reported across all
CPU cores on the target machine for the duration of the test.

TCP Rx (target receiving, peer transmitting)

                 |   Standard Driver  |   NAPI Driver      |
Target           | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B     |    941    |  74.9  |    941    |  91.5  |
Nitrogen8M       |    941    |  76.2  |    941    |  92.7  |
Compaq Pressario |    941    |  44.5  |    941    |  82.1  |
Dell T3620       |    941    |  88.9  |    941    |  98.3  |

TCP Tx (target transmitting, peer receiving)

                 |   Standard Driver  |   NAPI Driver      |
Target           | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B     |    683    |  80.1  |    942    |  97.6  |
Nitrogen8M       |    942    |  97.8  |    942    |  97.3  |
Compaq Pressario |    939    |  80.0  |    942    |  91.2  |
Dell T3620       |    942    |  95.3  |    942    |  97.6  |

UDP Rx (target receiving, peer transmitting)

                 |   Standard Driver  |   NAPI Driver      |
Target           | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B     |     -     |    -   | 958 (0%)  |  76.2  |
Nitrogen8M       | 690 (25%) |  57.7  | 937 (0%)  |  68.5  |
Compaq Pressario | 958 (0%)  |  50.2  | 958 (0%)  |  61.6  |
Dell T3620       | 958 (0%)  |  89.6  | 958 (0%)  |  85.3  |

The figure in brackets is the percentage packet loss.

UDP Tx (target transmitting, peer receiving)

                 |   Standard Driver  |   NAPI Driver      |
Target           | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B     |    370    |  75.0  |    886    |  78.9  |
Nitrogen8M       |    710    |  75.0  |    958    |  85.3  |
Compaq Pressario |    958    |  65.5  |    958    |  76.6  |
Dell T3620       |    958    |  97.0  |    958    |  97.3  |
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Introduce NAPI polling support
John Efstathiades [Thu, 18 Nov 2021 11:01:39 +0000 (11:01 +0000)]
lan78xx: Introduce NAPI polling support

This patch introduces a NAPI-style approach for processing completed
Rx URBs that contributes to improving driver throughput and reducing
CPU load.

Packets in completed URBs are copied to NAPI SKBs and passed to the
network stack for processing. Each frame passed to the stack is one
work item in the NAPI budget.

If the NAPI budget is consumed and frames remain, they are added to
an overflow queue that is processed at the start of the next NAPI
polling cycle.

The NAPI handler is also responsible for copying pending Tx data to
Tx URBs and submitting them to the USB host controller for
transmission.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Remove hardware-specific header update
John Efstathiades [Thu, 18 Nov 2021 11:01:38 +0000 (11:01 +0000)]
lan78xx: Remove hardware-specific header update

Remove hardware-specific header length adjustment as it is no longer
required. It also breaks generic receive offload (GRO) processing of
received TCP frames that results in a TCP ACK being sent for each
received frame.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Re-order rx_submit() to remove forward declaration
John Efstathiades [Thu, 18 Nov 2021 11:01:37 +0000 (11:01 +0000)]
lan78xx: Re-order rx_submit() to remove forward declaration

Move position of rx_submit() to remove forward declaration of
rx_complete() which is now no longer required.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Introduce Rx URB processing improvements
John Efstathiades [Thu, 18 Nov 2021 11:01:36 +0000 (11:01 +0000)]
lan78xx: Introduce Rx URB processing improvements

This patch introduces a new approach to allocating and managing
Rx URBs that contributes to improving driver throughput and reducing
CPU load.

A pool of Rx URBs is created during driver instantiation. All the
URBs are initially submitted to the USB host controller for
processing.

The default URB buffer size is different for each USB bus speed.
The chosen sizes provide good USB utilisation with little impact on
overall packet latency.

Completed URBs are processed in the driver bottom half. The URB
buffer contents are copied to a dynamically allocated SKB, which is
then passed to the network stack. The URB is then re-submitted to
the USB host controller.

NOTE: the call to skb_copy() in rx_process() that copies the URB
contents to a new SKB is a temporary change to make this patch work
in its own right. This call will be removed when the NAPI processing
is introduced by patch 6 in this patch set.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Introduce Tx URB processing improvements
John Efstathiades [Thu, 18 Nov 2021 11:01:35 +0000 (11:01 +0000)]
lan78xx: Introduce Tx URB processing improvements

This patch introduces a new approach to allocating and managing
Tx URBs that contributes to improving driver throughput and reducing
CPU load.

A pool of Tx URBs is created during driver instantiation. A URB is
allocated from the pool when there is data to transmit. The URB is
released back to the pool when the data has been transmitted by the
device.

The default URB buffer size is different for each USB bus speed.
The chosen sizes provide good USB utilisation with little impact on
overall packet latency.

SKBs to be transmitted are added to a pending queue for processing.
The driver tracks the available Tx URB buffer space and copies as
much pending data as possible into each free URB. Each full URB
is then submitted to the USB host controller for transmission.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Fix memory allocation bug
John Efstathiades [Thu, 18 Nov 2021 11:01:34 +0000 (11:01 +0000)]
lan78xx: Fix memory allocation bug

Fix memory allocation that fails to check for NULL return.

Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'dsa-felix-psfp'
David S. Miller [Thu, 18 Nov 2021 12:07:24 +0000 (12:07 +0000)]
Merge branch 'dsa-felix-psfp'

Xiaoliang Yang says:

====================
net: dsa: felix: psfp support on vsc9959

VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP).
This patch series add PSFP support on tc flower offload of ocelot
driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate
and police set to support PSFP in VSC9959 driver.

v6-v7 changes:
 - Add a patch to restrict psfp rules on ingress port.
 - Using stats.drops to show the packet count discarded by the rule.

v5->v6 changes:
 - Modify ocelot_mact_lookup() parameters.
 - Use parameters ssid and sfid instead of streamdata in
   ocelot_mact_learn_streamdata() function.
 - Serialize STREAMDATA and MAC table write.

v4->v5 changes:
 - Add MAC table lock patch, and move stream data write in
   ocelot_mact_learn_streamdata().
 - Add two sections of VCAP policers to Seville platform.

v3->v4 changes:
 - Introduce vsc9959_psfp_sfi_table_get() function in patch where it is
   used to fix compile warning.

v2->v3 changes:
 - Reorder first two patches. Export struct ocelot_mact_entry, then add
   ocelot_mact_lookup() and ocelot_mact_write() functions.
 - Add PSFP list to struct ocelot, and init it by using
   ocelot->ops->psfp_init().

v1->v2 changes:
 - Use tc flower offload of ocelot driver to support PSFP add and delete.
 - Add PSFP tables add/del functions in felix_vsc9959.c.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: felix: restrict psfp rules on ingress port
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:04 +0000 (18:12 +0800)]
net: dsa: felix: restrict psfp rules on ingress port

PSFP rules take effect on the streams from any port of VSC9959 switch.
This patch use ingress port to limit the rule only active on this port.

Each stream can only match two ingress source ports in VSC9959. Streams
from lowest port gets the configuration of SFID pointed by MAC Table
lookup and streams from highest port gets the configuration of (SFID+1)
pointed by MAC Table lookup. This patch defines the PSFP rule on highest
port as dummy rule, which means that it does not modify the MAC table.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: felix: use vcap policer to set flow meter for psfp
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:03 +0000 (18:12 +0800)]
net: dsa: felix: use vcap policer to set flow meter for psfp

This patch add police action to set flow meter table which is defined
in IEEE802.1Qci. Flow metering is two rates two buckets and three color
marker to policing the frames, we only enable one rate one bucket in
this patch.

Flow metering shares a same policer pool with VCAP policers, so the PSFP
policer calls ocelot_vcap_policer_add() and ocelot_vcap_policer_del() to
set flow meter police.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: use index to set vcap policer
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:02 +0000 (18:12 +0800)]
net: mscc: ocelot: use index to set vcap policer

Policer was previously automatically assigned from the highest index to
the lowest index from policer pool. But police action of tc flower now
uses index to set an police entry. This patch uses the police index to
set vcap policers, so that one policer can be shared by multiple rules.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: felix: add stream gate settings for psfp
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:01 +0000 (18:12 +0800)]
net: dsa: felix: add stream gate settings for psfp

This patch adds stream gate settings for PSFP. Use SGI table to store
stream gate entries. Disable the gate entry when it is not used by any
stream.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: felix: support psfp filter on vsc9959
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:00 +0000 (18:12 +0800)]
net: dsa: felix: support psfp filter on vsc9959

VSC9959 supports Per-Stream Filtering and Policing(PSFP) that complies
with the IEEE 802.1Qci standard. The stream is identified by Null stream
identification(DMAC and VLAN ID) defined in IEEE802.1CB.

For PSFP, four tables need to be set up: stream table, stream filter
table, stream gate table, and flow meter table. Identify the stream by
parsing the tc flower keys and add it to the stream table. The stream
filter table is automatically maintained, and its index is determined by
SGID(flow gate index) and FMID(flow meter index).

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add gate and police action offload to PSFP
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:59 +0000 (18:11 +0800)]
net: mscc: ocelot: add gate and police action offload to PSFP

PSFP support gate and police action. This patch add the gate and police
action to flower parse action, check chain ID to determine which block
to offload. Adding psfp callback functions to add, delete and update gate
and police in PSFP table if hardware supports it.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: set vcap IS2 chain to goto PSFP chain
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:58 +0000 (18:11 +0800)]
net: mscc: ocelot: set vcap IS2 chain to goto PSFP chain

Some chips in the ocelot series such as VSC9959 support Per-Stream
Filtering and Policing(PSFP), which is processing after VCAP blocks.
We set this block on chain 30000 and set vcap IS2 chain to goto PSFP
chain if hardware support.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add MAC table stream learn and lookup operations
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:57 +0000 (18:11 +0800)]
net: mscc: ocelot: add MAC table stream learn and lookup operations

ocelot_mact_learn_streamdata() can be used in VSC9959 to overwrite an
FDB entry with stream data. The stream data includes SFID and SSID which
can be used for PSFP and FRER set.

ocelot_mact_lookup() can be used to check if the given {DMAC, VID} FDB
entry is exist, and also can retrieve the DEST_IDX and entry type for
the FDB entry.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp/test: Update refcount checking in route fragment tests
Jeremy Kerr [Thu, 18 Nov 2021 06:57:23 +0000 (14:57 +0800)]
mctp/test: Update refcount checking in route fragment tests

In 99ce45d5e, we moved a route refcount decrement from
mctp_do_fragment_route into the caller. This invalidates the assumption
that the route test makes about refcount behaviour, so the route tests
fail.

This change fixes the test case to suit the new refcount behaviour.

Fixes: 99ce45d5e7db ("mctp: Implement extended addressing")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: ah6: use swap() to make code cleaner
Yao Jing [Thu, 18 Nov 2021 06:10:18 +0000 (06:10 +0000)]
ipv6: ah6: use swap() to make code cleaner

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yao Jing <yao.jing2@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: add missing htmldocs for skb->ll_node and sk->defer_list
Eric Dumazet [Thu, 18 Nov 2021 01:57:29 +0000 (17:57 -0800)]
tcp: add missing htmldocs for skb->ll_node and sk->defer_list

Add missing entries to fix these "make htmldocs" warnings.

./include/linux/skbuff.h:953: warning: Function parameter or member 'll_node' not described in 'sk_buff'
./include/net/sock.h:540: warning: Function parameter or member 'defer_list' not described in 'sock'

Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Thu, 18 Nov 2021 11:49:52 +0000 (11:49 +0000)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2021-11-17

Radoslaw Tyl says:

The change is a consequence of errors reported by the ixgbevf driver
while starting several virtual guests at the same time on ESX host.
During this, VF was not able to communicate correctly with the PF,
as a result reported "PF still in reset state. Is the PF interface up?"
and then goes to locked state. The only thing left was to reload
the VF driver on the guest OS.

The background of the problem is that the current PFU and VFU
semaphore locking mechanism between sender and receiver may cause
overriding Mailbox memory (VFMBMEM), in such scenario receiver of
the original message will read the invalid, corrupted or one (or more)
message may be lost.

This change is actually as a support for communication with PF ESX
driver and does not contains changes and support for ixgbe driver.
For maintain backward compatibility, previous communication method
has been preserved in the form of LEGACY functions.

In the future there is a plan to add a support for a 1.5 mailbox API
communication also to ixgbe driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mdio: Replaced BUG_ON() with WARN()
Florian Fainelli [Wed, 17 Nov 2021 17:36:29 +0000 (09:36 -0800)]
net: mdio: Replaced BUG_ON() with WARN()

Killing the kernel because a certain MDIO bus object is not in the
desired state at various points in the registration or unregistration
paths is excessive and is not helping in troubleshooting or fixing
issues. Replace the BUG_ON() with WARN() and print out the MDIO bus name
to facilitate debugging.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'dpaa2-phylink'
David S. Miller [Thu, 18 Nov 2021 11:38:45 +0000 (11:38 +0000)]
Merge branch 'dpaa2-phylink'

Russell King says:

====================
net: dpaa2: phylink validate implementation updates

This series converts dpaa2 to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate(). Previous behaviour should be
preserved.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dpaa2-mac: use phylink_generic_validate()
Russell King (Oracle) [Wed, 17 Nov 2021 17:24:13 +0000 (17:24 +0000)]
net: dpaa2-mac: use phylink_generic_validate()

DPAA2 has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dpaa2-mac: remove interface checks in dpaa2_mac_validate()
Russell King (Oracle) [Wed, 17 Nov 2021 17:24:07 +0000 (17:24 +0000)]
net: dpaa2-mac: remove interface checks in dpaa2_mac_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dpaa2-mac: populate supported_interfaces member
Russell King [Wed, 17 Nov 2021 17:24:02 +0000 (17:24 +0000)]
net: dpaa2-mac: populate supported_interfaces member

Populate the phy interface mode bitmap for the Freescale DPAA2 driver
with interfaces modes supported by the MAC.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ag71xx-phylink'
David S. Miller [Thu, 18 Nov 2021 11:36:48 +0000 (11:36 +0000)]
Merge branch 'ag71xx-phylink'

Russell King says:

====================
net: ag71xx: phylink validate implementation updates

This series converts ag71xx to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().

The question over the port linkmode restriction has been answered by
Oleksij - there is no reason for this restriction, so we can go the
whole hog with this conversion. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ag71xx: use phylink_generic_validate()
Russell King (Oracle) [Wed, 17 Nov 2021 16:46:31 +0000 (16:46 +0000)]
net: ag71xx: use phylink_generic_validate()

ag71xx apparently only supports MII port type, which makes it different
from other implementations. However, Oleksij says there is no special
reason for this.

Convert the driver to use phylink_generic_validate(), which will allow
all ethtool port linkmodes instead of only MII, giving the driver
consistent behaviour with other drivers.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ag71xx: remove interface checks in ag71xx_mac_validate()
Russell King (Oracle) [Wed, 17 Nov 2021 16:46:25 +0000 (16:46 +0000)]
net: ag71xx: remove interface checks in ag71xx_mac_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ag71xx: populate supported_interfaces member
Russell King [Wed, 17 Nov 2021 16:46:20 +0000 (16:46 +0000)]
net: ag71xx: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Atheros ag71xx driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: dwmac-qcom-ethqos: add platform level clocks management
Bhupesh Sharma [Wed, 17 Nov 2021 11:05:38 +0000 (16:35 +0530)]
net: stmmac: dwmac-qcom-ethqos: add platform level clocks management

Split clocks settings from init callback into clks_config callback,
which could support platform level clock management.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4/raw: support binding to nonlocal addresses
Riccardo Paolo Bestetti [Wed, 17 Nov 2021 09:00:11 +0000 (10:00 +0100)]
ipv4/raw: support binding to nonlocal addresses

Add support to inet v4 raw sockets for binding to nonlocal addresses
through the IP_FREEBIND and IP_TRANSPARENT socket options, as well as
the ipv4.ip_nonlocal_bind kernel parameter.

Add helper function to inet_sock.h to check for bind address validity on
the base of the address type and whether nonlocal address are enabled
for the socket via any of the sockopts/sysctl, deduplicating checks in
ipv4/ping.c, ipv4/af_inet.c, ipv6/af_inet6.c (for mapped v4->v6
addresses), and ipv4/raw.c.

Add test cases with IP[V6]_FREEBIND verifying that both v4 and v6 raw
sockets support binding to nonlocal addresses after the change. Add
necessary support for the test cases to nettest.

Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211117090010.125393-1-pbl@bestov.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: add missing include in include/net/gro.h
Eric Dumazet [Wed, 17 Nov 2021 10:01:30 +0000 (02:01 -0800)]
net: add missing include in include/net/gro.h

This is needed for some arches, as reported by Geert Uytterhoeven,
Randy Dunlap and Stephen Rothwell

Fixes: 4721031c3559 ("net: move gro definitions to include/net/gro.h")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20211117100130.2368319-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agostmmac: fix build due to brainos in trans_start changes
Alexander Lobakin [Wed, 17 Nov 2021 15:29:17 +0000 (16:29 +0100)]
stmmac: fix build due to brainos in trans_start changes

txq_trans_cond_update() takes netdev_tx_queue *nq,
not nq->trans_start.

Fixes: 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211117152917.3739-1-alexandr.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoixgbevf: Add support for new mailbox communication between PF and VF
Radoslaw Tyl [Wed, 30 Jun 2021 08:15:32 +0000 (10:15 +0200)]
ixgbevf: Add support for new mailbox communication between PF and VF

Provide improved mailbox communication, between PF and VF,
which is defined as API version 1.5.

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoixgbevf: Mailbox improvements
Radoslaw Tyl [Wed, 30 Jun 2021 08:15:31 +0000 (10:15 +0200)]
ixgbevf: Mailbox improvements

Improve reliability of the mailbox communication and remove
its potential flaws that may lead to the undefined or faulty behavior.

Recently some users reported issues on ESX with 10G Intel NICs which were
found to be caused by incorrect implementation of the PF-VF mailbox
communication.

Technical investigation highlighted areas to improve in the communication
between PF or VF that wants to send the message (sender) and the other
part which receives the message (receiver):

 - Locking the mailbox when the sender wants to send a message
 - Releasing the mailbox when the communication ends
 - Returning the result of the mailbox message execution

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoixgbevf: Add legacy suffix to old API mailbox functions
Radoslaw Tyl [Wed, 30 Jun 2021 08:15:30 +0000 (10:15 +0200)]
ixgbevf: Add legacy suffix to old API mailbox functions

Add legacy suffix to mailbox functions which should be backwards compatible
with older PF drivers. Communication during API negotiation always has to
be done using the earlier implementation.

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoixgbevf: Improve error handling in mailbox
Radoslaw Tyl [Wed, 30 Jun 2021 08:15:29 +0000 (10:15 +0200)]
ixgbevf: Improve error handling in mailbox

Add new handling for error codes:
 IXGBE_ERR_CONFIG - ixgbe_mbx_operations is not correctly set
 IXGBE_ERR_TIMEOUT - mailbox operation, e.g. poll for message, timeout

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoixgbevf: Rename MSGTYPE to SUCCESS and FAILURE
Radoslaw Tyl [Wed, 30 Jun 2021 08:15:28 +0000 (10:15 +0200)]
ixgbevf: Rename MSGTYPE to SUCCESS and FAILURE

There is name similarity within IXGBE_VT_MSGTYPE_ACK and
PFMAILBOX.ACK / VFMAILBOX.ACK. MSGTYPE macros are renamed to SUCCESS and
FAILURE because they are not specified in datasheet and now will be
easily distinguishable.

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoMerge branch 'dev_watchdog-less-intrusive'
David S. Miller [Wed, 17 Nov 2021 14:56:16 +0000 (14:56 +0000)]
Merge branch 'dev_watchdog-less-intrusive'

Eric Dumazet says:

====================
net: make dev_watchdog() less intrusive

dev_watchdog() is used on many NIC to periodically monitor TX queues
to detect hangs.

Problem is : It stops all queues, then check them, then 'unfreeze' them.

Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.

With many TX queues, this is becoming more visible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: no longer stop all TX queues in dev_watchdog()
Eric Dumazet [Wed, 17 Nov 2021 03:29:24 +0000 (19:29 -0800)]
net: no longer stop all TX queues in dev_watchdog()

There is no reason for stopping all TX queues from dev_watchdog()

Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: do not inline netif_tx_lock()/netif_tx_unlock()
Eric Dumazet [Wed, 17 Nov 2021 03:29:23 +0000 (19:29 -0800)]
net: do not inline netif_tx_lock()/netif_tx_unlock()

These are not fast path, there is no point in inlining them.

Also provide netif_freeze_queues()/netif_unfreeze_queues()
so that we can use them from dev_watchdog() in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: annotate accesses to queue->trans_start
Eric Dumazet [Wed, 17 Nov 2021 03:29:22 +0000 (19:29 -0800)]
net: annotate accesses to queue->trans_start

In following patches, dev_watchdog() will no longer stop all queues.
It will read queue->trans_start locklessly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: use an atomic_long_t for queue->trans_timeout
Eric Dumazet [Wed, 17 Nov 2021 03:29:21 +0000 (19:29 -0800)]
net: use an atomic_long_t for queue->trans_timeout

tx_timeout_show() assumed dev_watchdog() would stop all
the queues, to fetch queue->trans_timeout under protection
of the queue->_xmit_lock.

As we want to no longer disrupt transmits, we use an
atomic_long_t instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: david decotigny <david.decotigny@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'for-net-next-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 17 Nov 2021 14:52:44 +0000 (14:52 +0000)]
Merge tag 'for-net-next-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for AOSP Bluetooth Quality Report
 - Enables AOSP extension for Mediatek Chip (MT7921 & MT7922)
 - Rework of HCI command execution serialization
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw: Enable PHY timestamping
Kurt Kanzenbach [Tue, 16 Nov 2021 08:03:25 +0000 (09:03 +0100)]
net: ethernet: ti: cpsw: Enable PHY timestamping

If the used PHYs also support hardware timestamping, all configuration requests
should be forwared to the PHYs instead of being processed by the MAC driver
itself.

This enables PHY timestamping in combination with the cpsw driver.

Tested with an am335x based board with two DP83640 PHYs connected to the cpsw
switch.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoDocumentation: networking: net_failover: Fix documentation
Vasudev Kamath [Tue, 16 Nov 2021 07:21:48 +0000 (12:51 +0530)]
Documentation: networking: net_failover: Fix documentation

Update net_failover documentation with missing and incomplete
details to get a proper working setup.

Signed-off-by: Vasudev Kamath <vasudev@copyninja.info>
Reviewed-by: Krishna Kumar <krikku@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ocelot_net-phylink'
David S. Miller [Wed, 17 Nov 2021 11:25:45 +0000 (11:25 +0000)]
Merge branch 'ocelot_net-phylink'

Russell King says:

====================
net: ocelot_net: phylink validate implementation updates

This series converts ocelot_net to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation,
and then converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:41 +0000 (10:09 +0000)]
net: ocelot_net: use phylink_generic_validate()

ocelot_net has no special behaviour in its validation implementation, so
can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: remove interface checks in macb_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:36 +0000 (10:09 +0000)]
net: ocelot_net: remove interface checks in macb_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:31 +0000 (10:09 +0000)]
net: ocelot_net: populate supported_interfaces member

Populate the phy interface mode bitmap for the MSCC Ocelot driver with
the interface modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mtk_eth_soc-phylink'
David S. Miller [Wed, 17 Nov 2021 11:23:39 +0000 (11:23 +0000)]
Merge branch 'mtk_eth_soc-phylink'

Russell King says:

====================
net: mtk_eth_soc: phylink validate implementation updates

This series converts mtk_eth_soc to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation, and
then converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:58 +0000 (10:06 +0000)]
net: mtk_eth_soc: use phylink_generic_validate()

mtk_eth_soc has no special behaviour in its validation implementation,
so can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: drop use of phylink_helper_basex_speed()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:53 +0000 (10:06 +0000)]
net: mtk_eth_soc: drop use of phylink_helper_basex_speed()

Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: remove interface checks in mtk_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:48 +0000 (10:06 +0000)]
net: mtk_eth_soc: remove interface checks in mtk_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:43 +0000 (10:06 +0000)]
net: mtk_eth_soc: populate supported_interfaces member

Populate the phy interface mode bitmap for the Mediatek driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'sparx5-phylink'
David S. Miller [Wed, 17 Nov 2021 11:21:42 +0000 (11:21 +0000)]
Merge branch 'sparx5-phylink'

Russell King says:

====================
net: sparx5: phylink validate implementation updates

This series converts sparx5 to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:11 +0000 (10:02 +0000)]
net: sparx5: use phylink_generic_validate()

Sparx5 has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: clean up sparx5_phylink_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:06 +0000 (10:02 +0000)]
net: sparx5: clean up sparx5_phylink_validate()

sparx5_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Neither is it necessary
to check the device capabilities as we will not be called for
unsupported interface modes. Remove these checks.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:01 +0000 (10:02 +0000)]
net: sparx5: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Microchip Sparx5 driver
with interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'enetc-phylink'
David S. Miller [Wed, 17 Nov 2021 11:19:28 +0000 (11:19 +0000)]
Merge branch 'enetc-phylink'

Russell King says:

====================
net: enetc: phylink validate implementation updates

This series converts enetc to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:59:08 +0000 (09:59 +0000)]
net: enetc: use phylink_generic_validate()

enetc has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: remove interface checks in enetc_pl_mac_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:59:03 +0000 (09:59 +0000)]
net: enetc: remove interface checks in enetc_pl_mac_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 09:58:58 +0000 (09:58 +0000)]
net: enetc: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Freescale enetc driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'xilinx-phylink'
David S. Miller [Wed, 17 Nov 2021 11:17:44 +0000 (11:17 +0000)]
Merge branch 'xilinx-phylink'

Russell King says:

====================
net: xilinx: phylink validate implementation updates

This series converts axienet to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:32 +0000 (09:55 +0000)]
net: axienet: use phylink_generic_validate()

axienet has no special behaviour in its validation implementation, so
can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: remove interface checks in axienet_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:27 +0000 (09:55 +0000)]
net: axienet: remove interface checks in axienet_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:22 +0000 (09:55 +0000)]
net: axienet: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Xilinx axienet driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 17 Nov 2021 11:03:43 +0000 (11:03 +0000)]
Merge tag 'mlx5-updates-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-11-16

Updates for mlx5 driver:

1) Support ethtool cq mode
2) Static allocation of mod header object for the common case
3) TC support for when local and remote VTEPs are in the same
4) Create E-Switch QoS objects on demand to save on resources
5) Minor code improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx5: E-switch, Create QoS on demand
Dmytro Linkin [Tue, 21 Sep 2021 16:08:38 +0000 (19:08 +0300)]
net/mlx5: E-switch, Create QoS on demand

Don't create eswitch QoS (root TSAR) on switch mode change. Create it on
first child TSAR object creation - vport or rate group. Keep track
root TSAR references and release root TSAR with last object deletion.
No need to check for QoS is enabled when installing tc matchall filter.
Remove related helper function due to no users of it.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Enable vport QoS on demand
Dmytro Linkin [Tue, 21 Sep 2021 15:45:42 +0000 (18:45 +0300)]
net/mlx5: E-switch, Enable vport QoS on demand

Vports' QoS is not commonly used but consume SW/HW resources, which
becomes an issue on BlueField SoC systems.
Don't enable QoS on vports by default on eswitch mode change and enable
when it's going to be used by one of the top level users:
- configuring TC matchall filter with police action;
- setting rate with legacy NDO API;
- calling devlink ops->rate_leaf_*() callbacks.

Disable vport QoS on vport cleanup.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, move offloads mode callbacks to offloads file
Parav Pandit [Thu, 21 Oct 2021 15:21:30 +0000 (18:21 +0300)]
net/mlx5: E-switch, move offloads mode callbacks to offloads file

eswitch.c is mainly for common code between legacy and offloads mode.
MAC address get and set via devlink is applicable only in offloads mode.

Hence, move it to eswitch_offloads.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Reuse mlx5_eswitch_set_vport_mac
Parav Pandit [Thu, 21 Oct 2021 15:17:52 +0000 (18:17 +0300)]
net/mlx5: E-switch, Reuse mlx5_eswitch_set_vport_mac

mlx5_eswitch_set_vport_mac() routine already does necessary checks which
are duplicated in implementation of
mlx5_devlink_port_function_hw_addr_set().

Hence, reuse mlx5_eswitch_set_vport_mac() and cut down the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Remove vport enabled check
Parav Pandit [Wed, 20 Oct 2021 04:56:01 +0000 (07:56 +0300)]
net/mlx5: E-switch, Remove vport enabled check

An eswitch vport of the devlink port is always enabled before a
devlink port is registered. And a eswitch vport is always disabled
after a devlink port is unregistered.
Hence avoid the vport enabled check in the devlink callback routine.
Such check is only applicable in the legacy SR-IOV callbacks.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Sunil Sudhakar Rani <sunrani@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Specify out ifindex when looking up decap route
Chris Mi [Tue, 26 Oct 2021 09:08:24 +0000 (17:08 +0800)]
net/mlx5e: Specify out ifindex when looking up decap route

There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
decap route for offloads. So in this case, a local route is returned
and the route dev is lo.

Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up decap
route for offloads. So that a unicast route will be returned.

[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Move comment about mod header flag to correct place
Roi Dayan [Wed, 10 Nov 2021 14:19:41 +0000 (16:19 +0200)]
net/mlx5e: TC, Move comment about mod header flag to correct place

Move the comment to the correct place where the driver actually
removes the flag and not in the check that maybe pedit actions exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Move kfree() calls after destroying all resources
Roi Dayan [Mon, 1 Nov 2021 16:13:02 +0000 (18:13 +0200)]
net/mlx5e: TC, Move kfree() calls after destroying all resources

When deleting fdb/nic flow rules first release all resources
and then call the kfree() calls instead of sparse them around
the function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Destroy nic flow counter if exists
Roi Dayan [Mon, 1 Nov 2021 16:02:00 +0000 (18:02 +0200)]
net/mlx5e: TC, Destroy nic flow counter if exists

Counter is only added if counter flag exists.
So check the counter fag exists for deleting the counter.
This is the same as in add/del fdb flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: TC, using swap() instead of tmp variable
Yihao Han [Wed, 3 Nov 2021 06:21:09 +0000 (23:21 -0700)]
net/mlx5: TC, using swap() instead of tmp variable

swap() was used instead of the tmp variable to swap values

Signed-off-by: Yihao Han <hanyihao@vivo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: CT: Allow static allocation of mod headers
Paul Blakey [Wed, 25 Aug 2021 13:46:41 +0000 (16:46 +0300)]
net/mlx5: CT: Allow static allocation of mod headers

As each CT rule uses at least 4 modify header actions, each rule
causes at least 3 reallocations by the mod header actions api.

Allow initial static allocation of the mod acts array, and use it for
CT rules. If the static allocation is exceeded go back to dynamic
allocation.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
3 years agonet/mlx5e: Refactor mod header management API
Paul Blakey [Mon, 5 Jul 2021 08:31:47 +0000 (11:31 +0300)]
net/mlx5e: Refactor mod header management API

For all mod hdr related functions to reside in a single self contained
component (mod_hdr.c), refactor alloc() and add get_id() so that user
won't rely on internal implementation, and move both to mod_hdr
component.

Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Avoid printing health buffer when firmware is unavailable
Aya Levin [Tue, 9 Nov 2021 13:44:58 +0000 (15:44 +0200)]
net/mlx5: Avoid printing health buffer when firmware is unavailable

Use firmware version field as an indication to health buffer's sanity.
When firmware version is 0xFFFFFFFF, deduce that firmware is unavailable
and avoid printing the health buffer to dmesg as it doesn't provide
debug info.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix format-security build warnings
Saeed Mahameed [Wed, 3 Nov 2021 21:01:05 +0000 (14:01 -0700)]
net/mlx5: Fix format-security build warnings

Treat the string as an argument to avoid this.

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5:
error: format string is not a string literal (potentially insecure)
                         name);
                         ^~~~
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4:
error: format string is not a string literal (potentially insecure)
                        ptp_ch_stats_desc[i].format);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
3 years agonet/mlx5e: Support ethtool cq mode
Saeed Mahameed [Wed, 15 Sep 2021 06:26:17 +0000 (23:26 -0700)]
net/mlx5e: Support ethtool cq mode

Add support for ethtool coalesce cq mode set and get.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
3 years agonet: document SMII and correct phylink's new validation mechanism
Russell King (Oracle) [Mon, 15 Nov 2021 17:11:17 +0000 (17:11 +0000)]
net: document SMII and correct phylink's new validation mechanism

SMII has not been documented in the kernel, but information on this PHY
interface mode has been recently found. Document it, and correct the
recently introduced phylink handling for this interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it...
Jakub Kicinski [Wed, 17 Nov 2021 03:10:34 +0000 (19:10 -0800)]
Merge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it-to-the-mass-market'

Heiner Kallweit says:

====================
r8169: disable detection of further chip versions that didn't make it to the mass market

There's no sign of life from further chip versions. Seems they didn't
make it to the mass market. Let's disable detection and if nobody
complains remove support a few kernel versions later.
====================

Link: https://lore.kernel.org/r/7708d13a-4a2b-090d-fadf-ecdd0fff5d2e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip version 41
Heiner Kallweit [Mon, 15 Nov 2021 20:52:35 +0000 (21:52 +0100)]
r8169: disable detection of chip version 41

It seems this chip version never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip version 45
Heiner Kallweit [Mon, 15 Nov 2021 20:51:52 +0000 (21:51 +0100)]
r8169: disable detection of chip version 45

It seems this chip version never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip versions 49 and 50
Heiner Kallweit [Mon, 15 Nov 2021 20:51:14 +0000 (21:51 +0100)]
r8169: disable detection of chip versions 49 and 50

It seems these chip versions never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: enable ASPM L1/L1.1 from RTL8168h
Heiner Kallweit [Mon, 15 Nov 2021 20:17:56 +0000 (21:17 +0100)]
r8169: enable ASPM L1/L1.1 from RTL8168h

With newer chip versions ASPM-related issues seem to occur only if
L1.2 is enabled. I have a test system with RTL8168h that gives a
number of rx_missed errors when running iperf and L1.2 is enabled.
With L1.2 disabled (and L1 + L1.1 active) everything is fine.
See also [0]. Can't test this, but L1 + L1.1 being active should be
sufficient to reach higher package power saving states.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942830

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/36feb8c4-a0b6-422a-899c-e61f2e869dfe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-better-packing-of-global-vars'
Jakub Kicinski [Wed, 17 Nov 2021 03:07:57 +0000 (19:07 -0800)]
Merge branch 'net-better-packing-of-global-vars'

Eric Dumazet says:

====================
net: better packing of global vars

First two patches avoid holes in data section,
and last patch makes sure some siphash keys are contained
in a single cache line.
====================

Link: https://lore.kernel.org/r/20211115172303.3732746-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: align static siphash keys
Eric Dumazet [Mon, 15 Nov 2021 17:23:03 +0000 (09:23 -0800)]
net: align static siphash keys

siphash keys use 16 bytes.

Define siphash_aligned_key_t macro so that we can make sure they
are not crossing a cache line boundary.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: use .data.once section in netdev_level_once()
Eric Dumazet [Mon, 15 Nov 2021 17:23:02 +0000 (09:23 -0800)]
net: use .data.once section in netdev_level_once()

Same rationale than prior patch : using the dedicated
section avoid holes and pack all these bool values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoonce: use __section(".data.once")
Eric Dumazet [Mon, 15 Nov 2021 17:23:01 +0000 (09:23 -0800)]
once: use __section(".data.once")

.data.once contains nicely packed bool variables.
It is used already by DO_ONCE_LITE().

Using it also in DO_ONCE() removes holes in .data section.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>