]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
3 years agonet: dsa: qca8k: drop dsa_switch_ops from qca8k_priv
Ansuel Smith [Fri, 15 Apr 2022 23:30:15 +0000 (01:30 +0200)]
net: dsa: qca8k: drop dsa_switch_ops from qca8k_priv

Now that dsa_switch_ops is not switch specific anymore, we can drop it
from qca8k_priv and use the static ops directly for the dsa_switch
pointer.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: rework and simplify mdiobus logic
Ansuel Smith [Fri, 15 Apr 2022 23:30:14 +0000 (01:30 +0200)]
net: dsa: qca8k: rework and simplify mdiobus logic

In an attempt to reduce qca8k_priv space, rework and simplify mdiobus
logic.
We now declare a mdiobus instead of relying on DSA phy_read/write even
if a mdio node is not present. This is all to make the qca8k ops static
and not switch specific. With a legacy implementation where port doesn't
have a phy map declared in the dts with a mdio node, we declare a
'qca8k-legacy' mdiobus. The conversion logic is used as legacy read and
write ops are used instead of the internal one.
Also drop the legacy_phy_port_mapping as we now declare mdiobus with ops
that already address the workaround.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: drop port_sts from qca8k_priv
Ansuel Smith [Fri, 15 Apr 2022 23:30:13 +0000 (01:30 +0200)]
net: dsa: qca8k: drop port_sts from qca8k_priv

Port_sts is a thing of the past for this driver. It was something
present on the initial implementation of this driver and parts of the
original struct were dropped over time. Using an array of int to store if
a port is enabled or not to handle PM operation seems overkill. Switch
and use a simple u8 to store the port status where each bit correspond
to a port. (bit is set port is enabled, bit is not set, port is disabled)
Also add some comments to better describe why we need to track port
status.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: drop MTU tracking from qca8k_priv
Ansuel Smith [Fri, 15 Apr 2022 23:30:12 +0000 (01:30 +0200)]
net: dsa: qca8k: drop MTU tracking from qca8k_priv

DSA set the CPU port based on the largest MTU of all the slave ports.
Based on this we can drop the MTU array from qca8k_priv and set the
port_change_mtu logic on DSA changing MTU of the CPU port as the switch
have a global MTU settingfor each port.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for...
Arun Ajith S [Fri, 15 Apr 2022 08:34:02 +0000 (08:34 +0000)]
net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131

Add a new neighbour cache entry in STALE state for routers on receiving
an unsolicited (gratuitous) neighbour advertisement with
target link-layer-address option specified.
This is similar to the arp_accept configuration for IPv4.
A new sysctl endpoint is created to turn on this behaviour:
/proc/sys/net/ipv6/conf/interface/accept_unsolicited_na.

Signed-off-by: Arun Ajith S <aajith@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: fix NULL deref in ip6_rcv_core()
Eric Dumazet [Wed, 13 Apr 2022 20:56:53 +0000 (13:56 -0700)]
ipv6: fix NULL deref in ip6_rcv_core()

idev can be NULL, as the surrounding code suggests.

Fixes: 4daf841a2ef3 ("net: ipv6: add skb drop reasons to ip6_rcv_core()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Cc: Jiang Biao <benbjiang@tencent.com>
Cc: Hao Peng <flyingpeng@tencent.com>
Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet_sched: make qdisc_reset() smaller
Eric Dumazet [Thu, 14 Apr 2022 01:10:04 +0000 (18:10 -0700)]
net_sched: make qdisc_reset() smaller

For some unknown reason qdisc_reset() is using
a convoluted way of freeing two lists of skbs.

Use __skb_queue_purge() instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220414011004.2378350-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteon_ep: Remove custom driver version
Leon Romanovsky [Thu, 14 Apr 2022 07:52:42 +0000 (10:52 +0300)]
octeon_ep: Remove custom driver version

In review comment [1] was pointed that new code is not supposed
to set driver version and should rely on kernel version instead.

As an outcome of that comment all the dance around setting such
driver version to FW should be removed too, because in upstream
kernel whole driver will have same version so read/write from/to
FW will give same result.

[1] https://lore.kernel.org/all/YladGTmon1x3dfxI@unreal

Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/5d76f3116ee795071ec044eabb815d6c2bdc7dbd.1649922731.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'ibmvnic-use-a-set-of-ltbs-per-pool'
Jakub Kicinski [Fri, 15 Apr 2022 21:02:09 +0000 (14:02 -0700)]
Merge branch 'ibmvnic-use-a-set-of-ltbs-per-pool'

Sukadev Bhattiprolu says:

====================
ibmvnic: Use a set of LTBs per pool

ibmvnic uses a single large long term buffer (LTB) per rx or tx
pool (queue). This has two limitations.

First, if we need to free/allocate an LTB (eg during a reset), under
low memory conditions, the allocation can fail.

Second, the kernel limits the size of single LTB (DMA buffer) to 16MB
(based on MAX_ORDER). With jumbo frames (mtu = 9000) we can only have
about 1763 buffers per LTB (16MB / 9588 bytes per frame) even though
the max supported buffers is 4096. (The 9588 instead of 9088 comes from
IBMVNIC_BUFFER_HLEN.)

To overcome these limitations, allow creating a set of LTBs per queue.
====================

Link: https://lore.kernel.org/r/20220413171026.1264294-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: Allow multiple ltbs in txpool ltb_set
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:26 +0000 (13:10 -0400)]
ibmvnic: Allow multiple ltbs in txpool ltb_set

Allow multiple LTBs in the txpool's ltb_set. i.e rather than using
a single large LTB, use several smaller LTBs.

The first n-1 LTBs will all be of the same size. The size of the last
LTB in the set depends on the number of buffers and buffer (mtu) size.
This strategy hopefully allows more reuse of the initial LTBs and also
reduces the chances of an allocation failure (of the large LTB) when
system is low in memory.

Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: Allow multiple ltbs in rxpool ltb_set
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:25 +0000 (13:10 -0400)]
ibmvnic: Allow multiple ltbs in rxpool ltb_set

Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all
be of the same size. The size of the last LTB in the set depends on the
number of buffers and buffer (mtu) size.

Having a set of LTBs per pool provides a couple of benefits.

First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an
MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation
can fail in low memory conditions. With a set of LTBs per pool, we can
use several smaller (8MB) LTBs and hopefully have fewer allocation
failures. (See also comments in ibmvnic.h on the trade-off with smaller
LTBs)

Second since the kernel limits the size of the DMA buffer to 16MB (based
on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited
to 16MB. This in turn limits the number of buffers per pool to 1763 when
MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096
buffers per pool even when MTU is 9000.

Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: convert rxpool ltb to a set of ltbs
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:24 +0000 (13:10 -0400)]
ibmvnic: convert rxpool ltb to a set of ltbs

Define and use interfaces that treat the long term buffer (LTB) of an
rxpool as a set of LTBs rather than a single LTB. The set only has one
LTB for now.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: define map_txpool_buf_to_ltb()
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:23 +0000 (13:10 -0400)]
ibmvnic: define map_txpool_buf_to_ltb()

Define a helper to map a given txpool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per txpool
so the mapping is trivial. When we add support for multiple LTBs per
txpool, this helper will be more useful.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: define map_rxpool_buf_to_ltb()
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:22 +0000 (13:10 -0400)]
ibmvnic: define map_rxpool_buf_to_ltb()

Define a helper to map a given rx pool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per pool so
the mapping is trivial. When we add support for multiple LTBs per pool,
this helper will be more useful.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: rename local variable index to bufidx
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:21 +0000 (13:10 -0400)]
ibmvnic: rename local variable index to bufidx

The local variable 'index' is heavily used in some functions and is
confusing with the presence of other "index" fields like pool->index,
->consumer_index, etc. Rename it to bufidx to better reflect that its
the index of a buffer in the pool.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-ethool-add-support-to-get-set-tx-push-by-ethtool-g-g'
Jakub Kicinski [Fri, 15 Apr 2022 18:41:56 +0000 (11:41 -0700)]
Merge branch 'net-ethool-add-support-to-get-set-tx-push-by-ethtool-g-g'

Jie Wang says:

====================
net: ethool: add support to get/set tx push by ethtool -G/g

These three patches add tx push in ring params and adapt the set and get APIs
of ring params.
====================

Link: https://lore.kernel.org/r/20220412020121.14140-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns3: add tx push support in hns3 ring param process
Jie Wang [Tue, 12 Apr 2022 02:01:21 +0000 (10:01 +0800)]
net: hns3: add tx push support in hns3 ring param process

This patch adds tx push param to hns3 ring param and adapts the set and get
API of ring params. So users can set it by cmd ethtool -G and get it by cmd
ethtool -g.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethtool: move checks before rtnl_lock() in ethnl_set_rings
Jie Wang [Tue, 12 Apr 2022 02:01:20 +0000 (10:01 +0800)]
net: ethtool: move checks before rtnl_lock() in ethnl_set_rings

Currently these two checks in ethnl_set_rings are added after rtnl_lock()
which will do useless works if the request is invalid.

So this patch moves these checks before the rtnl_lock() to avoid these
costs.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethtool: extend ringparam set/get APIs for tx_push
Jie Wang [Tue, 12 Apr 2022 02:01:19 +0000 (10:01 +0800)]
net: ethtool: extend ringparam set/get APIs for tx_push

Currently tx push is a standard driver feature which controls use of a fast
path descriptor push. So this patch extends the ringparam APIs and data
structures to support set/get tx push by ethtool -G/g.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteon_ep: fix error return code in octep_probe()
Yang Yingliang [Fri, 15 Apr 2022 02:39:57 +0000 (10:39 +0800)]
octeon_ep: fix error return code in octep_probe()

If register_netdev() fails , it should return error
code in octep_probe().

Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'emaclite-cleanups'
David S. Miller [Fri, 15 Apr 2022 10:46:29 +0000 (11:46 +0100)]
Merge branch 'emaclite-cleanups'

Radhey Shyam Pandey says:

====================
net: emaclite: Trivial code cleanup

This patchset fix coding style issues, remove BUFFER_ALIGN
macro and also update copyright text.

I have to resend as earlier series didn't reach mailing list
due to some configuration issue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Remove custom BUFFER_ALIGN macro
Shravya Kumbham [Thu, 14 Apr 2022 12:37:11 +0000 (18:07 +0530)]
net: emaclite: Remove custom BUFFER_ALIGN macro

BUFFER_ALIGN macro is used to calculate the number of bytes
required for the next alignment. Instead of this, we can directly
use the skb_reserve(skb, NET_IP_ALIGN) to make the protocol header
buffer aligned on at least a 4-byte boundary, where the NET_IP_ALIGN
is by default defined as 2. So removing the BUFFER_ALIGN and its
related defines which it can be done by the skb_reserve() itself.

Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Update copyright text to correct format
Michal Simek [Thu, 14 Apr 2022 12:37:10 +0000 (18:07 +0530)]
net: emaclite: Update copyright text to correct format

Based on recommended guidance Copyright term should be also present in
front of (c). That's why aligned driver to match this pattern.
It helps automated tools with source code scanning.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Fix coding style
Radhey Shyam Pandey [Thu, 14 Apr 2022 12:37:09 +0000 (18:07 +0530)]
net: emaclite: Fix coding style

Make coding style changes to fix checkpatch script warnings.
There is no functional change. Fixes below check and warnings-

CHECK: Blank lines aren't necessary after an open brace '{'
CHECK: spinlock_t definition without comment
CHECK: Please don't use multiple blank lines
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
CHECK: braces {} should be used on all arms of this statement
CHECK: Unbalanced braces around else statement
CHECK: Alignment should match open parenthesis
WARNING: Missing a blank line after declarations

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: davinci_emac: using pm_runtime_resume_and_get instead of pm_runtim...
Minghao Chi [Thu, 14 Apr 2022 09:08:00 +0000 (09:08 +0000)]
net: ethernet: ti: davinci_emac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and
pm_runtime_put_noidle. This change is just to simplify the code, no
actual functional changes.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: Fix spelling mistake "inerrupts" -> "interrupts"
Colin Ian King [Thu, 14 Apr 2022 08:08:34 +0000 (09:08 +0100)]
octeon_ep: Fix spelling mistake "inerrupts" -> "interrupts"

There is a spelling mistake in a dev_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-line-card-prep'
David S. Miller [Fri, 15 Apr 2022 10:06:13 +0000 (11:06 +0100)]
Merge branch 'mlxsw-line-card-prep'

Ido Schimmel says:

====================
mlxsw: Preparations for line cards support

Currently, mlxsw registers thermal zones as well as hwmon entries for
objects such as transceiver modules and gearboxes. In upcoming modular
systems, these objects are no longer found on the main board (i.e., slot
0), but on plug-able line cards. This patchset prepares mlxsw for such
systems in terms of hwmon, thermal and cable access support.

Patches #1-#3 gradually prepare mlxsw for transceiver modules access
support for line cards by splitting some of the internal structures and
some APIs.

Patches #4-#5 gradually prepare mlxsw for hwmon support for line cards
by splitting some of the internal structures and augmenting them with a
slot index.

Patches #6-#7 do the same for thermal zones.

Patch #8 selects cooling device for binding to a thermal zone by exact
name match to prevent binding to non-relevant devices.

Patch #9 replaces internal define for thermal zone name length with a
common define.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_thermal: Use common define for thermal zone name length
Vadim Pasternak [Wed, 13 Apr 2022 15:17:33 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use common define for thermal zone name length

Replace internal define 'MLXSW_THERMAL_ZONE_MAX_NAME' by common
'THERMAL_NAME_LENGTH'.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_thermal: Use exact name of cooling devices for binding
Vadim Pasternak [Wed, 13 Apr 2022 15:17:32 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use exact name of cooling devices for binding

Modular system supports additional cooling devices "mlxreg_fan1",
"mlxreg_fan2", etcetera. Thermal zones in "mlxsw" driver should be
bound to the same device as before called "mlxreg_fan". Used exact
match for cooling device name to avoid binding to new additional
cooling devices.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_thermal: Add line card id prefix to line card thermal zone name
Vadim Pasternak [Wed, 13 Apr 2022 15:17:31 +0000 (18:17 +0300)]
mlxsw: core_thermal: Add line card id prefix to line card thermal zone name

Add prefix "lc#n" to thermal zones associated with the thermal objects
found on line cards.

For example thermal zone for module #9 located at line card #7 will
have type:
mlxsw-lc7-module9.
And thermal zone for gearbox #3 located at line card #5 will have type:
mlxsw-lc5-gearbox3.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_thermal: Extend internal structures to support multi thermal areas
Vadim Pasternak [Wed, 13 Apr 2022 15:17:30 +0000 (18:17 +0300)]
mlxsw: core_thermal: Extend internal structures to support multi thermal areas

Introduce intermediate level for thermal zones areas.
Currently all thermal zones are associated with thermal objects located
within the main board. Such objects are created during driver
initialization and removed during driver de-initialization.

For line cards in modular system the thermal zones are to be associated
with the specific line card. They should be created whenever new line
card is available (inserted, validated, powered and enabled) and
removed, when line card is getting unavailable.
The thermal objects found on the line card #n are accessed by setting
slot index to #n, while for access to objects found on the main board
slot index should be set to default value zero.

Each thermal area contains the set of thermal zones associated with
particular slot index.
Thus introduction of thermal zone areas allows to use the same APIs for
the main board and line cards, by adding slot index argument.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces
Vadim Pasternak [Wed, 13 Apr 2022 15:17:29 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces

Add 'slot' parameter to 'mlxsw_hwmon_dev' structure. Use this parameter
in mlxsw_reg_mtmp_pack(), mlxsw_reg_mtbr_pack(), mlxsw_reg_mgpir_pack()
and mlxsw_reg_mtmp_slot_index_set() routines.
For main board it'll always be zero, for line cards it'll be set to
the physical slot number in modular systems.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core_hwmon: Extend internal structures to support multi hwmon objects
Vadim Pasternak [Wed, 13 Apr 2022 15:17:28 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Extend internal structures to support multi hwmon objects

Currently, mlxsw supports a single hwmon device and registers it with
attributes corresponding to the various objects found on the main
board such as fans and gearboxes.

Line cards can have the same objects, but unlike the main board they
can be added and removed while the system is running. The various
hwmon objects found on these line cards should be created when the
line card becomes available and destroyed when the line card becomes
unavailable.

The above can be achieved by representing each line card as a
different hwmon device and registering / unregistering it when the
line card becomes available / unavailable.

Prepare for multi hwmon device support by splitting
'struct mlxsw_hwmon' into 'struct mlxsw_hwmon' and
'struct mlxsw_hwmon_dev'. The first will hold information relevant to
all hwmon devices, whereas the second will hold per-hwmon device
information.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core: Move port module events enablement to a separate function
Vadim Pasternak [Wed, 13 Apr 2022 15:17:27 +0000 (18:17 +0300)]
mlxsw: core: Move port module events enablement to a separate function

Use a separate function for enablement of port module events such
plug/unplug and temperature threshold crossing. The motivation is to
reuse the function for line cards.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core: Extend port module data structures for line cards
Vadim Pasternak [Wed, 13 Apr 2022 15:17:26 +0000 (18:17 +0300)]
mlxsw: core: Extend port module data structures for line cards

The port module core is tasked with module operations such as setting
power mode policy and reset. The per-module information is currently
stored in one large array suited for non-modular systems where only the
main board is present (i.e., slot index 0).

As a preparation for line cards support, allocate a per line card array
according to the queried number of slots in the system. For each line
card, allocate a module array according to the queried maximum number of
modules per-slot.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core: Extend interfaces for cable info access with slot argument
Vadim Pasternak [Wed, 13 Apr 2022 15:17:25 +0000 (18:17 +0300)]
mlxsw: core: Extend interfaces for cable info access with slot argument

Extend all cable info APIs with 'slot_index' argument.

For main board, slot will always be set to zero and these APIs will work
as before. If reading cable information is required from cages located
on line cards, slot should be set to the physical slot number, where
line card is located in modular systems.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of pm_runtime_g...
Minghao Chi [Wed, 13 Apr 2022 09:38:36 +0000 (09:38 +0000)]
net: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Minghao Chi [Wed, 13 Apr 2022 09:38:01 +0000 (09:38 +0000)]
net: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of pm_runtime_ge...
Minghao Chi [Wed, 13 Apr 2022 09:35:29 +0000 (09:35 +0000)]
net: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agogeneve: avoid indirect calls in GRO path, when possible
Paolo Abeni [Wed, 13 Apr 2022 08:44:40 +0000 (10:44 +0200)]
geneve: avoid indirect calls in GRO path, when possible

In the most common setups, the geneve tunnels use an inner
ethernet encapsulation. In the GRO path, when such condition is
true, we can call directly the relevant GRO helper and avoid
a few indirect calls.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mneta-page_pool_get_stats'
David S. Miller [Fri, 15 Apr 2022 09:43:48 +0000 (10:43 +0100)]
Merge branch 'mneta-page_pool_get_stats'

Lorenzo Bianconi says:

====================
net: mvneta: add support for page_pool_get_stats

Introduce page_pool stats ethtool APIs in order to avoid driver duplicated
code.

Changes since v4:
- rebase on top of net-next

Changes since v3:
- get rid of wrong for loop in page_pool_ethtool_stats_get()
- add API stubs when page_pool_stats are not compiled in

Changes since v2:
- remove enum list of page_pool stats in page_pool.h
- remove leftover change in mvneta.c for ethtool_stats array allocation

Changes since v1:
- move stats accounting to page_pool code
- move stats string management to page_pool code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvneta: add support for page_pool_get_stats
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:59 +0000 (18:31 +0200)]
net: mvneta: add support for page_pool_get_stats

Introduce support for the page_pool stats API into mvneta driver.
Report page_pool stats through ethtool.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: page_pool: introduce ethtool stats
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:58 +0000 (18:31 +0200)]
net: page_pool: introduce ethtool stats

Introduce page_pool APIs to report stats through ethtool and reduce
duplicated code in each driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Fri, 15 Apr 2022 07:26:00 +0000 (09:26 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

3 years agoMerge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 14 Apr 2022 18:58:19 +0000 (11:58 -0700)]
Merge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - smc: fix af_ops of child socket pointing to released memory

   - wifi: ath9k: fix usage of driver-private space in tx_info

  Previous releases - regressions:

   - ipv6: fix panic when forwarding a pkt with no in6 dev

   - sctp: use the correct skb for security_sctp_assoc_request

   - smc: fix NULL pointer dereference in smc_pnet_find_ib()

   - sched: fix initialization order when updating chain 0 head

   - phy: don't defer probe forever if PHY IRQ provider is missing

   - dsa: revert "net: dsa: setup master before ports"

   - dsa: felix: fix tagging protocol changes with multiple CPU ports

   - eth: ice:
      - fix use-after-free when freeing @rx_cpu_rmap
      - revert "iavf: fix deadlock occurrence during resetting VF
        interface"

   - eth: lan966x: stop processing the MAC entry is port is wrong

  Previous releases - always broken:

   - sched:
      - flower: fix parsing of ethertype following VLAN header
      - taprio: check if socket flags are valid

   - nfc: add flush_workqueue to prevent uaf

   - veth: ensure eth header is in skb's linear part

   - eth: stmmac: fix altr_tse_pcs function when using a fixed-link

   - eth: macb: restart tx only if queue pointer is lagging

   - eth: macvlan: fix leaking skb in source mode with nodst option"

* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
  rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
  net: dsa: felix: fix tagging protocol changes with multiple CPU ports
  tun: annotate access to queue->trans_start
  nfc: nci: add flush_workqueue to prevent uaf
  net: dsa: realtek: don't parse compatible string for RTL8366S
  net: dsa: realtek: fix Kconfig to assure consistent driver linkage
  net: ftgmac100: access hardware register after clock ready
  Revert "net: dsa: setup master before ports"
  macvlan: Fix leaking skb in source mode with nodst option
  netfilter: nf_tables: nft_parse_register can return a negative value
  net: lan966x: Stop processing the MAC entry is port is wrong.
  net: lan966x: Fix when a port's upper is changed.
  net: lan966x: Fix IGMP snooping when frames have vlan tag
  net: lan966x: Update lan966x_ptp_get_nominal_value
  sctp: Initialize daddr on peeled off socket
  net/smc: Fix af_ops of child socket pointing to released memory
  net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
  net/smc: use memcpy instead of snprintf to avoid out of bounds read
  net: macb: Restart tx only if queue pointer is lagging
  ...

3 years agoMerge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 14 Apr 2022 18:08:12 +0000 (11:08 -0700)]
Merge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became an unexpectedly large pull request due to various
  regression fixes in the previous kernels.

  The majority of fixes are a series of patches to address the
  regression at probe errors in devres'ed drivers, while there are yet
  more fixes for the x86 SG allocations and for USB-audio buffer
  management. In addition, a few HD-audio quirks and other small fixes
  are found"

* tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
  ALSA: usb-audio: Limit max buffer and period sizes per time
  ALSA: memalloc: Add fallback SG-buffer allocations for x86
  ALSA: nm256: Don't call card private_free at probe error path
  ALSA: mtpav: Don't call card private_free at probe error path
  ALSA: rme9652: Fix the missing snd_card_free() call at probe error
  ALSA: hdspm: Fix the missing snd_card_free() call at probe error
  ALSA: hdsp: Fix the missing snd_card_free() call at probe error
  ALSA: oxygen: Fix the missing snd_card_free() call at probe error
  ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
  ALSA: cmipci: Fix the missing snd_card_free() call at probe error
  ALSA: aw2: Fix the missing snd_card_free() call at probe error
  ALSA: als300: Fix the missing snd_card_free() call at probe error
  ALSA: lola: Fix the missing snd_card_free() call at probe error
  ALSA: bt87x: Fix the missing snd_card_free() call at probe error
  ALSA: sis7019: Fix the missing error handling
  ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
  ALSA: via82xx: Fix the missing snd_card_free() call at probe error
  ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
  ALSA: rme96: Fix the missing snd_card_free() call at probe error
  ALSA: rme32: Fix the missing snd_card_free() call at probe error
  ...

3 years agoMerge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 14 Apr 2022 17:58:27 +0000 (10:58 -0700)]
Merge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more code and warning fixes.

  There's one feature ioctl removal patch slated for 5.18 that did not
  make it to the main pull request. It's just a one-liner and the ioctl
  has a v2 that's in use for a long time, no point to postpone it to
  5.19.

  Late update:

   - remove balance v1 ioctl, superseded by v2 in 2012

  Fixes:

   - add back cgroup attribution for compressed writes

   - add super block write start/end annotations to asynchronous balance

   - fix root reference count on an error handling path

   - in zoned mode, activate zone at the chunk allocation time to avoid
     ENOSPC due to timing issues

   - fix delayed allocation accounting for direct IO

  Warning fixes:

   - simplify assertion condition in zoned check

   - remove an unused variable"

* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix btrfs_submit_compressed_write cgroup attribution
  btrfs: fix root ref counts in error handling in btrfs_get_root_ref
  btrfs: zoned: activate block group only for extent allocation
  btrfs: return allocated block group from do_chunk_alloc()
  btrfs: mark resumed async balance as writing
  btrfs: remove support of balance v1 ioctl
  btrfs: release correct delalloc amount in direct IO write path
  btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
  btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range

3 years agoMerge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 14 Apr 2022 17:51:20 +0000 (10:51 -0700)]
Merge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache fixes from David Howells:
 "Here's a collection of fscache and cachefiles fixes and misc small
  cleanups. The two main fixes are:

   - Add a missing unmark of the inode in-use mark in an error path.

   - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
     cachefiles volume due to the wrong length being given to memcpy().

  In addition, there's the removal of an unused parameter, removal of an
  unused Kconfig option, conditionalising a bit of procfs-related stuff
  and some doc fixes"

* tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: remove FSCACHE_OLD_API Kconfig option
  fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
  fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
  fscache: Remove the cookie parameter from fscache_clear_page_bits()
  docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
  docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
  cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
  cachefiles: unmark inode in use in error path

3 years agoMerge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'
Paolo Abeni [Thu, 14 Apr 2022 13:08:16 +0000 (15:08 +0200)]
Merge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'

Lech Perczak says:

====================
rndis_host: handle bogus MAC addresses in ZTE RNDIS devices

When porting support of ZTE MF286R to OpenWrt [1], it was discovered,
that its built-in LTE modem fails to adjust its target MAC address,
when a random MAC address is assigned to the interface, due to detection of
"locally-administered address" bit. This leads to dropping of ingress
trafficat the host. The modem uses RNDIS as its primary interface,
with some variants exposing both of them simultaneously.

Then it was discovered, that cdc_ether driver contains a fixup for that
exact issue, also appearing on CDC ECM interfaces.
I discussed how to proceed with that with Bjørn Mork at OpenWrt forum [3],
with the first approach would be to trust the locally-administered MAC
again, and add a quirk for the problematic ZTE devices, as suggested by
Kristian Evensen. before [4], but reusing the fixup from cdc_ether looks
like a safer and more generic solution.

Finally, according to Bjørn's suggestion. limit the scope of bogus MAC
addressdetection to ZTE devices, the same way as it is done in cdc_ether,
as this trait wasn't really observed outside of ZTE devices.
Do that for both flavours of RNDIS devices, with interface classes
02/02/ff and e0/01/03, as both types are reported by different modems.

[1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=7ac8da00609f42b8aba74b7efc6b0d055b7cef3e
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfe9b9d2df669a57a95d641ed46eb018e204c6ce
[3] https://forum.openwrt.org/t/problem-with-modem-in-zte-mf286r/120988
[4] https://lore.kernel.org/all/CAKfDRXhDp3heiD75Lat7cr1JmY-kaJ-MS0tt7QXX=s8RFjbpUQ@mail.gmail.com/T/

Cc: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
v3: Fixed wrong identifier commit description and whitespace in patch 2.

v2: ensure that MAC fixup is applied to all Ethernet frames in RNDIS
batch, by introducing a driver flag, and integrating the fixup inside
rndis_rx_fixup().
====================

Link: https://lore.kernel.org/r/20220413014416.2306843-1-lech.perczak@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agorndis_host: limit scope of bogus MAC address detection to ZTE devices
Lech Perczak [Wed, 13 Apr 2022 01:44:16 +0000 (03:44 +0200)]
rndis_host: limit scope of bogus MAC address detection to ZTE devices

Reporting of bogus MAC addresses and ignoring configuration of new
destination address wasn't observed outside of a range of ZTE devices,
among which this seems to be the common bug. Align rndis_host driver
with implementation found in cdc_ether, which also limits this workaround
to ZTE devices.

Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agorndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether
Lech Perczak [Wed, 13 Apr 2022 01:44:15 +0000 (03:44 +0200)]
rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether

Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from
MF286R, expose both CDC-ECM and RNDIS network interfaces.
They have a trait of ignoring the locally-administered MAC address
configured on the interface both in CDC-ECM and RNDIS part,
and this leads to dropping of incoming traffic by the host.
However, the workaround was only present in CDC-ECM, and MF286R
explicitly requires it in RNDIS mode.

Re-use the workaround in rndis_host as well, to fix operation of MF286R
module, some versions of which expose only the RNDIS interface. Do so by
introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it
in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all
of the packets inside the batch need the fixup. This might introduce a
performance penalty, because test is done for every returned Ethernet
frame.

Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE
modems, like MF823 found in the wild, report the USB_CLASS_COMM class
interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER.

Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agocdc_ether: export usbnet_cdc_zte_rx_fixup
Lech Perczak [Wed, 13 Apr 2022 01:44:14 +0000 (03:44 +0200)]
cdc_ether: export usbnet_cdc_zte_rx_fixup

Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduces a workaround for certain ZTE modems reporting invalid MAC
addresses over CDC-ECM.
The same issue was present on their RNDIS interface,which was fixed in
commit a5a18bdf7453 ("rndis_host: Set valid random MAC on buggy devices").

However, internal modem of ZTE MF286R router, on its RNDIS interface, also
exhibits a second issue fixed already in CDC-ECM, of the device not
respecting configured random MAC address. In order to share the fixup for
this with rndis_host driver, export the workaround function, which will
be re-used in the following commit in rndis_host.

Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
Jeremy Linton [Tue, 12 Apr 2022 21:04:20 +0000 (16:04 -0500)]
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"

It turns out after digging deeper into this bug, that it was being
triggered by GCC12 failing to call the bcmgenet_enable_dma()
routine. Given that a gcc12 fix has been merged [1] and the genet
driver now works properly when built with gcc12, this commit should
be reverted.

[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57

Fixes: 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes to assure ordering")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agortnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
Petr Machata [Tue, 12 Apr 2022 20:25:06 +0000 (22:25 +0200)]
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies

When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
size of 0, which is supposed to be an indication that the corresponding
attribute should not be emitted. However, instead, the current code
reserves a 0-byte attribute.

The reason this does not show up as a citation on a kasan kernel is that
netdev_offload_xstats_get(), which is supposed to fill in the data, never
ends up getting called, because rtnl_offload_xstats_get_stats() notices
that the stats are not actually used and skips the call.

Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
response, confusing the userspace.

Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().

Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: felix: fix tagging protocol changes with multiple CPU ports
Vladimir Oltean [Tue, 12 Apr 2022 17:22:09 +0000 (20:22 +0300)]
net: dsa: felix: fix tagging protocol changes with multiple CPU ports

When the device tree has 2 CPU ports defined, a single one is active
(has any dp->cpu_dp pointers point to it). Yet the second one is still a
CPU port, and DSA still calls ->change_tag_protocol on it.

On the NXP LS1028A, the CPU ports are ports 4 and 5. Port 4 is the
active CPU port and port 5 is inactive.

After the following commands:

 # Initial setting
 cat /sys/class/net/eno2/dsa/tagging
 ocelot
 echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
 echo ocelot > /sys/class/net/eno2/dsa/tagging

traffic is now broken, because the driver has moved the NPI port from
port 4 to port 5, unbeknown to DSA.

The problem can be avoided by detecting that the second CPU port is
unused, and not doing anything for it. Further rework will be needed
when proper support for multiple CPU ports is added.

Treat this as a bug and prepare current kernels to work in single-CPU
mode with multiple-CPU DT blobs.

Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220412172209.2531865-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agotun: annotate access to queue->trans_start
Antoine Tenart [Tue, 12 Apr 2022 13:58:52 +0000 (15:58 +0200)]
tun: annotate access to queue->trans_start

Commit 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
introduced a new helper, txq_trans_cond_update, to update
queue->trans_start using WRITE_ONCE. One snippet in drivers/net/tun.c
was missed, as it was introduced roughly at the same time.

Fixes: 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220412135852.466386-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonfc: nci: add flush_workqueue to prevent uaf
Lin Ma [Tue, 12 Apr 2022 16:04:30 +0000 (00:04 +0800)]
nfc: nci: add flush_workqueue to prevent uaf

Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).

The race can be demonstrated below:

Thread-1                           Thread-2
                                 | nci_dev_up()
                                 |   nci_open_device()
                                 |     __nci_request(nci_reset_req)
                                 |       nci_send_cmd
                                 |         queue_work(cmd_work)
nci_unregister_device()          |
  nci_close_device()             | ...
    del_timer_sync(cmd_timer)[1] |
...                              | Worker
nci_free_device()                | nci_cmd_work()
  kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.

This UAF is easy to trigger, crash trace by POC is like below

[   66.703713] ==================================================================
[   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[   66.703974]
[   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[   66.703974] Call Trace:
[   66.703974]  <TASK>
[   66.703974]  dump_stack_lvl+0x57/0x7d
[   66.703974]  print_report.cold+0x5e/0x5db
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  kasan_report+0xbe/0x1c0
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  enqueue_timer+0x448/0x490
[   66.703974]  __mod_timer+0x5e6/0xb80
[   66.703974]  ? mark_held_locks+0x9e/0xe0
[   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[   66.703974]  ? queue_work_on+0x61/0x80
[   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
[   66.703974]  process_one_work+0x8bb/0x1510
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
[   66.703974]  ? rwlock_bug.part.0+0x90/0x90
[   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
[   66.703974]  worker_thread+0x575/0x1190
[   66.703974]  ? process_one_work+0x1510/0x1510
[   66.703974]  kthread+0x2a0/0x340
[   66.703974]  ? kthread_complete_and_exit+0x20/0x20
[   66.703974]  ret_from_fork+0x22/0x30
[   66.703974]  </TASK>
[   66.703974]
[   66.703974] Allocated by task 267:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  __kasan_kmalloc+0x81/0xa0
[   66.703974]  nci_allocate_device+0xd3/0x390
[   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
[   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   66.703974]  tty_ioctl+0x764/0x1310
[   66.703974]  __x64_sys_ioctl+0x122/0x190
[   66.703974]  do_syscall_64+0x3b/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   66.703974]
[   66.703974] Freed by task 406:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  kasan_set_track+0x21/0x30
[   66.703974]  kasan_set_free_info+0x20/0x30
[   66.703974]  __kasan_slab_free+0x108/0x170
[   66.703974]  kfree+0xb0/0x330
[   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
[   66.703974]  nci_uart_tty_close+0xdf/0x180
[   66.703974]  tty_ldisc_kill+0x73/0x110
[   66.703974]  tty_ldisc_hangup+0x281/0x5b0
[   66.703974]  __tty_hangup.part.0+0x431/0x890
[   66.703974]  tty_release+0x3a8/0xc80
[   66.703974]  __fput+0x1f0/0x8c0
[   66.703974]  task_work_run+0xc9/0x170
[   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
[   66.703974]  syscall_exit_to_user_mode+0x19/0x50
[   66.703974]  do_syscall_64+0x48/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae

To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: realtek: don't parse compatible string for RTL8366S
Alvin Å ipraga [Tue, 12 Apr 2022 15:57:49 +0000 (17:57 +0200)]
net: dsa: realtek: don't parse compatible string for RTL8366S

This switch is not even supported, but if someone were to actually put
this compatible string "realtek,rtl8366s" in their device tree, they
would be greeted with a kernel panic because the probe function would
dereference NULL. So let's just remove it.

Link: https://lore.kernel.org/all/CACRpkdYdKZs0WExXc3=0yPNOwP+oOV60HRz7SRoGjZvYHaT=1g@mail.gmail.com/
Signed-off-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: realtek: fix Kconfig to assure consistent driver linkage
Alvin Å ipraga [Tue, 12 Apr 2022 15:55:27 +0000 (17:55 +0200)]
net: dsa: realtek: fix Kconfig to assure consistent driver linkage

The kernel test robot reported a build failure:

or1k-linux-ld: drivers/net/dsa/realtek/realtek-smi.o:(.rodata+0x16c): undefined reference to `rtl8366rb_variant'

... with the following build configuration:

CONFIG_NET_DSA_REALTEK=y
CONFIG_NET_DSA_REALTEK_SMI=y
CONFIG_NET_DSA_REALTEK_RTL8365MB=y
CONFIG_NET_DSA_REALTEK_RTL8366RB=m

The problem here is that the realtek-smi interface driver gets built-in,
while the rtl8366rb switch subdriver gets built as a module, hence the
symbol rtl8366rb_variant is not reachable when defining the OF device
table in the interface driver.

The Kconfig dependencies don't help in this scenario because they just
say that the subdriver(s) depend on at least one interface driver. In
fact, the subdrivers don't depend on the interface drivers at all, and
can even be built even in their absence. Somewhat strangely, the
interface drivers can also be built in the absence of any subdriver,
BUT, if a subdriver IS enabled, then it must be reachable according to
the linkage of the interface driver: effectively what the IS_REACHABLE()
macro achieves. If it is not reachable, the above kind of linker error
will be observed.

Rather than papering over the above build error by simply using
IS_REACHABLE(), we can do a little better and admit that it is actually
the interface drivers that have a dependency on the subdrivers. So this
patch does exactly that. Specifically, we ensure that:

1. The interface drivers' Kconfig symbols must have a value no greater
   than the value of any subdriver Kconfig symbols.

2. The subdrivers should by default enable both interface drivers, since
   most users probably want at least one of them; those interface
   drivers can be explicitly disabled however.

What this doesn't do is prevent a user from building only a subdriver,
without any interface driver. To that end, add an additional line of
help in the menu to guide users in the right direction.

Link: https://lore.kernel.org/all/202204110757.XIafvVnj-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: aac94001067d ("net: dsa: realtek: add new mdio interface for drivers")
Signed-off-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfp: update nfp_X logging definitions
Dylan Muller [Tue, 12 Apr 2022 15:26:00 +0000 (17:26 +0200)]
nfp: update nfp_X logging definitions

Previously it was not possible to determine which code path was responsible
for generating a certain message after a call to the nfp_X messaging
definitions for cases of duplicate strings. We therefore modify nfp_err,
nfp_warn, nfp_info, nfp_dbg and nfp_printk to print the corresponding file
and line number where the nfp_X definition is used.

Signed-off-by: Dylan Muller <dylan.muller@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'wireless-2022-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 13 Apr 2022 12:13:34 +0000 (13:13 +0100)]
Merge tag 'wireless-2022-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.18

First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ip-ingress-skb-reason'
David S. Miller [Wed, 13 Apr 2022 12:09:57 +0000 (13:09 +0100)]
Merge branch 'ip-ingress-skb-reason'

Menglong Dong says:

====================
net: ip: add skb drop reasons to ip ingress

In the series "net: use kfree_skb_reason() for ip/udp packet receive",
skb drop reasons are added to the basic ingress path of IPv4. And in
the series "net: use kfree_skb_reason() for ip/neighbour", the egress
paths of IPv4 and IPv6 are handled. Related links:

https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/
https://lore.kernel.org/netdev/20220226041831.2058437-1-imagedong@tencent.com/

Seems we still have a lot work to do with IP layer, including IPv6 basic
ingress path, IPv4/IPv6 forwarding, IPv6 exthdrs, fragment and defrag,
etc.

In this series, skb drop reasons are added to the basic ingress path of
IPv6 protocol and IPv4/IPv6 packet forwarding. Following functions, which
are used for IPv6 packet receiving are handled:

  ip6_pkt_drop()
  ip6_rcv_core()
  ip6_protocol_deliver_rcu()

And following functions that used for IPv6 TLV parse are handled:

  ip6_parse_tlv()
  ipv6_hop_ra()
  ipv6_hop_ioam()
  ipv6_hop_jumbo()
  ipv6_hop_calipso()
  ipv6_dest_hao()

Besides, ip_forward() and ip6_forward(), which are used for IPv4/IPv6
forwarding, are also handled. And following new drop reasons are added:

  /* host unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
  SKB_DROP_REASON_IP_INADDRERRORS
  /* network unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
  SKB_DROP_REASON_IP_INNOROUTES
  /* packet size is too big, corresponding to
   * IPSTATS_MIB_INTOOBIGERRORS
   */
  SKB_DROP_REASON_PKT_TOO_BIG

In order to simply the definition and assignment for
'enum skb_drop_reason', some helper functions are introduced in the 1th
patch. I'm not such if this is necessary, but it makes the code simpler.
For example, we can replace the code:

  if (reason == SKB_DROP_REASON_NOT_SPECIFIED)
          reason = SKB_DROP_REASON_IP_INHDR;

with:

  SKB_DR_OR(reason, IP_INHDR);

In the 6th patch, the statistics for skb in ipv6_hop_jum() is removed,
as I think it is redundant. There are two call chains for
ipv6_hop_jumbo(). The first one is:

  ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()

On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.

The second call chain is:

  ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()

And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.

Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.

======================================================================

Here is a basic test for IPv6 forwarding packet drop that monitored by
'dropwatch' tool:

  drop at: ip6_forward+0x81a/0xb70 (0xffffffff86c73f8a)
  origin: software
  input port ifindex: 7
  timestamp: Wed Apr 13 11:51:06 2022 130010176 nsec
  protocol: 0x86dd
  length: 94
  original length: 94
  drop reason: IP_INADDRERRORS

The origin cause of this case is that IPv6 doesn't allow to forward the
packet with LOCAL-LINK saddr, and results the 'IP_INADDRERRORS' drop
reason.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()
Menglong Dong [Wed, 13 Apr 2022 08:16:00 +0000 (16:16 +0800)]
net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()

Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
kfree_skb_reason().

No new reasons are added.

Some paths are ignored, as they are not common, such as encapsulation
on non-final protocol.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: add skb drop reasons to ip6_rcv_core()
Menglong Dong [Wed, 13 Apr 2022 08:15:59 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to ip6_rcv_core()

Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
No new drop reasons are added.

Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
do. Will it be too general and hard to know what happened?

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: add skb drop reasons to TLV parse
Menglong Dong [Wed, 13 Apr 2022 08:15:58 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to TLV parse

Replace kfree_skb() used in TLV encoded option header parsing with
kfree_skb_reason(). Following functions are involved:

ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()

Most skb drops during this process are regarded as 'InHdrErrors',
as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.

However, 'IP_INHDR' is a relatively general reason. Therefore, we
can use other reasons with higher priority in some cases. For example,
'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: remove redundant statistics in ipv6_hop_jumbo()
Menglong Dong [Wed, 13 Apr 2022 08:15:57 +0000 (16:15 +0800)]
net: ipv6: remove redundant statistics in ipv6_hop_jumbo()

There are two call chains for ipv6_hop_jumbo(). The first one is:

ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()

On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.

The second call chain is:

ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()

And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.

Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: icmp: introduce function icmpv6_param_prob_reason()
Menglong Dong [Wed, 13 Apr 2022 08:15:56 +0000 (16:15 +0800)]
net: icmp: introduce function icmpv6_param_prob_reason()

In order to add the skb drop reasons support to icmpv6_param_prob(),
introduce the function icmpv6_param_prob_reason() and make
icmpv6_param_prob() an inline call to it. This new function will be
used in the following patches.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ip: add skb drop reasons to ip forwarding
Menglong Dong [Wed, 13 Apr 2022 08:15:55 +0000 (16:15 +0800)]
net: ip: add skb drop reasons to ip forwarding

Replace kfree_skb() which is used in ip6_forward() and ip_forward()
with kfree_skb_reason().

The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
the case that the length of the packet exceeds MTU and can't
fragment.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: add skb drop reasons to ip6_pkt_drop()
Menglong Dong [Wed, 13 Apr 2022 08:15:54 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to ip6_pkt_drop()

Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason().
No new reason is added.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv4: add skb drop reasons to ip_error()
Menglong Dong [Wed, 13 Apr 2022 08:15:53 +0000 (16:15 +0800)]
net: ipv4: add skb drop reasons to ip_error()

Eventually, I find out the handler function for inputting route lookup
fail: ip_error().

The drop reasons we used in ip_error() are almost corresponding to
IPSTATS_MIB_*, and following new reasons are introduced:

SKB_DROP_REASON_IP_INADDRERRORS
SKB_DROP_REASON_IP_INNOROUTES

Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and
SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding
to IPSTATS_MIB_*, we keep their name still.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoskb: add some helpers for skb drop reasons
Menglong Dong [Wed, 13 Apr 2022 08:15:52 +0000 (16:15 +0800)]
skb: add some helpers for skb drop reasons

In order to simply the definition and assignment for
'enum skb_drop_reason', introduce some helpers.

SKB_DR() is used to define a variable of type 'enum skb_drop_reason'
with the 'SKB_DROP_REASON_NOT_SPECIFIED' initial value.

SKB_DR_SET() is used to set the value of the variable. Seems it is
a little useless? But it makes the code shorter.

SKB_DR_OR() is used to set the value of the variable if it is not set
yet, which means its value is SKB_DROP_REASON_NOT_SPECIFIED.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'octeon_ep-driver'
David S. Miller [Wed, 13 Apr 2022 11:56:32 +0000 (12:56 +0100)]
Merge branch 'octeon_ep-driver'

Veerasenareddy Burru says:

====================
Add octeon_ep driver

This driver implements networking functionality of Marvell's Octeon
PCI Endpoint NIC.

This driver support following devices:
 * Network controller: Cavium, Inc. Device b200

V4 -> V5:
   - Fix warnings reported by clang.
   - Address comments from community reviews.

V3 -> V4:
   - Fix warnings and errors reported by "make W=1 C=1".

V2 -> V3:
   - Fix warnings and errors reported by kernel test robot:
     "Reported-by: kernel test robot <lkp@intel.com>"

V1 -> V2:
    - Address review comments on original patch series.
    - Divide PATCH 1/4 from the original series into 4 patches in
      v2 patch series: PATCH 1/7 to PATCH 4/7.
    - Fix clang build errors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: add ethtool support for Octeon PCI Endpoint NIC
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:03 +0000 (20:35 -0700)]
octeon_ep: add ethtool support for Octeon PCI Endpoint NIC

Add support for the following ethtool commands:

ethtool -i|--driver devname
ethtool devname
ethtool -s devname [speed N] [autoneg on|off] [advertise N]
ethtool -S|--statistics devname

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: add Tx/Rx processing and interrupt support
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:02 +0000 (20:35 -0700)]
octeon_ep: add Tx/Rx processing and interrupt support

Add support to enable MSI-x and register interrupts.
Add support to process Tx and Rx traffic. Includes processing
Tx completions and Rx refill.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: add support for ndo ops
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:01 +0000 (20:35 -0700)]
octeon_ep: add support for ndo ops

Add support for ndo ops to set MAC address, change MTU, get stats.
Add control path support to set MAC address, change MTU, get stats,
set speed, get and set link mode.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: add Tx/Rx ring resource setup and cleanup
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:00 +0000 (20:35 -0700)]
octeon_ep: add Tx/Rx ring resource setup and cleanup

Implement Tx/Rx ring resource allocation and cleanup.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: Add mailbox for control commands
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:59 +0000 (20:34 -0700)]
octeon_ep: Add mailbox for control commands

Add mailbox between host and NIC to send control commands from host to
NIC and receive responses and notifications from NIC to host driver,
like link status update.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: add hardware configuration APIs
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:58 +0000 (20:34 -0700)]
octeon_ep: add hardware configuration APIs

Implement hardware resource init and shutdown helper APIs.
This includes hardware Tx/Rx queue init/enable/disable/reset,
non queue interrupt handler that decodes non-queue interrupt type.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteon_ep: Add driver framework and device initialization
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:57 +0000 (20:34 -0700)]
octeon_ep: Add driver framework and device initialization

Add driver framework and device setup and initialization for Octeon
PCI Endpoint NIC.

Add implementation to load module, initilaize, register network device,
cleanup and unload module.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'br-flush-filtering'
David S. Miller [Wed, 13 Apr 2022 11:46:26 +0000 (12:46 +0100)]
Merge branch 'br-flush-filtering'

Nikolay Aleksandrov says:

====================
net: bridge: add flush filtering support

This patch-set adds support to specify filtering conditions for a bulk
delete (flush) operation. This version uses a new nlmsghdr delete flag
called NLM_F_BULK in combination with a new ndo_fdb_del_bulk op which is
used to signal that the driver supports bulk deletes (that avoids
pushing common mac address checks to ndo_fdb_del implementations and
also has a different prototype and parsed attribute expectations, more
info in patch 03). The new delete flag can be used for any RTM_DEL*
type, implementations just need to be careful with older kernels which
are doing non-strict attribute parses. A new rtnl flag
(RTNL_FLAG_BULK_DEL_SUPPORTED) is used to show that the delete supports
NLM_F_BULK. A proper error is returned if bulk delete is not supported.
For old kernels I use the fact that mac address attribute (lladdr) is
mandatory in the classic fdb del case, but it's not allowed if bulk
deleting so older kernels will error out.

Patch 01 and 02 are minor rtnetlink cleanups to make the code easier to
read. They remove hardcoded values and use names instead. Patch 03 uses
BIT() for rtnl flags.
Patch 04 adds the new NLM_F_BULK delete request modifier, patch 05 adds
the new bulk delete flag and checks for it if the delete requests have
NLM_F_BULK set, it also warns if rtnl register is called with a non-delete
kind and the bulk delete flag is set.
Patch 06 adds the new ndo_fdb_del_bulk call. Patch 07 adds NLM_F_BULK
support to rtnl_fdb_del, on such request strict parsing is used only for
the supported attributes, and if the ndo is implemented it's called, the
NTF_SELF/MASTER rules are the same as for the standard rtnl_fdb_del.
Patch 08 implements bridge-specific minimal ndo_fdb_del_bulk call which
uses the current br_fdb_flush to delete all entries. Patch 09 adds
filtering support to the new bridge flush op which supports target
ifindex (port or bridge), vlan id and flags/state mask. Patch 10 adds
ndm state and flags mask attributes which will be used for filtering.
Patch 11 converts ndm state/flags and their masks to bridge-private flags
and fills them in the filter descriptor for matching. Finally patch 12
fills in the target ifindex (after validating it) and vlan id (already
validated by rtnl_fdb_flush) for matching. Flush filtering is needed
because user-space applications need a quick way to delete only a
specific set of entries, e.g. mlag implementations need a way to flush only
dynamic entries excluding externally learned ones or only externally
learned ones without static entries etc. Also apps usually want to target
only a specific vlan or port/vlan combination. The current 2 flush
operations (per port and bridge-wide) are not extensible and cannot
provide such filtering.

I decided against embedding new attrs into the old flush attributes for
multiple reasons - proper error handling on unsupported attributes,
older kernels silently flushing all, need for a second mechanism to
signal that the attribute should be parsed (e.g. using boolopts),
special treatment for permanent entries.

Examples:
$ bridge fdb flush dev bridge vlan 100 static
< flush all static entries on vlan 100 >
$ bridge fdb flush dev bridge vlan 1 dynamic
< flush all dynamic entries on vlan 1 >
$ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
< flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev ens16 vlan 1 dynamic master
< as above: flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev bridge nooffloaded nopermanent self
< flush all non-offloaded and non-permanent entries >
$ bridge fdb flush dev bridge static noextern_learn
< flush all static entries which are not externally learned >
$ bridge fdb flush dev bridge permanent
< flush all permanent entries >
$ bridge fdb flush dev bridge port bridge permanent
< flush all permanent entries pointing to the bridge itself >

Example of a flush call with unsupported netlink attribute (NDA_DST):
$ bridge fdb flush dev bridge vlan 100 dynamic dst
Error: Unsupported attribute.

Example of a flush call on an older kernel:
$ bridge fdb flush dev bridge dynamic
Error: invalid address.

Example of calling PF_UNSPEC RTM_DELNEIGH which doesn't support bulk delete
with NLM_F_BULK set (ip neigh is changed to add the flag):
$ ip n del 192.168.122.5 lladdr 00:11:22:33:44:55 dev ens3
Error: Bulk delete is not supported.

Note that all flags have their negated version (static vs nostatic etc)
and there are some tricky cases to handle like "static" which in flag
terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
mask matches on both but we need only NUD_NOARP to be set. That's
because permanent entries have both set so we can't just match on
NUD_NOARP. Also note that this flush operation doesn't treat permanent
entries in a special way (fdb_delete vs fdb_delete_local), it will
delete them regardless if any port is using them. We can extend the api
with a flag to do that if needed in the future.

Patch-sets (in order):
 - Initial bulk del infra and fdb flush filtering (this set)
 - iproute2 support
 - selftests

v4: Add and check for rtnl del bulk supported flag when using
    NLM_F_BULK (new patch 05), patches 01 - 03 are also new minor cleanups
    to remove use of raw values and make code easier to read, don't
    rename br_fdb_flush in patch 08, set port ifindex as flush target if
    NDA_IFINDEX is missing and flush was called with port netdev and
    NTF_MASTER (patch 12).

v3: Add NLM_F_BULK delete modifier and ndo_fdb_del_bulk callback,
    patches 01 - 03 and 06 are new. Patch 04 is changed to implement
    bulk_del instead of flush, patches 05, 07 and 08 are adjusted to
    use NDA_ attributes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for flush filtering based on ifindex and vlan
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:02 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ifindex and vlan

Add support for fdb flush filtering based on destination ifindex and
vlan id. The ifindex must either match a port's device ifindex or the
bridge's. The vlan support is trivial since it's already validated by
rtnl_fdb_del, we just need to fill it in.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for flush filtering based on ndm flags and state
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:01 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ndm flags and state

Add support for fdb flush filtering based on ndm flags and state. NDM
state and flags are mapped to bridge-specific flags and matched
according to the specified masks. NTF_USE is used to represent
added_by_user flag since it sets it on fdb add and we don't have a 1:1
mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
ignored.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add ndm flags and state mask attributes
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:00 +0000 (13:52 +0300)]
net: rtnetlink: add ndm flags and state mask attributes

Add ndm flags/state masks which will be used for bulk delete filtering.
All of these are used by the bridge and vxlan drivers. Also minimal attr
policy validation is added, it is up to ndo_fdb_del_bulk implementers to
further validate them.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for fine-grained flushing
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:59 +0000 (13:51 +0300)]
net: bridge: fdb: add support for fine-grained flushing

Add the ability to specify exactly which fdbs to be flushed. They are
described by a new structure - net_bridge_fdb_flush_desc. Currently it
can match on port/bridge ifindex, vlan id and fdb flags. It is used to
describe the existing dynamic fdb flush operation. Note that this flush
operation doesn't treat permanent entries in a special way (fdb_delete vs
fdb_delete_local), it will delete them regardless if any port is using
them, so currently it can't directly replace deletes which need to handle
that case, although we can extend it later for that too.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:58 +0000 (13:51 +0300)]
net: bridge: fdb: add ndo_fdb_del_bulk

Add a minimal ndo_fdb_del_bulk implementation which flushes all entries.
Support for more fine-grained filtering will be added in the following
patches.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:57 +0000 (13:51 +0300)]
net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del

When NLM_F_BULK is specified in a fdb del message we need to handle it
differently. First since this is a new call we can strictly validate the
passed attributes, at first only ifindex and vlan are allowed as these
will be the initially supported filter attributes, any other attribute
is rejected. The mac address is no longer mandatory, but we use it
to error out in older kernels because it cannot be specified with bulk
request (the attribute is not allowed) and then we have to dispatch
the call to ndo_fdb_del_bulk if the device supports it. The del bulk
callback can do further validation of the attributes if necessary.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:56 +0000 (13:51 +0300)]
net: add ndo_fdb_del_bulk

Add a new netdev op called ndo_fdb_del_bulk, it will be later used for
driver-specific bulk delete implementation dispatched from rtnetlink. The
first user will be the bridge, we need it to signal to rtnetlink from
the driver that we support bulk delete operation (NLM_F_BULK).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add bulk delete support flag
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:55 +0000 (13:51 +0300)]
net: rtnetlink: add bulk delete support flag

Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to
verify that the delete operation allows bulk object deletion. Also emit
a warning if anyone tries to set it for non-delete kind.

Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: netlink: add NLM_F_BULK delete request modifier
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:54 +0000 (13:51 +0300)]
net: netlink: add NLM_F_BULK delete request modifier

Add a new delete request modifier called NLM_F_BULK which, when
supported, would cause the request to delete multiple objects. The flag
is a convenient way to signal that a multiple delete operation is
requested which can be gradually added to different delete requests. In
order to make sure older kernels will error out if the operation is not
supported instead of doing something unintended we have to break a
required condition when implementing support for this flag, f.e. for
neighbors we will omit the mandatory mac address attribute.
Initially it will be used to add flush with filtering support for bridge
fdbs, but it also opens the door to add similar support to others.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: use BIT for flag values
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:53 +0000 (13:51 +0300)]
net: rtnetlink: use BIT for flag values

Use BIT to define flag values.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add helper to extract msg type's kind
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:52 +0000 (13:51 +0300)]
net: rtnetlink: add helper to extract msg type's kind

Add a helper which extracts the msg type's kind using the kind mask (0x3).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add msg kind names
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:51 +0000 (13:51 +0300)]
net: rtnetlink: add msg kind names

Add rtnl kind names instead of using raw values. We'll need to
check for DEL kind later to validate bulk flag support.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ftgmac100: access hardware register after clock ready
Dylan Hung [Tue, 12 Apr 2022 11:48:59 +0000 (19:48 +0800)]
net: ftgmac100: access hardware register after clock ready

AST2600 MAC register 0x58 is writable only when the MAC clock is
enabled.  Usually, the MAC clock is enabled by the bootloader so
register 0x58 is set normally when the bootloader is involved.  To make
ast2600 ftgmac100 work without the bootloader, postpone the register
write until the clock is ready.

Fixes: 137d23cea1c0 ("net: ftgmac100: Fix Aspeed ast2600 TX hang issue")
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-ti-storm-prevention-support'
David S. Miller [Wed, 13 Apr 2022 11:42:40 +0000 (12:42 +0100)]
Merge branch 'net-ti-storm-prevention-support'

Grygorii Strashko says:

====================
net: ethernet: ti: enable bc/mc storm prevention support

This series first adds supports for the ALE feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC storm
prevention.

And then enables corresponding support for ingress broadcast(BC)/multicast(MC)
packets rate limiting for TI CPSW switchdev and AM65x/J221E CPSW_NUSS drivers by
implementing HW offload for simple tc-flower with policer action with matches
on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

The solution inspired patch from Vladimir Oltean [1].

Changes in v3:
  - comments applied
  - policer validation added

Changes in v2:
 - switch to packet-per-second policing introduced by
   commit 2ffe0395288a ("net/sched: act_police: add support for packet-per-second policing") [2]

v2: https://patchwork.kernel.org/project/netdevbpf/cover/20211101170122.19160-1-grygorii.strashko@ti.com/
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20201114035654.32658-1-grygorii.strashko@ti.com/

[1] https://lore.kernel.org/patchwork/patch/1217254/
[2] https://patchwork.kernel.org/project/netdevbpf/cover/20210312140831.23346-1-simon.horman@netronome.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw_new: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:29 +0000 (13:29 +0300)]
net: ethernet: ti: cpsw_new: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI CPSW switchdev driver (the corresponding ALE support
was added in previous patch) by implementing HW offload for simple
tc-flower with policer action with matches on dst_mac:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 10000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:28 +0000 (13:29 +0300)]
net: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI AM65x CPSW driver (the corresponding ALE support was
added in previous patch) by implementing HW offload for simple tc-flower
with policer action with matches on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers: net: cpsw: ale: add broadcast/multicast rate limit support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:27 +0000 (13:29 +0300)]
drivers: net: cpsw: ale: add broadcast/multicast rate limit support

The CPSW ALE supports feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC
storm prevention.

The ALE BC/MC packet rate limit configuration consist of two parts:
- global
  ALE_CONTROL.ENABLE_RATE_LIMIT bit 0 which enables rate limiting globally
  ALE_PRESCALE.PRESCALE specifies rate limiting interval
- per-port
  ALE_PORTCTLx.BCASTMCAST/_LIMIT specifies number of BC/MC packets allowed
  per rate limiting interval.
  When port.BCASTMCAST/_LIMIT is 0 rate limiting is disabled for Port.

When BC/MC packet rate limiting is enabled the number of allowed packets
per/sec is defined as:
  number_of_packets/sec = (Fclk / ALE_PRESCALE) * port.BCASTMCAST/_LIMIT

Hence, the ALE_PRESCALE configuration is common for all ports the 1ms
interval is selected and configured during ALE initialization while
port.BCAST/MCAST_LIMIT are configured per-port.
This allows to achieve:
 - min number_of_packets = 1000 when port.BCAST/MCAST_LIMIT = 1
 - max number_of_packets = 1000 * 255 = 255000
   when port.BCAST/MCAST_LIMIT = 0xFF

The ALE_CONTROL.ENABLE_RATE_LIMIT can also be enabled once during ALE
initialization as rate limiting enabled by non zero port.BCASTMCAST/_LIMIT
values.

This patch implements above logic in ALE and adds new ALE APIs
 cpsw_ale_rx_ratelimit_bc();
 cpsw_ale_rx_ratelimit_mc();

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phylink: remove phylink_helper_basex_speed()
Russell King (Oracle) [Tue, 12 Apr 2022 10:24:00 +0000 (11:24 +0100)]
net: phylink: remove phylink_helper_basex_speed()

As there are now no users of phylink_helper_basex_speed(), we can
remove this obsolete functionality.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoRevert "net: dsa: setup master before ports"
Vladimir Oltean [Tue, 12 Apr 2022 09:44:26 +0000 (12:44 +0300)]
Revert "net: dsa: setup master before ports"

This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae.

dsa_slave_change_mtu() updates the MTU of the DSA master and of the
associated CPU port, but only if it detects a change to the master MTU.

The blamed commit in the Fixes: tag below addressed a regression where
dsa_slave_change_mtu() would return early and not do anything due to
ds->ops->port_change_mtu() not being implemented.

However, that commit also had the effect that the master MTU got set up
to the correct value by dsa_master_setup(), but the associated CPU port's
MTU did not get updated. This causes breakage for drivers that rely on
the ->port_change_mtu() DSA call to account for the tagging overhead on
the CPU port, and don't set up the initial MTU during the setup phase.

Things actually worked before because they were in a fragile equilibrium
where dsa_slave_change_mtu() was called before dsa_master_setup() was.
So dsa_slave_change_mtu() could actually detect a change and update the
CPU port MTU too.

Restore the code to the way things used to work by reverting the reorder
of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did
not have a concrete motivation going for it anyway, it just looked
better.

Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomacvlan: Fix leaking skb in source mode with nodst option
Martin Willi [Tue, 12 Apr 2022 09:34:57 +0000 (11:34 +0200)]
macvlan: Fix leaking skb in source mode with nodst option

The MACVLAN receive handler clones skbs to all matching source MACVLAN
interfaces, before it passes the packet along to match on destination
based MACVLANs.

When using the MACVLAN nodst mode, passing the packet to destination based
MACVLANs is omitted and the handler returns with RX_HANDLER_CONSUMED.
However, the passed skb is not freed, leaking for any packet processed
with the nodst option.

Properly free the skb when consuming packets to fix that leak.

Fixes: 427f0c8c194b ("macvlan: Add nodst option to macvlan type source")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>