]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
3 years agoMerge branch 'br-flush-filtering'
David S. Miller [Wed, 13 Apr 2022 11:46:26 +0000 (12:46 +0100)]
Merge branch 'br-flush-filtering'

Nikolay Aleksandrov says:

====================
net: bridge: add flush filtering support

This patch-set adds support to specify filtering conditions for a bulk
delete (flush) operation. This version uses a new nlmsghdr delete flag
called NLM_F_BULK in combination with a new ndo_fdb_del_bulk op which is
used to signal that the driver supports bulk deletes (that avoids
pushing common mac address checks to ndo_fdb_del implementations and
also has a different prototype and parsed attribute expectations, more
info in patch 03). The new delete flag can be used for any RTM_DEL*
type, implementations just need to be careful with older kernels which
are doing non-strict attribute parses. A new rtnl flag
(RTNL_FLAG_BULK_DEL_SUPPORTED) is used to show that the delete supports
NLM_F_BULK. A proper error is returned if bulk delete is not supported.
For old kernels I use the fact that mac address attribute (lladdr) is
mandatory in the classic fdb del case, but it's not allowed if bulk
deleting so older kernels will error out.

Patch 01 and 02 are minor rtnetlink cleanups to make the code easier to
read. They remove hardcoded values and use names instead. Patch 03 uses
BIT() for rtnl flags.
Patch 04 adds the new NLM_F_BULK delete request modifier, patch 05 adds
the new bulk delete flag and checks for it if the delete requests have
NLM_F_BULK set, it also warns if rtnl register is called with a non-delete
kind and the bulk delete flag is set.
Patch 06 adds the new ndo_fdb_del_bulk call. Patch 07 adds NLM_F_BULK
support to rtnl_fdb_del, on such request strict parsing is used only for
the supported attributes, and if the ndo is implemented it's called, the
NTF_SELF/MASTER rules are the same as for the standard rtnl_fdb_del.
Patch 08 implements bridge-specific minimal ndo_fdb_del_bulk call which
uses the current br_fdb_flush to delete all entries. Patch 09 adds
filtering support to the new bridge flush op which supports target
ifindex (port or bridge), vlan id and flags/state mask. Patch 10 adds
ndm state and flags mask attributes which will be used for filtering.
Patch 11 converts ndm state/flags and their masks to bridge-private flags
and fills them in the filter descriptor for matching. Finally patch 12
fills in the target ifindex (after validating it) and vlan id (already
validated by rtnl_fdb_flush) for matching. Flush filtering is needed
because user-space applications need a quick way to delete only a
specific set of entries, e.g. mlag implementations need a way to flush only
dynamic entries excluding externally learned ones or only externally
learned ones without static entries etc. Also apps usually want to target
only a specific vlan or port/vlan combination. The current 2 flush
operations (per port and bridge-wide) are not extensible and cannot
provide such filtering.

I decided against embedding new attrs into the old flush attributes for
multiple reasons - proper error handling on unsupported attributes,
older kernels silently flushing all, need for a second mechanism to
signal that the attribute should be parsed (e.g. using boolopts),
special treatment for permanent entries.

Examples:
$ bridge fdb flush dev bridge vlan 100 static
< flush all static entries on vlan 100 >
$ bridge fdb flush dev bridge vlan 1 dynamic
< flush all dynamic entries on vlan 1 >
$ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
< flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev ens16 vlan 1 dynamic master
< as above: flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev bridge nooffloaded nopermanent self
< flush all non-offloaded and non-permanent entries >
$ bridge fdb flush dev bridge static noextern_learn
< flush all static entries which are not externally learned >
$ bridge fdb flush dev bridge permanent
< flush all permanent entries >
$ bridge fdb flush dev bridge port bridge permanent
< flush all permanent entries pointing to the bridge itself >

Example of a flush call with unsupported netlink attribute (NDA_DST):
$ bridge fdb flush dev bridge vlan 100 dynamic dst
Error: Unsupported attribute.

Example of a flush call on an older kernel:
$ bridge fdb flush dev bridge dynamic
Error: invalid address.

Example of calling PF_UNSPEC RTM_DELNEIGH which doesn't support bulk delete
with NLM_F_BULK set (ip neigh is changed to add the flag):
$ ip n del 192.168.122.5 lladdr 00:11:22:33:44:55 dev ens3
Error: Bulk delete is not supported.

Note that all flags have their negated version (static vs nostatic etc)
and there are some tricky cases to handle like "static" which in flag
terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
mask matches on both but we need only NUD_NOARP to be set. That's
because permanent entries have both set so we can't just match on
NUD_NOARP. Also note that this flush operation doesn't treat permanent
entries in a special way (fdb_delete vs fdb_delete_local), it will
delete them regardless if any port is using them. We can extend the api
with a flag to do that if needed in the future.

Patch-sets (in order):
 - Initial bulk del infra and fdb flush filtering (this set)
 - iproute2 support
 - selftests

v4: Add and check for rtnl del bulk supported flag when using
    NLM_F_BULK (new patch 05), patches 01 - 03 are also new minor cleanups
    to remove use of raw values and make code easier to read, don't
    rename br_fdb_flush in patch 08, set port ifindex as flush target if
    NDA_IFINDEX is missing and flush was called with port netdev and
    NTF_MASTER (patch 12).

v3: Add NLM_F_BULK delete modifier and ndo_fdb_del_bulk callback,
    patches 01 - 03 and 06 are new. Patch 04 is changed to implement
    bulk_del instead of flush, patches 05, 07 and 08 are adjusted to
    use NDA_ attributes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for flush filtering based on ifindex and vlan
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:02 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ifindex and vlan

Add support for fdb flush filtering based on destination ifindex and
vlan id. The ifindex must either match a port's device ifindex or the
bridge's. The vlan support is trivial since it's already validated by
rtnl_fdb_del, we just need to fill it in.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for flush filtering based on ndm flags and state
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:01 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ndm flags and state

Add support for fdb flush filtering based on ndm flags and state. NDM
state and flags are mapped to bridge-specific flags and matched
according to the specified masks. NTF_USE is used to represent
added_by_user flag since it sets it on fdb add and we don't have a 1:1
mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
ignored.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add ndm flags and state mask attributes
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:00 +0000 (13:52 +0300)]
net: rtnetlink: add ndm flags and state mask attributes

Add ndm flags/state masks which will be used for bulk delete filtering.
All of these are used by the bridge and vxlan drivers. Also minimal attr
policy validation is added, it is up to ndo_fdb_del_bulk implementers to
further validate them.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add support for fine-grained flushing
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:59 +0000 (13:51 +0300)]
net: bridge: fdb: add support for fine-grained flushing

Add the ability to specify exactly which fdbs to be flushed. They are
described by a new structure - net_bridge_fdb_flush_desc. Currently it
can match on port/bridge ifindex, vlan id and fdb flags. It is used to
describe the existing dynamic fdb flush operation. Note that this flush
operation doesn't treat permanent entries in a special way (fdb_delete vs
fdb_delete_local), it will delete them regardless if any port is using
them, so currently it can't directly replace deletes which need to handle
that case, although we can extend it later for that too.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: fdb: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:58 +0000 (13:51 +0300)]
net: bridge: fdb: add ndo_fdb_del_bulk

Add a minimal ndo_fdb_del_bulk implementation which flushes all entries.
Support for more fine-grained filtering will be added in the following
patches.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:57 +0000 (13:51 +0300)]
net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del

When NLM_F_BULK is specified in a fdb del message we need to handle it
differently. First since this is a new call we can strictly validate the
passed attributes, at first only ifindex and vlan are allowed as these
will be the initially supported filter attributes, any other attribute
is rejected. The mac address is no longer mandatory, but we use it
to error out in older kernels because it cannot be specified with bulk
request (the attribute is not allowed) and then we have to dispatch
the call to ndo_fdb_del_bulk if the device supports it. The del bulk
callback can do further validation of the attributes if necessary.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:56 +0000 (13:51 +0300)]
net: add ndo_fdb_del_bulk

Add a new netdev op called ndo_fdb_del_bulk, it will be later used for
driver-specific bulk delete implementation dispatched from rtnetlink. The
first user will be the bridge, we need it to signal to rtnetlink from
the driver that we support bulk delete operation (NLM_F_BULK).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add bulk delete support flag
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:55 +0000 (13:51 +0300)]
net: rtnetlink: add bulk delete support flag

Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to
verify that the delete operation allows bulk object deletion. Also emit
a warning if anyone tries to set it for non-delete kind.

Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: netlink: add NLM_F_BULK delete request modifier
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:54 +0000 (13:51 +0300)]
net: netlink: add NLM_F_BULK delete request modifier

Add a new delete request modifier called NLM_F_BULK which, when
supported, would cause the request to delete multiple objects. The flag
is a convenient way to signal that a multiple delete operation is
requested which can be gradually added to different delete requests. In
order to make sure older kernels will error out if the operation is not
supported instead of doing something unintended we have to break a
required condition when implementing support for this flag, f.e. for
neighbors we will omit the mandatory mac address attribute.
Initially it will be used to add flush with filtering support for bridge
fdbs, but it also opens the door to add similar support to others.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: use BIT for flag values
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:53 +0000 (13:51 +0300)]
net: rtnetlink: use BIT for flag values

Use BIT to define flag values.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add helper to extract msg type's kind
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:52 +0000 (13:51 +0300)]
net: rtnetlink: add helper to extract msg type's kind

Add a helper which extracts the msg type's kind using the kind mask (0x3).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtnetlink: add msg kind names
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:51 +0000 (13:51 +0300)]
net: rtnetlink: add msg kind names

Add rtnl kind names instead of using raw values. We'll need to
check for DEL kind later to validate bulk flag support.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-ti-storm-prevention-support'
David S. Miller [Wed, 13 Apr 2022 11:42:40 +0000 (12:42 +0100)]
Merge branch 'net-ti-storm-prevention-support'

Grygorii Strashko says:

====================
net: ethernet: ti: enable bc/mc storm prevention support

This series first adds supports for the ALE feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC storm
prevention.

And then enables corresponding support for ingress broadcast(BC)/multicast(MC)
packets rate limiting for TI CPSW switchdev and AM65x/J221E CPSW_NUSS drivers by
implementing HW offload for simple tc-flower with policer action with matches
on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

The solution inspired patch from Vladimir Oltean [1].

Changes in v3:
  - comments applied
  - policer validation added

Changes in v2:
 - switch to packet-per-second policing introduced by
   commit 2ffe0395288a ("net/sched: act_police: add support for packet-per-second policing") [2]

v2: https://patchwork.kernel.org/project/netdevbpf/cover/20211101170122.19160-1-grygorii.strashko@ti.com/
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20201114035654.32658-1-grygorii.strashko@ti.com/

[1] https://lore.kernel.org/patchwork/patch/1217254/
[2] https://patchwork.kernel.org/project/netdevbpf/cover/20210312140831.23346-1-simon.horman@netronome.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw_new: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:29 +0000 (13:29 +0300)]
net: ethernet: ti: cpsw_new: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI CPSW switchdev driver (the corresponding ALE support
was added in previous patch) by implementing HW offload for simple
tc-flower with policer action with matches on dst_mac:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 10000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:28 +0000 (13:29 +0300)]
net: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI AM65x CPSW driver (the corresponding ALE support was
added in previous patch) by implementing HW offload for simple tc-flower
with policer action with matches on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers: net: cpsw: ale: add broadcast/multicast rate limit support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:27 +0000 (13:29 +0300)]
drivers: net: cpsw: ale: add broadcast/multicast rate limit support

The CPSW ALE supports feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC
storm prevention.

The ALE BC/MC packet rate limit configuration consist of two parts:
- global
  ALE_CONTROL.ENABLE_RATE_LIMIT bit 0 which enables rate limiting globally
  ALE_PRESCALE.PRESCALE specifies rate limiting interval
- per-port
  ALE_PORTCTLx.BCASTMCAST/_LIMIT specifies number of BC/MC packets allowed
  per rate limiting interval.
  When port.BCASTMCAST/_LIMIT is 0 rate limiting is disabled for Port.

When BC/MC packet rate limiting is enabled the number of allowed packets
per/sec is defined as:
  number_of_packets/sec = (Fclk / ALE_PRESCALE) * port.BCASTMCAST/_LIMIT

Hence, the ALE_PRESCALE configuration is common for all ports the 1ms
interval is selected and configured during ALE initialization while
port.BCAST/MCAST_LIMIT are configured per-port.
This allows to achieve:
 - min number_of_packets = 1000 when port.BCAST/MCAST_LIMIT = 1
 - max number_of_packets = 1000 * 255 = 255000
   when port.BCAST/MCAST_LIMIT = 0xFF

The ALE_CONTROL.ENABLE_RATE_LIMIT can also be enabled once during ALE
initialization as rate limiting enabled by non zero port.BCASTMCAST/_LIMIT
values.

This patch implements above logic in ALE and adds new ALE APIs
 cpsw_ale_rx_ratelimit_bc();
 cpsw_ale_rx_ratelimit_mc();

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phylink: remove phylink_helper_basex_speed()
Russell King (Oracle) [Tue, 12 Apr 2022 10:24:00 +0000 (11:24 +0100)]
net: phylink: remove phylink_helper_basex_speed()

As there are now no users of phylink_helper_basex_speed(), we can
remove this obsolete functionality.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: mtk_eth_soc: use after free in __mtk_ppe_check_skb()
Dan Carpenter [Tue, 12 Apr 2022 09:24:19 +0000 (12:24 +0300)]
net: ethernet: mtk_eth_soc: use after free in __mtk_ppe_check_skb()

The __mtk_foe_entry_clear() function frees "entry" so we have to use
the _safe() version of hlist_for_each_entry() to prevent a use after
free.

Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: am65-cpsw-nuss: using pm_runtime_resume_and_get instead of pm_runt...
Minghao Chi [Tue, 12 Apr 2022 09:05:15 +0000 (09:05 +0000)]
net: ethernet: ti: am65-cpsw-nuss: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoNFC: NULL out the dev->rfkill to prevent UAF
Lin Ma [Tue, 12 Apr 2022 05:32:08 +0000 (13:32 +0800)]
NFC: NULL out the dev->rfkill to prevent UAF

Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
assumes the device_is_registered() in function nfc_dev_up() will help
to check when the rfkill is unregistered. However, this check only
take effect when device_del(&dev->dev) is done in nfc_unregister_device().
Hence, the rfkill object is still possible be dereferenced.

The crash trace in latest kernel (5.18-rc2):

[   68.760105] ==================================================================
[   68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750
[   68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313
[   68.760756]
[   68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4
[   68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   68.760756] Call Trace:
[   68.760756]  <TASK>
[   68.760756]  dump_stack_lvl+0x57/0x7d
[   68.760756]  print_report.cold+0x5e/0x5db
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  kasan_report+0xbe/0x1c0
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  __lock_acquire+0x3ec1/0x6750
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? register_lock_class+0x18d0/0x18d0
[   68.760756]  lock_acquire+0x1ac/0x4f0
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? mutex_lock_io_nested+0x12c0/0x12c0
[   68.760756]  ? nla_get_range_signed+0x540/0x540
[   68.760756]  ? _raw_spin_lock_irqsave+0x4e/0x50
[   68.760756]  _raw_spin_lock_irqsave+0x39/0x50
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  rfkill_blocked+0xe/0x60
[   68.760756]  nfc_dev_up+0x84/0x260
[   68.760756]  nfc_genl_dev_up+0x90/0xe0
[   68.760756]  genl_family_rcv_msg_doit+0x1f4/0x2f0
[   68.760756]  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230
[   68.760756]  ? security_capable+0x51/0x90
[   68.760756]  genl_rcv_msg+0x280/0x500
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? lock_acquire+0x1ac/0x4f0
[   68.760756]  ? nfc_genl_dev_down+0xe0/0xe0
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  netlink_rcv_skb+0x11b/0x340
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? netlink_ack+0x9c0/0x9c0
[   68.760756]  ? netlink_deliver_tap+0x136/0xb00
[   68.760756]  genl_rcv+0x1f/0x30
[   68.760756]  netlink_unicast+0x430/0x710
[   68.760756]  ? memset+0x20/0x40
[   68.760756]  ? netlink_attachskb+0x740/0x740
[   68.760756]  ? __build_skb_around+0x1f4/0x2a0
[   68.760756]  netlink_sendmsg+0x75d/0xc00
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  sock_sendmsg+0xdf/0x110
[   68.760756]  __sys_sendto+0x19e/0x270
[   68.760756]  ? __ia32_sys_getpeername+0xa0/0xa0
[   68.760756]  ? fd_install+0x178/0x4c0
[   68.760756]  ? fd_install+0x195/0x4c0
[   68.760756]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
[   68.760756]  __x64_sys_sendto+0xd8/0x1b0
[   68.760756]  ? lockdep_hardirqs_on+0xbf/0x130
[   68.760756]  ? syscall_enter_from_user_mode+0x1d/0x50
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756] RIP: 0033:0x7f67fb50e6b3
...
[   68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[   68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3
[   68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003
[   68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c
[   68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e
[   68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003

[   68.760756]  </TASK>
[   68.760756]
[   68.760756] Allocated by task 279:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  __kasan_kmalloc+0x81/0xa0
[   68.760756]  rfkill_alloc+0x7f/0x280
[   68.760756]  nfc_register_device+0xa3/0x1a0
[   68.760756]  nci_register_device+0x77a/0xad0
[   68.760756]  nfcmrvl_nci_register_dev+0x20b/0x2c0
[   68.760756]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   68.760756]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   68.760756]  tty_ioctl+0x764/0x1310
[   68.760756]  __x64_sys_ioctl+0x122/0x190
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756]
[   68.760756] Freed by task 314:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  kasan_set_track+0x21/0x30
[   68.760756]  kasan_set_free_info+0x20/0x30
[   68.760756]  __kasan_slab_free+0x108/0x170
[   68.760756]  kfree+0xb0/0x330
[   68.760756]  device_release+0x96/0x200
[   68.760756]  kobject_put+0xf9/0x1d0
[   68.760756]  nfc_unregister_device+0x77/0x190
[   68.760756]  nfcmrvl_nci_unregister_dev+0x88/0xd0
[   68.760756]  nci_uart_tty_close+0xdf/0x180
[   68.760756]  tty_ldisc_kill+0x73/0x110
[   68.760756]  tty_ldisc_hangup+0x281/0x5b0
[   68.760756]  __tty_hangup.part.0+0x431/0x890
[   68.760756]  tty_release+0x3a8/0xc80
[   68.760756]  __fput+0x1f0/0x8c0
[   68.760756]  task_work_run+0xc9/0x170
[   68.760756]  exit_to_user_mode_prepare+0x194/0x1a0
[   68.760756]  syscall_exit_to_user_mode+0x19/0x50
[   68.760756]  do_syscall_64+0x48/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This patch just add the null out of dev->rfkill to make sure such
dereference cannot happen. This is safe since the device_lock() already
protect the check/write from data race.

Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: exthdrs: use swap() instead of open coding it
Guo Zhengkui [Tue, 12 Apr 2022 03:20:58 +0000 (11:20 +0800)]
ipv6: exthdrs: use swap() instead of open coding it

Address the following coccicheck warning:
net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap()

by using swap() for the swapping of variable values and drop
the tmp (`addr`) variable that is not needed any more.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: fib_rule_tests: add support to select a test to run
Alaa Mohamed [Tue, 12 Apr 2022 02:04:31 +0000 (04:04 +0200)]
selftests: net: fib_rule_tests: add support to select a test to run

Add boilerplate test loop in test to run all tests
in fib_rule_tests.sh

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: mtk_eth_soc: use standard property for cci-control-port
Lorenzo Bianconi [Mon, 11 Apr 2022 10:13:25 +0000 (12:13 +0200)]
net: ethernet: mtk_eth_soc: use standard property for cci-control-port

Rely on standard cci-control-port property to identify CCI port
reference.
Update mt7622 dts binding.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Wed, 13 Apr 2022 10:57:55 +0000 (11:57 +0100)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2022-04-12

This series contains updates to i40e and ice drivers.

Joe Damato adds TSO support for MPLS packets on i40e and ice drivers. He
also adds tracking and reporting of tx_stopped statistic for i40e.

Nabil S. Alramli adds reporting of tx_restart to ethtool for i40e.

Mateusz adds new device id support for i40e.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'tls-rx-refactor-part-3'
David S. Miller [Wed, 13 Apr 2022 10:45:39 +0000 (11:45 +0100)]
Merge branch 'tls-rx-refactor-part-3'

Jakub Kicinski says:

====================
tls: rx: random refactoring part 3

TLS Rx refactoring. Part 3 of 3. This set is mostly around rx_list
and async processing. The last two patches are minor optimizations.
A couple of features to follow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: only copy IV from the packet for TLS 1.2
Jakub Kicinski [Mon, 11 Apr 2022 19:19:17 +0000 (12:19 -0700)]
tls: rx: only copy IV from the packet for TLS 1.2

TLS 1.3 and ChaChaPoly don't carry IV in the packet.
The code before this change would copy out iv_size
worth of whatever followed the TLS header in the packet
and then for TLS 1.3 | ChaCha overwrite that with
the sequence number. Waste of cycles especially
with TLS 1.2 being close to dead and TLS 1.3 being
the common case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: use MAX_IV_SIZE for allocations
Jakub Kicinski [Mon, 11 Apr 2022 19:19:16 +0000 (12:19 -0700)]
tls: rx: use MAX_IV_SIZE for allocations

IVs are 8 or 16 bytes, no point reading out the exact value
for quantities this small.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: use async as an in-out argument
Jakub Kicinski [Mon, 11 Apr 2022 19:19:15 +0000 (12:19 -0700)]
tls: rx: use async as an in-out argument

Propagating EINPROGRESS thru multiple layers of functions is
error prone. Use darg->async as an in/out argument, like we
use darg->zc today. On input it tells the code if async is
allowed, on output if it took place.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: return the already-copied data on crypto error
Jakub Kicinski [Mon, 11 Apr 2022 19:19:14 +0000 (12:19 -0700)]
tls: rx: return the already-copied data on crypto error

async crypto handler will report the socket error no need
to report it again. We can, however, let the data we already
copied be reported to user space but we need to make sure
the error will be reported next time around.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: treat process_rx_list() errors as transient
Jakub Kicinski [Mon, 11 Apr 2022 19:19:13 +0000 (12:19 -0700)]
tls: rx: treat process_rx_list() errors as transient

process_rx_list() only fails if it can't copy data to user
space. There is no point recording the error onto sk->sk_err
or giving up on the data which was read partially. Treat
the return value like a normal socket partial read.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: assume crypto always calls our callback
Jakub Kicinski [Mon, 11 Apr 2022 19:19:12 +0000 (12:19 -0700)]
tls: rx: assume crypto always calls our callback

If crypto didn't always invoke our callback for async
we'd not be clearing skb->sk and would crash in the
skb core when freeing it. This if must be dead code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: don't handle TLS 1.3 in the async crypto callback
Jakub Kicinski [Mon, 11 Apr 2022 19:19:11 +0000 (12:19 -0700)]
tls: rx: don't handle TLS 1.3 in the async crypto callback

Async crypto never worked with TLS 1.3 and was explicitly disabled in
commit 8497ded2d16c ("net/tls: Disable async decrytion for tls1.3").
There's no need for us to handle TLS 1.3 padding in the async cb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: move counting TlsDecryptErrors for sync
Jakub Kicinski [Mon, 11 Apr 2022 19:19:10 +0000 (12:19 -0700)]
tls: rx: move counting TlsDecryptErrors for sync

Move counting TlsDecryptErrors to tls_do_decryption()
where differences between sync and async crypto are
reconciled.

No functional changes, this code just always gave
me a pause.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: reuse leave_on_list label for psock
Jakub Kicinski [Mon, 11 Apr 2022 19:19:09 +0000 (12:19 -0700)]
tls: rx: reuse leave_on_list label for psock

The code is identical, we can save a few LoC.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: rx: consistently use unlocked accessors for rx_list
Jakub Kicinski [Mon, 11 Apr 2022 19:19:08 +0000 (12:19 -0700)]
tls: rx: consistently use unlocked accessors for rx_list

rx_list is protected by the socket lock, no need to take
the built-in spin lock on accesses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoixp4xx_eth: fix error check return value of platform_get_irq()
Lv Ruyi [Tue, 12 Apr 2022 08:51:26 +0000 (08:51 +0000)]
ixp4xx_eth: fix error check return value of platform_get_irq()

platform_get_irq() return negative value on failure, so null check of
return value is incorrect. Fix it by comparing whether it is less than
zero.

Fixes: 9055a2f59162 ("ixp4xx_eth: make ptp support a platform driver")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220412085126.2532924-1-lv.ruyi@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethernet: ti: cpsw: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Minghao Chi [Tue, 12 Apr 2022 08:28:47 +0000 (08:28 +0000)]
net: ethernet: ti: cpsw: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and
pm_runtime_put_noidle. This change is just to simplify the code, no
actual functional changes.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Link: https://lore.kernel.org/r/20220412082847.2532584-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agofou: Remove XRFM from NET_FOU Kconfig
Coco Li [Mon, 11 Apr 2022 21:37:17 +0000 (14:37 -0700)]
fou: Remove XRFM from NET_FOU Kconfig

XRFM is no longer needed for configuring FOU tunnels
(CONFIG_NET_FOU_IP_TUNNELS), remove from Kconfig.

Also remove the xrfm.h dependency in fou.c. It was
added in '23461551c006 ("fou: Support for foo-over-udp RX path")'
for depencies of udp_del_offload and udp_offloads, which were removed in
'd92283e338f6 ("fou: change to use UDP socket GRO")'.

Built and installed kernel and setup GUE/FOU tunnels.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Link: https://lore.kernel.org/r/20220411213717.3688789-1-lixiaoyan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoi40e: Add Ethernet Connection X722 for 10GbE SFP+ support
Mateusz Palczewski [Tue, 29 Mar 2022 07:35:43 +0000 (09:35 +0200)]
i40e: Add Ethernet Connection X722 for 10GbE SFP+ support

Add support for Ethernet Connection X722 for 10GbE SFP+ cards.
Make possible for the driver to bind to the card.

Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoi40e: Add vsi.tx_restart to i40e ethtool stats
Nabil S. Alramli [Thu, 24 Mar 2022 20:03:38 +0000 (20:03 +0000)]
i40e: Add vsi.tx_restart to i40e ethtool stats

Add vsi.tx_restart to the i40e driver ethtool statistics

Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoi40e: Add tx_stopped stat
Joe Damato [Thu, 24 Mar 2022 19:46:58 +0000 (12:46 -0700)]
i40e: Add tx_stopped stat

Track TX queue stop events and export the new stat with ethtool.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoice: Add mpls+tso support
Joe Damato [Fri, 18 Mar 2022 04:12:12 +0000 (21:12 -0700)]
ice: Add mpls+tso support

Attempt to add mpls+tso support.

I don't have ice hardware available to test myself, but I just implemented
this feature in i40e and thought it might be useful to implement for ice
while this is fresh in my brain.

Hoping some one at intel will be able to test this on my behalf.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoMerge branch 'mlxsw-extend-device-registers-for-line-cards-support'
Jakub Kicinski [Tue, 12 Apr 2022 16:51:43 +0000 (09:51 -0700)]
Merge branch 'mlxsw-extend-device-registers-for-line-cards-support'

Ido Schimmel says:

====================
mlxsw: Extend device registers for line cards support

This patch set prepares mlxsw for line cards support by extending device
registers with a slot index, which allows accessing components found on
a line card at a given slot. Currently, only slot index 0 (main board)
is used.

No user visible changes that I am aware of.
====================

Link: https://lore.kernel.org/r/20220411144657.2655752-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Add new field to Management General Peripheral Information Register
Vadim Pasternak [Mon, 11 Apr 2022 14:46:57 +0000 (17:46 +0300)]
mlxsw: reg: Add new field to Management General Peripheral Information Register

Add new field 'max_modules_per_slot' to provide maximum number of
modules that can be connected per slot. This field will always be zero,
if 'slot_index' in query request is set to non-zero value, otherwise
value in this field will provide maximum modules number, which can be
equipped on device inserted at any slot.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: core_env: Pass slot index during PMAOS register write call
Vadim Pasternak [Mon, 11 Apr 2022 14:46:56 +0000 (17:46 +0300)]
mlxsw: core_env: Pass slot index during PMAOS register write call

Pass the slot index down to PMAOS pack helper alongside with the module.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend MGPIR register with new slot fields
Vadim Pasternak [Mon, 11 Apr 2022 14:46:55 +0000 (17:46 +0300)]
mlxsw: reg: Extend MGPIR register with new slot fields

Extend MGPIR (Management General Peripheral Information Register) with
new fields specifying the slot number and number of the slots available
on system. The purpose of these fields is:
- to support access to MPGIR register on modular system for getting the
  number of cages, equipped on the line card, inserted at specified
  slot. In case slot number is set zero, MGPIR will provide the
  information for the main board. For Top of the Rack (non-modular)
  system it will provide the same as before.
- to provide the number of slots supported by system. This data is
  relevant only in case slot number is set zero.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend PMMP register with new slot number field
Vadim Pasternak [Mon, 11 Apr 2022 14:46:54 +0000 (17:46 +0300)]
mlxsw: reg: Extend PMMP register with new slot number field

Extend PMMP (Port Module Memory Map Properties Register) with new
field specifying the slot number. The purpose of this field is to
enable overriding the cable/module memory map advertisement.

For non-modular systems the 'module' number uniquely identifies the
transceiver location. For modular systems the transceivers are
identified by two indexes:
- 'slot_index', specifying the slot number, where line card is located;
- 'module', specifying cage transceiver within the line card.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend MCION register with new slot number field
Vadim Pasternak [Mon, 11 Apr 2022 14:46:53 +0000 (17:46 +0300)]
mlxsw: reg: Extend MCION register with new slot number field

Extend MCION (Management Cable IO and Notifications Register) with new
field specifying the slot number. The purpose of this field is to
support access to MCION register for query cage transceiver on modular
system.

For non-modular systems the 'module' number uniquely identifies the
transceiver location. For modular systems the transceivers are
identified by two indexes:
- 'slot_index', specifying the slot number, where line card is located;
- 'module', specifying cage transceiver within the line card.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend MCIA register with new slot number field
Vadim Pasternak [Mon, 11 Apr 2022 14:46:52 +0000 (17:46 +0300)]
mlxsw: reg: Extend MCIA register with new slot number field

Extend MCIA (Management Cable Info Access Register) with new field
specifying the slot number. The purpose of this field is to support
access to MCIA register for reading cage cable information on modular
system. For non-modular systems the 'module' number uniquely identifies
the transceiver location. For modular systems the transceivers are
identified by two indexes:
- 'slot_index', specifying the slot number, where line card is located;
- 'module', specifying cage transceiver within the line card.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend MTBR register with new slot number field
Vadim Pasternak [Mon, 11 Apr 2022 14:46:51 +0000 (17:46 +0300)]
mlxsw: reg: Extend MTBR register with new slot number field

Extend MTBR (Management Temperature Bulk Register) with new field
specifying the slot number. The purpose of this field is to support
access to MTBR register for reading temperature sensors on modular
system. For non-modular systems the 'sensor_index' uniquely identifies
the cage sensors. For modular systems the sensors are identified by two
indexes:
- 'slot_index', specifying the slot number, where line card is located;
- 'sensor_index', specifying cage sensor within the line card.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: reg: Extend MTMP register with new slot number field
Vadim Pasternak [Mon, 11 Apr 2022 14:46:50 +0000 (17:46 +0300)]
mlxsw: reg: Extend MTMP register with new slot number field

Extend MTMP (Management Temperature Register) with new field specifying
the slot index. The purpose of this field is to support access to MTMP
register for reading temperature sensors on modular systems.
For non-modular systems the 'sensor_index' uniquely identifies the cage
sensors, while 'slot_index' is always 0. For modular systems the
sensors are identified by:
- 'slot_index', specifying the slot index, where line card is located;
- 'sensor_index', specifying cage sensor within the line card.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoi40e: Add support for MPLS + TSO
Joe Damato [Thu, 3 Mar 2022 01:29:07 +0000 (17:29 -0800)]
i40e: Add support for MPLS + TSO

This change adds support for TSO of MPLS packets.

In my tests with tcpdump it seems to work. Note this test setup has
a 9000 byte MTU:

MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
  Flags [P.], seq 593345:644401, ack 0, win 420,
  options [nop,nop,TS val 45022534 ecr 1722291395], length 51056

IP dstip.1234 > srcip.50086: Flags [.], ack 593345, win 122,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 602289, win 105,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 620177, win 71,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
  Flags [P.], seq 644401:655953, ack 0, win 420,
  options [nop,nop,TS val 45022534 ecr 1722291395], length 11552

IP dstip.1234 > srcip.50086: Flags [.], ack 638065, win 37,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 644401, win 25,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 653345, win 8,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 655953, win 3,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

Signed-off-by: Joe Damato <jdamato@fastly.com>
Co-developed-by: Mike Gallo <mgallo@fastly.com>
Signed-off-by: Mike Gallo <mgallo@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agopage_pool: Add recycle stats to page_pool_put_page_bulk
Lorenzo Bianconi [Mon, 11 Apr 2022 14:05:26 +0000 (16:05 +0200)]
page_pool: Add recycle stats to page_pool_put_page_bulk

Add missing recycle stats to page_pool_put_page_bulk routine.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: remove noblock parameter from recvmsg() entities
Oliver Hartkopp [Mon, 11 Apr 2022 12:49:55 +0000 (14:49 +0200)]
net: remove noblock parameter from recvmsg() entities

The internal recvmsg() functions have two parameters 'flags' and 'noblock'
that were merged inside skb_recv_datagram(). As a follow up patch to commit
f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()")
this patch removes the separate 'noblock' parameter for recvmsg().

Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
'noblock' parameters are unnecessarily split up with e.g.

err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                           flags & ~MSG_DONTWAIT, &addr_len);

or in

err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                      sk, msg, size, flags & MSG_DONTWAIT,
                      flags & ~MSG_DONTWAIT, &addr_len);

instead of simply using only flags all the time and check for MSG_DONTWAIT
where needed (to preserve for the formerly separated no(n)block condition).

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: usb: qmi_wwan: add Telit 0x1057 composition
Daniele Palmas [Mon, 11 Apr 2022 13:59:43 +0000 (15:59 +0200)]
net: usb: qmi_wwan: add Telit 0x1057 composition

Add the following Telit FN980 composition:

0x1057: tty, adb, rmnet, tty, tty, tty, tty, tty

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20220411135943.4067264-1-dnlplm@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoMerge branch 'sfc-remove-some-global-definitions'
Paolo Abeni [Tue, 12 Apr 2022 10:13:32 +0000 (12:13 +0200)]
Merge branch 'sfc-remove-some-global-definitions'

Martin Habets says:

====================
sfc: Remove some global definitions

These are some small cleanups to remove definitions that need not
be defined in .h files.
====================

Link: https://lore.kernel.org/r/164967635861.17602.16525009567130361754.stgit@palantir17.mph.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agosfc: Remove global definition of efx_reset_type_names
Martin Habets [Mon, 11 Apr 2022 11:27:20 +0000 (12:27 +0100)]
sfc: Remove global definition of efx_reset_type_names

The strings are only used in efx_common.c so the definitions
can be static in there.

Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agosfc: Remove duplicate definition of efx_xmit_done
Martin Habets [Mon, 11 Apr 2022 11:27:15 +0000 (12:27 +0100)]
sfc: Remove duplicate definition of efx_xmit_done

It is defined both in efx.h and tx_common.h.
Remove the definition in efx.h.

Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agosfc: efx_default_channel_type APIs can be static
Martin Habets [Mon, 11 Apr 2022 11:27:10 +0000 (12:27 +0100)]
sfc: efx_default_channel_type APIs can be static

This means we can remove them from efx_channel.h and avoid
naming conflicts later.
efx_channel_dummy_op_void() cannot be static as it is
used in ef100_nic.c.

Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoMerge branch 'net-dsa-mt7530-updates-for-phylink-changes'
Paolo Abeni [Tue, 12 Apr 2022 08:33:17 +0000 (10:33 +0200)]
Merge branch 'net-dsa-mt7530-updates-for-phylink-changes'

Russell King says:

====================
net: dsa: mt7530: updates for phylink changes

This revised series is a partial conversion of the mt7530 DSA driver to
the modern phylink infrastructure. This driver has some exceptional
cases which prevent - at the moment - its full conversion (particularly
with the Autoneg bit) to using phylink_generic_validate().

Patch 1 fixes the incorrect test highlighted in the first RFC series.

Patch 2 fixes the incorrect assumption that RGMII is unable to support
1000BASE-X.

Patch 3 populates the supported_interfaces for each port

Patch 4 removes the interface checks that become unnecessary as a result
of patch 3.

Patch 5 removes use of phylink_helper_basex_speed() which is no longer
required by phylink.

Patch 6 becomes possible after patch 5, only indicating the ethtool
modes that can be supported with a particular interface mode - this
involves removing some modes and adding others as per phylink
documentation.

Patch 7 switches the driver to use phylink_get_linkmodes(), which moves
the driver as close as we can to phylink_generic_validate() due to the
Autoneg bit issue mentioned above.

Patch 8 converts the driver to the phylink pcs support, removing a bunch
of driver private indirected methods. We include TRGMII as a PCS even
though strictly TRGMII does not have a PCS. This is convenient to allow
the change in patch 9 to be made.

Patch 9 moves the special autoneg handling to the PCS validate method,
which means we can convert the MAC side to the generic validator.

Patch 10 marks the driver as non-legacy.

The series was posted on 23 February, and a ping sent on 3 March, but
no feedback has been received. The previous posting also received no
feedback on the actual patches either.

v2:
- fix build issue in patch 5
- add Marek's tested-by

 drivers/net/dsa/mt7530.c | 330 +++++++++++++++++++++--------------------------
 drivers/net/dsa/mt7530.h |  26 ++--
 2 files changed, 159 insertions(+), 197 deletions(-)
====================

Link: https://lore.kernel.org/r/YlP4vGKVrlIJUUHK@shell.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: mark as non-legacy
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:37 +0000 (10:46 +0100)]
net: dsa: mt7530: mark as non-legacy

The mt7530 driver does not make use of the speed, duplex, pause or
advertisement in its phylink_mac_config() implementation, so it can be
marked as a non-legacy driver.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: move autoneg handling to PCS validation
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:32 +0000 (10:46 +0100)]
net: dsa: mt7530: move autoneg handling to PCS validation

Move the autoneg bit handling to the PCS validation, which allows us to
get rid of mt753x_phylink_validate() and rely on the default
phylink_generic_validate() implementation for the MAC side.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: partially convert to phylink_pcs
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:27 +0000 (10:46 +0100)]
net: dsa: mt7530: partially convert to phylink_pcs

Partially convert the mt7530 driver to use phylink's PCS support. This
is a partial implementation as we don't move anything into the
pcs_config method yet - this driver supports SGMII or 1000BASE-X
without in-band.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: switch to use phylink_get_linkmodes()
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:22 +0000 (10:46 +0100)]
net: dsa: mt7530: switch to use phylink_get_linkmodes()

Switch mt7530 to use phylink_get_linkmodes() to generate the ethtool
linkmodes that can be supported. We are unable to use the generic
helper for this as pause modes are dependent on the interface as
the Autoneg bit depends on the interface mode.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: only indicate linkmodes that can be supported
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:17 +0000 (10:46 +0100)]
net: dsa: mt7530: only indicate linkmodes that can be supported

Now that mt7530 is not using the basex helper, it becomes unnecessary to
indicate support for both 1000baseX and 2500baseX when one of the 803.3z
PHY interface modes is being selected. Ensure that the driver indicates
only those linkmodes that can actually be supported by the PHY interface
mode.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: drop use of phylink_helper_basex_speed()
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:12 +0000 (10:46 +0100)]
net: dsa: mt7530: drop use of phylink_helper_basex_speed()

Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: remove interface checks
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:06 +0000 (10:46 +0100)]
net: dsa: mt7530: remove interface checks

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: populate supported_interfaces and mac_capabilities
Russell King (Oracle) [Mon, 11 Apr 2022 09:46:01 +0000 (10:46 +0100)]
net: dsa: mt7530: populate supported_interfaces and mac_capabilities

Populate the supported interfaces and MAC capabilities for mt7530,
mt7531 and mt7621 DSA switches. Filling this in will enable phylink
to pre-check the PHY interface mode against the the supported
interfaces bitmap prior to calling the validate function, and will
eventually allow us to convert to using the generic validation.

Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: dsa: mt7530: 1G can also support 1000BASE-X link mode
Russell King (Oracle) [Mon, 11 Apr 2022 09:45:56 +0000 (10:45 +0100)]
net: dsa: mt7530: 1G can also support 1000BASE-X link mode

When using an external PHY connected using RGMII to mt7531 port 5, the
PHY can be used to used support 1000BASE-X connections. Moreover, if
1000BASE-T is supported, then we should allow 1000BASE-X as well, since
which are supported is a property of the PHY.

Therefore, it makes no sense to exclude this from the linkmodes when
1000BASE-T is supported.

Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoMerge branch 'net-bridge-add-support-for-host-l2-mdb-entries'
Paolo Abeni [Tue, 12 Apr 2022 08:06:57 +0000 (10:06 +0200)]
Merge branch 'net-bridge-add-support-for-host-l2-mdb-entries'

Joachim Wiberg says:

====================
net: bridge: add support for host l2 mdb entries

Fix to an obvious omissions for layer-2 host mdb entries, this v2 adds
the missing selftest and some minor style fixes.

Note: this patch revealed some worrying problems in how the bridge
      forwards unknown BUM traffic and also how unknown multicast is
      forwarded when a IP multicast router is known, which a another
      (RFC) patch series intend to address.  That series will build
      on this selftest, hence the name of the test.

v2:
  - Add braces to other if/else clauses (Jakub)
  - Add selftest to verify add/del of mac/ipv4/ipv6 mdb entries (Jakub)
====================

Link: https://lore.kernel.org/r/20220411084054.298807-1-troglobit@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoselftests: forwarding: new test, verify host mdb entries
Joachim Wiberg [Mon, 11 Apr 2022 08:40:54 +0000 (10:40 +0200)]
selftests: forwarding: new test, verify host mdb entries

Boiler plate for testing static mdb entries.  This first test verifies
adding and removing host mdb entries for all supported types: IPv4,
IPv6, and MAC multicast.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: bridge: add support for host l2 mdb entries
Joachim Wiberg [Mon, 11 Apr 2022 08:40:53 +0000 (10:40 +0200)]
net: bridge: add support for host l2 mdb entries

This patch expands on the earlier work on layer-2 mdb entries by adding
support for host entries.  Due to the fact that host joined entries do
not have any flag field, we infer the permanent flag when reporting the
entries to userspace, which otherwise would be listed as 'temp'.

Before patch:

    ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent
    Error: bridge: Flags are not allowed for host groups.
    ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee
    Error: bridge: Only permanent L2 entries allowed.

After patch:

    ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent
    ~# bridge mdb show
    dev br0 port br0 grp 01:00:00:c0:ff:ee permanent vid 1

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agosfc: Fix spelling mistake "writting" -> "writing"
Lv Ruyi [Mon, 11 Apr 2022 03:25:46 +0000 (03:25 +0000)]
sfc: Fix spelling mistake "writting" -> "writing"

There are some spelling mistakes in the comment. Fix it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Link: https://lore.kernel.org/r/20220411032546.2517628-1-lv.ruyi@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet/cadence: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Minghao Chi [Mon, 11 Apr 2022 01:38:12 +0000 (01:38 +0000)]
net/cadence: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220411013812.2517212-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agosfc: ef10: Fix assigning negative value to unsigned variable
Haowen Bai [Mon, 11 Apr 2022 01:32:37 +0000 (09:32 +0800)]
sfc: ef10: Fix assigning negative value to unsigned variable

fix warning reported by smatch:
251 drivers/net/ethernet/sfc/ef10.c:2259 efx_ef10_tx_tso_desc()
warn: assigning (-208) to unsigned variable 'ip_tot_len'

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/1649640757-30041-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICAST
Arınç ÜNAL [Sun, 10 Apr 2022 13:42:27 +0000 (16:42 +0300)]
net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICAST

Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to
BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drivers which have an offloaded
data plane have a chance to reject these bridge port flags if they don't
support them yet.

It makes the code path go through the
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS driver handlers, which return
-EINVAL for everything they don't recognize.

For drivers that don't catch SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS at
all, switchdev will return -EOPNOTSUPP for those which is then ignored, but
those are in the minority.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220410134227.18810-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-lan966x-add-support-for-fdma'
Jakub Kicinski [Tue, 12 Apr 2022 03:50:04 +0000 (20:50 -0700)]
Merge branch 'net-lan966x-add-support-for-fdma'

Horatiu Vultur says:

====================
net: lan966x: Add support for FDMA

Currently when injecting or extracting a frame from CPU, the frame
is given to the HW each word at a time. There is another way to
inject/extract frames from CPU using FDMA(Frame Direct Memory Access).
In this way the entire frame is given to the HW. This improves both
RX and TX bitrate.
====================

Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10
Link: https://lore.kernel.org/r/20220408070357.559899-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: lan966x: Update FDMA to change MTU.
Horatiu Vultur [Fri, 8 Apr 2022 07:03:57 +0000 (09:03 +0200)]
net: lan966x: Update FDMA to change MTU.

When changing the MTU, it is required to change also the size of the
DBs. In case those frames will arrive to CPU.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: lan966x: Add FDMA functionality
Horatiu Vultur [Fri, 8 Apr 2022 07:03:56 +0000 (09:03 +0200)]
net: lan966x: Add FDMA functionality

Ethernet frames can be extracted or injected to or from the device's
DDR memory. There is one channel for injection and one channel for
extraction. Each of these channels contain a linked list of DCBs which
contains DB. The DCB contains only 1 DB for both the injection and
extraction. Each DB contains a frame. Every time when a frame is received
or transmitted an interrupt is generated.

It is not possible to use both the FDMA and the manual
injection/extraction of the frames. Therefore the FDMA has priority over
the manual because of better performance values.

FDMA:
iperf -c 192.168.1.1
[  5]   0.00-10.02  sec   420 MBytes   352 Mbits/sec    0 sender
[  5]   0.00-10.03  sec   420 MBytes   351 Mbits/sec      receiver

iperf -c 192.168.1.1 -R
[  5]   0.00-10.01  sec   528 MBytes   442 Mbits/sec    0 sender
[  5]   0.00-10.00  sec   524 MBytes   440 Mbits/sec      receiver

Manual:
iperf -c 192.168.1.1
[  5]   0.00-10.02  sec  93.8 MBytes  78.5 Mbits/sec    0 sender
[  5]   0.00-10.03  sec  93.8 MBytes  78.4 Mbits/sec      receiver

ipers -c 192.168.1.1 -R
[  5]   0.00-10.03  sec   121 MBytes   101 Mbits/sec    0 sender
[  5]   0.00-10.01  sec   118 MBytes  99.0 Mbits/sec      receiver

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: lan966x: Expose functions that are needed by FDMA
Horatiu Vultur [Fri, 8 Apr 2022 07:03:55 +0000 (09:03 +0200)]
net: lan966x: Expose functions that are needed by FDMA

Expose the following functions 'lan966x_hw_offload',
'lan966x_ifh_get_src_port' and 'lan966x_ifh_get_timestamp' in
lan966x_main.h so they can be accessed by FDMA.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: lan966x: Add registers that are used for FDMA.
Horatiu Vultur [Fri, 8 Apr 2022 07:03:54 +0000 (09:03 +0200)]
net: lan966x: Add registers that are used for FDMA.

Add the registers that are used to configure the FDMA.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: calxedaxgmac: Fix typo (doubled "the")
Jonathan Neuschäfer [Sat, 9 Apr 2022 18:21:45 +0000 (20:21 +0200)]
net: calxedaxgmac: Fix typo (doubled "the")

Fix a doubled word in the comment above xgmac_poll.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20220409182147.2509788-1-j.neuschaefer@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethernet: ti: am65-cpsw: Fix build error without PHYLINK
YueHaibing [Sat, 9 Apr 2022 10:59:31 +0000 (18:59 +0800)]
net: ethernet: ti: am65-cpsw: Fix build error without PHYLINK

If PHYLINK is n, build fails:

drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_set_link_ksettings':
am65-cpsw-ethtool.c:(.text+0x118): undefined reference to `phylink_ethtool_ksettings_set'
drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_get_link_ksettings':
am65-cpsw-ethtool.c:(.text+0x138): undefined reference to `phylink_ethtool_ksettings_get'
drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_set_eee':
am65-cpsw-ethtool.c:(.text+0x158): undefined reference to `phylink_ethtool_set_eee'

Select PHYLINK for TI_K3_AM65_CPSW_NUSS to fix this.

Fixes: e8609e69470f ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220409105931.9080-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Tue, 12 Apr 2022 03:34:01 +0000 (20:34 -0700)]
Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Leon Romanovsky says:

====================
Mellanox shared branch that includes:

 * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com

  Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code
  is unmaintained, untested and not in-use by any upstream/distro oriented
  customers. In order to reduce code complexity, drop the kernel code,
  clean build config options and delete useless kTLS vs. TLS separation.

  [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf

 * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com

  Together with FPGA TLS, the IPsec went to EOL state in the November of
  2019 [1]. Exactly like FPGA TLS, no active customers exist for this
  upstream code and all the complexity around that area can be deleted.

  [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf

 * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (23 commits)
  net/mlx5: Remove not-implemented IPsec capabilities
  net/mlx5: Remove ipsec_ops function table
  net/mlx5: Reduce kconfig complexity while building crypto support
  net/mlx5: Move IPsec file to relevant directory
  net/mlx5: Remove not-needed IPsec config
  net/mlx5: Align flow steering allocation namespace to common style
  net/mlx5: Unify device IPsec capabilities check
  net/mlx5: Remove useless IPsec device checks
  net/mlx5: Remove ipsec vs. ipsec offload file separation
  RDMA/core: Delete IPsec flow action logic from the core
  RDMA/mlx5: Drop crypto flow steering API
  RDMA/mlx5: Delete never supported IPsec flow action
  net/mlx5: Remove FPGA ipsec specific statistics
  net/mlx5: Remove XFRM no_trailer flag
  net/mlx5: Remove not-used IDA field from IPsec struct
  net/mlx5: Delete metadata handling logic
  net/mlx5_fpga: Drop INNOVA IPsec support
  IB/mlx5: Fix undefined behavior due to shift overflowing the constant
  net/mlx5: Cleanup kTLS function names and their exposure
  net/mlx5: Remove tls vs. ktls separation as it is the same
  ...
====================

Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: stmmac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Minghao Chi [Fri, 8 Apr 2022 08:12:50 +0000 (08:12 +0000)]
net: stmmac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220408081250.2494588-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agohv_netvsc: Add support for XDP_REDIRECT
Haiyang Zhang [Thu, 7 Apr 2022 20:21:34 +0000 (13:21 -0700)]
hv_netvsc: Add support for XDP_REDIRECT

Handle XDP_REDIRECT action in netvsc driver.
Also, transparently pass ndo_xdp_xmit to VF when available.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1649362894-20077-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'ipv4-convert-several-tos-fields-to-dscp_t'
Jakub Kicinski [Tue, 12 Apr 2022 00:38:02 +0000 (17:38 -0700)]
Merge branch 'ipv4-convert-several-tos-fields-to-dscp_t'

Guillaume Nault says:

====================
ipv4: Convert several tos fields to dscp_t

Continue the work started with commit a410a0cf9885 ("ipv6: Define
dscp_t and stop taking ECN bits into account in fib6-rules") and
convert more structure fields and variables to dscp_t. This series
focuses on struct fib_rt_info, struct fib_entry_notifier_info and their
users (networking drivers).

The purpose of dscp_t is to ensure that ECN bits don't influence IP
route lookups. It does so by ensuring that dscp_t variables have the
ECN bits cleared.

Notes:
  * This series is entirely about type annotation and isn't supposed
to have any user visible effect.

  * The first two patches have to introduce a few dsfield <-> dscp
conversions in the affected drivers, but those are then removed when
converting the internal driver structures (patches 3-5). In the end,
drivers don't have to handle any conversion.
====================

Link: https://lore.kernel.org/r/cover.1649445279.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: marvell: prestera: Use dscp_t in struct prestera_kern_fib_cache
Guillaume Nault [Fri, 8 Apr 2022 20:08:50 +0000 (22:08 +0200)]
net: marvell: prestera: Use dscp_t in struct prestera_kern_fib_cache

Use the new dscp_t type to replace the kern_tos field of struct
prestera_kern_fib_cache. This ensures ECN bits are ignored and makes it
compatible with the dscp fields of struct fib_entry_notifier_info and
struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: Use dscp_t in struct mlxsw_sp_fib4_entry
Guillaume Nault [Fri, 8 Apr 2022 20:08:46 +0000 (22:08 +0200)]
mlxsw: Use dscp_t in struct mlxsw_sp_fib4_entry

Use the new dscp_t type to replace the tos field of struct
mlxsw_sp_fib4_entry. This ensures ECN bits are ignored and makes it
compatible with the dscp fields of fib_entry_notifier_info and
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonetdevsim: Use dscp_t in struct nsim_fib4_rt
Guillaume Nault [Fri, 8 Apr 2022 20:08:43 +0000 (22:08 +0200)]
netdevsim: Use dscp_t in struct nsim_fib4_rt

Use the new dscp_t type to replace the tos field of struct
nsim_fib4_rt. This ensures ECN bits are ignored and makes it compatible
with the dscp fields of struct fib_entry_notifier_info and struct
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv4: Use dscp_t in struct fib_entry_notifier_info
Guillaume Nault [Fri, 8 Apr 2022 20:08:40 +0000 (22:08 +0200)]
ipv4: Use dscp_t in struct fib_entry_notifier_info

Use the new dscp_t type to replace the tos field of struct
fib_entry_notifier_info. This ensures ECN bits are ignored and makes it
compatible with the dscp field of struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv4: Use dscp_t in struct fib_rt_info
Guillaume Nault [Fri, 8 Apr 2022 20:08:37 +0000 (22:08 +0200)]
ipv4: Use dscp_t in struct fib_rt_info

Use the new dscp_t type to replace the tos field of struct fib_rt_info.
This ensures ECN bits are ignored and makes it compatible with the
fa_dscp field of struct fib_alias.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethernet: ti: cpsw: drop CPSW_HEADROOM define
Grygorii Strashko [Fri, 8 Apr 2022 13:48:38 +0000 (16:48 +0300)]
net: ethernet: ti: cpsw: drop CPSW_HEADROOM define

Since commit 1771afd47430 ("net: cpsw: avoid alignment faults by taking
NET_IP_ALIGN into account") the TI CPSW driver was switched to use correct
define CPSW_HEADROOM_NA to avoid alignment faults, but there are two places
left where CPSW_HEADROOM is still used (without causing issues).

Hence, completely drop CPSW_HEADROOM define and use CPSW_HEADROOM_NA
everywhere to avoid further mistakes in code.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mptcp-next'
David S. Miller [Mon, 11 Apr 2022 10:55:54 +0000 (11:55 +0100)]
Merge branch 'mptcp-next'

Mat Martineau says:

====================
mptcp: Miscellaneous changes for 5.19

Four separate groups of patches here:

Patch 1 optimizes flag checking when releasing mptcp socket locks.

Patches 2 and 3 update the packet scheduler when subflow priorities
change.

Patch 4 adds some pernet helper functions for MPTCP.

Patches 5-8 add diag support for MPTCP listeners, including a selftest.

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests/mptcp: add diag listen tests
Florian Westphal [Fri, 8 Apr 2022 19:46:01 +0000 (12:46 -0700)]
selftests/mptcp: add diag listen tests

Check dumping of mptcp listener sockets:
1. filter by dport should not return any results
2. filter by sport should return listen sk
3. filter by saddr+sport should return listen sk
4. no filter should return listen sk

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: listen diag dump support
Florian Westphal [Fri, 8 Apr 2022 19:46:00 +0000 (12:46 -0700)]
mptcp: listen diag dump support

makes 'ss -Ml' show mptcp listen sockets.

Iterate over the tcp listen sockets and pick those that have mptcp ulp
info attached.

mptcp_diag_get_info() is modified to prefer msk->first for mptcp sockets
in listen state.  This reports accurate number for recv and send queue
(pending / max connection backlog counters).

Sample output:
ss -Mil
State        Recv-Q Send-Q Local Address:Port  Peer Address:Port
LISTEN       0      20     127.0.0.1:12000     0.0.0.0:*
         subflows_max:2

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: remove locking in mptcp_diag_fill_info
Florian Westphal [Fri, 8 Apr 2022 19:45:59 +0000 (12:45 -0700)]
mptcp: remove locking in mptcp_diag_fill_info

Problem is that listener iteration would call this from atomic context
so this locking is not allowed.

One way is to drop locks before calling the helper, but afaics the lock
isn't really needed, all values are fetched via READ_ONCE().

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: diag: switch to context structure
Florian Westphal [Fri, 8 Apr 2022 19:45:58 +0000 (12:45 -0700)]
mptcp: diag: switch to context structure

Raw access to cb->arg[] is deprecated, use a context structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: add pm_nl_pernet helpers
Geliang Tang [Fri, 8 Apr 2022 19:45:57 +0000 (12:45 -0700)]
mptcp: add pm_nl_pernet helpers

This patch adds two pm_nl_pernet related helpers, named pm_nl_get_pernet()
and pm_nl_get_pernet_from_msk() to get pm_nl_pernet from 'net' or 'msk'.
Use these helpers instead of using net_generic() directly.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>