Leon Romanovsky [Sat, 25 Sep 2021 11:22:47 +0000 (14:22 +0300)]
octeontx2: Move devlink registration to be last devlink command
This change prevents from users to access device before devlink is fully
configured. This change allows us to delete call to devlink_params_publish()
and impossible check during unregister flow.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 25 Sep 2021 11:22:45 +0000 (14:22 +0300)]
net: hinic: Open device for the user access when it is ready
Move devlink registration to be the last command in device activation,
so it opens the driver to accept such devlink commands from the user
when it is fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 25 Sep 2021 11:22:43 +0000 (14:22 +0300)]
liquidio: Overcome missing device lock protection in init/remove flows
The liquidio driver is broken by design. It initialize PCI devices
in separate delayed works. It causes to the situation where device lock
is dropped during initialize and remove sequences.
That lock is part of driver/core and needed to protect from races during
init, destroy and bus invocations.
In addition to lack of locking protection, it has incorrect order of
destroy flows and very questionable synchronization scheme based on
atomic_t.
This change doesn't fix that driver but makes sure that rest of the
netdev subsystem doesn't suffer from such basic protection by adding
device_lock over devlink_*() APIs and by moving devlink_register()
to be last command in setup_nic_devices().
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 25 Sep 2021 11:22:42 +0000 (14:22 +0300)]
bnxt_en: Register devlink instance at the end devlink configuration
Move devlink_register() to be last command in devlink configuration
sequence, so no user space access will be possible till devlink instance
is fully operable. As part of this change, the devlink_params_publish
call is removed as not needed.
This change fixes forgotten devlink_params_unpublish() too.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 25 Sep 2021 11:22:41 +0000 (14:22 +0300)]
devlink: Notify users when objects are accessible
The devlink core code notified users about add/remove objects without
relation if this object can be accessible or not. In this patch we unify
such user visible notifications in one place.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-10 and later warn about a theoretical array overrun when
accessing priv->int_name_rx_irq[i] with an out of bounds value
of 'i':
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_request_irq_multi_msi':
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3528:17: error: 'snprintf' argument 4 may overlap destination object 'dev' [-Werror=restrict]
3528 | snprintf(int_name, int_name_len, "%s:%s-%d", dev->name, "tx", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3404:60: note: destination object referenced by 'restrict'-qualified argument 1 was declared here
3404 | static int stmmac_request_irq_multi_msi(struct net_device *dev)
| ~~~~~~~~~~~~~~~~~~~^~~
The warning is a bit strange since it's not actually about the array
bounds but rather about possible string operations with overlapping
arguments, but it's not technically wrong.
Avoid the warning by adding an extra bounds check.
Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX") Link: https://lore.kernel.org/all/20210421134743.3260921-1-arnd@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Sun, 26 Sep 2021 07:42:12 +0000 (15:42 +0800)]
net: sparx5: fix resource_size.cocci warnings
Use resource_size function on resource object
instead of explicit computation.
Clean up coccicheck warning:
./drivers/net/ethernet/microchip/sparx5/sparx5_main.c:237:19-22: ERROR:
Missing resource_size with iores [ idx ]
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cai Huoqing [Sun, 26 Sep 2021 06:52:14 +0000 (14:52 +0800)]
ibmveth: Use dma_alloc_coherent() instead of kmalloc/dma_map_single()
Replacing kmalloc/kfree/dma_map_single/dma_unmap_single()
with dma_alloc_coherent/dma_free_coherent() helps to reduce
code size, and simplify the code, and coherent DMA will not
clear the cache every time.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cai Huoqing [Sat, 25 Sep 2021 12:46:28 +0000 (20:46 +0800)]
net: cisco: Fix a function name in comments
Use dma_alloc_coherent() instead of pci_alloc_consistent(),
because only dma_alloc_coherent() is called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mianhan Liu [Sat, 25 Sep 2021 14:21:40 +0000 (22:21 +0800)]
net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c
tcp_nv.c hasn't use any macro or function declared in mm.h. Thus, these files
can be removed from tcp_nv.c safely without affecting the compilation
of the net module.
Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 24 Sep 2021 20:24:53 +0000 (13:24 -0700)]
net: make napi_disable() symmetric with enable
Commit 3765996e4f0b ("napi: fix race inside napi_enable") fixed
an ordering bug in napi_enable() and made the napi_enable() diverge
from napi_disable(). The state transitions done on disable are
not symmetric to enable.
There is no known bug in napi_disable() this is just refactoring.
Eric suggests we can also replace msleep(1) with a more opportunistic
usleep_range().
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Min Li [Fri, 24 Sep 2021 19:01:32 +0000 (15:01 -0400)]
ptp: clockmatrix: use rsmu driver to access i2c/spi bus
rsmu (Renesas Synchronization Management Unit ) driver is located in
drivers/mfd and responsible for creating multiple devices including
clockmatrix phc, which will then use the exposed regmap and mutex
handle to access i2c/spi bus.
Signed-off-by: Min Li <min.li.xe@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 24 Sep 2021 10:04:27 +0000 (12:04 +0200)]
selftests: net: fib_nexthops: Wait before checking reported idle time
The purpose of this test is to verify that after a short activity passes,
the reported time is reasonable: not zero (which could be reported by
mistake), and not something outrageous (which would be indicative of an
issue in used units).
However, the idle time is reported in units of clock_t, or hundredths of
second. If the initial sequence of commands is very quick, it is possible
that the idle time is reported as just flat-out zero. When this test was
recently enabled in our nightly regression, we started seeing spurious
failures for exactly this reason.
Therefore buffer the delay leading up to the test with a sleep, to make
sure there is no legitimate way of reporting 0.
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 26 Sep 2021 10:26:17 +0000 (11:26 +0100)]
Merge branch 'octeontx2-af-kpu'
Kiran Kumar K says:
====================
adding KPU profile changes for GTPU and custom
Adding changes to limit the KPU processing for GTPU headers to parse
packet up to L4 and added changes to variable length headers to parse LA
as part of PKIND action.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: Optimize KPU1 processing for variable-length headers
Optimized KPU1 entry processing for variable-length custom L2 headers
of size 24B, 90B by
- Moving LA LTYPE parsing for 24B and 90B headers to PKIND.
- Removing LA flags assignment for 24B and 90B headers.
- Reserving a PKIND 55 to parse variable length headers.
Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with
corresponding variable-length offset, mask, and shift count
(NPC_AF_KPUX_ENTRYX_ACTION0).
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With current KPU profile, while parsing GTPU packets, GTPU payload
is also being parsed and GTPU PDU payload is being treated as IPV4
data, which is not correct. In case of GTPU packets, parsing should
be stopped after identifying the GTPU. Adding changes to limit KPU
profile parsing for GTPU payload.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 25 Sep 2021 10:36:51 +0000 (11:36 +0100)]
Merge branch 'mptcp-fixes'
Mat Martineau says:
====================
mptcp: Miscellaneous fixes
Here are five changes we've collected and tested in the mptcp-tree:
Patch 1 changes handling of the MPTCP-level snd_next value during the
recovery phase after a subflow link failure.
Patches 2 and 3 are some small refactoring changes to replace some
open-coded bits.
Patch 4 removes an unused field in struct mptcp_sock.
Patch 5 restarts the MPTCP retransmit timer when there is
not-yet-transmitted data to send and all previously sent data has been
acknowledged. This prevents some sending stalls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The retransmit head will be NULL in case there is no in-flight data
(meaning all data injected into network has been acked).
In that case the retransmit timer is stopped.
This is only correct if there is no more pending, not-yet-sent data.
If there is, the retransmit timer needs to set the PENDING bit again so
that mptcp tries to send the remaining (new) data once a subflow can accept
more data.
Also, mptcp_subflow_get_retrans() has to be called unconditionally.
This function checks for subflows that have become unresponsive and marks
them as stale, so in the case where the rtx queue is empty, subflows
will never be marked stale which prevents available backup subflows from
becoming eligible for transmit.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
will update tx_pending_data multiple times when a subflow is declared
stale while earlier recovery is still in progress.
This means that tx_pending_data will still be positive even after
all data as has been transmitted.
Rather than fix it, remove this field: there are no consumers.
The outstanding data byte count can be computed either via
"msk->write_seq - rtx_head->data_seq" or
"msk->write_seq - msk->snd_una".
The latter is more recent/accurate estimate as rtx_head adjustment
is deferred until mptcp lock can be acquired.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 24 Sep 2021 21:12:36 +0000 (14:12 -0700)]
mptcp: use lockdep_assert_held_once() instead of open-coding it
We have a few more places where the mptcp code duplicates
lockdep_assert_held_once(). Let's use the existing macro and
avoid a bunch of compiler's conditional.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since OPTIONS_MPTCP_MPC has been defined, use it instead of open-coding.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When recovering after a link failure, snd_nxt should not be set to a
lower value. Else, update of snd_nxt is broken because:
msk->snd_nxt += ret; (where ret is number of bytes sent)
assumes that snd_nxt always moves forward.
After reduction, its possible that snd_nxt update gets out of sync:
dfrag we just sent might have had a data sequence number even past
recovery_snd_nxt.
This change factors the common msk state update to a helper
and updates snd_nxt based on the current dfrag data sequence number.
The conditional is required for the recovery phase where we may
re-transmit old dfrags that are before current snd_nxt.
After this change, snd_nxt only moves forward and covers all in-sequence
data that was transmitted.
recovery_snd_nxt is retained to detect when recovery has completed.
Fixes: 1e1d9d6f119c5 ("mptcp: handle pending data on closed subflow") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 23 Jun 2021 08:36:46 +0000 (11:36 +0300)]
net/mlx5e: loopback test is not supported in switchdev mode
In switchdev mode we insert steering rules to eswitch that
make sure packets can't be looped back.
Modify the self tests infra and have flags per test.
Add a flag for tests that needs to be skipped in switchdev mode.
Before this commit:
$ ethtool --test enp8s0f0
The test result is FAIL
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Loopback Test 1
After this commit:
$ ethtool --test enp8s0f0
The test result is PASS
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Example output in dmesg:
enp8s0f0: Self test begin..
enp8s0f0: [0] Link Test start..
enp8s0f0: [0] Link Test end: result(0)
enp8s0f0: [1] Speed Test start..
enp8s0f0: [1] Speed Test end: result(0)
enp8s0f0: [2] Health Test start..
enp8s0f0: [2] Health Test end: result(0)
enp8s0f0: Self test out: status flags(0x1)
Roi Dayan [Thu, 12 Aug 2021 06:38:32 +0000 (09:38 +0300)]
net/mlx5e: Check action fwd/drop flag exists also for nic flows
The driver should add offloaded rules with either a fwd or drop action.
The check existed in parsing fdb flows but not when parsing nic flows.
Move the test into actions_match_supported() which is called for
checking nic flows and fdb flows.
Roi Dayan [Thu, 12 Aug 2021 06:37:19 +0000 (09:37 +0300)]
net/mlx5e: Remove incorrect addition of action fwd flag
A user is expected to explicit request a fwd or drop action.
It is not correct to implicit add a fwd action for the user,
when modify header action flag exists.
Roi Dayan [Wed, 11 Aug 2021 11:14:49 +0000 (14:14 +0300)]
net/mlx5e: Use correct return type
modify_header_match_supported() should return type bool but
it returns the value returned by is_action_keys_supported()
which is type int.
is_action_keys_supported() always returns either -EOPNOTSUPP
or 0 and it shouldn't change as the purpose of the function
is checking for support. so just make the function return
a bool type.
rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
Directly using _usecs_to_jiffies() might be unsafe, so it's
better to use usecs_to_jiffies() instead.
Because we can see that the result of _usecs_to_jiffies()
could be larger than MAX_JIFFY_OFFSET values without the
check of the input.
Fixes: c410bf01933e ("Fix the excessive initial retransmission timeout") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: tracking packets with CE marks in BW rate sample
In order to track CE marks per rate sample (one round trip), TCP needs a
per-skb header field to record the tp->delivered_ce count when the skb
was sent. To make space, we replace the "last_in_flight" field which is
used exclusively for NV congestion control. The stat needed by NV can be
alternatively approximated by existing stats tcp_sock delivered and
mss_cache.
This patch counts the number of packets delivered which have CE marks in
the rate sample, using similar approach of delivery accounting.
Cc: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Sep 2021 13:12:57 +0000 (14:12 +0100)]
Merge branch 'devlink-fixes'
Leon Romanovsky says:
====================
Batch of devlink related fixes
I'm asking to apply this batch of devlink fixes to net-next and not to
net, because most if not all fixes are for old code or/and can be considered
as cleanup.
It will cancel the need to deal with merge conflicts for my next devlink series :).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 23 Sep 2021 18:12:53 +0000 (21:12 +0300)]
qed: Don't ignore devlink allocation failures
devlink is a software interface that doesn't depend on any hardware
capabilities. The failure in SW means memory issues, wrong parameters,
programmer error e.t.c.
Like any other such interface in the kernel, the returned status of
devlink APIs should be checked and propagated further and not ignored.
Fixes: 755f982bb1ff ("qed/qede: make devlink survive recovery") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 23 Sep 2021 18:12:52 +0000 (21:12 +0300)]
ice: Delete always true check of PF pointer
PF pointer is always valid when PCI core calls its .shutdown() and
.remove() callbacks. There is no need to check it again.
Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 23 Sep 2021 18:12:49 +0000 (21:12 +0300)]
bnxt_en: Properly remove port parameter support
This driver doesn't have any port parameters and registers
devlink port parameters with empty table. Remove the useless
calls to devlink_port_params_register and _unregister.
Fixes: da203dfa89ce ("Revert "devlink: Add a generic wake_on_lan port parameter"") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 23 Sep 2021 18:12:48 +0000 (21:12 +0300)]
bnxt_en: Check devlink allocation and registration status
devlink is a software interface that doesn't depend on any hardware
capabilities. The failure in SW means memory issues, wrong parameters,
programmer error e.t.c.
Like any other such interface in the kernel, the returned status of
devlink APIs should be checked and propagated further and not ignored.
Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 23 Sep 2021 15:35:41 +0000 (18:35 +0300)]
net: dsa: felix: accept "ethernet-ports" OF node name
Since both forms are accepted, let's search for both when we
pre-validate the PHY modes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Sep 2021 09:26:52 +0000 (10:26 +0100)]
Merge branch 'mlxsw-next'
Ido Schimmel says:
====================
mlxsw: Add support for IP-in-IP with IPv6 underlay
Currently, mlxsw only supports IP-in-IP with IPv4 underlay. Traffic
routed through 'gre' netdevs is encapsulated with IPv4 and GRE headers.
Similarly, incoming IPv4 GRE packets are decapsulated and routed in the
overlay VRF (which can be the same as the underlay VRF).
This patchset adds support for IPv6 underlay using the 'ip6gre' netdev.
Due to architectural differences between Spectrum-1 and later ASICs,
this functionality is only supported on Spectrum-2 onwards (the software
data path is used for Spectrum-1).
Patchset overview:
Patches #1-#5 are preparations.
Patches #6-#9 add and extend required device registers.
Amit Cohen [Thu, 23 Sep 2021 12:37:00 +0000 (15:37 +0300)]
mlxsw: Add support for IP-in-IP with IPv6 underlay for Spectrum-2 and above
Currently, mlxsw driver supports IP-in-IP only with IPv4 underlay.
Add support for IPv6 underlay for Spectrum-2 and above.
Most of the configurations are same to IPv4, the main difference between
IPv4 and IPv6 is related to saving IP addresses.
IPv6 addresses are saved as part of KVD and the relevant registers hold
pointer to them.
Add API for that as part of ipip_ops, so then only Spectrum-2 and above
will save IPv6 addresses in this way.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:59 +0000 (15:36 +0300)]
mlxsw: spectrum_router: Increase parsing depth for IPv6 decapsulation
The Spectrum ASIC has a configurable limit on how deep into the packet
it parses. By default, the limit is 96 bytes.
For IP-in-IP packets, with IPv6 outer and inner headers, the default
parsing depth is not enough and without increasing it such packets cannot
be properly decapsulated.
Use the existing API to set parsing depth, call it once for each
decapsulation entry when it is created/destroyed.
There is no need to protect the code with new mutex because 'router->lock'
is already taken in these code paths.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:58 +0000 (15:36 +0300)]
mlxsw: Add IPV6_ADDRESS kvdl entry type
Add support for allocating and freeing KVD entries for IPv6 addresses.
These addresses are programmed by the RIPS register and referenced by
the RATR and RTDP registers for IPv6 underlay encapsulation and
decapsulation, respectively.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:57 +0000 (15:36 +0300)]
mlxsw: spectrum_ipip: Add mlxsw_sp_ipip_gre6_ops
Add operations for IP-in-IP GRE6. For now the function can_offload()
returns false and the other functions warn as they should never be called.
A later patch will add dedicated operations for Spectrum-2 which will
support IPv6 underlay, so use 'mlxsw_sp1_' prefix for the new operations.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:56 +0000 (15:36 +0300)]
mlxsw: Create separate ipip_ops_arr for different ASICs
Currently, there is support for IP-in-IP only with IPv4 underlay for all
supported Spectrum ASICs.
The next patches will add support for IPv6 underlay only for Spectrum-2
and above.
Add infrastructure for splitting IP-in-IP support between different
ASICs - create separate ipip_ops_arr and add ipips_init function to set the
right ops.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:55 +0000 (15:36 +0300)]
mlxsw: reg: Add support for ritr_loopback_ipip6_pack()
The RITR register is used to configure the router interface table.
For IP-in-IP, it stores the underlay source IP address for encapsulation
and also the ingress RIF for the underlay lookup.
Add support for IPv6 IP-in-IP configuration.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:54 +0000 (15:36 +0300)]
mlxsw: reg: Add support for ratr_ipip6_entry_pack()
The RATR register is used to configure the Router Adjacency (next-hop)
Table.
For IP-in-IP entry, underlay destination IPv4 is saved as part of this
register and underlay destination IPv6 is saved by RIPS register and RATR
saves pointer to it.
Add function for setting IPv6 IP-in-IP configuration.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:53 +0000 (15:36 +0300)]
mlxsw: reg: Add support for rtdp_ipip6_pack()
The RTDP register is used for configuring the tunnel decapsulation
properties of NVE and IP-in-IP.
Linux tunnels verify packets before decapsulation based on the packet's
source IP, which must match tunnel remote IP.
RTDP is used to configure decapsulation so that it filters out packets that
are not IPv6 or have the wrong source IP or wrong GRE key.
For IP-in-IP entry, source IPv4 is saved as part of this register and
source IPv6 is saved by RIPS register and RTDP saves pointer to it.
Create common function for configuring both IPv4 and IPv6 and add
dedicated functions for each protocol.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:52 +0000 (15:36 +0300)]
mlxsw: reg: Add Router IP version Six Register
The RIPS register is used to store IPv6 addresses for use by the NVE and
IP-in-IP.
For IPv6 underlay support, RATR register needs to hold a pointer to the
remote IPv6 address for encapsulation and RTDP register needs to hold a
pointer to the local IPv6 address for decapsulation check.
Add the required register for saving IPv6 addresses.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:51 +0000 (15:36 +0300)]
mlxsw: Take tunnel's type into account when searching underlay device
The function __mlxsw_sp_ipip_netdev_ul_dev_get() returns the underlay
device that corresponds to the overlay device that it gets.
Currently, this function assumes that the tunnel is IPv4 GRE, because it
is the only one that is supported by mlxsw driver.
This assumption will no longer be correct when IPv6 GRE support is added,
resulting in wrong underlay device being returned.
Instead, check 'ol_dev->type' and return the underlay device accordingly.
Move the function to spectrum_ipip.c because spectrum_router.c should not
be aware to tunnel type.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:50 +0000 (15:36 +0300)]
mlxsw: spectrum_ipip: Create common function for mlxsw_sp_ipip_ol_netdev_change_gre()
The function mlxsw_sp_ipip_ol_netdev_change_gre4() contains code that
can be shared between IPv4 and IPv6.
The only difference is the way that arguments are taken from tunnel
parameters, which are different between IPv4 and IPv6.
For that, add structure 'mlxsw_sp_ipip_parms' to hold all the required
parameters for the function and save it as part of
'struct mlxsw_sp_ipip_entry' instead of the existing structure that is
not shared between IPv4 and IPv6. Add new operation as part of
'mlxsw_sp_ipip_ops' to initialize the new structure.
Then mlxsw_sp_ipip_ol_netdev_change_gre{4,6}() will prepare the new
structure and both will call the same function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:49 +0000 (15:36 +0300)]
mlxsw: spectrum_router: Fix arguments alignment
Suppress the following checkpatch.pl check [1] by adding a variable to
store the IP-in-IP options. Noticed while adding equivalent IPv6 code in
subsequent patches.
[1]
CHECK: Alignment should match open parenthesis
+ mlxsw_reg_ritr_loopback_ipip4_pack(ritr_pl, lb_cf.lb_ipipt,
+
+ MLXSW_REG_RITR_LOOPBACK_IPIP_OPTIONS_GRE_KEY_PRESET,
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:48 +0000 (15:36 +0300)]
mlxsw: spectrum_ipip: Pass IP tunnel parameters by reference and as 'const'
Currently, there are several functions that handle 'struct ip_tunnel_parm'
and 'struct __ip6_tnl_parm'.
Change the mentioned functions to get IP tunnel parameters by reference
and as 'const'.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 23 Sep 2021 12:36:47 +0000 (15:36 +0300)]
mlxsw: spectrum_router: Create common function for fib_entry_type_unset() code
mlxsw_sp_fib4_entry_type_unset() is not specific for IPv4 FIB entry,
move the code to mlxsw_sp_fib_entry_type_unset(), and call this function
from mlxsw_sp_fib4_entry_type_unset() so then it will be used for IPv6
also.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
Previous releases - regressions:
- introduce a shutdown method to mdio device drivers, and make DSA
switch drivers compatible with masters disappearing on shutdown;
preventing infinite reference wait
- fix issues in mdiobus users related to ->shutdown vs ->remove
- virtio-net: fix pages leaking when building skb in big mode
- xen-netback: correct success/error reporting for the
SKB-with-fraglist
- dsa: tear down devlink port regions when tearing down the devlink
port on error
- nexthop: fix division by zero while replacing a resilient group
- hns3: check queue, vf, vlan ids range before using
Previous releases - always broken:
- napi: fix race against netpoll causing NAPI getting stuck
- mlx4_en: ensure link operstate is updated even if link comes up
before netdev registration
- bnxt_en: fix TX timeout when TX ring size is set to the smallest
- enetc: fix illegal access when reading affinity_hint; prevent oops
on sysfs access
- core: correct the sock::sk_lock.owned lockdep annotations"
* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
atlantic: Fix issue in the pm resume flow.
net/mlx4_en: Don't allow aRFS for encapsulated packets
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
nfc: st-nci: Add SPI ID matching DT compatible
MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
nexthop: Fix memory leaks in nexthop notification chain listeners
mptcp: ensure tx skbs always have the MPTCP ext
qed: rdma - don't wait for resources under hw error recovery flow
s390/qeth: fix deadlock during failing recovery
s390/qeth: Fix deadlock in remove_discipline
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
net: dsa: realtek: register the MDIO bus under devres
net: dsa: don't allocate the slave_mii_bus using devres
Doc: networking: Fox a typo in ice.rst
net: dsa: fix dsa_tree_setup error path
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net/smc: add missing error check in smc_clc_prfx_set()
net: hns3: fix a return value error in hclge_get_reset_status()
net: hns3: check vlan id before using it
...
Prior to the commit 7e1c0d6f5820 ("memcg: switch lruvec stats to rstat")
and the commit aa48e47e3906 ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time. The commit aa48e47e3906
moved the lruvec stats to rstat infrastructure and the commit 7e1c0d6f5820 bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds. More specifically it decoupled the
number of stats and the number of cgroups from the error rate.
However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently. Previously in the slowpath
the kernel adds the stats up the memcg tree. After aa48e47e3906, the
kernel triggers the asyn lruvec stats flush through queue_work(). This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].
We tried two options to fix the regression:
1) Increase the threshold to trigger the slowpath in lruvec stats
update codepath from 32 to 512.
2) Remove the slowpath from lruvec stats update codepath and instead
flush the stats in the page refault codepath. The assumption is that
the kernel timely flush the stats, so, the update tree would be
small in the refault codepath to not cause the preformance impact.
Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e. (1) 5.15-rc1 as baseline (2) 5.15-rc1 with aa48e47e3906 and 7e1c0d6f5820 reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.
From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.
Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.
Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit aa48e47e3906
("memcg: infrastructure to flush memcg stats").
Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.
After fixing hibernation resume flow, another usecase was found which
should be explicitly handled - resume when device is in "down" state.
Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
up during suspend/hibernate. We still need to perform nic_deinit() if
caller requests for it, to handle the freeze/resume scenarios.
Fixes: 57f780f1c433 ("atlantic: Fix driver resume flow.") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aya Levin [Thu, 23 Sep 2021 06:51:45 +0000 (09:51 +0300)]
net/mlx4_en: Don't allow aRFS for encapsulated packets
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.
Fixes: 1eb8c695bda9 ("net/mlx4_en: Add accelerated RFS support") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 23 Sep 2021 02:03:38 +0000 (19:03 -0700)]
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
The blamed commit made the fatally incorrect assumption that ports which
aren't in the FORWARDING STP state should not have packets forwarded
towards them, and that is all that needs to be done.
However, that logic alone permits BLOCKING ports to forward to
FORWARDING ports, which of course allows packet storms to occur when
there is an L2 loop.
The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
do for you", but "what can you do for the bridge". This way, only
FORWARDING ports forward to the other FORWARDING ports from the same
bridging domain, and we are still compatible with the idea of multiple
bridges.
Fixes: df291e54ccca ("net: ocelot: support multiple bridges") Suggested-by: Colin Foster <colin.foster@in-advantage.com> Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes multiple CLS_REPLACE calls are issued for the same connection.
rhashtable_insert_fast does not check for these duplicates, so multiple
hardware flow entries can be created.
Fix this by checking for an existing entry early
Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 22 Sep 2021 18:36:55 +0000 (21:36 +0300)]
net: dsa: sja1105: stop using priv->vlan_aware
Now that the sja1105 driver is finally sane enough again to stop having
a ternary VLAN awareness state, we can remove priv->vlan_aware and query
DSA for the ds->vlan_filtering value (for SJA1105, VLAN filtering is a
global property).
Also drop the paranoid checking that DSA calls ->port_vlan_filtering
multiple times without the VLAN awareness state changing. It doesn't,
the same check is present inside dsa_port_vlan_filtering too.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Brown [Wed, 22 Sep 2021 18:30:37 +0000 (19:30 +0100)]
nfc: st-nci: Add SPI ID matching DT compatible
Currently autoloading for SPI devices does not use the DT ID table, it uses
SPI modalises. Supporting OF modalises is going to be difficult if not
impractical, an attempt was made but has been reverted, so ensure that
module autoloading works for this driver by adding the part name used in
the compatible to the list of SPI IDs.
Fixes: 96c8395e2166 ("spi: Revert modalias changes") Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Sep 2021 11:50:26 +0000 (12:50 +0100)]
Merge branch 'remove-sk-skb-caches'
Paolo Abeni says:
====================
net: remove sk skb caches
Eric noted we would be better off reverting the sk
skb caches.
MPTCP relies on such a feature, so we need a
little refactor of the MPTCP tx path before the mentioned
revert.
The first patch exposes additional TCP helpers. The 2nd patch
changes the MPTCP code to do locally the whole skb allocation
and updating, so it does not rely anymore on core TCP helpers
for that nor the sk skb cache.
As a side effect, we can make the tcp_build_frag helper static.
Finally, we can pull Eric's revert.
RFC -> v1:
- drop driver specific patch - no more needed after helper rename
- rename skb_entail -> tcp_skb_entail (Eric)
- preserve the tcp_build_frag helpwe, just make it static (Eric)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Sep 2021 17:26:43 +0000 (19:26 +0200)]
tcp: remove sk_{tr}x_skb_cache
This reverts the following patches :
- commit 2e05fcae83c4 ("tcp: fix compile error if !CONFIG_SYSCTL")
- commit 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues")
- commit 472c2e07eef0 ("tcp: add one skb cache for tx")
- commit 8b27dae5a2e8 ("tcp: add one skb cache for rx")
Having a cache of one skb (in each direction) per TCP socket is fragile,
since it can cause a significant increase of memory needs,
and not good enough for high speed flows anyway where more than one skb
is needed.
We want instead to add a generic infrastructure, with more flexible
per-cpu caches, for alien NUMA nodes.
Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 22 Sep 2021 17:26:41 +0000 (19:26 +0200)]
mptcp: stop relying on tcp_tx_skb_cache
We want to revert the skb TX cache, but MPTCP is currently
using it unconditionally.
Rework the MPTCP tx code, so that tcp_tx_skb_cache is not
needed anymore: do the whole coalescing check, skb allocation
skb initialization/update inside mptcp_sendmsg_frag(), quite
alike the current TCP code.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 22 Sep 2021 17:26:40 +0000 (19:26 +0200)]
tcp: expose the tcp_mark_push() and tcp_skb_entail() helpers
the tcp_skb_entail() helper is actually skb_entail(), renamed
to provide proper scope.
The two helper will be used by the next patch.
RFC -> v1:
- rename skb_entail to tcp_skb_entail (Eric)
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 22 Sep 2021 11:12:17 +0000 (13:12 +0200)]
mptcp: ensure tx skbs always have the MPTCP ext
Due to signed/unsigned comparison, the expression:
info->size_goal - skb->len > 0
evaluates to true when the size goal is smaller than the
skb size. That results in lack of tx cache refill, so that
the skb allocated by the core TCP code lacks the required
MPTCP skb extensions.
Due to the above, syzbot is able to trigger the following WARN_ON():
Fix the issue rewriting the relevant expression to avoid
sign-related problems - note: size_goal is always >= 0.
Additionally, ensure that the skb in the tx cache always carries
the relevant extension.
Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com Fixes: 1094c6fe7280 ("mptcp: fix possible divide by zero") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 22 Sep 2021 15:10:29 +0000 (18:10 +0300)]
net: dsa: sja1105: don't keep a persistent reference to the reset GPIO
The driver only needs the reset GPIO for a very brief period, so instead
of using devres and keeping the descriptor pointer inside priv, just use
that descriptor inside the sja1105_hw_reset function and then let go of
it.
Also use gpiod_get_optional while at it, and error out on real errors
(bad flags etc).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Sep 2021 11:45:07 +0000 (12:45 +0100)]
Merge branch 'ja1105-deps'
Vladimir Oltean says:
====================
Fix circular dependency between sja1105 and tag_sja1105
As discussed here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocols cannot use symbols exported by switch drivers.
Eliminate the two instances of that from tag_sja1105, and that allows us
to have a working setup with modules again.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 22 Sep 2021 14:37:26 +0000 (17:37 +0300)]
net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.
The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.
Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.
Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.
Vladimir Oltean [Wed, 22 Sep 2021 14:37:25 +0000 (17:37 +0300)]
net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.
The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.
So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).
On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.
DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.
When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.
The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.
To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).
With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.
Vladimir Oltean [Wed, 22 Sep 2021 13:57:03 +0000 (16:57 +0300)]
net: dsa: sja1105: remove sp->dp
It looks like this field was never used since its introduction in commit 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through
standalone ports") remove it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nexthop: Fix memory leaks in nexthop notification chain listeners
syzkaller discovered memory leaks [1] that can be reduced to the
following commands:
# ip nexthop add id 1 blackhole
# devlink dev reload pci/0000:06:00.0
As part of the reload flow, mlxsw will unregister its netdevs and then
unregister from the nexthop notification chain. Before unregistering
from the notification chain, mlxsw will receive delete notifications for
nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
will not receive notifications for nexthops using netdevs that are not
dismantled as part of the reload flow. For example, the blackhole
nexthop above that internally uses the loopback netdev as its nexthop
device.
One way to fix this problem is to have listeners flush their nexthop
tables after unregistering from the notification chain. This is
error-prone as evident by this patch and also not symmetric with the
registration path where a listener receives a dump of all the existing
nexthops.
Therefore, fix this problem by replaying delete notifications for the
listener being unregistered. This is symmetric to the registration path
and also consistent with the netdev notification chain.
The above means that unregister_nexthop_notifier(), like
register_nexthop_notifier(), will have to take RTNL in order to iterate
over the existing nexthops and that any callers of the function cannot
hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
driver. To avoid a deadlock, change the latter to unregister its nexthop
listener without holding RTNL, making it symmetric to the registration
path.
Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
init: Revert accidental changes to print irqs_disabled()
Commit f8ade8dddb16 ("xsurf100: drop include of lib8390.c") accidentally
changed init/main.c. Revert that part.
Fixes: f8ade8dddb16 ("xsurf100: drop include of lib8390.c") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konrad's new job role is putting a serious cramp on him
being a responsive maintainer and as such he is handing off
the reins to Juergen, Roger, and Stefano.
Merge tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi modalias fix from Mark Brown:
"Fix modalias issues
As reported by Russell King the change to use OF style modaliases for
DT enumerated broke at least the spi-nor driver, the patch here
reverts that change to fix the regression.
Sadly this will mean that anything that started loading since the
change to OF modaliases will run into issues, there doesn't seem to be
any approach which doesn't cause some problems and thi seems like the
least bad approach - gory details are in the commit log for the
change.
I'm currently working through the SPI drivers to add ID tables and
missing IDs to tables which should address things from the other end,
this seems more straightforward and robust than any other options"
* tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Revert modalias changes