Matthieu Baerts [Wed, 30 Nov 2022 14:06:27 +0000 (15:06 +0100)]
selftests: mptcp: declare var as local
Just to avoid classical Bash pitfall where variables are accidentally
overridden by other functions because the proper scope has not been
defined.
That's also what is done in other MPTCP selftests scripts where all non
local variables are defined at the beginning of the script and the
others are defined with the "local" keyword.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts [Wed, 30 Nov 2022 14:06:26 +0000 (15:06 +0100)]
selftests: mptcp: clearly declare global ns vars
It is clearer to declare these global variables at the beginning of the
file as it is done in other MPTCP selftests rather than in functions in
the middle of the script.
So for uniformity reason, we can do the same here in mptcp_sockopt.sh.
Suggested-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
remove label = "cpu" from DSA dt-binding
With this patch series, we're completely getting rid of 'label = "cpu";'
which is not used by the DSA dt-binding at all.
Information for taking the patches for maintainers:
Patch 1: netdev maintainers (based off netdev/net-next.git main)
Patch 2-3: SoC maintainers (based off soc/soc.git soc/dt)
Patch 4: MIPS maintainers (based off mips/linux.git mips-next)
Patch 5: PowerPC maintainers (based off powerpc/linux.git next-test)
I've been meaning to submit this for a few months. Find the relevant
conversation here:
https://lore.kernel.org/netdev/20220913155408.GA3802998-robh@kernel.org/
Here's how I did it, for the interested (or suggestions):
Find the platforms which have got 'label = "cpu";' defined.
grep -rnw . -e 'label = "cpu";'
Remove the line where 'label = "cpu";' is included.
sed -i /'label = "cpu";'/,+d arch/arm/boot/dts/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/freescale/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/marvell/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/mediatek/*
sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/rockchip/*
sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/qca/*
sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/ralink/*
sed -i /'label = "cpu";'/,+d arch/powerpc/boot/dts/turris1x.dts
sed -i /'label = "cpu";'/,+d Documentation/devicetree/bindings/net/qca,ar71xx.yaml
Restore the symlink files which typechange after running sed.
====================
The static key introduced by commit 6015c71e656b ("tcp: md5: add
tcp_md5_needed jump label") is a fast-path optimization aimed at
avoiding a cache line miss.
Once an MD5 key is introduced in the system the static key is enabled
and never disabled. Address this by disabling the static key when
the last tcp_md5sig_info in system is destroyed.
Previously it was submitted as a part of TCP-AO patches set [1].
Now in attempt to split 36 patches submission, I send this independently.
Version 5:
https://lore.kernel.org/all/20221122185534.308643-1-dima@arista.com/T/#u
Version 4:
https://lore.kernel.org/all/20221115211905.1685426-1-dima@arista.com/T/#u
Version 3:
https://lore.kernel.org/all/20221111212320.1386566-1-dima@arista.com/T/#u
Version 2:
https://lore.kernel.org/all/20221103212524.865762-1-dima@arista.com/T/#u
Version 1:
https://lore.kernel.org/all/20221102211350.625011-1-dima@arista.com/T/#u
Dmitry Safonov [Wed, 23 Nov 2022 17:38:58 +0000 (17:38 +0000)]
net/tcp: Do cleanup on tcp_md5_key_copy() failure
If the kernel was short on (atomic) memory and failed to allocate it -
don't proceed to creation of request socket. Otherwise the socket would
be unsigned and userspace likely doesn't expect that the TCP is not
MD5-signed anymore.
Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Safonov [Wed, 23 Nov 2022 17:38:57 +0000 (17:38 +0000)]
net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
To do that, separate two scenarios:
- where it's the first MD5 key on the system, which means that enabling
of the static key may need to sleep;
- copying of an existing key from a listening socket to the request
socket upon receiving a signed TCP segment, where static key was
already enabled (when the key was added to the listening socket).
Now the life-time of the static branch for TCP-MD5 is until:
- last tcp_md5sig_info is destroyed
- last socket in time-wait state with MD5 key is closed.
Which means that after all sockets with TCP-MD5 keys are gone, the
system gets back the performance of disabled md5-key static branch.
While at here, provide static_key_fast_inc() helper that does ref
counter increment in atomic fashion (without grabbing cpus_read_lock()
on CONFIG_JUMP_LABEL=y). This is needed to add a new user for
a static_key when the caller controls the lifetime of another user.
Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Safonov [Wed, 23 Nov 2022 17:38:55 +0000 (17:38 +0000)]
jump_label: Prevent key->enabled int overflow
1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
protection against key->enabled refcounter overflow.
2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
still may turn the refcounter negative as (v + 1) may overflow.
key->enabled is indeed a ref-counter as it's documented in multiple
places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
etc.
As -1 is reserved for static key that's in process of being enabled,
functions would break with negative key->enabled refcount:
- for CONFIG_JUMP_LABEL=n negative return of static_key_count()
breaks static_key_false(), static_key_true()
- the ref counter may become 0 from negative side by too many
static_key_slow_inc() calls and lead to use-after-free issues.
These flaws result in that some users have to introduce an additional
mutex and prevent the reference counter from overflowing themselves,
see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.
Prevent the reference counter overflow by checking if (v + 1) > 0.
Change functions API to return whether the increment was successful.
Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Juhee Kang [Tue, 29 Nov 2022 16:12:44 +0000 (01:12 +0900)]
r8169: use tp_to_dev instead of open code
The open code is defined as a helper function(tp_to_dev) on r8169_main.c,
which the open code is &tp->pci_dev->dev. The helper function was added
in commit 1e1205b7d3e9 ("r8169: add helper tp_to_dev"). And then later,
commit f1e911d5d0df ("r8169: add basic phylib support") added
r8169_phylink_handler function but it didn't use the helper function.
Thus, tp_to_dev() replaces the open code. This patch doesn't change logic.
But fixing this appears to be not so simple. The patch set represents my
attempt to address it.
In short, the problem is that dpaa2_mac_connect() and dpaa2_mac_disconnect()
call 2 phylink functions in a row, one takes rtnl_lock() itself -
phylink_create(), and one which requires rtnl_lock() to be held by the
caller - phylink_fwnode_phy_connect(). The existing approach in the
drivers is too simple. We take rtnl_lock() when calling dpaa2_mac_connect(),
which is what results in the deadlock.
Fixing just that creates another problem. The drivers make use of
rtnl_lock() for serializing with other code paths too. I think I've
found all those code paths, and established other mechanisms for
serializing with them.
====================
Vladimir Oltean [Tue, 29 Nov 2022 14:12:21 +0000 (16:12 +0200)]
net: dpaa2-mac: move rtnl_lock() only around phylink_{,dis}connect_phy()
After the introduction of a private mac_lock that serializes access to
priv->mac (and port_priv->mac in the switch), the only remaining purpose
of rtnl_lock() is to satisfy the locking requirements of
phylink_fwnode_phy_connect() and phylink_disconnect_phy().
But the functions these live in, dpaa2_mac_connect() and
dpaa2_mac_disconnect(), have contradictory locking requirements.
While phylink_fwnode_phy_connect() wants rtnl_lock() to be held,
phylink_create() wants it to not be held.
Move the rtnl_lock() from top-level (in the dpaa2-eth and dpaa2-switch
drivers) to only surround the phylink calls that require it, in the
dpaa2-mac library code.
This is possible because dpaa2_mac_connect() and dpaa2_mac_disconnect()
run unlocked, and there isn't any danger of an AB/BA deadlock between
the rtnl_mutex and other private locks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:20 +0000 (16:12 +0200)]
net: dpaa2-switch: serialize changes to priv->mac with a mutex
The dpaa2-switch driver uses a DPMAC in the same way as the dpaa2-eth
driver, so we need to duplicate the locking solution established by the
previous change to the switch driver as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:19 +0000 (16:12 +0200)]
net: dpaa2-eth: serialize changes to priv->mac with a mutex
The dpaa2 architecture permits dynamic connections between objects on
the fsl-mc bus, specifically between a DPNI object (represented by a
struct net_device) and a DPMAC object (represented by a struct phylink).
The DPNI driver is notified when those connections are created/broken
through the dpni_irq0_handler_thread() method. To ensure that ethtool
operations, as well as netdev up/down operations serialize with the
connection/disconnection of the DPNI with a DPMAC,
dpni_irq0_handler_thread() takes the rtnl_lock() to block those other
operations from taking place.
There is code called by dpaa2_mac_connect() which wants to acquire the
rtnl_mutex once again, see phylink_create() -> phylink_register_sfp() ->
sfp_bus_add_upstream() -> rtnl_lock(). So the strategy doesn't quite
work out, even though it's fairly simple.
Create a different strategy, where all code paths in the dpaa2-eth
driver access priv->mac only while they are holding priv->mac_lock.
The phylink instance is not created or connected to the PHY under the
priv->mac_lock, but only assigned to priv->mac then. This will eliminate
the reliance on the rtnl_mutex.
Add lockdep annotations and put comments where holding the lock is not
necessary, and priv->mac can be dereferenced freely.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:18 +0000 (16:12 +0200)]
net: dpaa2-eth: connect to MAC before requesting the "endpoint changed" IRQ
dpaa2_eth_connect_mac() is called both from dpaa2_eth_probe() and from
dpni_irq0_handler_thread().
It could happen that the DPNI gets connected to a DPMAC on the fsl-mc
bus exactly during probe, as soon as the "endpoint change" interrupt is
requested in dpaa2_eth_setup_irqs(). This will cause the
dpni_irq0_handler_thread() to register a phylink instance for that DPMAC.
Then, the probing function will also try to register a phylink instance
for the same DPMAC, operation which should fail (and this will fail the
probing of the driver).
Reorder dpaa2_eth_setup_irqs() and dpaa2_eth_connect_mac(), such that
dpni_irq0_handler_thread() never races with the DPMAC-related portion of
the probing path.
Also reorder dpaa2_eth_disconnect_mac() to be in the mirror position of
dpaa2_eth_connect_mac() in the teardown path.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:16 +0000 (16:12 +0200)]
net: dpaa2: publish MAC stringset to ethtool -S even if MAC is missing
DPNIs and DPSW objects can connect and disconnect at runtime from DPMAC
objects on the same fsl-mc bus. The DPMAC object also holds "ethtool -S"
unstructured counters. Those counters are only shown for the entity
owning the netdev (DPNI, DPSW) if it's connected to a DPMAC.
The ethtool stringset code path is split into multiple callbacks, but
currently, connecting and disconnecting the DPMAC takes the rtnl_lock().
This blocks the entire ethtool code path from running, see
ethnl_default_doit() -> rtnl_lock() -> ops->prepare_data() ->
strset_prepare_data().
This is going to be a problem if we are going to no longer require
rtnl_lock() when connecting/disconnecting the DPMAC, because the DPMAC
could appear between ops->get_sset_count() and ops->get_strings().
If it appears out of the blue, we will provide a stringset into an array
that was dimensioned thinking the DPMAC wouldn't be there => array
accessed out of bounds.
There isn't really a good way to work around that, and I don't want to
put too much pressure on the ethtool framework by playing locking games.
Just make the DPMAC counters be always available. They'll be zeroes if
the DPNI or DPSW isn't connected to a DPMAC.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:15 +0000 (16:12 +0200)]
net: dpaa2-switch: assign port_priv->mac after dpaa2_mac_connect() call
The dpaa2-switch has the exact same locking requirements when connected
to a DPMAC, so it needs port_priv->mac to always point either to NULL,
or to a DPMAC with a fully initialized phylink instance.
Make the same preparatory change in the dpaa2-switch driver as in the
dpaa2-eth one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:14 +0000 (16:12 +0200)]
net: dpaa2-eth: assign priv->mac after dpaa2_mac_connect() call
There are 2 requirements for correct code:
- Any time the driver accesses the priv->mac pointer at runtime, it
either holds NULL to indicate a DPNI-DPNI connection (or unconnected
DPNI), or a struct dpaa2_mac whose phylink instance was fully
initialized (created and connected to the PHY). No changes are made to
priv->mac while it is being used. Currently, rtnl_lock() watches over
the call to dpaa2_eth_connect_mac(), so it serves the purpose of
serializing this with all readers of priv->mac.
- dpaa2_mac_connect() should run unlocked, because inside it are 2
phylink calls with incompatible locking requirements: phylink_create()
requires that the rtnl_mutex isn't held, and phylink_fwnode_phy_connect()
requires that the rtnl_mutex is held. The only way to solve those
contradictory requirements is to let dpaa2_mac_connect() take
rtnl_lock() when it needs to.
To solve both requirements, we need to identify the writer side of the
priv->mac pointer, which can be wrapped in a mutex private to the driver
in a future patch. The dpaa2_mac_connect() cannot be part of the writer
side critical section, because of an AB/BA deadlock with rtnl_lock().
So the strategy needs to be that where we prepare the DPMAC by calling
dpaa2_mac_connect(), and only make priv->mac point to it once it's fully
prepared. This ensures that the writer side critical section has the
absolute minimum surface it can.
The reverse strategy is adopted in the dpaa2_eth_disconnect_mac() code
path. This makes sure that priv->mac is NULL when we start tearing down
the DPMAC that we disconnected from, and concurrent code will simply not
see it.
No locking changes in this patch (concurrent code is still blocked by
the rtnl_mutex).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:12 +0000 (16:12 +0200)]
net: dpaa2-mac: absorb phylink_start() call into dpaa2_mac_start()
The phylink handling is intended to be hidden inside the dpaa2_mac
object. Move the phylink_start() call into dpaa2_mac_start(), and
phylink_stop() into dpaa2_mac_stop().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:11 +0000 (16:12 +0200)]
net: dpaa2: replace dpaa2_mac_is_type_fixed() with dpaa2_mac_is_type_phy()
dpaa2_mac_is_type_fixed() is a header with no implementation and no
callers, which is referenced from the documentation though. It can be
deleted.
On the other hand, it would be useful to reuse the code between
dpaa2_eth_is_type_phy() and dpaa2_switch_port_is_type_phy(). That common
code should be called dpaa2_mac_is_type_phy(), so let's create that.
The removal and the addition are merged into the same patch because,
in fact, is_type_phy() is the logical opposite of is_type_fixed().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:10 +0000 (16:12 +0200)]
net: dpaa2-eth: don't use -ENOTSUPP error code
dpaa2_eth_setup_dpni() is called from the probe path and
dpaa2_eth_set_link_ksettings() is propagated to user space.
include/linux/errno.h says that ENOTSUPP is "Defined for the NFSv3
protocol". Conventional wisdom has it to not use it in networking
drivers. Replace it with -EOPNOTSUPP.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Tue, 29 Nov 2022 09:43:47 +0000 (12:43 +0300)]
net: microchip: sparx5: Fix error handling in vcap_show_admin()
If vcap_dup_rule() fails that leads to an error pointer dereference
side the call to vcap_free_rule(). Also it only returns an error if the
very last call to vcap_read_rule() fails and it returns success for
other errors.
I've changed it to just stop printing after the first error and return
an error code.
Fixes: 3a7921560d2f ("net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Link: https://lore.kernel.org/r/Y4XUUx9kzurBN+BV@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Mon, 28 Nov 2022 11:06:14 +0000 (14:06 +0300)]
bonding: uninitialized variable in bond_miimon_inspect()
The "ignore_updelay" variable needs to be initialized to false.
Fixes: f8a65ab2f3ff ("bonding: fix link recovery in mode 2 when updelay is nonzero") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 1 Dec 2022 06:05:19 +0000 (22:05 -0800)]
Merge tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-11-29
Misc update for mlx5 driver
1) Various trivial cleanups
2) Maor Dickman, Adds support for trap offload with additional actions
3) From Tariq, UMR (device memory registrations) cleanups,
UMR WQE must be aligned to 64B per device spec, (not a bug fix).
* tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Support devlink reload of IPsec core
net/mlx5e: TC, Add offload support for trap with additional actions
net/mlx5e: Do early return when setup vports dests for slow path flow
net/mlx5: Remove redundant check
net/mlx5e: Delete always true DMA check
net/mlx5e: Don't access directly DMA device pointer
net/mlx5e: Don't use termination table when redundant
net/mlx5: Fix orthography errors in documentation
net/mlx5: Use generic definition for UMR KLM alignment
net/mlx5: Generalize name of UMR alignment definition
net/mlx5: Remove unused UMR MTT definitions
net/mlx5e: Add padding when needed in UMR WQEs
net/mlx5: Remove unused ctx variables
net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
net/mlx5e: Remove unneeded io-mapping.h #include
====================
Xiaolei Wang [Wed, 30 Nov 2022 02:12:16 +0000 (10:12 +0800)]
net: phy: Add link between phy dev and mac dev
If the external phy used by current mac interface is
managed by another mac interface, it means that this
network port cannot work independently, especially
when the system suspends and resumes, the following
trace may appear, so we should create a device link
between phy dev and mac dev.
WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
Modules linked in:
CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
Hardware name: Freescale i.MX6 SoloX (Device Tree)
Workqueue: events_power_efficient phy_state_machine
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x90
dump_stack_lvl from __warn+0xb4/0x24c
__warn from warn_slowpath_fmt+0x5c/0xd8
warn_slowpath_fmt from phy_error+0x20/0x68
phy_error from phy_state_machine+0x22c/0x23c
phy_state_machine from process_one_work+0x288/0x744
process_one_work from worker_thread+0x3c/0x500
worker_thread from kthread+0xf0/0x114
kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0951fb0 to 0xf0951ff8)
====================
net: devlink: return the driver name in devlink_nl_info_fill
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
The goal of this series is to have the devlink core to report this
information instead of the drivers.
The first patch fulfills the actual goal of this series: modify
devlink core to report the driver name and clean-up all drivers. Both
have to be done in an atomic change to avoid attribute
duplication. This same patch also removes the
devlink_info_driver_name_put() function to prevent future drivers from
reporting the driver name themselves.
The second patch allows the core to call devlink_nl_info_fill() even
if the devlink_ops::info_get() callback is NULL. This leads to the
third and final patch which cleans up the drivers which have an empty
info_get().
====================
Vincent Mailhol [Tue, 29 Nov 2022 09:51:39 +0000 (18:51 +0900)]
net: devlink: make the devlink_ops::info_get() callback optional
Some drivers only reported the driver name in their
devlink_ops::info_get() callback. Now that the core provides this
information, the callback became empty. For such drivers, just
removing the callback would prevent the core from executing
devlink_nl_info_fill() meaning that "devlink dev info" would not
return anything.
Make the callback function optional by executing
devlink_nl_info_fill() even if devlink_ops::info_get() is NULL.
N.B.: the drivers with devlink support which previously did not
implement devlink_ops::info_get() will now also be able to report
the driver name.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vincent Mailhol [Tue, 29 Nov 2022 09:51:38 +0000 (18:51 +0900)]
net: devlink: let the core report the driver name instead of the drivers
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.
Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 1 Dec 2022 04:54:39 +0000 (20:54 -0800)]
Merge branch 'support-direct-read-from-region'
Jacob Keller says:
====================
support direct read from region
A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.
This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.
If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.
However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.
For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.
This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.
This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.
Jacob Keller [Mon, 28 Nov 2022 20:36:47 +0000 (12:36 -0800)]
ice: implement direct read for NVM and Shadow RAM regions
Implement the .read handler for the NVM and Shadow RAM regions. This
enables user space to read a small chunk of the flash without needing the
overhead of creating a full snapshot.
Update the documentation for ice to detail which regions have direct read
support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:46 +0000 (12:36 -0800)]
ice: document 'shadow-ram' devlink region
78ad87da9978 ("ice: devlink: add shadow-ram region to snapshot Shadow RAM")
added support for the 'shadow-ram' devlink region, but did not document it
in the ice devlink documentation. Fix this.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:45 +0000 (12:36 -0800)]
ice: use same function to snapshot both NVM and Shadow RAM
The ice driver supports a region for both the flat NVM contents as well as
the Shadow RAM which is a layer built on top of the flash during device
initialization.
These regions use an almost identical read function, except that the NVM
needs to set the direct flag when reading, while Shadow RAM needs to read
without the direct flag set. They each call ice_read_flat_nvm with the only
difference being whether to set the direct flash flag.
The NVM region read function also was fixed to read the NVM in blocks to
avoid a situation where the firmware reclaims the lock due to taking too
long.
Note that the region snapshot function takes the ops pointer so the
function can easily determine which region to read. Make use of this and
re-use the NVM snapshot function for both the NVM and Shadow RAM regions.
This makes Shadow RAM benefit from the same block approach as the NVM
region. It also reduces code in the ice driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:44 +0000 (12:36 -0800)]
devlink: support directly reading from region memory
To read from a region, user space must currently request a new snapshot of
the region and then read from that snapshot. This can sometimes be overkill
if user space only reads a tiny portion. They first create the snapshot,
then request a read, then destroy the snapshot.
For regions which have a single underlying "contents", it makes sense to
allow supporting direct reading of the region data.
Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if
requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set,
then perform a direct read instead of using a snapshot. Direct read is
mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken
to ensure that we reject commands which provide incorrect attributes.
Regions must enable support for direct read by implementing the .read()
callback function. If a region does not support such direct reads, a
suitable extended error message is reported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:43 +0000 (12:36 -0800)]
devlink: refactor region_read_snapshot_fill to use a callback function
The devlink_nl_region_read_snapshot_fill is used to copy the contents of
a snapshot into a message for reporting to userspace via the
DEVLINK_CMG_REGION_READ netlink message.
A future change is going to add support for directly reading from
a region. Almost all of the logic for this new capability is identical.
To help reduce code duplication and make this logic more generic,
refactor the function to take a cb and cb_priv pointer for doing the
actual copy.
Add a devlink_region_snapshot_fill implementation that will simply copy
the relevant chunk of the region. This does require allocating some
storage for the chunk as opposed to simply passing the correct address
forward to the devlink_nl_cmg_region_read_chunk_fill function.
A future change to implement support for directly reading from a region
without a snapshot will provide a separate implementation that calls the
newly added devlink region operation.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:41 +0000 (12:36 -0800)]
devlink: find snapshot in devlink_nl_cmd_region_read_dumpit
The snapshot pointer is obtained inside of the function
devlink_nl_region_read_snapshot_fill. Simplify this function by locating
the snapshot upfront in devlink_nl_cmd_region_read_dumpit instead. This
aligns with how other netlink attributes are handled, and allows us to
exit slightly earlier if an invalid snapshot ID is provided.
It also allows us to pass the snapshot pointer directly to the
devlink_nl_region_read_snapshot_fill, and remove the now unused attrs
parameter.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Mon, 28 Nov 2022 14:29:59 +0000 (15:29 +0100)]
net: microchip: vcap: Change how the rule id is generated
Currently whenever a new rule id is generated, it picks up the next
number bigger than previous id. So it would always be 1, 2, 3, etc.
When the rule with id 1 will be deleted and a new rule will be added,
it will have the id 4 and not id 1.
In theory this can be a problem if at some point a rule will be added
and removed ~0 times. Then no more rules can be added because there
are no more ids.
Change this such that when a new rule is added, search for an empty
rule id starting with value of 1 as value 0 is reserved.
Christophe JAILLET [Fri, 25 Nov 2022 12:24:00 +0000 (13:24 +0100)]
octeontx2-af: Fix the size of memory allocated for the 'id_bmap' bitmap
This allocation is really spurious.
The size of the bitmap is 'tot_ids' and it is used as such in the driver.
So we could expect something like:
table->id_bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(table->tot_ids),
sizeof(long), GFP_KERNEL);
However, when the bitmap is allocated, we allocate:
BITS_TO_LONGS(table->tot_ids) * table->tot_ids ~=
table->tot_ids / 32 * table->tot_ids ~=
table->tot_ids^2 / 32
It is proportional to the square of 'table->tot_ids' which seems to
potentially be big.
Allocate the expected amount of memory, and switch to the bitmap API to
have it more straightforward.
Christophe JAILLET [Fri, 25 Nov 2022 12:23:59 +0000 (13:23 +0100)]
octeontx2-af: Use the bitmap API to allocate bitmaps
Use devm_bitmap_zalloc() instead of hand-writing it.
This also makes the comment "Allocate bitmap for 32 entry mcam" more
explicit because now 32 is really used in the allocation function, instead
of an obscure 'sizeof(long)'.
Christophe JAILLET [Fri, 25 Nov 2022 12:23:57 +0000 (13:23 +0100)]
octeontx2-af: Fix a potentially spurious error message
When this error message is displayed, we know that the all the bits in the
bitmap are set.
So, bitmap_weight() will return the number of bits of the bitmap, which is
'table->tot_ids'.
It is unlikely that a bit will be cleared between mutex_unlock() and
dev_err(), but, in order to simplify the code and avoid this possibility,
just take 'table->tot_ids'.
Leon Romanovsky [Mon, 22 Aug 2022 10:38:55 +0000 (13:38 +0300)]
net/mlx5e: Support devlink reload of IPsec core
Change IPsec initialization flow to allow future creation of hardware
resources that should be released and allocated during devlink reload
operation. As part of that change, update function signature to be
void as no callers are actually interested in it.
Maor Dickman [Thu, 24 Nov 2022 12:21:03 +0000 (14:21 +0200)]
net/mlx5e: TC, Add offload support for trap with additional actions
TC trap action offload is currently supported only when trap is the sole action
in the flow.
This patch remove this limitation by changing trap action offload to not use
MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table
explicitly to be the slow table. This will allow offload of the additional
actions.
TC flow example:
tc filter add dev $REP2 protocol ip prio 2 root \
flower skip_sw dst_mac $mac0 \
action mirred egress redirect dev $REP3 \
action pedit ex munge eth dst set $mac2 pipe \
action trap
Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 21 Nov 2022 10:14:50 +0000 (12:14 +0200)]
net/mlx5e: Do early return when setup vports dests for slow path flow
Adding flow flag cases in setup vport dests before the slow path
case is incorrect as the slow path should take precedence.
Current code doesn't show this importance so make the slow path
case return early and separate from the other cases and remove
the redundant comparison of it in the sample case.
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Thu, 13 Oct 2022 12:48:43 +0000 (15:48 +0300)]
net/mlx5: Remove redundant check
If ASO failed in creation, it won't be called to destroy either.
The kernel coding pattern is to make sure that callers are calling
to destroy only for valid objects.
Roi Dayan [Wed, 16 Nov 2022 07:47:37 +0000 (09:47 +0200)]
net/mlx5e: Don't use termination table when redundant
Current code used termination table for each vport destination
while it's only required for hairpin, i.e. uplink to uplink, or
when vlan push on rx action being used.
Fix to skip using termination table for vport destinations that
do not require it.
Tariq Toukan [Mon, 31 Oct 2022 12:24:02 +0000 (14:24 +0200)]
net/mlx5: Use generic definition for UMR KLM alignment
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while
MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to
MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and
confusing.
Replace this KLM definition with one based on the generic definition.
Tariq Toukan [Mon, 31 Oct 2022 12:18:22 +0000 (14:18 +0200)]
net/mlx5e: Add padding when needed in UMR WQEs
Per the device spec, MTTs/KLMs list in a UMR WQE must be aligned to 64B.
Per our SW design, the MTT/KLMs list would need alignment only if it's
too small, for example on PPC when PAGE_SIZE is 64KB, and only 4 pages
are needed to cover a MPWQE of size 256KB.
Padding, if needed, is taken into account when calculating the UMR WQE
fields (ds_cnt and xlt_octowords), however no entries are provided,
instead garbage is passed.
No real harm though, as these parts act as gaps between the RX MPWQEs
and not used by any of them. Hence, in practice, device does not try to
write any incoming packet to them. Still, prefer providing clean padding
marking the end of the list, and do not map garbage into the RQ memory
region.
Gustavo A. R. Silva [Mon, 26 Sep 2022 21:50:42 +0000 (16:50 -0500)]
net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Jakub Kicinski [Wed, 30 Nov 2022 04:50:50 +0000 (20:50 -0800)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2022-11-26
1) Remove redundant variable in esp6.
From Colin Ian King.
2) Update x->lastused for every packet. It was used only for
outgoing mobile IPv6 packets, but showed to be usefull
to check if the a SA is still in use in general.
From Antony Antony.
3) Remove unused variable in xfrm_byidx_resize.
From Leon Romanovsky.
4) Finalize extack support for xfrm.
From Sabrina Dubroca.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: add extack to xfrm_set_spdinfo
xfrm: add extack to xfrm_alloc_userspi
xfrm: add extack to xfrm_do_migrate
xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len
xfrm: add extack to xfrm_del_sa
xfrm: add extack to xfrm_add_sa_expire
xfrm: a few coding style clean ups
xfrm: Remove not-used total variable
xfrm: update x->lastused for every packet
esp6: remove redundant variable err
====================
====================
net: pcs: altera-tse: simplify and clean-up the driver
This small series does a bit of code cleanup in the altera TSE pcs
driver, removing unused register definitions, handling 1000BaseX speed
configuration correctly according to the datasheet, and making use of
proper poll_timeout helpers.
No functional change is introduced.
====================
Maxime Chevallier [Fri, 25 Nov 2022 13:17:59 +0000 (14:17 +0100)]
net: pcs: altera-tse: use read_poll_timeout to wait for reset
Software resets on the TSE PCS don't clear registers, but rather reset
all internal state machines regarding AN, comma detection and
encoding/decoding. Use read_poll_timeout to wait for the reset to clear
instead of manually polling the register.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
mptcp: MSG_FASTOPEN and TFO listener side support
Before this series, only the initiator of a connection was able to combine
both TCP FastOpen and MPTCP when using TCP_FASTOPEN_CONNECT socket option.
These new patches here add (in theory) the full support of TFO with MPTCP,
which means:
- MSG_FASTOPEN sendmsg flag support (patch 1/8)
- TFO support for the listener side (patches 2-5/8)
- TCP_FASTOPEN socket option (patch 6/8)
- TCP_FASTOPEN_KEY socket option (patch 7/8)
To support TFO for the server side, a few preparation patches are needed
(patches 2 to 5/8). Some of them were inspired by a previous work from
Benjamin Hesmans.
Note that TFO support with MPTCP has been validated with selftests
(patch 8/8) but also with Packetdrill tests running with a modified
but still very WIP version supporting MPTCP. Both the modified tool
and the tests are available online:
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:54 +0000 (23:29 +0100)]
selftests: mptcp: mptfo Initiator/Listener
This patch first adds TFO support in mptcp_connect.c.
This can be enabled via a new option: -o MPTFO.
Once enabled, the TCP_FASTOPEN socket option is enabled for the server
side and a sendto() with MSG_FASTOPEN is used instead of a connect() for
the client side.
Note that the first SYN has a limit of bytes it can carry. In other
words, it is allowed to send less data than the provided one. We then
need to track more status info to properly allow the next sendmsg()
starting from the next part of the data to send the rest.
Also in TFO scenarios, we need to completely spool the partially xmitted
buffer -- and account for that -- before starting sendfile/mmap xmit,
otherwise the relevant tests will fail.
Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts [Fri, 25 Nov 2022 22:29:53 +0000 (23:29 +0100)]
mptcp: add support for TCP_FASTOPEN_KEY sockopt
The goal of this socket option is to set different keys per listener,
see commit 1fba70e5b6be ("tcp: socket option to set TCP fast open key")
for more details about this socket option.
The only thing to do here with MPTCP is to relay the request to the
first subflow like it is already done for the other TCP_FASTOPEN* socket
options.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:51 +0000 (23:29 +0100)]
mptcp: add subflow_v(4,6)_send_synack()
The send_synack() needs to be overridden for MPTCP to support TFO for
two reasons:
- There is not be enough space in the TCP options if the TFO cookie has
to be added in the SYN+ACK with other options: MSS (4), SACK OK (2),
Timestamps (10), Window Scale (3+1), TFO (10+2), MP_CAPABLE (12).
MPTCPv1 specs -- RFC 8684, section B.1 [1] -- suggest to drop the TCP
timestamps option in this case.
- The data received in the SYN has to be handled: the SKB can be
dequeued from the subflow sk and transferred to the MPTCP sk. Counters
need to be updated accordingly and the application can be notified at
the end because some bytes have been received.
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:50 +0000 (23:29 +0100)]
mptcp: implement delayed seq generation for passive fastopen
With fastopen in place, the first subflow socket is created before the
MPC handshake completes, and we need to properly initialize the sequence
numbers at MPC ACK reception.
Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Nov 2022 22:29:49 +0000 (23:29 +0100)]
mptcp: consolidate initial ack seq generation
Currently the initial ack sequence is generated on demand whenever
it's requested and the remote key is handy. The relevant code is
scattered in different places and can lead to multiple, unneeded,
crypto operations.
This change consolidates the ack sequence generation code in a single
helper, storing the sequence number at the subflow level.
The above additionally saves a few conditional in fast-path and will
simplify the upcoming fast-open implementation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Nov 2022 22:29:48 +0000 (23:29 +0100)]
mptcp: track accurately the incoming MPC suboption type
Currently in the receive path we don't need to discriminate
between MPC SYN, MPC SYN-ACK and MPC ACK, but soon the fastopen
code will need that info to properly track the fully established
status.
Track the exact MPC suboption type into the receive opt bitmap.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:47 +0000 (23:29 +0100)]
mptcp: add MSG_FASTOPEN sendmsg flag support
Since commit 54f1944ed6d2 ("mptcp: factor out mptcp_connect()"), all the
infrastructure is now in place to support the MSG_FASTOPEN flag, we
just need to call into the fastopen path in mptcp_sendmsg().
Co-developed-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 29 Nov 2022 17:52:10 +0000 (09:52 -0800)]
Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
Yuan Can [Tue, 29 Nov 2022 01:39:34 +0000 (01:39 +0000)]
udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write()
As the nla_nest_start() may fail with NULL returned, the return value
should be checked.
Note that this is not a real bug, nothing will break here.
The next nla_put() will fail as well and we'll bail (and
nla_nest_cancel() can handle NULL). But we keep getting
those "fixes" so whatever.
Yoshihiro Shimoda [Mon, 28 Nov 2022 06:56:04 +0000 (15:56 +0900)]
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
After system resumed on some environment board, the promiscuous mode
is disabled because the SoC turned off. So, call ravb_set_rx_mode() in
the ravb_resume() to fix the issue.
Shannon Nelson [Tue, 29 Nov 2022 01:17:34 +0000 (17:17 -0800)]
ionic: update MAINTAINERS entry
Now that Pensando is a part of AMD we need to update
a couple of addresses. We're keeping the mailing list
address for the moment, but that will likely change in
the near future.
Willem de Bruijn [Mon, 28 Nov 2022 16:18:12 +0000 (11:18 -0500)]
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
CHECKSUM_COMPLETE signals that skb->csum stores the sum over the
entire packet. It does not imply that an embedded l4 checksum
field has been validated.
Chris Mi [Tue, 29 Nov 2022 09:30:06 +0000 (01:30 -0800)]
net/mlx5: Lag, Fix for loop when checking lag
The cited commit adds a for loop to check if each port supports lag
or not. But dev is not initialized correctly. Fix it by initializing
dev for each iteration.
Fixes: e87c6a832f88 ("net/mlx5: E-switch, Fix duplicate lag creation") Signed-off-by: Chris Mi <cmi@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221129093006.378840-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cited commit removed the validity checks which initialized the
window_sz and never removed the use of the now uninitialized variable,
so now we are left with wrong value in the window size and the following
clang warning: [-Wuninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c:232:45:
warning: variable 'window_sz' is uninitialized when used here
MLX5_SET(macsec_aso, aso_ctx, window_size, window_sz);
Revet at this time to address the clang issue due to lack of time to
test the proper solution.
Fixes: c0071be0e16c ("net/mlx5e: MACsec, remove replay window size limitation in offload path") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221129093006.378840-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yu Xiao [Fri, 25 Nov 2022 11:30:30 +0000 (12:30 +0100)]
nfp: ethtool: support reporting link modes
Add support for reporting link modes,
including `Supported link modes` and `Advertised link modes`,
via ethtool $DEV.
A new command `SPCODE_READ_MEDIA` is added to read info from
management firmware. Also, the mapping table `nfp_eth_media_table`
associates the link modes between NFP and kernel. Both of them
help to support this ability.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20221125113030.141642-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Fri, 25 Nov 2022 09:50:08 +0000 (10:50 +0100)]
net: lan966x: add tc matchall goto action
Extend matchall with action goto. This is needed to enable the lookup in
the VCAP. It is needed to connect chain 0 to a chain that is recognized
by the HW.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>