]> www.infradead.org Git - nvme.git/log
nvme.git
11 months agonet: lan969x: add lan969x ops to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:28 +0000 (00:01 +0200)]
net: lan969x: add lan969x ops to match data

Add a bunch of small lan969x ops in bulk. These ops are explained in
detail in a previous series [1].

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-9-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add constants to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:27 +0000 (00:01 +0200)]
net: lan969x: add constants to match data

Add the lan969x constants to match data. These are already used
throughout the Sparx5 code (introduced in earlier series [1]), so no
need to update any code use.

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-8-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add register diffs to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:26 +0000 (00:01 +0200)]
net: lan969x: add register diffs to match data

Add new file lan969x_regs.c that defines all the register differences
for lan969x, and add it to the lan969x match data.

GW_DEV2G5_PHASE_DETECTOR_CTRL, FP_DEV2G5_PHAD_CTRL_PHAD_ENA and
FP_DEV2G5_PHAD_CTRL_PHAD_FAILED are required by the new register macros
which was introduced earlier. Add these for Sparx5 also.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-7-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add match data for lan969x
Daniel Machon [Wed, 23 Oct 2024 22:01:25 +0000 (00:01 +0200)]
net: lan969x: add match data for lan969x

Add match data for lan969x, with initial fields for iomap, iomap_size
and ioranges. Add new Kconfig symbol CONFIG_LAN969X_CONFIG for compiling
the lan969x driver.

It has been decided to give lan969x its own Kconfig symbol, as a
considerable amount of code is needed, beside the Sparx5 code, to add
full chip support (and more will be added in future series). Also this
makes it possible to compile Sparx5 without lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-6-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add registers required by lan969x
Daniel Machon [Wed, 23 Oct 2024 22:01:24 +0000 (00:01 +0200)]
net: sparx5: add registers required by lan969x

Lan969x will require a few additional registers for certain operations.
Some are shared, some are not. Add these.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-5-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add sparx5 context pointer to a few functions
Daniel Machon [Wed, 23 Oct 2024 22:01:23 +0000 (00:01 +0200)]
net: sparx5: add sparx5 context pointer to a few functions

In preparation for lan969x, add the sparx5 context pointer to certain
IFH (Internal Frame Header) functions. This is required, as the
is_sparx5() function will be used here in a subsequent patch.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-4-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: change frequency calculation for SDLB's
Daniel Machon [Wed, 23 Oct 2024 22:01:22 +0000 (00:01 +0200)]
net: sparx5: change frequency calculation for SDLB's

In preparation for lan969x, rework the function that calculates the SDLB
(Service Dual Leacky Bucket) clock. This is required, as the
HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock
from the core clock, using the sparx5_clk_period() function. The clock
stays the same before and after this patch, only now,
sparx5_sdlb_clk_hz_get() can be used for lan969x too.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-3-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: change spx5_wr to spx5_rmw in cal update()
Daniel Machon [Wed, 23 Oct 2024 22:01:21 +0000 (00:01 +0200)]
net: sparx5: change spx5_wr to spx5_rmw in cal update()

In preparation for lan969x, use spx5_rmw() for enabling the update of
the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG
register, as an additional write will be added before this one, in a
subsequent patch.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-2-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add support for lan969x targets and core clock
Daniel Machon [Wed, 23 Oct 2024 22:01:20 +0000 (00:01 +0200)]
net: sparx5: add support for lan969x targets and core clock

In preparation for lan969x, add lan969x targets to
sparx5_target_chiptype and set the core clock frequency for these
throughout. Lan969x only supports a core clock frequency of 328MHz.

Also, set the policer update internal (pol_upd_int) matching the 328 MHz
frequency of the lan969x targets.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-1-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agogve: change to use page_pool_put_full_page when recycling pages
Harshitha Ramamurthy [Wed, 23 Oct 2024 22:11:41 +0000 (15:11 -0700)]
gve: change to use page_pool_put_full_page when recycling pages

The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.

But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().

Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/
Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'refactoring-rvu-nic-driver'
Jakub Kicinski [Thu, 31 Oct 2024 00:50:39 +0000 (17:50 -0700)]
Merge branch 'refactoring-rvu-nic-driver'

Geetha sowjanya says:

====================
Refactoring RVU NIC driver

This is a preparation pathset for follow-up "Introducing RVU representors
driver" patches. The RVU representor driver creates representor netdev
of each rvu device when switch dev mode is enabled.

RVU representor and NIC have a similar set of HW resources(NIX_LF,RQ/SQ/CQ)
and implements a subset of NIC functionality.
This patch set groups hw resources and queue configuration code into single
API and export the existing functions so, that code can be shared between
NIC and representor drivers.

These patches are part of "Introduce RVU representors" patchset.
https://lore.kernel.org/all/ZsdJ-w00yCI4NQ8T@nanopsycho.orion/T/
As suggested by "Jiri Pirko", submitting as separate patchset.
====================

Link: https://patch.msgid.link/20241023161843.15543-1-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Move shared APIs to header file
Geetha sowjanya [Wed, 23 Oct 2024 16:18:43 +0000 (21:48 +0530)]
octeontx2-pf: Move shared APIs to header file

Move mbox, hw resources and interrupt configuration functions to common
header file. So, that they can be used later by the RVU representor driver.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-5-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Reuse PF max mtu value
Geetha sowjanya [Wed, 23 Oct 2024 16:18:42 +0000 (21:48 +0530)]
octeontx2-pf: Reuse PF max mtu value

Reuse the maximum support HW MTU value that is fetch during probe.
Instead of fetching through mbox each time mtu is changed as the
value is fixed for interface.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-4-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Add new APIs for queue memory alloc/free.
Geetha sowjanya [Wed, 23 Oct 2024 16:18:41 +0000 (21:48 +0530)]
octeontx2-pf: Add new APIs for queue memory alloc/free.

Group the queue(RX/TX/CQ) memory allocation and free code to single APIs.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-3-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Define common API for HW resources configuration
Geetha sowjanya [Wed, 23 Oct 2024 16:18:40 +0000 (21:48 +0530)]
octeontx2-pf: Define common API for HW resources configuration

Define new API "otx2_init_rsrc" and move the HW blocks
NIX/NPA resources configuration code under this API. So, that
it can be used by the RVU representor driver that has similar
resources of RVU NIC.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-2-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mirroring-to-dsa-cpu-port'
Jakub Kicinski [Thu, 31 Oct 2024 00:33:57 +0000 (17:33 -0700)]
Merge branch 'mirroring-to-dsa-cpu-port'

Vladimir Oltean says:

====================
Mirroring to DSA CPU port

Users of the NXP LS1028A SoC (drivers/net/dsa/ocelot L2 switch inside)
have requested to mirror packets from the ingress of a switch port to
software. Both port-based and flow-based mirroring is required.

The simplest way I could come up with was to set up tc mirred actions
towards a dummy net_device, and make the offloading of that be accepted
by the driver. Currently, the pattern in drivers is to reject mirred
towards ports they don't know about, but I'm now permitting that,
precisely by mirroring "to the CPU".

For testers, this series depends on commit 34d35b4edbbe ("net/sched:
act_api: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers") from net/main, which is absent from net-next as of the
day of posting (Oct 23). Without the bug fix it is possible to create
invalid configurations which are not rejected by the kernel.

v2:  https://lore.kernel.org/20241017165215.3709000-1-vladimir.oltean@nxp.com
RFC: https://lore.kernel.org/20240913152915.2981126-1-vladimir.oltean@nxp.com

For historical purposes, link to a much older (and much different) attempt:
https://lore.kernel.org/20191002233750.13566-1-olteanv@gmail.com
====================

Link: https://patch.msgid.link/20241023135251.1752488-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces
Vladimir Oltean [Wed, 23 Oct 2024 13:52:51 +0000 (16:52 +0300)]
net: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces

Debugging certain flows in the offloaded switch data path can be done by
installing two tc-mirred filters for mirroring: one in the hardware data
path, which copies the frames to the CPU, and one which takes the frame
from there and mirrors it to a virtual interface like a dummy device,
where it can be seen with tcpdump.

The effect of having 2 filters run on the same packet can be obtained by
default using tc, by not specifying either the 'skip_sw' or 'skip_hw'
keywords.

Instead of refusing to offload mirroring/redirecting packets towards
interfaces that aren't switch ports, just treat every other destination
for what it is: something that is handled in software, behind the CPU
port.

Usage:

$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: allow matchall mirroring rules towards the CPU
Vladimir Oltean [Wed, 23 Oct 2024 13:52:50 +0000 (16:52 +0300)]
net: dsa: allow matchall mirroring rules towards the CPU

If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.

This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].

The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).

There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.

Example:

$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0

Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.

[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:49 +0000 (16:52 +0300)]
net: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()

Do not leave -EOPNOTSUPP errors without an explanation. It is confusing
for the user to figure out what is wrong otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:48 +0000 (16:52 +0300)]
net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()

We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: clean up dsa_user_add_cls_matchall()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:47 +0000 (16:52 +0300)]
net: dsa: clean up dsa_user_add_cls_matchall()

The body is a bit hard to read, hard to extend, and has duplicated
conditions.

Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:

- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
  function call.

This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sched: propagate "skip_sw" flag to struct flow_cls_common_offload
Vladimir Oltean [Wed, 23 Oct 2024 13:52:46 +0000 (16:52 +0300)]
net: sched: propagate "skip_sw" flag to struct flow_cls_common_offload

Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.

In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.

It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.

In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.

[1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/
[2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'ptp-driver-for-s390-clocks'
Jakub Kicinski [Thu, 31 Oct 2024 00:02:43 +0000 (17:02 -0700)]
Merge branch 'ptp-driver-for-s390-clocks'

Sven Schnelle says:

====================
PtP driver for s390 clocks

these patches add support for using the s390 physical and TOD clock as ptp
clock. To do so, the first patch adds a clock id to the s390 TOD clock,
while the second patch adds the PtP driver itself.
====================

Link: https://patch.msgid.link/20241023065601.449586-1-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agos390/time: Add PtP driver
Sven Schnelle [Wed, 23 Oct 2024 06:56:01 +0000 (08:56 +0200)]
s390/time: Add PtP driver

Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agos390/time: Add clocksource id to TOD clock
Sven Schnelle [Wed, 23 Oct 2024 06:56:00 +0000 (08:56 +0200)]
s390/time: Add clocksource id to TOD clock

To allow specifying the clock source in the upcoming PtP driver,
add a clocksource ID to the s390 TOD clock.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotests: hsr: Increase timeout to 50 seconds
Yunshui Jiang [Mon, 28 Oct 2024 08:27:56 +0000 (16:27 +0800)]
tests: hsr: Increase timeout to 50 seconds

The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to
be exact, and even more on a debug kernel or kernel with other network
security limits. The timeout setting for the kselftest is currently 45
seconds, which is way too short to integrate hsr tests to run_kselftest
infrastructure. However, timeout of hundreds of seconds is quite a long
time, especially in a CI/CD environment. It seems that we need
accelerate the test and balance with timeout setting.

The most time-consuming func is do_ping_long, where ping command sends
10 packages to the given address. The default interval between two ping
packages is 1s according to the ping Mannual. There isn't any operation
between pings thus we could pass -i 0.1 to ping to make it 10 times
faster.

While even with this short interval, the test still need about 46.4
seconds to finish because of the two HSR interfaces, each of which is
tested by calling do_ping func 12 times and do_ping_long func 19 times
and sleep for 3s.

So, an explicit setting is also needed to slightly increase the
timeout. And to leave us some slack, use 50 as default timeout.

Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'tcp-warn-once'
David S. Miller [Wed, 30 Oct 2024 13:26:56 +0000 (13:26 +0000)]
Merge branch 'tcp-warn-once'

Jason Xing says:

====================
tcp: add tcp_warn_once() common helper

Paolo Abeni suggested we can introduce a new helper to cover more cases
in the future for better debug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agotcp: add more warn of socket in tcp_send_loss_probe()
Jason Xing [Wed, 23 Oct 2024 08:14:52 +0000 (16:14 +0800)]
tcp: add more warn of socket in tcp_send_loss_probe()

Add two fields to print in the helper which here covers tcp_send_loss_probe().

Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agotcp: add a common helper to debug the underlying issue
Jason Xing [Wed, 23 Oct 2024 08:14:51 +0000 (16:14 +0800)]
tcp: add a common helper to debug the underlying issue

Following the commit c8770db2d544 ("tcp: check skb is non-NULL
in tcp_rto_delta_us()"), we decided to add a helper so that it's
easier to get verbose warning on either cases.

Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoDocumentation: networking: Add missing PHY_GET command in the message list
Kory Maincent [Mon, 28 Oct 2024 13:23:51 +0000 (14:23 +0100)]
Documentation: networking: Add missing PHY_GET command in the message list

ETHTOOL_MSG_PHY_GET/GET_REPLY/NTF is missing in the ethtool message list.
Add it to the ethool netlink documentation.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241028132351.75922-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Wed, 30 Oct 2024 01:50:57 +0000 (18:50 -0700)]
Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.13

The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.

Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.

Major changes:

cfg80211/mac80211
 * stop exporting wext symbols
 * new mac80211 op to indicate that a new interface is to be added
 * support radio separation of multi-band devices

Wireless Extensions
 * move wext spy implementation to libiw
 * remove iw_public_data from struct net_device

brcmfmac
 * optional LPO clock support

ipw2x00
 * move remaining lib80211 code into libiw

wilc1000
 * WILC3000 support

rtw89
 * RTL8852BE and RTL8852BE-VT BT-coexistence improvements

* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
  mac80211: Remove NOP call to ieee80211_hw_config
  wifi: iwlwifi: work around -Wenum-compare-conditional warning
  wifi: mac80211: re-order assigning channel in activate links
  wifi: mac80211: convert debugfs files to short fops
  debugfs: add small file operations for most files
  wifi: mac80211: remove misleading j_0 construction parts
  wifi: mac80211_hwsim: use hrtimer_active()
  wifi: mac80211: refactor BW limitation check for CSA parsing
  wifi: mac80211: filter on monitor interfaces based on configured channel
  wifi: mac80211: refactor ieee80211_rx_monitor
  wifi: mac80211: add support for the monitor SKIP_TX flag
  wifi: cfg80211: add monitor SKIP_TX flag
  wifi: mac80211: add flag to opt out of virtual monitor support
  wifi: cfg80211: pass net_device to .set_monitor_channel
  wifi: mac80211: remove status->ampdu_delimiter_crc
  wifi: cfg80211: report per wiphy radio antenna mask
  wifi: mac80211: use vif radio mask to limit creating chanctx
  wifi: mac80211: use vif radio mask to limit ibss scan frequencies
  wifi: cfg80211: add option for vif allowed radios
  wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
  ...

====================

Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'devlink-minor-cleanup'
Jakub Kicinski [Tue, 29 Oct 2024 23:52:59 +0000 (16:52 -0700)]
Merge branch 'devlink-minor-cleanup'

Przemek Kitszel says:

====================
devlink: minor cleanup

(Patch 1, 2) Add one helper shortcut to put u64 values into skb.
(Patch 3, 4) Minor cleanup for error codes.
(Patch 5, 6, 7) Remove some devlink_resource_*() usage and functions
itself via replacing devlink_* variants by devl_* ones.

v2: fix metadata (cc list, target tree) - Jiri; rebase; tags collected

v1: https://lore.kernel.org/20241018102009.10124-1-przemyslaw.kitszel@intel.com
====================

Link: https://patch.msgid.link/20241023131248.27192-1-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: remove unused devlink_resource_register()
Przemek Kitszel [Wed, 23 Oct 2024 13:09:07 +0000 (15:09 +0200)]
devlink: remove unused devlink_resource_register()

Remove unused devlink_resource_register(); all the drivers use
devl_resource_register() variant instead.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-8-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: remove unused devlink_resource_occ_get_register() and _unregister()
Przemek Kitszel [Wed, 23 Oct 2024 13:09:06 +0000 (15:09 +0200)]
devlink: remove unused devlink_resource_occ_get_register() and _unregister()

Remove not used devlink_resource_occ_get_register() and
devlink_resource_occ_get_unregister() functions; current devlink resource
users are fine with devl_ variants of the two.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-7-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: replace devlink resource registration calls by devl_ variants
Przemek Kitszel [Wed, 23 Oct 2024 13:09:05 +0000 (15:09 +0200)]
net: dsa: replace devlink resource registration calls by devl_ variants

Replace devlink_resource_register(), devlink_resource_occ_get_register(),
and devlink_resource_occ_get_unregister() calls by respective devl_*
variants. Mentioned functions have no direct users in any drivers, and are
going to be removed in subsequent patches.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-6-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: region: snapshot IDs: consolidate error values
Przemek Kitszel [Wed, 23 Oct 2024 13:09:04 +0000 (15:09 +0200)]
devlink: region: snapshot IDs: consolidate error values

Consolidate error codes for too big message size.

Current code is written to return -EINVAL when tailroom in the skb msg
would be exhausted precisely when it's time to nest, and return -EMSGSIZE
in all other "not enough space" conditions.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-5-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: devl_resource_register(): differentiate error codes
Przemek Kitszel [Wed, 23 Oct 2024 13:09:03 +0000 (15:09 +0200)]
devlink: devl_resource_register(): differentiate error codes

Differentiate error codes of devl_resource_register().

Replace one of -EINVAL exit paths by -EEXIST. This should aid developers
introducing new resources and registering them in the wrong order.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-4-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: use devlink_nl_put_u64() helper
Przemek Kitszel [Wed, 23 Oct 2024 13:09:02 +0000 (15:09 +0200)]
devlink: use devlink_nl_put_u64() helper

Use devlink_nl_put_u64() shortcut added by prev commit on all devlink/.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-3-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodevlink: introduce devlink_nl_put_u64()
Przemek Kitszel [Wed, 23 Oct 2024 13:09:01 +0000 (15:09 +0200)]
devlink: introduce devlink_nl_put_u64()

Add devlink_nl_put_u64() that abstracts padding for u64 values.
All u64 values are passed with the very same padding option.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-2-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agor8169: fix inconsistent indenting in rtl8169_get_eth_mac_stats
Heiner Kallweit [Thu, 24 Oct 2024 20:48:59 +0000 (22:48 +0200)]
r8169: fix inconsistent indenting in rtl8169_get_eth_mac_stats

This fixes an inconsistent indenting introduced with e3fc5139bd8f
("r8169: implement additional ethtool stats ops").

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410220413.1gAxIJ4t-lkp@intel.com/
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20fd6f39-3c1b-4af0-9adc-7d1f49728fad@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agosocket: Print pf->create() when it does not clear sock->sk on failure.
Kuniyuki Iwashima [Thu, 24 Oct 2024 20:14:58 +0000 (13:14 -0700)]
socket: Print pf->create() when it does not clear sock->sk on failure.

I suggested to put DEBUG_NET_WARN_ON_ONCE() in __sock_create() to
catch possible use-after-free.

But the warning itself was not useful because our interest is in
the callee than the caller.

Let's define DEBUG_NET_WARN_ONCE() and print the name of pf->create()
and the socket identifier.

While at it, we enclose DEBUG_NET_WARN_ON_ONCE() in parentheses too
to avoid a checkpatch error.

Note that %pf or %pF were obsoleted and will be removed later as per
comment in lib/vsprintf.c.

Link: https://lore.kernel.org/netdev/202410231427.633734b3-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241024201458.49412-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agor8169: add support for RTL8125D
Heiner Kallweit [Thu, 24 Oct 2024 20:42:33 +0000 (22:42 +0200)]
r8169: add support for RTL8125D

This adds support for new chip version RTL8125D, which can be found on
boards like Gigabyte X870E AORUS ELITE WIFI7. Firmware rtl8125d-1.fw
for this chip version is available in linux-firmware already.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d0306912-e88e-4c25-8b5d-545ae8834c0c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: qlogic: use ethtool string helpers
Rosen Penev [Thu, 24 Oct 2024 19:55:34 +0000 (12:55 -0700)]
net: qlogic: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer. Cleans up the code quite well.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024195534.176410-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: marvell: use ethtool string helpers
Rosen Penev [Thu, 24 Oct 2024 19:58:33 +0000 (12:58 -0700)]
net: marvell: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer. Cleans up the code quite well.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024195833.176843-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlx5: simplify EQ interrupt polling logic
Caleb Sander Mateos [Wed, 23 Oct 2024 20:51:12 +0000 (14:51 -0600)]
mlx5: simplify EQ interrupt polling logic

Use a while loop in mlx5_eq_comp_int() and mlx5_eq_async_int() to
clarify the EQE polling logic. This consolidates the next_eqe_sw() calls
for the first and subequent iterations. It also avoids a goto. Turn the
num_eqes < MLX5_EQ_POLLING_BUDGET check into a break condition.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023205113.255866-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlx5: fix typo in "mlx5_cqwq_get_cqe_enahnced_comp"
Caleb Sander Mateos [Wed, 23 Oct 2024 16:48:38 +0000 (10:48 -0600)]
mlx5: fix typo in "mlx5_cqwq_get_cqe_enahnced_comp"

"enahnced" looks to be a misspelling of "enhanced".
Rename "mlx5_cqwq_get_cqe_enahnced_comp" to
"mlx5_cqwq_get_cqe_enhanced_comp".

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241023164840.140535-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoamd-xgbe: use ethtool string helpers
Rosen Penev [Tue, 22 Oct 2024 23:32:03 +0000 (16:32 -0700)]
amd-xgbe: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022233203.9670-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: mana: use ethtool string helpers
Rosen Penev [Tue, 22 Oct 2024 20:49:08 +0000 (13:49 -0700)]
net: mana: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the data pointer.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Link: https://patch.msgid.link/20241022204908.511021-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoibmvnic: use ethtool string helpers
Rosen Penev [Tue, 22 Oct 2024 20:32:40 +0000 (13:32 -0700)]
ibmvnic: use ethtool string helpers

They are the preferred way to copy ethtool strings.

Avoids manually incrementing the data pointer.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Tested-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022203240.391648-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: ftgmac100: refactor getting phy device handle
Jacky Chou [Tue, 22 Oct 2024 08:42:14 +0000 (16:42 +0800)]
net: ftgmac100: refactor getting phy device handle

Consolidate the handling of dedicated PHY and fixed-link phy by taking
advantage of logic in of_phy_get_and_connect() which handles both of
these cases, rather than open coding the same logic in ftgmac100_probe().

Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241022084214.1261174-1-jacky_chou@aspeedtech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'net-phylink-simplify-sfp-phy-attachment'
Jakub Kicinski [Tue, 29 Oct 2024 18:57:35 +0000 (11:57 -0700)]
Merge branch 'net-phylink-simplify-sfp-phy-attachment'

Russell King says:

====================
net: phylink: simplify SFP PHY attachment

These two patches simplify how we attach SFP PHYs.

The first patch notices that at the two sites where we call
sfp_select_interface(), if that fails, we always print the same error.
Move this into its own function.

The second patch adds an additional level of validation, checking that
the returned interface is one that is supported by the MAC/PCS.

The last patch simplifies how SFP PHYs are attached, reducing the
number of times that we do validation in this path.
====================

Link: https://patch.msgid.link/Zxj8_clRmDA_G7uH@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: simplify how SFP PHYs are attached
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:57 +0000 (14:41 +0100)]
net: phylink: simplify how SFP PHYs are attached

There are a few issues with how SFP PHYs are attached:

a) The phylink_sfp_connect_phy() and phylink_sfp_config_phy() code
   validates the configuration three times:

1. To discover the support/advertising masks that the PHY/PCS/MAC
   can support in order to select an interface.
2. To validate the selected interface.
3. When the PHY is brought up after being attached, another validation
   is done.

   This is needlessly complex.

b) The configuration is set prior to the PHY being attached, which
   means we don't have the PHY available in phylink_major_config()
   for phylink_pcs_neg_mode() to make decisions upon.

We have already added an extra step to validate the selected interface,
so we can now move the attachment and bringup of the PHY earlier,
inside phylink_sfp_config_phy(). This results in the validation at
step 2 above becoming entirely unnecessary, so remove that too.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t3bcb-000c8H-3e@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: validate sfp_select_interface() returned interface
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:51 +0000 (14:41 +0100)]
net: phylink: validate sfp_select_interface() returned interface

Validate that the returned interface from sfp_select_interface() is
supportable by the MAC/PCS. If it isn't, print an error and return
the NA interface type. This is a preparatory step to reorganising
how a PHY on a SFP module is handled.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t3bcV-000c8B-Vz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: add common validation for sfp_select_interface()
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:46 +0000 (14:41 +0100)]
net: phylink: add common validation for sfp_select_interface()

Whenever we call sfp_select_interface(), we check the returned value
and print an error. There are two cases where this happens with the
same message. Provide a common function to do this.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t3bcQ-000c85-S4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: simplify phylink_parse_fixedlink()
Russell King (Oracle) [Tue, 22 Oct 2024 14:17:07 +0000 (15:17 +0100)]
net: phylink: simplify phylink_parse_fixedlink()

phylink_parse_fixedlink() wants to preserve the pause, asym_pause and
autoneg bits in pl->supported. Rather than reading the bits into
separate bools, zeroing pl->supported, and then setting them if they
were previously set, use a mask and linkmode_and() to achieve the same
result.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1t3Fh5-000aQi-Nk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mlx5e-update-features-on-config-changes'
Jakub Kicinski [Tue, 29 Oct 2024 18:48:29 +0000 (11:48 -0700)]
Merge branch 'mlx5e-update-features-on-config-changes'

Tariq Toukan says:

====================
mlx5e update features on config changes

This small patchset by Dragos adds a call to netdev_update_features()
in configuration changes that could impact the features status.
====================

Link: https://patch.msgid.link/20241024164134.299646-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Update features on ring size change
Dragos Tatulea [Thu, 24 Oct 2024 16:41:33 +0000 (19:41 +0300)]
net/mlx5e: Update features on ring size change

When the ring size changes successfully, trigger
netdev_update_features() to enable features in wanted state if
applicable.

An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ethtool --set-ring eth1 rx 1024

With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024164134.299646-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Update features on MTU change
Dragos Tatulea [Thu, 24 Oct 2024 16:41:32 +0000 (19:41 +0300)]
net/mlx5e: Update features on MTU change

When the MTU changes successfully, trigger netdev_update_features() to
enable features in wanted state if applicable.

An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ip link set dev eth1 mtu 7000

With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024164134.299646-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agowwan: core: Pass string literal as format argument of dev_set_name()
Simon Horman [Wed, 23 Oct 2024 12:15:28 +0000 (13:15 +0100)]
wwan: core: Pass string literal as format argument of dev_set_name()

Both gcc-14 and clang-18 report that passing a non-string literal as the
format argument of dev_set_name() is potentially insecure.

E.g. clang-18 says:

drivers/net/wwan/wwan_core.c:442:34: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  442 |         return dev_set_name(&port->dev, buf);
      |                                         ^~~
drivers/net/wwan/wwan_core.c:442:34: note: treat the string as an argument to avoid this
  442 |         return dev_set_name(&port->dev, buf);
      |                                         ^
      |                                         "%s",

It is always the case where the contents of mod is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.

But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20241023-wwan-fmt-v1-1-521b39968639@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoselftests: tc-testing: Fix typo error
Karan Sanghavi [Tue, 22 Oct 2024 18:30:52 +0000 (18:30 +0000)]
selftests: tc-testing: Fix typo error

Correct the typo errors in json files

- "diffferent" is corrected to "different".
- "muliple" and "miltiple" is corrected to "multiple".

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Karan Sanghavi <karansanghvi98@gmail.com>
Link: https://patch.msgid.link/20241022-multiple_spell_error-v2-1-7e5036506fe5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agortnetlink: Fix kdoc of rtnl_af_register().
Kuniyuki Iwashima [Tue, 22 Oct 2024 21:03:20 +0000 (14:03 -0700)]
rtnetlink: Fix kdoc of rtnl_af_register().

Commit 26eebdc4b005 ("rtnetlink: Return int from rtnl_af_register().")
made rtnl_af_register() return int again, and kdoc needs to be fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022210320.86111-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'ipv4-prepare-core-ipv4-files-to-future-flowi4_tos-conversion'
Jakub Kicinski [Tue, 29 Oct 2024 18:21:25 +0000 (11:21 -0700)]
Merge branch 'ipv4-prepare-core-ipv4-files-to-future-flowi4_tos-conversion'

Guillaume Nault says:

====================
ipv4: Prepare core ipv4 files to future .flowi4_tos conversion.

Continue preparing users of ->flowi4_tos (struct flowi4) to the future
conversion of this field (from __u8 to dscp_t). The objective is to
have type annotation to properly separate DSCP bits from ECN ones. This
way we'll ensure that ECN doesn't interfere with DSCP and avoid
regressions where it break routing descisions (fib rules in particular).

This series concentrates on some easy IPv4 conversions where
->flowi4_tos is set directly from an IPv4 header, so we can get the
DSCP value using the ip4h_dscp() helper function.
====================

Link: https://patch.msgid.link/cover.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoipv4: Prepare ip_rt_get_source() to future .flowi4_tos conversion.
Guillaume Nault [Tue, 22 Oct 2024 09:48:23 +0000 (11:48 +0200)]
ipv4: Prepare ip_rt_get_source() to future .flowi4_tos conversion.

Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0a13a200f31809841975e38633914af1061e0c04.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoipv4: Prepare ipmr_rt_fib_lookup() to future .flowi4_tos conversion.
Guillaume Nault [Tue, 22 Oct 2024 09:48:15 +0000 (11:48 +0200)]
ipv4: Prepare ipmr_rt_fib_lookup() to future .flowi4_tos conversion.

Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/462402a097260357a7aba80228612305f230b6a9.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoipv4: Prepare icmp_reply() to future .flowi4_tos conversion.
Guillaume Nault [Tue, 22 Oct 2024 09:48:08 +0000 (11:48 +0200)]
ipv4: Prepare icmp_reply() to future .flowi4_tos conversion.

Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/61b7563563f8b0a562b5b62032fe5260034d0aac.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoipv4: Prepare fib_compute_spec_dst() to future .flowi4_tos conversion.
Guillaume Nault [Tue, 22 Oct 2024 09:48:00 +0000 (11:48 +0200)]
ipv4: Prepare fib_compute_spec_dst() to future .flowi4_tos conversion.

Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a0eba69cce94f747e4c7516184a85ffd0abbe3f0.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'ibm-emac-more-cleanups'
Paolo Abeni [Tue, 29 Oct 2024 14:33:24 +0000 (15:33 +0100)]
Merge branch 'ibm-emac-more-cleanups'

Rosen Penev says:

====================
ibm: emac: more cleanups

Tested on Cisco MX60W.

v2: fixed build errors. Also added extra commits to clean the driver up
further.
v3: Added tested message. Removed bad alloc_netdev_dummy commit.
v4: removed modules changes from patchset. Added fix for if MAC not
found.
v5: added of_find_matching_node commit.
v6: resend after net-next merge.
v7: removed of_find_matching_node commit. Adjusted mutex_init patch.
v8: removed patch removing custom init/exit. Needs more work.
====================

Link: https://patch.msgid.link/20241022002245.843242-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ibm: emac: generate random MAC if not found
Rosen Penev [Tue, 22 Oct 2024 00:22:45 +0000 (17:22 -0700)]
net: ibm: emac: generate random MAC if not found

On this Cisco MX60W, u-boot sets the local-mac-address property.
Unfortunately by default, the MAC is wrong and is actually located on a
UBI partition. Which means nvmem needs to be used to grab it.

In the case where that fails, EMAC fails to initialize instead of
generating a random MAC as many other drivers do.

Match behavior with other drivers to have a working ethernet interface.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ibm: emac: use devm for mutex_init
Rosen Penev [Tue, 22 Oct 2024 00:22:44 +0000 (17:22 -0700)]
net: ibm: emac: use devm for mutex_init

It seems since inception that mutex_destroy was never called for these
in _remove. Instead of handling this manually, just use devm for
simplicity.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ibm: emac: use platform_get_irq
Rosen Penev [Tue, 22 Oct 2024 00:22:43 +0000 (17:22 -0700)]
net: ibm: emac: use platform_get_irq

No need for irq_of_parse_and_map since we have platform_device.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ibm: emac: use devm_platform_ioremap_resource
Rosen Penev [Tue, 22 Oct 2024 00:22:42 +0000 (17:22 -0700)]
net: ibm: emac: use devm_platform_ioremap_resource

No need to have a struct resource. Gets rid of the TODO.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ibm: emac: use netif_receive_skb_list
Rosen Penev [Tue, 22 Oct 2024 00:22:41 +0000 (17:22 -0700)]
net: ibm: emac: use netif_receive_skb_list

Small rx improvement. Would use napi_gro_receive instead but that's a
lot more involved than netif_receive_skb_list because of how the
function is implemented.

Before:

> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 51556 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   559 MBytes   467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 48228 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.03 sec   558 MBytes   467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 47600 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   557 MBytes   466 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 37252 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   559 MBytes   467 Mbits/sec

After:

> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 40786 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   572 MBytes   478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 52482 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   571 MBytes   477 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 48370 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   572 MBytes   478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 46086 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   571 MBytes   476 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 46062 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   572 MBytes   478 Mbits/sec

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'ipv4-convert-rtm_-new-del-addr-and-more-to-per-netns-rtnl'
Paolo Abeni [Tue, 29 Oct 2024 10:55:28 +0000 (11:55 +0100)]
Merge branch 'ipv4-convert-rtm_-new-del-addr-and-more-to-per-netns-rtnl'

Kuniyuki Iwashima says:

====================
ipv4: Convert RTM_{NEW,DEL}ADDR and more to per-netns RTNL.

The IPv4 address hash table and GC are already namespacified.

This series converts RTM_NEWADDR/RTM_DELADDR and some more
RTNL users to per-netns RTNL.

Changes:
  v2:
    * Add patch 1 to address sparse warning for CONFIG_DEBUG_NET_SMALL_RTNL=n
    * Add Eric's tags to patch 2-12

  v1: https://lore.kernel.org/netdev/20241018012225.90409-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241021183239.79741-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert devinet_ioctl to per-netns RTNL.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:39 +0000 (11:32 -0700)]
ipv4: Convert devinet_ioctl to per-netns RTNL.

ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns.

Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:38 +0000 (11:32 -0700)]
ipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.

Basically, devinet_ioctl() operates on a single netns.

However, ioctl(SIOCSIFFLAGS) will trigger the netdev notifier
that could touch another netdev in different netns.

Let's use per-netns RTNL helper in devinet_ioctl() and place
ASSERT_RTNL() for SIOCSIFFLAGS.

We will remove ASSERT_RTNL() once RTM_SETLINK and RTM_DELLINK
are converted.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert devinet_sysctl_forward() to per-netns RTNL.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:37 +0000 (11:32 -0700)]
ipv4: Convert devinet_sysctl_forward() to per-netns RTNL.

devinet_sysctl_forward() touches only a single netns.

Let's use rtnl_trylock() and __in_dev_get_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agortnetlink: Define rtnl_net_trylock().
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:36 +0000 (11:32 -0700)]
rtnetlink: Define rtnl_net_trylock().

We will need the per-netns version of rtnl_trylock().

rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock()
successfully holds RTNL.

When RTNL is removed, we will use mutex_trylock() for per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert check_lifetime() to per-netns RTNL.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:35 +0000 (11:32 -0700)]
ipv4: Convert check_lifetime() to per-netns RTNL.

Since commit 1675f385213e ("ipv4: Namespacify IPv4 address GC."),
check_lifetime() works on a per-netns basis.

Let's use rtnl_net_lock() and rtnl_net_dereference().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert RTM_DELADDR to per-netns RTNL.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:34 +0000 (11:32 -0700)]
ipv4: Convert RTM_DELADDR to per-netns RTNL.

Let's push down RTNL into inet_rtm_deladdr() as rtnl_net_lock().

Now, ip_mc_autojoin_config() is always called under per-netns RTNL,
so ASSERT_RTNL() can be replaced with ASSERT_RTNL_NET().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:33 +0000 (11:32 -0700)]
ipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().

inet_rtm_to_ifa() and find_matching_ifa() are called
under rtnl_net_lock().

__in_dev_get_rtnl() and in_dev_for_each_ifa_rtnl() there
can use per-netns RTNL helpers.

Let's define and use __in_dev_get_rtnl_net() and
in_dev_for_each_ifa_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Convert RTM_NEWADDR to per-netns RTNL.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:32 +0000 (11:32 -0700)]
ipv4: Convert RTM_NEWADDR to per-netns RTNL.

The address hash table and GC are already namespacified.

Let's push down RTNL into inet_rtm_newaddr() as rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:31 +0000 (11:32 -0700)]
ipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().

When we pass 0.0.0.0 to __inet_insert_ifa(), it frees ifa and returns 0.

We can do this check much earlier for RTM_NEWADDR even before allocating
struct in_ifaddr.

Let's move the validation to

  1. inet_insert_ifa() for ioctl()
  2. inet_rtm_newaddr() for RTM_NEWADDR

Now, we can remove the same check in find_matching_ifa().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:30 +0000 (11:32 -0700)]
ipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().

rtm_to_ifaddr() validates some attributes, looks up a netdev,
allocates struct in_ifaddr, and validates IFA_CACHEINFO.

There is no reason to delay IFA_CACHEINFO validation.

We will push RTNL down to inet_rtm_newaddr(), and then we want
to complete rtnetlink validation before rtnl_net_lock().

Let's factorise the validation parts.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agortnetlink: Define RTNL_FLAG_DOIT_PERNET for per-netns RTNL doit().
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:29 +0000 (11:32 -0700)]
rtnetlink: Define RTNL_FLAG_DOIT_PERNET for per-netns RTNL doit().

We will push RTNL down to each doit() as rtnl_net_lock().

We can use RTNL_FLAG_DOIT_UNLOCKED to call doit() without RTNL, but doit()
will still hold RTNL.

Let's define RTNL_FLAG_DOIT_PERNET as an alias of RTNL_FLAG_DOIT_UNLOCKED.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agortnetlink: Make per-netns RTNL dereference helpers to macro.
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:28 +0000 (11:32 -0700)]
rtnetlink: Make per-netns RTNL dereference helpers to macro.

When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the
static inline wrapper of rtnl_dereference() returning a plain (void *)
pointer to make sure net is always evaluated as requested in [0].

But, it makes sparse complain [1] when the pointer has __rcu annotation:

  net/ipv4/devinet.c:674:47: sparse: warning: incorrect type in argument 2 (different address spaces)
  net/ipv4/devinet.c:674:47: sparse:    expected void *p
  net/ipv4/devinet.c:674:47: sparse:    got struct in_ifaddr [noderef] __rcu *

Also, if we evaluate net as (void *) in a macro, then the compiler
in turn fails to build due to -Werror=unused-value.

  #define rtnl_net_dereference(net, p)                  \
        ({                                              \
                (void *)net;                            \
                rtnl_dereference(p);                    \
        })

  net/ipv4/devinet.c: In function ‘inet_rtm_deladdr’:
  ./include/linux/rtnetlink.h:154:17: error: statement with no effect [-Werror=unused-value]
    154 |                 (void *)net;                            \
  net/ipv4/devinet.c:674:21: note: in expansion of macro ‘rtnl_net_dereference’
    674 |              (ifa = rtnl_net_dereference(net, *ifap)) != NULL;
        |                     ^~~~~~~~~~~~~~~~~~~~

Let's go back to the original simplest macro.

Note that checkpatch complains about this approach, but it's one-shot and
less noisy than the other two.

  WARNING: Argument 'net' is not used in function-like macro
  #76: FILE: include/linux/rtnetlink.h:142:
  +#define rtnl_net_dereference(net, p) \
  + rtnl_dereference(p)

Fixes: 844e5e7e656d ("rtnetlink: Add assertion helpers for per-netns RTNL.")
Link: https://lore.kernel.org/netdev/20241004132145.7fd208e9@kernel.org/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410200325.SaEJmyZS-lkp@intel.com/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoneighbour: use kvzalloc()/kvfree()
Eric Dumazet [Tue, 22 Oct 2024 15:00:59 +0000 (15:00 +0000)]
neighbour: use kvzalloc()/kvfree()

mm layer is providing convenient functions, we do not have
to work around old limitations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gilad Naaman <gnaaman@drivenets.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241022150059.1345406-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonetlink: specs: Add missing phy-ntf command to ethtool spec
Kory Maincent [Tue, 22 Oct 2024 15:14:18 +0000 (17:14 +0200)]
netlink: specs: Add missing phy-ntf command to ethtool spec

ETHTOOL_MSG_PHY_NTF description is missing in the ethtool netlink spec.
Add it to the spec.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241022151418.875424-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agovsock: do not leave dangling sk pointer in vsock_create()
Eric Dumazet [Tue, 22 Oct 2024 13:48:19 +0000 (13:48 +0000)]
vsock: do not leave dangling sk pointer in vsock_create()

syzbot was able to trigger the following warning after recent
core network cleanup.

On error vsock_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object.

We must clear sock->sk to avoid possible use-after-free later.

WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581
Modules linked in:
CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581
Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff
RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293
RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00
RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000
RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a
R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed
R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000
FS:  00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  sock_create net/socket.c:1632 [inline]
  __sys_socket_create net/socket.c:1669 [inline]
  __sys_socket+0x150/0x3c0 net/socket.c:1716
  __do_sys_socket net/socket.c:1730 [inline]
  __se_sys_socket net/socket.c:1728 [inline]
  __x64_sys_socket+0x7a/0x90 net/socket.c:1728
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f22b117dff9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028
RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd

Fixes: 48156296a08c ("net: warn, if pf->create does not clear sock->sk on error")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5: unique names for per device caches
Sebastian Ott [Wed, 23 Oct 2024 13:41:46 +0000 (15:41 +0200)]
net/mlx5: unique names for per device caches

Add the device name to the per device kmem_cache names to
ensure their uniqueness. This fixes warnings like this:
"kmem_cache of name 'mlx5_fs_fgs' already exists".

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241023134146.28448-1-sebott@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'bonding-returns-detailed-error-about-xdp-failures'
Jakub Kicinski [Mon, 28 Oct 2024 23:09:43 +0000 (16:09 -0700)]
Merge branch 'bonding-returns-detailed-error-about-xdp-failures'

Hangbin Liu says:

====================
Bonding: returns detailed error about XDP failures

Based on discussion[1], this patch set returns detailed error about XDP
failures. And update bonding document about XDP supports.

https://lore.kernel.org/8088f2a7-3ab1-4a1e-996d-c15703da13cc@blackwall.org
====================

Link: https://patch.msgid.link/20241021031211.814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoDocumentation: bonding: add XDP support explanation
Hangbin Liu [Mon, 21 Oct 2024 03:12:11 +0000 (03:12 +0000)]
Documentation: bonding: add XDP support explanation

Add document about which modes have native XDP support.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agobonding: return detailed error when loading native XDP fails
Hangbin Liu [Mon, 21 Oct 2024 03:12:10 +0000 (03:12 +0000)]
bonding: return detailed error when loading native XDP fails

Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mptcp-various-small-improvements'
Jakub Kicinski [Mon, 28 Oct 2024 22:55:48 +0000 (15:55 -0700)]
Merge branch 'mptcp-various-small-improvements'

Matthieu Baerts says:

====================
mptcp: various small improvements

The following patches are not related to each other.

- Patch 1: Avoid sending advertisements on stale subflows, reducing
  risks on loosing them.

- Patch 2: Annotate data-races around subflow->fully_established, using
  READ/WRITE_ONCE().

- Patch 3: A small clean-up on the PM side, avoiding a bit of duplicated
  code.

- Patch 4: Use "Middlebox interference" MP_TCPRST code in reaction to a
  packet received without MPTCP options in the middle of a connection.
====================

Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-0-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomptcp: use "middlebox interference" RST when no DSS
Davide Caratti [Mon, 21 Oct 2024 15:14:06 +0000 (17:14 +0200)]
mptcp: use "middlebox interference" RST when no DSS

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of
fully established subflow that carries data at TCP level with no DSS
sub-option.

This is generally the case when mpext is NULL or mpext->use_map is 0:
use a dedicated value of 'mapping_status' and use it before closing the
socket in subflow_check_data_avail().

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/518
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-4-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomptcp: implement mptcp_pm_connection_closed
Geliang Tang [Mon, 21 Oct 2024 15:14:05 +0000 (17:14 +0200)]
mptcp: implement mptcp_pm_connection_closed

The MPTCP path manager event handler mptcp_pm_connection_closed
interface has been added in the commit 1b1c7a0ef7f3 ("mptcp: Add path
manager interface") but it was an empty function from then on.

With such name, it sounds good to invoke mptcp_event with the
MPTCP_EVENT_CLOSED event type from it. It also removes a bit of
duplicated code.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-3-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomptcp: annotate data-races around subflow->fully_established
Gang Yan [Mon, 21 Oct 2024 15:14:04 +0000 (17:14 +0200)]
mptcp: annotate data-races around subflow->fully_established

We introduce the same handling for potential data races with the
'fully_established' flag in subflow as previously done for
msk->fully_established.

Additionally, we make a crucial change: convert the subflow's
'fully_established' from 'bit_field' to 'bool' type. This is
necessary because methods for avoiding data races don't work well
with 'bit_field'. Specifically, the 'READ_ONCE' needs to know
the size of the variable being accessed, which is not supported in
'bit_field'. Also, 'test_bit' expect the address of 'bit_field'.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/516
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-2-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomptcp: pm: send ACK on non-stale subflows
Matthieu Baerts (NGI0) [Mon, 21 Oct 2024 15:14:03 +0000 (17:14 +0200)]
mptcp: pm: send ACK on non-stale subflows

If the subflow is considered as "staled", it is better to avoid it to
send an ACK carrying an ADD_ADDR or RM_ADDR. Another subflow, if any,
will then be selected.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-1-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'net-systemport-minor-io-macros-changes'
Jakub Kicinski [Mon, 28 Oct 2024 22:54:42 +0000 (15:54 -0700)]
Merge branch 'net-systemport-minor-io-macros-changes'

Florian Fainelli says:

====================
net: systemport: Minor IO macros changes

This patch series addresses the warning initially reported by Vladimir
here:

https://lore.kernel.org/all/20241014150139.927423-1-vladimir.oltean@nxp.com/

and follows on with proceeding with his suggestion the IO macros to the
header file.
====================

Link: https://patch.msgid.link/20241021174935.57658-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: systemport: Move IO macros to header file
Florian Fainelli [Mon, 21 Oct 2024 17:49:35 +0000 (10:49 -0700)]
net: systemport: Move IO macros to header file

Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h
where it is more appropriate and where static inline helpers are
acceptable. While at it, make sure that the macro 'offset' argument does
not trigger a checkpatch warning due to possible argument re-use.

Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: systemport: Remove unused txchk accessors
Florian Fainelli [Mon, 21 Oct 2024 17:49:34 +0000 (10:49 -0700)]
net: systemport: Remove unused txchk accessors

Vladimir reported the following warning with clang-16 and W=1:

warning: unused function 'txchk_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'txchk_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'tbuf_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'tbuf_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

The TXCHK and RBUF blocks are not being accessed, remove the IO macros
used to access those blocks. No functional impact.

Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>