Daniel Machon [Wed, 23 Oct 2024 22:01:23 +0000 (00:01 +0200)]
net: sparx5: add sparx5 context pointer to a few functions
In preparation for lan969x, add the sparx5 context pointer to certain
IFH (Internal Frame Header) functions. This is required, as the
is_sparx5() function will be used here in a subsequent patch.
Daniel Machon [Wed, 23 Oct 2024 22:01:22 +0000 (00:01 +0200)]
net: sparx5: change frequency calculation for SDLB's
In preparation for lan969x, rework the function that calculates the SDLB
(Service Dual Leacky Bucket) clock. This is required, as the
HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock
from the core clock, using the sparx5_clk_period() function. The clock
stays the same before and after this patch, only now,
sparx5_sdlb_clk_hz_get() can be used for lan969x too.
Daniel Machon [Wed, 23 Oct 2024 22:01:21 +0000 (00:01 +0200)]
net: sparx5: change spx5_wr to spx5_rmw in cal update()
In preparation for lan969x, use spx5_rmw() for enabling the update of
the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG
register, as an additional write will be added before this one, in a
subsequent patch.
Daniel Machon [Wed, 23 Oct 2024 22:01:20 +0000 (00:01 +0200)]
net: sparx5: add support for lan969x targets and core clock
In preparation for lan969x, add lan969x targets to
sparx5_target_chiptype and set the core clock frequency for these
throughout. Lan969x only supports a core clock frequency of 328MHz.
Also, set the policer update internal (pol_upd_int) matching the 328 MHz
frequency of the lan969x targets.
Harshitha Ramamurthy [Wed, 23 Oct 2024 22:11:41 +0000 (15:11 -0700)]
gve: change to use page_pool_put_full_page when recycling pages
The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.
But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().
Jakub Kicinski [Thu, 31 Oct 2024 00:50:39 +0000 (17:50 -0700)]
Merge branch 'refactoring-rvu-nic-driver'
Geetha sowjanya says:
====================
Refactoring RVU NIC driver
This is a preparation pathset for follow-up "Introducing RVU representors
driver" patches. The RVU representor driver creates representor netdev
of each rvu device when switch dev mode is enabled.
RVU representor and NIC have a similar set of HW resources(NIX_LF,RQ/SQ/CQ)
and implements a subset of NIC functionality.
This patch set groups hw resources and queue configuration code into single
API and export the existing functions so, that code can be shared between
NIC and representor drivers.
These patches are part of "Introduce RVU representors" patchset.
https://lore.kernel.org/all/ZsdJ-w00yCI4NQ8T@nanopsycho.orion/T/
As suggested by "Jiri Pirko", submitting as separate patchset.
====================
Geetha sowjanya [Wed, 23 Oct 2024 16:18:42 +0000 (21:48 +0530)]
octeontx2-pf: Reuse PF max mtu value
Reuse the maximum support HW MTU value that is fetch during probe.
Instead of fetching through mbox each time mtu is changed as the
value is fixed for interface.
Geetha sowjanya [Wed, 23 Oct 2024 16:18:40 +0000 (21:48 +0530)]
octeontx2-pf: Define common API for HW resources configuration
Define new API "otx2_init_rsrc" and move the HW blocks
NIX/NPA resources configuration code under this API. So, that
it can be used by the RVU representor driver that has similar
resources of RVU NIC.
Jakub Kicinski [Thu, 31 Oct 2024 00:33:57 +0000 (17:33 -0700)]
Merge branch 'mirroring-to-dsa-cpu-port'
Vladimir Oltean says:
====================
Mirroring to DSA CPU port
Users of the NXP LS1028A SoC (drivers/net/dsa/ocelot L2 switch inside)
have requested to mirror packets from the ingress of a switch port to
software. Both port-based and flow-based mirroring is required.
The simplest way I could come up with was to set up tc mirred actions
towards a dummy net_device, and make the offloading of that be accepted
by the driver. Currently, the pattern in drivers is to reject mirred
towards ports they don't know about, but I'm now permitting that,
precisely by mirroring "to the CPU".
For testers, this series depends on commit 34d35b4edbbe ("net/sched:
act_api: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers") from net/main, which is absent from net-next as of the
day of posting (Oct 23). Without the bug fix it is possible to create
invalid configurations which are not rejected by the kernel.
For historical purposes, link to a much older (and much different) attempt:
https://lore.kernel.org/20191002233750.13566-1-olteanv@gmail.com
====================
Vladimir Oltean [Wed, 23 Oct 2024 13:52:51 +0000 (16:52 +0300)]
net: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces
Debugging certain flows in the offloaded switch data path can be done by
installing two tc-mirred filters for mirroring: one in the hardware data
path, which copies the frames to the CPU, and one which takes the frame
from there and mirrors it to a virtual interface like a dummy device,
where it can be seen with tcpdump.
The effect of having 2 filters run on the same packet can be obtained by
default using tc, by not specifying either the 'skip_sw' or 'skip_hw'
keywords.
Instead of refusing to offload mirroring/redirecting packets towards
interfaces that aren't switch ports, just treat every other destination
for what it is: something that is handled in software, behind the CPU
port.
Usage:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0
Vladimir Oltean [Wed, 23 Oct 2024 13:52:50 +0000 (16:52 +0300)]
net: dsa: allow matchall mirroring rules towards the CPU
If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.
This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].
The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).
There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.
Example:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0
Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.
Vladimir Oltean [Wed, 23 Oct 2024 13:52:48 +0000 (16:52 +0300)]
net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()
We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.
Vladimir Oltean [Wed, 23 Oct 2024 13:52:47 +0000 (16:52 +0300)]
net: dsa: clean up dsa_user_add_cls_matchall()
The body is a bit hard to read, hard to extend, and has duplicated
conditions.
Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:
- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
function call.
This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.
Vladimir Oltean [Wed, 23 Oct 2024 13:52:46 +0000 (16:52 +0300)]
net: sched: propagate "skip_sw" flag to struct flow_cls_common_offload
Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.
In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.
It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.
In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.
Jakub Kicinski [Thu, 31 Oct 2024 00:02:43 +0000 (17:02 -0700)]
Merge branch 'ptp-driver-for-s390-clocks'
Sven Schnelle says:
====================
PtP driver for s390 clocks
these patches add support for using the s390 physical and TOD clock as ptp
clock. To do so, the first patch adds a clock id to the s390 TOD clock,
while the second patch adds the PtP driver itself.
====================
Sven Schnelle [Wed, 23 Oct 2024 06:56:01 +0000 (08:56 +0200)]
s390/time: Add PtP driver
Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.
Yunshui Jiang [Mon, 28 Oct 2024 08:27:56 +0000 (16:27 +0800)]
tests: hsr: Increase timeout to 50 seconds
The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to
be exact, and even more on a debug kernel or kernel with other network
security limits. The timeout setting for the kselftest is currently 45
seconds, which is way too short to integrate hsr tests to run_kselftest
infrastructure. However, timeout of hundreds of seconds is quite a long
time, especially in a CI/CD environment. It seems that we need
accelerate the test and balance with timeout setting.
The most time-consuming func is do_ping_long, where ping command sends
10 packages to the given address. The default interval between two ping
packages is 1s according to the ping Mannual. There isn't any operation
between pings thus we could pass -i 0.1 to ping to make it 10 times
faster.
While even with this short interval, the test still need about 46.4
seconds to finish because of the two HSR interfaces, each of which is
tested by calling do_ping func 12 times and do_ping_long func 19 times
and sleep for 3s.
So, an explicit setting is also needed to slightly increase the
timeout. And to leave us some slack, use 50 as default timeout.
Jason Xing [Wed, 23 Oct 2024 08:14:52 +0000 (16:14 +0800)]
tcp: add more warn of socket in tcp_send_loss_probe()
Add two fields to print in the helper which here covers tcp_send_loss_probe().
Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Wed, 23 Oct 2024 08:14:51 +0000 (16:14 +0800)]
tcp: add a common helper to debug the underlying issue
Following the commit c8770db2d544 ("tcp: check skb is non-NULL
in tcp_rto_delta_us()"), we decided to add a helper so that it's
easier to get verbose warning on either cases.
Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 30 Oct 2024 01:50:57 +0000 (18:50 -0700)]
Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
mac80211: Remove NOP call to ieee80211_hw_config
wifi: iwlwifi: work around -Wenum-compare-conditional warning
wifi: mac80211: re-order assigning channel in activate links
wifi: mac80211: convert debugfs files to short fops
debugfs: add small file operations for most files
wifi: mac80211: remove misleading j_0 construction parts
wifi: mac80211_hwsim: use hrtimer_active()
wifi: mac80211: refactor BW limitation check for CSA parsing
wifi: mac80211: filter on monitor interfaces based on configured channel
wifi: mac80211: refactor ieee80211_rx_monitor
wifi: mac80211: add support for the monitor SKIP_TX flag
wifi: cfg80211: add monitor SKIP_TX flag
wifi: mac80211: add flag to opt out of virtual monitor support
wifi: cfg80211: pass net_device to .set_monitor_channel
wifi: mac80211: remove status->ampdu_delimiter_crc
wifi: cfg80211: report per wiphy radio antenna mask
wifi: mac80211: use vif radio mask to limit creating chanctx
wifi: mac80211: use vif radio mask to limit ibss scan frequencies
wifi: cfg80211: add option for vif allowed radios
wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
...
Jakub Kicinski [Tue, 29 Oct 2024 23:52:59 +0000 (16:52 -0700)]
Merge branch 'devlink-minor-cleanup'
Przemek Kitszel says:
====================
devlink: minor cleanup
(Patch 1, 2) Add one helper shortcut to put u64 values into skb.
(Patch 3, 4) Minor cleanup for error codes.
(Patch 5, 6, 7) Remove some devlink_resource_*() usage and functions
itself via replacing devlink_* variants by devl_* ones.
Przemek Kitszel [Wed, 23 Oct 2024 13:09:06 +0000 (15:09 +0200)]
devlink: remove unused devlink_resource_occ_get_register() and _unregister()
Remove not used devlink_resource_occ_get_register() and
devlink_resource_occ_get_unregister() functions; current devlink resource
users are fine with devl_ variants of the two.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241023131248.27192-7-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Wed, 23 Oct 2024 13:09:05 +0000 (15:09 +0200)]
net: dsa: replace devlink resource registration calls by devl_ variants
Replace devlink_resource_register(), devlink_resource_occ_get_register(),
and devlink_resource_occ_get_unregister() calls by respective devl_*
variants. Mentioned functions have no direct users in any drivers, and are
going to be removed in subsequent patches.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241023131248.27192-6-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current code is written to return -EINVAL when tailroom in the skb msg
would be exhausted precisely when it's time to nest, and return -EMSGSIZE
in all other "not enough space" conditions.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241023131248.27192-5-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Thu, 24 Oct 2024 20:42:33 +0000 (22:42 +0200)]
r8169: add support for RTL8125D
This adds support for new chip version RTL8125D, which can be found on
boards like Gigabyte X870E AORUS ELITE WIFI7. Firmware rtl8125d-1.fw
for this chip version is available in linux-firmware already.
Use a while loop in mlx5_eq_comp_int() and mlx5_eq_async_int() to
clarify the EQE polling logic. This consolidates the next_eqe_sw() calls
for the first and subequent iterations. It also avoids a goto. Turn the
num_eqes < MLX5_EQ_POLLING_BUDGET check into a break condition.
Rosen Penev [Tue, 22 Oct 2024 23:32:03 +0000 (16:32 -0700)]
amd-xgbe: use ethtool string helpers
The latter is the preferred way to copy ethtool strings.
Avoids manually incrementing the pointer.
Signed-off-by: Rosen Penev <rosenp@gmail.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241022233203.9670-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Consolidate the handling of dedicated PHY and fixed-link phy by taking
advantage of logic in of_phy_get_and_connect() which handles both of
these cases, rather than open coding the same logic in ftgmac100_probe().
These two patches simplify how we attach SFP PHYs.
The first patch notices that at the two sites where we call
sfp_select_interface(), if that fails, we always print the same error.
Move this into its own function.
The second patch adds an additional level of validation, checking that
the returned interface is one that is supported by the MAC/PCS.
The last patch simplifies how SFP PHYs are attached, reducing the
number of times that we do validation in this path.
====================
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:57 +0000 (14:41 +0100)]
net: phylink: simplify how SFP PHYs are attached
There are a few issues with how SFP PHYs are attached:
a) The phylink_sfp_connect_phy() and phylink_sfp_config_phy() code
validates the configuration three times:
1. To discover the support/advertising masks that the PHY/PCS/MAC
can support in order to select an interface.
2. To validate the selected interface.
3. When the PHY is brought up after being attached, another validation
is done.
This is needlessly complex.
b) The configuration is set prior to the PHY being attached, which
means we don't have the PHY available in phylink_major_config()
for phylink_pcs_neg_mode() to make decisions upon.
We have already added an extra step to validate the selected interface,
so we can now move the attachment and bringup of the PHY earlier,
inside phylink_sfp_config_phy(). This results in the validation at
step 2 above becoming entirely unnecessary, so remove that too.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1t3bcb-000c8H-3e@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:51 +0000 (14:41 +0100)]
net: phylink: validate sfp_select_interface() returned interface
Validate that the returned interface from sfp_select_interface() is
supportable by the MAC/PCS. If it isn't, print an error and return
the NA interface type. This is a preparatory step to reorganising
how a PHY on a SFP module is handled.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1t3bcV-000c8B-Vz@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 23 Oct 2024 13:41:46 +0000 (14:41 +0100)]
net: phylink: add common validation for sfp_select_interface()
Whenever we call sfp_select_interface(), we check the returned value
and print an error. There are two cases where this happens with the
same message. Provide a common function to do this.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1t3bcQ-000c85-S4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 22 Oct 2024 14:17:07 +0000 (15:17 +0100)]
net: phylink: simplify phylink_parse_fixedlink()
phylink_parse_fixedlink() wants to preserve the pause, asym_pause and
autoneg bits in pl->supported. Rather than reading the bits into
separate bools, zeroing pl->supported, and then setting them if they
were previously set, use a mask and linkmode_and() to achieve the same
result.
====================
mlx5e update features on config changes
This small patchset by Dragos adds a call to netdev_update_features()
in configuration changes that could impact the features status.
====================
Dragos Tatulea [Thu, 24 Oct 2024 16:41:33 +0000 (19:41 +0300)]
net/mlx5e: Update features on ring size change
When the ring size changes successfully, trigger
netdev_update_features() to enable features in wanted state if
applicable.
An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ethtool --set-ring eth1 rx 1024
With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241024164134.299646-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dragos Tatulea [Thu, 24 Oct 2024 16:41:32 +0000 (19:41 +0300)]
net/mlx5e: Update features on MTU change
When the MTU changes successfully, trigger netdev_update_features() to
enable features in wanted state if applicable.
An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ip link set dev eth1 mtu 7000
With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241024164134.299646-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 23 Oct 2024 12:15:28 +0000 (13:15 +0100)]
wwan: core: Pass string literal as format argument of dev_set_name()
Both gcc-14 and clang-18 report that passing a non-string literal as the
format argument of dev_set_name() is potentially insecure.
E.g. clang-18 says:
drivers/net/wwan/wwan_core.c:442:34: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
442 | return dev_set_name(&port->dev, buf);
| ^~~
drivers/net/wwan/wwan_core.c:442:34: note: treat the string as an argument to avoid this
442 | return dev_set_name(&port->dev, buf);
| ^
| "%s",
It is always the case where the contents of mod is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Compile tested only.
No functional change intended.
====================
ipv4: Prepare core ipv4 files to future .flowi4_tos conversion.
Continue preparing users of ->flowi4_tos (struct flowi4) to the future
conversion of this field (from __u8 to dscp_t). The objective is to
have type annotation to properly separate DSCP bits from ECN ones. This
way we'll ensure that ECN doesn't interfere with DSCP and avoid
regressions where it break routing descisions (fib rules in particular).
This series concentrates on some easy IPv4 conversions where
->flowi4_tos is set directly from an IPv4 header, so we can get the
DSCP value using the ip4h_dscp() helper function.
====================
Paolo Abeni [Tue, 29 Oct 2024 14:33:24 +0000 (15:33 +0100)]
Merge branch 'ibm-emac-more-cleanups'
Rosen Penev says:
====================
ibm: emac: more cleanups
Tested on Cisco MX60W.
v2: fixed build errors. Also added extra commits to clean the driver up
further.
v3: Added tested message. Removed bad alloc_netdev_dummy commit.
v4: removed modules changes from patchset. Added fix for if MAC not
found.
v5: added of_find_matching_node commit.
v6: resend after net-next merge.
v7: removed of_find_matching_node commit. Adjusted mutex_init patch.
v8: removed patch removing custom init/exit. Needs more work.
====================
Rosen Penev [Tue, 22 Oct 2024 00:22:45 +0000 (17:22 -0700)]
net: ibm: emac: generate random MAC if not found
On this Cisco MX60W, u-boot sets the local-mac-address property.
Unfortunately by default, the MAC is wrong and is actually located on a
UBI partition. Which means nvmem needs to be used to grab it.
In the case where that fails, EMAC fails to initialize instead of
generating a random MAC as many other drivers do.
Match behavior with other drivers to have a working ethernet interface.
Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Rosen Penev [Tue, 22 Oct 2024 00:22:41 +0000 (17:22 -0700)]
net: ibm: emac: use netif_receive_skb_list
Small rx improvement. Would use napi_gro_receive instead but that's a
lot more involved than netif_receive_skb_list because of how the
function is implemented.
Before:
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 51556 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.04 sec 559 MBytes 467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 48228 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.03 sec 558 MBytes 467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 47600 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.04 sec 557 MBytes 466 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 37252 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.05 sec 559 MBytes 467 Mbits/sec
After:
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 40786 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.05 sec 572 MBytes 478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 52482 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.04 sec 571 MBytes 477 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 48370 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 46086 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.05 sec 571 MBytes 476 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.1.101 port 46062 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec
Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Mon, 21 Oct 2024 18:32:28 +0000 (11:32 -0700)]
rtnetlink: Make per-netns RTNL dereference helpers to macro.
When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the
static inline wrapper of rtnl_dereference() returning a plain (void *)
pointer to make sure net is always evaluated as requested in [0].
But, it makes sparse complain [1] when the pointer has __rcu annotation:
net/ipv4/devinet.c: In function ‘inet_rtm_deladdr’:
./include/linux/rtnetlink.h:154:17: error: statement with no effect [-Werror=unused-value]
154 | (void *)net; \
net/ipv4/devinet.c:674:21: note: in expansion of macro ‘rtnl_net_dereference’
674 | (ifa = rtnl_net_dereference(net, *ifap)) != NULL;
| ^~~~~~~~~~~~~~~~~~~~
Let's go back to the original simplest macro.
Note that checkpatch complains about this approach, but it's one-shot and
less noisy than the other two.
WARNING: Argument 'net' is not used in function-like macro
#76: FILE: include/linux/rtnetlink.h:142:
+#define rtnl_net_dereference(net, p) \
+ rtnl_dereference(p)
Fixes: 844e5e7e656d ("rtnetlink: Add assertion helpers for per-netns RTNL.") Link: https://lore.kernel.org/netdev/20241004132145.7fd208e9@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410200325.SaEJmyZS-lkp@intel.com/ [1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sebastian Ott [Wed, 23 Oct 2024 13:41:46 +0000 (15:41 +0200)]
net/mlx5: unique names for per device caches
Add the device name to the per device kmem_cache names to
ensure their uniqueness. This fixes warnings like this:
"kmem_cache of name 'mlx5_fs_fgs' already exists".
Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241023134146.28448-1-sebott@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Mon, 21 Oct 2024 03:12:10 +0000 (03:12 +0000)]
bonding: return detailed error when loading native XDP fails
Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 28 Oct 2024 22:55:48 +0000 (15:55 -0700)]
Merge branch 'mptcp-various-small-improvements'
Matthieu Baerts says:
====================
mptcp: various small improvements
The following patches are not related to each other.
- Patch 1: Avoid sending advertisements on stale subflows, reducing
risks on loosing them.
- Patch 2: Annotate data-races around subflow->fully_established, using
READ/WRITE_ONCE().
- Patch 3: A small clean-up on the PM side, avoiding a bit of duplicated
code.
- Patch 4: Use "Middlebox interference" MP_TCPRST code in reaction to a
packet received without MPTCP options in the middle of a connection.
====================
Davide Caratti [Mon, 21 Oct 2024 15:14:06 +0000 (17:14 +0200)]
mptcp: use "middlebox interference" RST when no DSS
RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of
fully established subflow that carries data at TCP level with no DSS
sub-option.
This is generally the case when mpext is NULL or mpext->use_map is 0:
use a dedicated value of 'mapping_status' and use it before closing the
socket in subflow_check_data_avail().
Geliang Tang [Mon, 21 Oct 2024 15:14:05 +0000 (17:14 +0200)]
mptcp: implement mptcp_pm_connection_closed
The MPTCP path manager event handler mptcp_pm_connection_closed
interface has been added in the commit 1b1c7a0ef7f3 ("mptcp: Add path
manager interface") but it was an empty function from then on.
With such name, it sounds good to invoke mptcp_event with the
MPTCP_EVENT_CLOSED event type from it. It also removes a bit of
duplicated code.
Gang Yan [Mon, 21 Oct 2024 15:14:04 +0000 (17:14 +0200)]
mptcp: annotate data-races around subflow->fully_established
We introduce the same handling for potential data races with the
'fully_established' flag in subflow as previously done for
msk->fully_established.
Additionally, we make a crucial change: convert the subflow's
'fully_established' from 'bit_field' to 'bool' type. This is
necessary because methods for avoiding data races don't work well
with 'bit_field'. Specifically, the 'READ_ONCE' needs to know
the size of the variable being accessed, which is not supported in
'bit_field'. Also, 'test_bit' expect the address of 'bit_field'.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/516 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-2-1ef02746504a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the subflow is considered as "staled", it is better to avoid it to
send an ACK carrying an ADD_ADDR or RM_ADDR. Another subflow, if any,
will then be selected.
Florian Fainelli [Mon, 21 Oct 2024 17:49:35 +0000 (10:49 -0700)]
net: systemport: Move IO macros to header file
Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h
where it is more appropriate and where static inline helpers are
acceptable. While at it, make sure that the macro 'offset' argument does
not trigger a checkpatch warning due to possible argument re-use.
Leo Stone [Mon, 21 Oct 2024 17:46:44 +0000 (10:46 -0700)]
selftest/tcp-ao: Add filter tests
Add tests that check if getsockopt(TCP_AO_GET_KEYS) returns the right
keys when using different filters.
Sample output:
> # ok 114 filter keys: by sndid, rcvid, address
> # ok 115 filter keys: by is_current
> # ok 116 filter keys: by is_rnext
> # ok 117 filter keys: by sndid, rcvid
> # ok 118 filter keys: correct nkeys when in.nkeys < matches
Jakub Kicinski [Wed, 16 Oct 2024 01:11:44 +0000 (18:11 -0700)]
configs/debug: make sure PROVE_RCU_LIST=y takes effect
Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
added CONFIG_PROVE_RCU_LIST=y to the common CI config,
but RCU_EXPERT is not set, and it's a dependency for
CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
indicate that it does catch bugs.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241016011144.3058445-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Javier Carrasco [Sat, 19 Oct 2024 20:16:49 +0000 (22:16 +0200)]
net: dsa: mv88e6xxx: fix unreleased fwnode_handle in setup_port()
'ports_fwnode' is initialized via device_get_named_child_node(), which
requires a call to fwnode_handle_put() when the variable is no longer
required to avoid leaking memory.
Add the missing fwnode_handle_put() after 'ports_fwnode' has been used
and is no longer required.
Fixes: 94a2a84f5e9e ("net: dsa: mv88e6xxx: Support LED control") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>