]> www.infradead.org Git - users/hch/configfs.git/log
users/hch/configfs.git
10 months agovirtio_net: add support for Byte Queue Limits
Jiri Pirko [Tue, 18 Jun 2024 14:44:56 +0000 (16:44 +0200)]
virtio_net: add support for Byte Queue Limits

Add support for Byte Queue Limits (BQL).

Tested on qemu emulated virtio_net device with 1, 2 and 4 queues.
Tested with fq_codel and pfifo_fast. Super netperf with 50 threads is
running in background. Netperf TCP_RR results:

NOBQL FQC 1q:  159.56  159.33  158.50  154.31    agv: 157.925
NOBQL FQC 2q:  184.64  184.96  174.73  174.15    agv: 179.62
NOBQL FQC 4q:  994.46  441.96  416.50  499.56    agv: 588.12
NOBQL PFF 1q:  148.68  148.92  145.95  149.48    agv: 148.2575
NOBQL PFF 2q:  171.86  171.20  170.42  169.42    agv: 170.725
NOBQL PFF 4q: 1505.23 1137.23 2488.70 3507.99    agv: 2159.7875
  BQL FQC 1q: 1332.80 1297.97 1351.41 1147.57    agv: 1282.4375
  BQL FQC 2q:  768.30  817.72  864.43  974.40    agv: 856.2125
  BQL FQC 4q:  945.66  942.68  878.51  822.82    agv: 897.4175
  BQL PFF 1q:  149.69  151.49  149.40  147.47    agv: 149.5125
  BQL PFF 2q: 2059.32  798.74 1844.12  381.80    agv: 1270.995
  BQL PFF 4q: 1871.98 4420.02 4916.59 13268.16   agv: 6119.1875

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240618144456.1688998-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: smc9194: add missing MODULE_DESCRIPTION() macro
Jeff Johnson [Tue, 18 Jun 2024 17:56:28 +0000 (10:56 -0700)]
net: smc9194: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/smsc/smc9194.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-smsc-v1-1-ad3d7200421e@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: ethernet: mac89x0: add missing MODULE_DESCRIPTION() macro
Jeff Johnson [Tue, 18 Jun 2024 17:41:54 +0000 (10:41 -0700)]
net: ethernet: mac89x0: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/cirrus/mac89x0.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-cirrus-v1-1-07f5bd0b64cb@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: amd: add missing MODULE_DESCRIPTION() macros
Jeff Johnson [Tue, 18 Jun 2024 17:26:20 +0000 (10:26 -0700)]
net: amd: add missing MODULE_DESCRIPTION() macros

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/a2065.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/ariadne.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/atarilance.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/hplance.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/7990.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/mvme147.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/sun3lance.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes drivers/net/ethernet/amd/lance.c which, although it did
not produce a warning with the m68k allmodconfig configuration, may
cause this warning with other configurations.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-amd-v1-1-50ee7a9ad50e@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: arcnet: com20020-isa: add missing MODULE_DESCRIPTION() macro
Jeff Johnson [Tue, 18 Jun 2024 16:53:44 +0000 (09:53 -0700)]
net: arcnet: com20020-isa: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020-isa.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-arcnet-v1-1-90e42bc58102@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: mt7530: add support for bridge port isolation
Matthias Schiffer [Tue, 18 Jun 2024 07:17:13 +0000 (09:17 +0200)]
net: dsa: mt7530: add support for bridge port isolation

Remove a pair of ports from the port matrix when both ports have the
isolated flag set.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: dsa: mt7530: factor out bridge join/leave logic
Matthias Schiffer [Tue, 18 Jun 2024 07:17:12 +0000 (09:17 +0200)]
net: dsa: mt7530: factor out bridge join/leave logic

As preparation for implementing bridge port isolation, move the logic to
add and remove bits in the port matrix into a new helper
mt7530_update_port_member(), which is called from
mt7530_port_bridge_join() and mt7530_port_bridge_leave().

Another part of the preparation is using dsa_port_offloads_bridge_dev()
instead of dsa_port_offloads_bridge() to check for bridge membership, as
we don't have a struct dsa_bridge in mt7530_port_bridge_flags().

The port matrix setting is slightly streamlined, now always first setting
the mt7530_port's pm field and then writing the port matrix from that
field into the hardware register, instead of duplicating the bit
manipulation for both the struct field and the register.

mt7530_port_bridge_join() was previously using |= to update the port
matrix with the port bitmap, which was unnecessary, as pm would only
have the CPU port set before joining a bridge; a simple assignment can
be used for both joining and leaving (and will also work when individual
bits are added/removed in port_bitmap with regard to the previous port
matrix, which is what happens with port isolation).

No functional change intended.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoMerge branch 'net-drop-rx-socket-tracepoint'
David S. Miller [Wed, 19 Jun 2024 11:44:23 +0000 (12:44 +0100)]
Merge branch 'net-drop-rx-socket-tracepoint'

Yan Zhai says:

====================
net: pass receive socket to drop tracepoint

We set up our production packet drop monitoring around the kfree_skb
tracepoint. While this tracepoint is extremely valuable for diagnosing
critical problems, it also has some limitation with drops on the local
receive path: this tracepoint can only inspect the dropped skb itself,
but such skb might not carry enough information to:

1. determine in which netns/container this skb gets dropped
2. determine by which socket/service this skb oughts to be received

The 1st issue is because skb->dev is the only member field with valid
netns reference. But skb->dev can get cleared or reused. For example,
tcp_v4_rcv will clear skb->dev and in later processing it might be reused
for OFO tree.

The 2nd issue is because there is no reference on an skb that reliably
points to a receiving socket. skb->sk usually points to the local
sending socket, and it only points to a receive socket briefly after
early demux stage, yet the socket can get stolen later. For certain drop
reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it
is hard to infer which receiving socket is impacted. This cannot be
overcome by simply looking at the packet header, because of
complications like sk lookup programs. In the past, single purpose
tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full,
etc are added as needed to provide more visibility. This could be
handled in a more generic way.

In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in
replacement for kfree_skb_reason at various local input path. It accepts
an extra receiving socket argument. Both issues above can be resolved
via this new argument.

V4->V5: rename rx_skaddr to rx_sk to be more clear visually, suggested
by Jesper Dangaard Brouer.

V3->V4: adjusted the TP_STRUCT field order to align better, suggested by
Steven Rostedt.

V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks;
Added a few missing report tags from test bots (also noticed by Dan
Carpenter and Simon Horman).

V1->V2: instead of using skb->cb, directly add the needed argument to
trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet
suggested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoaf_packet: use sk_skb_reason_drop to free rx packets
Yan Zhai [Mon, 17 Jun 2024 18:09:27 +0000 (11:09 -0700)]
af_packet: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011859.Aacus8GV-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoudp: use sk_skb_reason_drop to free rx packets
Yan Zhai [Mon, 17 Jun 2024 18:09:24 +0000 (11:09 -0700)]
udp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agotcp: use sk_skb_reason_drop to free rx packets
Yan Zhai [Mon, 17 Jun 2024 18:09:20 +0000 (11:09 -0700)]
tcp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: raw: use sk_skb_reason_drop to free rx packets
Yan Zhai [Mon, 17 Jun 2024 18:09:16 +0000 (11:09 -0700)]
net: raw: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoping: use sk_skb_reason_drop to free rx packets
Yan Zhai [Mon, 17 Jun 2024 18:09:13 +0000 (11:09 -0700)]
ping: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: introduce sk_skb_reason_drop function
Yan Zhai [Mon, 17 Jun 2024 18:09:09 +0000 (11:09 -0700)]
net: introduce sk_skb_reason_drop function

Long used destructors kfree_skb and kfree_skb_reason do not pass
receiving socket to packet drop tracepoints trace_kfree_skb.
This makes it hard to track packet drops of a certain netns (container)
or a socket (user application).

The naming of these destructors are also not consistent with most sk/skb
operating functions, i.e. functions named "sk_xxx" or "skb_xxx".
Introduce a new functions sk_skb_reason_drop as drop-in replacement for
kfree_skb_reason on local receiving path. Callers can now pass receiving
sockets to the tracepoints.

kfree_skb and kfree_skb_reason are still usable but they are now just
inline helpers that call sk_skb_reason_drop.

Note it is not feasible to do the same to consume_skb. Packets not
dropped can flow through multiple receive handlers, and have multiple
receiving sockets. Leave it untouched for now.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: add rx_sk to trace_kfree_skb
Yan Zhai [Mon, 17 Jun 2024 18:09:04 +0000 (11:09 -0700)]
net: add rx_sk to trace_kfree_skb

skb does not include enough information to find out receiving
sockets/services and netns/containers on packet drops. In theory
skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
stack for OOO packet lookup. Similarly, skb->sk often identifies a local
sender, and tells nothing about a receiver.

Allow passing an extra receiving socket to the tracepoint to improve
the visibility on receiving drops.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoMerge branch 'am65x-ptp'
David S. Miller [Wed, 19 Jun 2024 09:56:10 +0000 (10:56 +0100)]
Merge branch 'am65x-ptp'

Diogo Ivo says

====================
Enable PTP timestamping/PPS for AM65x SR1.0 devices

This patch series enables support for PTP in AM65x SR1.0 devices.

This feature relies heavily on the Industrial Ethernet Peripheral
(IEP) hardware module, which implements a hardware counter through
which time is kept. This hardware block is the basis for exposing
a PTP hardware clock to userspace and for issuing timestamps for
incoming/outgoing packets, allowing for time synchronization.

The IEP also has compare registers that fire an interrupt when the
counter reaches the value stored in a compare register. This feature
allows us to support PPS events in the kernel.

The changes are separated into five patches:
 - PATCH 01/05: Register SR1.0 devices with the IEP infrastructure to
expose a PHC clock to userspace, allowing time to be
adjusted using standard PTP tools. The code for issuing/
collecting packet timestamps is already present in the
current state of the driver, so only this needs to be
done.
 - PATCH 02/05: Remove unnecessary spinlock synchronization.
 - PATCH 03/05: Document IEP interrupt in DT binding.
 - PATCH 04/05: Add support for IEP compare event/interrupt handling
to enable PPS events.
 - PATCH 05/05: Add the interrupts to the IOT2050 device tree.

Currently every compare event generates two interrupts, the first
corresponding to the actual event and the second being a spurious
but otherwise harmless interrupt. The root cause of this has been
identified and has been solved in the platform's SDK. A forward port
of the SDK's patches also fixes the problem in upstream but is not
included here since it's upstreaming is out of the scope of this
series. If someone from TI would be willing to chime in and help
get the interrupt changes upstream that would be great!

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
---
Changes in v4:
- Remove unused 'flags' variables in patch 02/05
- Add patch 03/05 describing IEP interrupt in DT binding
- Link to v3: https://lore.kernel.org/r/20240607-iep-v3-0-4824224105bc@siemens.com

Changes in v3:
- Collect Reviewed-by tags
- Add patch 02/04 removing spinlocks from IEP driver
- Use mutex-based synchronization when accessing HW registers
- Link to v2: https://lore.kernel.org/r/20240604-iep-v2-0-ea8e1c0a5686@siemens.com

Changes in v2:
- Collect Reviewed-by tags
- PATCH 01/03: Limit line length to 80 characters
- PATCH 02/03: Proceed with limited functionality if getting IRQ fails,
       limit line length to 80 characters
- Link to v1: https://lore.kernel.org/r/20240529-iep-v1-0-7273c07592d3@siemens.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoarm64: dts: ti: iot2050: Add IEP interrupts for SR1.0 devices
Diogo Ivo [Mon, 17 Jun 2024 15:21:44 +0000 (16:21 +0100)]
arm64: dts: ti: iot2050: Add IEP interrupts for SR1.0 devices

Add the interrupts needed for PTP Hardware Clock support via IEP
in SR1.0 devices.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: ti: icss-iep: Enable compare events
Diogo Ivo [Mon, 17 Jun 2024 15:21:43 +0000 (16:21 +0100)]
net: ti: icss-iep: Enable compare events

The IEP module supports compare events, in which a value is written to a
hardware register and when the IEP counter reaches the written value an
interrupt is generated. Add handling for this interrupt in order to
support PPS events.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agodt-bindings: net: Add IEP interrupt
Diogo Ivo [Mon, 17 Jun 2024 15:21:42 +0000 (16:21 +0100)]
dt-bindings: net: Add IEP interrupt

The IEP interrupt is used in order to support both capture events, where
an incoming external signal gets timestamped on arrival, and compare
events, where an interrupt is generated internally when the IEP counter
reaches a programmed value.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: ti: icss-iep: Remove spinlock-based synchronization
Diogo Ivo [Mon, 17 Jun 2024 15:21:41 +0000 (16:21 +0100)]
net: ti: icss-iep: Remove spinlock-based synchronization

As all sources of concurrency in hardware register access occur in
non-interrupt context eliminate spinlock-based synchronization and
rely on the mutex-based synchronization that is already present.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: ti: icssg-prueth: Enable PTP timestamping support for SR1.0 devices
Diogo Ivo [Mon, 17 Jun 2024 15:21:40 +0000 (16:21 +0100)]
net: ti: icssg-prueth: Enable PTP timestamping support for SR1.0 devices

Enable PTP support for AM65x SR1.0 devices by registering with the IEP
infrastructure in order to expose a PTP clock to userspace.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agords:Simplify the allocation of slab caches
Hongfu Li [Mon, 17 Jun 2024 07:54:35 +0000 (15:54 +0800)]
rds:Simplify the allocation of slab caches

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: mana: Add support for page sizes other than 4KB on ARM64
Haiyang Zhang [Mon, 17 Jun 2024 20:17:26 +0000 (13:17 -0700)]
net: mana: Add support for page sizes other than 4KB on ARM64

As defined by the MANA Hardware spec, the queue size for DMA is 4KB
minimal, and power of 2. And, the HWC queue size has to be exactly
4KB.

To support page sizes other than 4KB on ARM64, define the minimal
queue size as a macro separately from the PAGE_SIZE, which we always
assumed it to be 4KB before supporting ARM64.

Also, add MANA specific macros and update code related to size
alignment, DMA region calculations, etc.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'net-mlx4_en-use-ethtool_puts-sprintf'
Jakub Kicinski [Wed, 19 Jun 2024 01:19:22 +0000 (18:19 -0700)]
Merge branch 'net-mlx4_en-use-ethtool_puts-sprintf'

Kamal Heib says:

====================
net/mlx4_en: Use ethtool_puts/sprintf

This patchset updates the mlx4_en driver to use the ethtool_puts and
ethtool_sprintf helper functions.

Signed-off-by: Kamal Heib <kheib@redhat.com>
====================

Link: https://lore.kernel.org/r/20240617172329.239819-1-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx4_en: Use ethtool_puts/sprintf to fill stats strings
Kamal Heib [Mon, 17 Jun 2024 17:23:29 +0000 (13:23 -0400)]
net/mlx4_en: Use ethtool_puts/sprintf to fill stats strings

Use the ethtool_puts/ethtool_sprintf helper to print the stats strings
into the ethtool strings interface.

Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240617172329.239819-4-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx4_en: Use ethtool_puts to fill selftest strings
Kamal Heib [Mon, 17 Jun 2024 17:23:28 +0000 (13:23 -0400)]
net/mlx4_en: Use ethtool_puts to fill selftest strings

Use the ethtool_puts helper to print the selftest strings into the
ethtool strings interface.

Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240617172329.239819-3-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx4_en: Use ethtool_puts to fill priv flags strings
Kamal Heib [Mon, 17 Jun 2024 17:23:27 +0000 (13:23 -0400)]
net/mlx4_en: Use ethtool_puts to fill priv flags strings

Use the ethtool_puts helper to print the priv flags strings into the
ethtool strings interface.

Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240617172329.239819-2-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: microchip: Constify struct vcap_operations
Christophe JAILLET [Sun, 16 Jun 2024 07:57:56 +0000 (09:57 +0200)]
net: microchip: Constify struct vcap_operations

"struct vcap_operations" are not modified in these drivers.

Constifying this structure moves some data to a read-only section, so
increase overall security.

In order to do it, "struct vcap_control" also needs to be adjusted to this
new const qualifier.

As an example, on a x86_64, with allmodconfig:
Before:
======
   text    data     bss     dec     hex filename
  15176    1094      16   16286    3f9e drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o

After:
=====
   text    data     bss     dec     hex filename
  15268     998      16   16282    3f9a drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/d8e76094d2e98ebb5bfc8205799b3a9db0b46220.1718524644.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agoMerge branch 'introduce-phy-mode-10g-qxgmii'
Paolo Abeni [Tue, 18 Jun 2024 11:28:29 +0000 (13:28 +0200)]
Merge branch 'introduce-phy-mode-10g-qxgmii'

Luo Jie says:

====================
Introduce PHY mode 10G-QXGMII

This patch series adds 10G-QXGMII mode for PHY driver. The patch
series is split from the QCA8084 PHY driver patch series below.
https://lore.kernel.org/all/20231215074005.26976-1-quic_luoj@quicinc.com/

Per Andrew Lunn’s advice, submitting this patch series for acceptance
as they already include the necessary 'Reviewed-by:' tags. This way,
they need not wait for QCA8084 series patches to conclude review.

Changes in v2:
* remove PHY_INTERFACE_MODE_10G_QXGMII from workaround of
  validation in the phylink_validate_phy. 10G_QXGMII will
  be set into phy->possible_interfaces in its .config_init
  method of PHY driver that supports it.
====================

Link: https://lore.kernel.org/r/20240615120028.2384732-1-quic_luoj@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agodt-bindings: net: ethernet-controller: add 10g-qxgmii mode
Vladimir Oltean [Sat, 15 Jun 2024 12:00:28 +0000 (20:00 +0800)]
dt-bindings: net: ethernet-controller: add 10g-qxgmii mode

Add the new interface mode 10g-qxgmii, which is similar to
usxgmii but extend to 4 channels to support maximum of 4
ports with the link speed 10M/100M/1G/2.5G.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: phy: introduce core support for phy-mode = "10g-qxgmii"
Vladimir Oltean [Sat, 15 Jun 2024 12:00:27 +0000 (20:00 +0800)]
net: phy: introduce core support for phy-mode = "10g-qxgmii"

10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport
specification. It uses the same signaling as USXGMII, but it multiplexes
4 ports over the link, resulting in a maximum speed of 2.5G per port.

Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean
either the single-port USXGMII or the quad-port 10G-QXGMII variant, and
they could get away just fine with that thus far. But there is a need to
distinguish between the 2 as far as SerDes drivers are concerned.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: Enable TSO on VLANs
Furong Xu [Sat, 15 Jun 2024 09:56:11 +0000 (17:56 +0800)]
net: stmmac: Enable TSO on VLANs

The TSO engine works well when the frames are not VLAN Tagged.
But it will produce broken segments when frames are VLAN Tagged.

The first segment is all good, while the second segment to the
last segment are broken, they lack of required VLAN tag.

An example here:
========
// 1st segment of a VLAN Tagged TSO frame, nothing wrong.
MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [.], seq 1:1449

// 2nd to last segments of a VLAN Tagged TSO frame, VLAN tag is missing.
MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 1449:2897
MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 2897:4345
MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 4345:5793
MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [P.], seq 5793:7241

// normal VLAN Tagged non-TSO frame, nothing wrong.
MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1022: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [P.], seq 7241:8193
MacSrc > MacDst, ethertype 802.1Q (0x8100), length 70: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [F.], seq 8193
========

When transmitting VLAN Tagged TSO frames, never insert VLAN tag by HW,
always insert VLAN tag to SKB payload, then TSO works well on VLANs for
all MAC cores.

Tested on DWMAC CORE 5.10a, DWMAC CORE 5.20a and DWXGMAC CORE 3.20a

Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240615095611.517323-1-0x1207@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: Move dev_set_hwtstamp_phylib to net/core/dev.h
Kory Maincent [Wed, 12 Jun 2024 15:04:02 +0000 (17:04 +0200)]
net: Move dev_set_hwtstamp_phylib to net/core/dev.h

This declaration was added to the header to be called from ethtool.
ethtool is separated from core for code organization but it is not really
a separate entity, it controls very core things.
As ethtool is an internal stuff it is not wise to have it in netdevice.h.
Move the declaration to net/core/dev.h instead.

Remove the EXPORT_SYMBOL_GPL call as ethtool can not be built as a module.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-2-b2a086257b63@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dwc-xlgmac: fix missing MODULE_DESCRIPTION() warning
Jeff Johnson [Sun, 16 Jun 2024 20:01:48 +0000 (13:01 -0700)]
net: dwc-xlgmac: fix missing MODULE_DESCRIPTION() warning

With ARCH=hexagon, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/synopsys/dwc-xlgmac.o

With most other ARCH settings the MODULE_DESCRIPTION() is provided by
the macro invocation in dwc-xlgmac-pci.c. However, for hexagon, the
PCI bus is not enabled, and hence CONFIG_DWC_XLGMAC_PCI is not set.
As a result, dwc-xlgmac-pci.c is not compiled, and hence is not linked
into dwc-xlgmac.o.

To avoid this issue, relocate the MODULE_DESCRIPTION() and other
related macros from dwc-xlgmac-pci.c to dwc-xlgmac-common.c, since
that file already has an existing MODULE_LICENSE() and it is
unconditionally linked into dwc-xlgmac.o.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240616-md-hexagon-drivers-net-ethernet-synopsys-v1-1-55852b60aef8@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: mana: Use mana_cleanup_port_context() for rxq cleanup
Shradha Gupta [Fri, 14 Jun 2024 07:19:08 +0000 (00:19 -0700)]
net: mana: Use mana_cleanup_port_context() for rxq cleanup

To cleanup rxqs in port context structures, instead of duplicating the
code, use existing function mana_cleanup_port_context() which does
the exact cleanup that's needed.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Link: https://lore.kernel.org/r/1718349548-28697-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agofou: remove warn in gue_gro_receive on unsupported protocol
Willem de Bruijn [Fri, 14 Jun 2024 12:25:18 +0000 (08:25 -0400)]
fou: remove warn in gue_gro_receive on unsupported protocol

Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is
not known or does not have a GRO handler.

Such a packet is easily constructed. Syzbot generates them and sets
off this warning.

Remove the warning as it is expected and not actionable.

The warning was previously reduced from WARN_ON to WARN_ON_ONCE in
commit 270136613bf7 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad
proto callbacks").

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'net-smc-IPPROTO_SMC'
David S. Miller [Mon, 17 Jun 2024 12:14:09 +0000 (13:14 +0100)]
Merge branch 'net-smc-IPPROTO_SMC'

D. Wythe says:

====================
Introduce IPPROTO_SMC

This patch allows to create smc socket via AF_INET,
similar to the following code,

/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);

/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);

There are several reasons why we believe it is appropriate here:

1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.

2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path. Such as:
  1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall
     initiated by the user, without the use of LD-PRELOAD.
  2. Select whether immediate fallback is required based on peer's port/ip
     before connect().

A very significant result is that we can now use eBPF to implement smc_run
instead of LD_PRELOAD, who is completely ineffective in scenarios of static
linking.

Another potential value is that we are attempting to optimize the
performance of fallback socks, where merging socks is an important part,
and it relies on the creation of SMC sockets under the AF_INET path.
(More information :
https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/)

v2 -> v1:

- Code formatting, mainly including alignment and annotation repair.
- move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky.
- Fix the issue where refactoring affects the initialization order.
- Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set.

v3 -> v2:

- Add Alibaba's copyright information to the newfile

v4 -> v3:

- Fix some spelling errors
- Align function naming style with smc_sock_init() to smc_sk_init()
- Reversing the order of the conditional checks on clcsock to make the code more intuitive

v5 -> v4:

- Fix some spelling errors
- Added comment, "/* CONFIG_IPV6 */", after the final #endif directive.
- Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c
- Encapsulate the initialization and destruction of inet_smc in inet_smc.c,
  rather than implementing it directly in af_smc.c.
- Remove useless header files in smc_inet.h
- Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's
  only used in smc_inet.c

v6 -> v5:

- Wrapping lines to not exceed 80 characters
- Combine initialization and error handling of smc_inet6 into the same #if
  macro block.

v7 -> v6:

- Modify the value of IPPROTO_SMC to 256 so that it does not affect IPPROTO-MAX

v8 -> v7:

- Remove useless declarations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet/smc: Introduce IPPROTO_SMC
D. Wythe [Thu, 13 Jun 2024 18:00:30 +0000 (02:00 +0800)]
net/smc: Introduce IPPROTO_SMC

This patch allows to create smc socket via AF_INET,
similar to the following code,

/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);

/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);

There are several reasons why we believe it is appropriate here:

1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.

2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet/smc: expose smc proto operations
D. Wythe [Thu, 13 Jun 2024 18:00:29 +0000 (02:00 +0800)]
net/smc: expose smc proto operations

Externalize smc proto operations (smc_xxx) to allow
access from files other than af_smc.c

This is in preparation for the subsequent implementation
of the AF_INET version of SMC.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet/smc: refactoring initialization of smc sock
D. Wythe [Thu, 13 Jun 2024 18:00:28 +0000 (02:00 +0800)]
net/smc: refactoring initialization of smc sock

This patch aims to isolate the shared components of SMC socket
allocation by introducing smc_sk_init() for sock initialization
and __smc_create_clcsk() for the initialization of clcsock.

This is in preparation for the subsequent implementation of the
AF_INET version of SMC.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: make for_each_netdev_dump() a little more bug-proof
Jakub Kicinski [Thu, 13 Jun 2024 21:33:16 +0000 (14:33 -0700)]
net: make for_each_netdev_dump() a little more bug-proof

I find the behavior of xa_for_each_start() slightly counter-intuitive.
It doesn't end the iteration by making the index point after the last
element. IOW calling xa_for_each_start() again after it "finished"
will run the body of the loop for the last valid element, instead
of doing nothing.

This works fine for netlink dumps if they terminate correctly
(i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
reminded legacy dumps are unlikely to go away.

Fixing this generically at the xa_for_each_start() level seems hard -
there is no index reserved for "end of iteration".
ifindexes are 31b wide, tho, and iterator is ulong so for
for_each_netdev_dump() it's safe to go to the next element.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoMerge branch 'mlx5-genl-queue-stats'
David S. Miller [Mon, 17 Jun 2024 10:37:19 +0000 (11:37 +0100)]
Merge branch 'mlx5-genl-queue-stats'

Joe Damato says:

====================
mlx5: Add netdev-genl queue stats

Welcome to v5.

Switched from RFC to just a v5, because I think this is pretty close.
Minor changes from v4 summarized below in the changelog.

Note that my NIC does not seem to support PTP and I couldn't get the
mlnx-tools mlnx_qos script to work, so I was only able to test the
following cases:

- device up at boot
- adjusting queue counts
- device down (e.g. ip link set dev eth4 down)

Please see the commit message of patch 2/2 for more details on output
and test cases.

rfcv4 thread:
  https://lore.kernel.org/linux-kernel/20240604004629.299699-1-jdamato@fastly.com/T/

rfcv4 -> v5:
 - Patch 1/2: change variable name 'mlx5e_qid' to 'txq_ix'.
 - Patch 2/2:
    - remove logic in mlx5e_get_queue_stats_rx for PTP. PTP RX are
      always reported in base.
    - report PTP TX in mlx5e_get_base_stats only if:
      - PTP has ever been opened, and
      - either PTP is NULL (closed) or the MLX5E_PTP_STATE_TX bit in its
        state is not set

    Otherwise, PTP TX will be reported when the txq_ix is passed into
    mlx5e_get_queue_stats_tx

rfcv3 -> rfcv4:
 - Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq
   indices to sq_stats structures so stats can be accessed directly.
   This mapping is kept up to date along side txq2sq.

 - Patch 2/2:
   - All mutex_lock/unlock on state_lock has been dropped.
   - mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special
     case for PTP. If PTP was ever opened, is currently opened, and the
     channel index matches, stats for PTP RX are output.
   - mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No
     corner cases are needed here because any txq idx (passed in as i)
     will have an up to date mapping in priv->txq2sq_stats.
   - mlx5e_get_base_stats:
     - in the RX case:
       - iterates from [params.num_channels, stats_nch) collecting
         stats.
       - if ptp was ever opened but is currently closed, add the PTP
         stats.
     - in the TX case:
       - handle 2 cases:
         - the channel is available, so sum only the unavailable TCs
           [mlx5e_get_dcb_num_tc, max_opened_tc).
         - the channel is unavailable, so sum all TCs [0, max_opened_tc).
       - if ptp was ever opened but is currently closed, add the PTP
         sq stats.

v2 -> rfcv3:
 - Added patch 1/2 which creates some helpers for computing the txq_ix
   and ch_ix/tc_ix.

 - Patch 2/2 modified in several ways:
   - Fixed variable declarations in mlx5e_get_queue_stats_rx to be at
     the start of the function.
   - mlx5e_get_queue_stats_tx rewritten to access sq stats directly by
     using the helpers added in the previous patch.
   - mlx5e_get_base_stats modified in several ways:
     - Took the state_lock when accessing priv->channels.
     - For the base RX stats, code was simplified to call
       mlx5e_get_queue_stats_rx instead of repeating the same code.
     - For the base TX stats, I attempted to implement what I think
       Tariq suggested in the previous thread:
         - for available channels, only unavailable TC stats are summed
 - for unavailable channels, all stats for TCs up to
   max_opened_tc are summed.

v1 - > v2:
  - Essentially a full rewrite after comments from Jakub, Tariq, and
    Zhu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet/mlx5e: Add per queue netdev-genl stats
Joe Damato [Wed, 12 Jun 2024 20:08:57 +0000 (20:08 +0000)]
net/mlx5e: Add per queue netdev-genl stats

./cli.py --spec netlink/specs/netdev.yaml \
         --dump qstats-get --json '{"scope": "queue"}'

...snip

 {'ifindex': 7,
  'queue-id': 62,
  'queue-type': 'rx',
  'rx-alloc-fail': 0,
  'rx-bytes': 105965251,
  'rx-packets': 179790},
 {'ifindex': 7,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 9402665,
  'tx-packets': 17551},

...snip

Also tested with the script tools/testing/selftests/drivers/net/stats.py
in several scenarios to ensure stats tallying was correct:

- on boot (default queue counts)
- adjusting queue count up or down (ethtool -L eth0 combined ...)

The tools/testing/selftests/drivers/net/stats.py brings the device up,
so to test with the device down, I did the following:

$ ip link show eth4
7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
  [..snip..]

$ cat /proc/net/dev | grep eth4
eth4: 235710489  434811 [..snip rx..] 2878744 21227  [..snip tx..]

$ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
           --dump qstats-get --json '{"ifindex": 7}'
[{'ifindex': 7,
  'rx-alloc-fail': 0,
  'rx-bytes': 235710489,
  'rx-packets': 434811,
  'tx-bytes': 2878744,
  'tx-packets': 21227}]

Compare the values in /proc/net/dev match the output of cli for the same
device, even while the device is down.

Note that while the device is down, per queue stats output nothing
(because the device is down there are no queues):

$ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
           --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
[]

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet/mlx5e: Add txq to sq stats mapping
Joe Damato [Wed, 12 Jun 2024 20:08:56 +0000 (20:08 +0000)]
net/mlx5e: Add txq to sq stats mapping

mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map
txqs to sq_stats, as well, for direct access to stats.

Add priv->txq2sq_stats and insert mappings. The mappings will be used
next to tabulate stats information.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agonet: micro-optimize skb_datagram_iter
Sagi Grimberg [Thu, 13 Jun 2024 11:35:04 +0000 (14:35 +0300)]
net: micro-optimize skb_datagram_iter

We only use the mapping in a single context in a short and contained scope,
so kmap_local_page is sufficient and cheaper. This will also allow
skb_datagram_iter to be called from softirq context.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20240613113504.1079860-1-sagi@grimberg.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'mlxsw-handle-mtu-values'
Jakub Kicinski [Sat, 15 Jun 2024 02:30:41 +0000 (19:30 -0700)]
Merge branch 'mlxsw-handle-mtu-values'

Petr Machata says:

====================
mlxsw: Handle MTU values

Amit Cohen writes:

The driver uses two values for maximum MTU, but neither is accurate.
In addition, the value which is configured to hardware is not calculated
correctly. Handle these issues and expose accurate values for minimum
and maximum MTU per netdevice.

Add test cases to check that the exposed values are really supported.

Patch set overview:
Patches #1-#3 set the driver to use accurate values for MTU
Patch #4 aligns the driver to always use the same value for maximum MTU
Patch #5 adds a test
====================

Link: https://lore.kernel.org/r/cover.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoselftests: forwarding: Add test for minimum and maximum MTU
Amit Cohen [Thu, 13 Jun 2024 14:07:58 +0000 (16:07 +0200)]
selftests: forwarding: Add test for minimum and maximum MTU

Add cases to check minimum and maximum MTU which are exposed via
"ip -d link show". Test configuration and traffic. Use VLAN devices as
usually VLAN header (4 bytes) is not included in the MTU, and drivers
should configure hardware correctly to send maximum MTU payload size
in VLAN tagged packets.

$ ./min_max_mtu.sh
TEST: ping [ OK ]
TEST: ping6 [ OK ]
TEST: Test maximum MTU configuration [ OK ]
TEST: Test traffic, packet size is maximum MTU [ OK ]
TEST: Test minimum MTU configuration [ OK ]
TEST: Test traffic, packet size is minimum MTU [ OK ]

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/89de8be8989db7a97f3b39e3c9da695673e78d2e.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agomlxsw: Use the same maximum MTU value throughout the driver
Amit Cohen [Thu, 13 Jun 2024 14:07:57 +0000 (16:07 +0200)]
mlxsw: Use the same maximum MTU value throughout the driver

Currently, the driver uses two different values for maximum MTU, one is
stored in mlxsw_port->dev->max_mtu and the second is stored in
mlxsw_port->max_mtu. The second one is set to value which is queried from
firmware. This value was never tested, and unfortunately is not really
supported. That means that with the existing code, user can set MTU to
X, which is not really supported by firmware and which is bigger than
buffer size which is allocated in pci.

To make the driver consistent, use only mlxsw_port->dev->max_mtu for
maximum MTU value, for buffers headroom add Ethernet frame headers, which
are not included in mlxsw_port->dev->max_mtu. Remove mlxsw_port->max_mtu.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/89fa6f804386b918d337e736e14ac291bb947483.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agomlxsw: spectrum: Set more accurate values for netdevice min/max MTU
Amit Cohen [Thu, 13 Jun 2024 14:07:56 +0000 (16:07 +0200)]
mlxsw: spectrum: Set more accurate values for netdevice min/max MTU

Currently, the driver uses ETH_MAX_MTU as maximum MTU of netdevices,
instead, use the accurate value which is supported by the driver.
Subtract Ethernet headers which are taken into account by hardware for
MTU checking, as described in the previous patch.

Set minimum MTU to ETH_MIN_MTU, as zero MTU is not really supported.

With this change:
a. The stack will do the MTU checking, so we can remove it from the driver.
b. User space will be able to query the actual MTU limits.
   Before this patch:
   $ ip -j -d link show dev swp1 | jq | grep mtu
    "mtu": 1500,
    "min_mtu": 0,
    "max_mtu": 65535,

   With this patch:
   $ ip -j -d link show dev swp1 | jq | grep mtu
    "mtu": 1500,
    "min_mtu": 68,
    "max_mtu": 10218,

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/be8232e38c196ecb607f82c5e000ea427ce22abb.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agomlxsw: Adjust MTU value to hardware check
Amit Cohen [Thu, 13 Jun 2024 14:07:55 +0000 (16:07 +0200)]
mlxsw: Adjust MTU value to hardware check

Ethernet frame consists of - Ethernet header, payload, FCS. The MTU value
which is used by user is the size of the payload, which means that when
user sets MTU to X, the total frame size will be larger due to the addition
of the Ethernet header and FCS.

Spectrum ASICs take into account Ethernet header and FCS as part of packet
size for MTU check. Adjust MTU value when user sets MTU, to configure the
MTU size which is required by hardware. The Tx header length which was used
by the driver is not relevant for such calculation, take into account
Ethernet header (with VLAN extension) and FCS.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/f3203c2477bb8ed18b1e79642fa3e3713e1e55bb.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agomlxsw: port: Edit maximum MTU value
Amit Cohen [Thu, 13 Jun 2024 14:07:54 +0000 (16:07 +0200)]
mlxsw: port: Edit maximum MTU value

Currently mlxsw driver supports up to 10000 bytes for maximum MTU, this
value is not accurate, we can support up to 10K bytes. Change the value to
the maximum supported MTU by firmware.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/666f51681234aeef09d771833ccb6e94bd323c88.1718275854.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoatm: clean up a put_user() calls
Dan Carpenter [Thu, 13 Jun 2024 18:21:42 +0000 (21:21 +0300)]
atm: clean up a put_user() calls

Unlike copy_from_user(), put_user() and get_user() return -EFAULT on
error.  Use the error code directly instead of setting it.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/04a018e8-7433-4f67-8ddd-9357a0114f87@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'net-stmmac-provide-platform-select_pcs-method'
Jakub Kicinski [Sat, 15 Jun 2024 02:06:42 +0000 (19:06 -0700)]
Merge branch 'net-stmmac-provide-platform-select_pcs-method'

Russell King says:

====================
net: stmmac: provide platform select_pcs method

This series adds a select_pcs() method to the stmmac platform data to
allow platforms that need to provide their own PCSes to do so, moving
the decision making into platform code.

This avoids questions such as "what should the priority of XPCS vs
some other platform PCS be?" and when we provide a PCS for the
internal PCS, how that interacts with both the XPCS and platform
provided PCS.

Note that if a platform implements the select_pcs() method, then the
return values are:
- a phylink_pcs pointer - the PCS to be used.
- NULL - no phylink_pcs to be used.
Otherwise (if not implemented or returns an error-pointer), then
allow the the stmmac internal PCS to be used if appropriate (once
that patch set is merged.)

Patch 1 introduces the new method.
Patch 2 converts Intel mGBE to use this to provide the XPCS and
 removes the XPCS decision making from core code.
Patch 3 provides an implementation for rzn1 to return its PCS.
Patch 4 does the same for socfpga.
Patch 5 removes the core code returning priv->hw->phylink_pcs.

No functional change is anticipated. Once this has been merged, it
will be expected that platforms should populate all three PCS
methods or none of the PCS methods.
====================

Link: https://lore.kernel.org/r/ZmrLbdwv6ALoy+gs@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: stmmac: clean up stmmac_mac_select_pcs()
Russell King (Oracle) [Thu, 13 Jun 2024 10:36:27 +0000 (11:36 +0100)]
net: stmmac: clean up stmmac_mac_select_pcs()

Since all platform providers of PCS now populate the select_pcs()
method, there is no need for the common code to look at
priv->hw->phylink_pcs, so remove it.

Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhoh-00FetT-3S@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: stmmac: dwmac-socfpga: provide select_pcs() implementation
Russell King (Oracle) [Thu, 13 Jun 2024 10:36:21 +0000 (11:36 +0100)]
net: stmmac: dwmac-socfpga: provide select_pcs() implementation

Provide a .select_pcs() implementation which returns the phylink PCS
that was created in the .pcs_init() method.

Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhob-00FetN-Vp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: stmmac: dwmac-rzn1: provide select_pcs() implementation
Russell King (Oracle) [Thu, 13 Jun 2024 10:36:16 +0000 (11:36 +0100)]
net: stmmac: dwmac-rzn1: provide select_pcs() implementation

Provide a .select_pcs() implementation which returns the phylink PCS
that was created in the .pcs_init() method.

Tested-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhoW-00FetH-GD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: stmmac: dwmac-intel: provide a select_pcs() implementation
Russell King (Oracle) [Thu, 13 Jun 2024 10:36:11 +0000 (11:36 +0100)]
net: stmmac: dwmac-intel: provide a select_pcs() implementation

Move the code returning the XPCS into dwmac-intel, which is the only
user of XPCS. Fill in the select_pcs() implementation only when we are
going to setup the XPCS, thus when it should be present.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhoR-00FetB-CP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: stmmac: add select_pcs() platform method
Russell King (Oracle) [Thu, 13 Jun 2024 10:36:06 +0000 (11:36 +0100)]
net: stmmac: add select_pcs() platform method

Allow platform drivers to provide their logic to select an appropriate
PCS.

Tested-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhoM-00Fesu-8E@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'mlx5-misc-patches-2023-06-13'
Jakub Kicinski [Sat, 15 Jun 2024 01:53:26 +0000 (18:53 -0700)]
Merge branch 'mlx5-misc-patches-2023-06-13'

Tariq Toukan says:

====================
mlx5 misc patches 2023-06-13

This patchset contains small code cleanups and enhancements from the
team to the mlx5 core and Eth drivers.

Series generated against:
commit 3ec8d7572a69 ("CDC-NCM: add support for Apple's private interface")
====================

Link: https://lore.kernel.org/r/20240613210036.1125203-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5e: Support SWP-mode offload L4 csum calculation
Rahul Rameshbabu [Thu, 13 Jun 2024 21:00:36 +0000 (00:00 +0300)]
net/mlx5e: Support SWP-mode offload L4 csum calculation

Calculate the pseudo-header checksum for both IPSec transport mode and
IPSec tunnel mode for mlx5 devices that do not implement a pure hardware
checksum offload for L4 checksum calculation. Introduce a capability bit
that identifies such mlx5 devices.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5e: Use tcp_v[46]_check checksum helpers
Gal Pressman [Thu, 13 Jun 2024 21:00:35 +0000 (00:00 +0300)]
net/mlx5e: Use tcp_v[46]_check checksum helpers

Use the tcp specific helpers to calculate the tcp pseudo header checksum
instead of the csum_*_magic ones.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5e: Fix outdated comment in features check
Gal Pressman [Thu, 13 Jun 2024 21:00:34 +0000 (00:00 +0300)]
net/mlx5e: Fix outdated comment in features check

The code no longer treats only UDP tunnels, adjust the outdated comment.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5: Replace strcpy with strscpy
Moshe Shemesh [Thu, 13 Jun 2024 21:00:33 +0000 (00:00 +0300)]
net/mlx5: Replace strcpy with strscpy

Replace deprecated strcpy with strscpy.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5: CT: Separate CT and CT-NAT tuple entries
Chris Mi [Thu, 13 Jun 2024 21:00:32 +0000 (00:00 +0300)]
net/mlx5: CT: Separate CT and CT-NAT tuple entries

Currently a ct entry is stored in both ct and ct-nat tables. ct
action is directed to the ct table, while ct nat action is directed
to the nat table. ct-nat entries perform the nat header rewrites,
if required. The current design assures that a ct action will match
in hardware even if the tuple has nat configured, it will just not
execute it. However, storing each connection in two tables increases
the system's memory consumption while reducing its insertion rate.

Offload a connection to either ct or the ct-nat table. Add a miss
fall-through rule from ct-nat table to the ct table allowing ct(nat)
action on non-natted connections.

ct action on natted connections, by default, will be handled by the
software miss path.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet/mlx5: Correct TASR typo into TSAR
Cosmin Ratiu [Thu, 13 Jun 2024 21:00:31 +0000 (00:00 +0300)]
net/mlx5: Correct TASR typo into TSAR

TSAR is the correct spelling (Transmit Scheduling ARbiter).

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: phy: realtek: add support for rtl8224 2.5Gbps PHY
Chris Packham [Tue, 11 Jun 2024 05:34:14 +0000 (17:34 +1200)]
net: phy: realtek: add support for rtl8224 2.5Gbps PHY

The Realtek RTL8224 PHY is a 2.5Gbps capable PHY. It only uses the
clause 45 MDIO interface and can leverage the support that has already
been added for the other 822x PHYs.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Link: https://lore.kernel.org/r/20240611053415.2111723-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoeth: lan966x: don't clear unsupported stats
Jakub Kicinski [Thu, 13 Jun 2024 00:32:22 +0000 (17:32 -0700)]
eth: lan966x: don't clear unsupported stats

Commit 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics")
added support for various standard stats. We should not clear the stats
which are not collected by the device. Core code uses a special
initializer to detect when device does not report given stat.

Acked-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240613003222.3327368-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: qrtr: ns: Ignore ENODEV failures in ns
Chris Lew [Wed, 12 Jun 2024 06:31:56 +0000 (12:01 +0530)]
net: qrtr: ns: Ignore ENODEV failures in ns

Ignore the ENODEV failures returned by kernel_sendmsg(). These errors
indicate that either the local port has been closed or the remote has
gone down. Neither of these scenarios are fatal and will eventually be
handled through packets that are later queued on the control port.

Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Sarannya Sasikumar <quic_sarannya@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240612063156.1377210-1-quic_sarannya@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agoMerge branch 'series-to-deliver-ethernet-for-stm32mp13'
Paolo Abeni [Fri, 14 Jun 2024 08:50:58 +0000 (10:50 +0200)]
Merge branch 'series-to-deliver-ethernet-for-stm32mp13'

Christophe Roullier says:

====================
Series to deliver Ethernet for STM32MP13

STM32MP13 is STM32 SOC with 2 GMACs instances
    GMAC IP version is SNPS 4.20.
    GMAC IP configure with 1 RX and 1 TX queue.
    DMA HW capability register supported
    RX Checksum Offload Engine supported
    TX Checksum insertion supported
    Wake-Up On Lan supported
    TSO supported
Rework dwmac glue to simplify management for next stm32 (integrate RFC from Marek)

V2: - Remark from Rob Herring (add Krzysztof's ack in patch 02/11, update in yaml)
      Remark from Serge Semin (upate commits msg)
V3: - Remove PHY regulator patch and Ethernet2 DT because need to clarify how to
      manage PHY regulator (in glue or PHY side)
    - Integrate RFC from Marek
    - Remark from Rob Herring in YAML documentation
V4: - Remark from Marek (remove max-speed, extra space in DT, update commit msg)
    - Remark from Rasmus (add sign-off, add base-commit)
    - Remark from Sai Krishna Gajula
V5: - Fix warning during build CHECK_DTBS
    - Remark from Marek (glue + DT update)
    - Remark from Krzysztof about YAML (Make it symmetric)
V6: - Replace pr_debug by dev_dbg
    - Split serie driver/DTs separately
V7: - Remark from Marek (update sysconfig register mask)
====================

Link: https://lore.kernel.org/r/20240611083606.733453-1-christophe.roullier@foss.st.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: add management of stm32mp13 for stm32
Christophe Roullier [Tue, 11 Jun 2024 08:36:06 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: add management of stm32mp13 for stm32

Add Ethernet support for STM32MP13.
STM32MP13 is STM32 SOC with 2 GMACs instances.
GMAC IP version is SNPS 4.20.
GMAC IP configure with 1 RX and 1 TX queue.
DMA HW capability register supported
RX Checksum Offload Engine supported
TX Checksum insertion supported
Wake-Up On Lan supported
TSO supported

Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Mask support for PMCR configuration
Christophe Roullier [Tue, 11 Jun 2024 08:36:05 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Mask support for PMCR configuration

Add possibility to have second argument in syscon property to manage
mask. This mask will be used to address right BITFIELDS of PMCR register.

Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Fix Mhz to MHz
Marek Vasut [Tue, 11 Jun 2024 08:36:04 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Fix Mhz to MHz

Trivial, fix up the comments using 'Mhz' to 'MHz'.
No functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Clean up the debug prints
Marek Vasut [Tue, 11 Jun 2024 08:36:03 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Clean up the debug prints

Use dev_err()/dev_dbg() and phy_modes() to print PHY mode instead of
pr_debug() and hand-written PHY mode decoding. This way, each debug
print has associated device with it and duplicated mode decoding is
removed.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Extract PMCR configuration
Marek Vasut [Tue, 11 Jun 2024 08:36:02 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Extract PMCR configuration

Pull the PMCR clock mux configuration into a separate function. This is
the final change of three, which moves external clock rate validation,
external clock selector decoding, and clock mux configuration into
separate functions. This should make the code easier to understand.
No functional change intended.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Separate out external clock selector
Marek Vasut [Tue, 11 Jun 2024 08:36:01 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Separate out external clock selector

Pull the external clock selector into a separate function, to avoid
conflating it with external clock rate validation and clock mux
register configuration. This should make the code easier to read and
understand.

The dwmac->enable_eth_ck variable in the end indicates whether the MAC
clock are supplied by external oscillator (true) or internal RCC clock
IP (false). The dwmac->enable_eth_ck value is set based on multiple DT
properties, some of them deprecated, some of them specific to bus mode.

The following DT properties and variables are taken into account. In
each case, if the property is present or true, MAC clock is supplied
by external oscillator.
- "st,ext-phyclk", assigned to variable dwmac->ext_phyclk
  - Used in any mode (MII/RMII/GMII/RGMII)
  - The only non-deprecated DT property of the three
- "st,eth-clk-sel", assigned to variable dwmac->eth_clk_sel_reg
  - Valid only in GMII/RGMII mode
  - Deprecated property, backward compatibility only
- "st,eth-ref-clk-sel", assigned to variable dwmac->eth_ref_clk_sel_reg
  - Valid only in RMII mode
  - Deprecated property, backward compatibility only

The stm32mp1_select_ethck_external() function handles the aforementioned
DT properties and sets dwmac->enable_eth_ck accordingly.

The stm32mp1_set_mode() is adjusted to call stm32mp1_select_ethck_external()
first and then only use dwmac->enable_eth_ck to determine hardware clock mux
settings.

No functional change intended.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: stmmac: dwmac-stm32: Separate out external clock rate validation
Marek Vasut [Tue, 11 Jun 2024 08:36:00 +0000 (10:36 +0200)]
net: stmmac: dwmac-stm32: Separate out external clock rate validation

Pull the external clock frequency validation into a separate function,
to avoid conflating it with external clock DT property decoding and
clock mux register configuration. This should make the code easier to
read and understand.

This does change the code behavior slightly. The clock mux PMCR register
setting now depends solely on the DT properties which configure the clock
mux between external clock and internal RCC generated clock. The mux PMCR
register settings no longer depend on the supplied clock frequency, that
supplied clock frequency is now only validated, and if the clock frequency
is invalid for a mode, it is rejected.

Previously, the code would switch the PMCR register clock mux to internal
RCC generated clock if external clock couldn't provide suitable frequency,
without checking whether the RCC generated clock frequency is correct. Such
behavior is risky at best, user should have configured their clock correctly
in the first place, so this behavior is removed here.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agodt-bindings: net: add STM32MP13 compatible in documentation for stm32
Christophe Roullier [Tue, 11 Jun 2024 08:35:59 +0000 (10:35 +0200)]
dt-bindings: net: add STM32MP13 compatible in documentation for stm32

New STM32 SOC have 2 GMACs instances.
GMAC IP version is SNPS 4.20.

Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 months agonet: hsr: Send supervisory frames to HSR network with ProxyNodeTable data
Lukasz Majewski [Mon, 10 Jun 2024 13:39:14 +0000 (15:39 +0200)]
net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data

This patch provides support for sending supervision HSR frames with
MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is
enabled.

Supervision frames with RedBox MAC address (appended as second TLV)
are only send for ProxyNodeTable nodes.

This patch series shall be tested with hsr_redbox.sh script.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 months agoMerge branch 'net-dsa-lantiq_gswip-code-improvements'
Jakub Kicinski [Fri, 14 Jun 2024 00:07:13 +0000 (17:07 -0700)]
Merge branch 'net-dsa-lantiq_gswip-code-improvements'

Martin Schiller says:

====================
net: dsa: lantiq_gswip: code improvements

This patchset for the lantiq_gswip driver is a collection of minor fixes
and coding improvements by Martin Blumenstingl without any real changes
in the actual functionality.
====================

Link: https://lore.kernel.org/r/20240611135434.3180973-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Improve error message in gswip_port_fdb()
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:34 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Improve error message in gswip_port_fdb()

Print that no FID is found for bridge %s instead of the incorrect
message that the port is not part of a bridge.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-13-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Update comments in gswip_port_vlan_filtering()
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:33 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Update comments in gswip_port_vlan_filtering()

Update the comments in gswip_port_vlan_filtering() so it's clear that
there are two separate cases, one for "tag based VLAN" and another one
for "port based VLAN".

Suggested-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-12-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Remove dead code from gswip_add_single_port_br()
Martin Schiller [Tue, 11 Jun 2024 13:54:32 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Remove dead code from gswip_add_single_port_br()

The port validation in gswip_add_single_port_br() is superfluous and
can be omitted.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-11-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Consistently use macros for the mac bridge table
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:31 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Consistently use macros for the mac bridge table

Only bits [5:0] in mac_bridge.key[3] are reserved for the FID.
Also, for dynamic (learned) entries, bits [7:4] in mac_bridge.val[0]
represents the port.

Introduce new macros GSWIP_TABLE_MAC_BRIDGE_KEY3_FID and
GSWIP_TABLE_MAC_BRIDGE_VAL0_PORT macro and use it throughout the driver.
Also rename and update GSWIP_TABLE_MAC_BRIDGE_VAL1_STATIC to use the
BIT() macro. This makes the driver code easier to understand.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-10-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Change literal 6 to ETH_ALEN
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:30 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Change literal 6 to ETH_ALEN

The addr variable in gswip_port_fdb_dump() stores a mac address. Use
ETH_ALEN to make this consistent across other drivers.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-9-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Use dsa_is_cpu_port() in gswip_port_change_mtu()
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:29 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Use dsa_is_cpu_port() in gswip_port_change_mtu()

Make the check for the CPU port in gswip_port_change_mtu() consistent
with other areas of the driver by using dsa_is_cpu_port().

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-8-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: do also enable or disable cpu port
Martin Schiller [Tue, 11 Jun 2024 13:54:28 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: do also enable or disable cpu port

Before commit 74be4babe72f ("net: dsa: do not enable or disable non user
ports"), gswip_port_enable/disable() were also executed for the cpu port
in gswip_setup() which disabled the cpu port during initialization.

Let's restore this by removing the dsa_is_user_port checks. Also, let's
clean up the gswip_port_enable() function so that we only have to check
for the cpu port once. The operation reordering done here is safe.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-7-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Don't manually call gswip_port_enable()
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:27 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Don't manually call gswip_port_enable()

We don't need to manually call gswip_port_enable() from within
gswip_setup() for the CPU port. DSA does this automatically for us.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-6-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Use dev_err_probe where appropriate
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:26 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Use dev_err_probe where appropriate

dev_err_probe() can be used to simplify the existing code. Also it means
we get rid of the following warning which is seen whenever the PMAC
(Ethernet controller which connects to GSWIP's CPU port) has not been
probed yet:
  gswip 1e108000.switch: dsa switch register failed: -517

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-5-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: add terminating \n where missing
Martin Schiller [Tue, 11 Jun 2024 13:54:25 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: add terminating \n where missing

Some dev_err are missing the terminating \n. Let's add that.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-4-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agonet: dsa: lantiq_gswip: Only allow phy-mode = "internal" on the CPU port
Martin Blumenstingl [Tue, 11 Jun 2024 13:54:24 +0000 (15:54 +0200)]
net: dsa: lantiq_gswip: Only allow phy-mode = "internal" on the CPU port

Add the CPU port to gswip_xrx200_phylink_get_caps() and
gswip_xrx300_phylink_get_caps(). It connects through a SoC-internal bus,
so the only allowed phy-mode is PHY_INTERFACE_MODE_INTERNAL.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240611135434.3180973-3-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agodt-bindings: net: dsa: lantiq,gswip: convert to YAML schema
Martin Schiller [Tue, 11 Jun 2024 13:54:23 +0000 (15:54 +0200)]
dt-bindings: net: dsa: lantiq,gswip: convert to YAML schema

Convert the lantiq,gswip bindings to YAML format.

Also add this new file to the MAINTAINERS file.

Furthermore, the CPU port has to specify a phy-mode and either a phy or
a fixed-link. Since GSWIP is connected using a SoC internal protocol
there's no PHY involved. Add phy-mode = "internal" and a fixed-link to
the example code to describe the communication between the PMAC
(Ethernet controller) and GSWIP switch.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-2-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge branch 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma...
Jakub Kicinski [Thu, 13 Jun 2024 23:45:09 +0000 (16:45 -0700)]
Merge branch 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Leon Romanovsky says:

====================
net: mana: Allow variable size indirection table

Like we talked, I created new shared branch for this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared

* 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  net: mana: Allow variable size indirection table
====================

Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 13 Jun 2024 20:13:38 +0000 (13:13 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts, no adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoMerge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 13 Jun 2024 18:11:53 +0000 (11:11 -0700)]
Merge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Slim pickings this time, probably a combination of summer, DevConf.cz,
  and the end of first half of the year at corporations.

  Current release - regressions:

   - Revert "igc: fix a log entry using uninitialized netdev", it traded
     lack of netdev name in a printk() for a crash

  Previous releases - regressions:

   - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ

   - geneve: fix incorrectly setting lengths of inner headers in the
     skb, confusing the drivers and causing mangled packets

   - sched: initialize noop_qdisc owner to avoid false-positive
     recursion detection (recursing on CPU 0), which bubbles up to user
     space as a sendmsg() error, while noop_qdisc should silently drop

   - netdevsim: fix backwards compatibility in nsim_get_iflink()

  Previous releases - always broken:

   - netfilter: ipset: fix race between namespace cleanup and gc in the
     list:set type"

* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
  af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
  bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
  gve: Clear napi->skb before dev_kfree_skb_any()
  ionic: fix use after netif_napi_del()
  Revert "igc: fix a log entry using uninitialized netdev"
  net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
  net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
  net/ipv6: Fix the RT cache flush via sysctl using a previous delay
  net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
  gve: ignore nonrelevant GSO type bits when processing TSO headers
  net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
  tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
  mailmap: map Geliang's new email address
  mptcp: pm: update add_addr counters after connect
  mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
  mptcp: ensure snd_una is properly initialized on connect
  ...

10 months agoMerge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Thu, 13 Jun 2024 18:07:32 +0000 (11:07 -0700)]
Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Bugfixes:
   - NFSv4.2: Fix a memory leak in nfs4_set_security_label
   - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
   - NFS: Add appropriate memory barriers to the sillyrename code
   - Propagate readlink errors in nfs_symlink_filler
   - NFS: don't invalidate dentries on transient errors
   - NFS: fix unnecessary synchronous writes in random write workloads
   - NFSv4.1: enforce rootpath check when deciding whether or not to trunk

  Other:
   - Change email address for Trond Myklebust due to email server concerns"

* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add barriers when testing for NFS_FSDATA_BLOCKED
  SUNRPC: return proper error from gss_wrap_req_priv
  NFSv4.1 enforce rootpath check in fs_location query
  NFS: abort nfs_atomic_open_v23 if name is too long.
  nfs: don't invalidate dentries on transient errors
  nfs: Avoid flushing many pages with NFS_FILE_SYNC
  nfs: propagate readlink errors in nfs_symlink_filler
  MAINTAINERS: Change email address for Trond Myklebust
  NFSv4: Fix memory leak in nfs4_set_security_label

10 months agoMerge tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Thu, 13 Jun 2024 17:09:29 +0000 (10:09 -0700)]
Merge tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:
 "Fix validation of NUMA coverage.

  memblock_validate_numa_coverage() was checking for a unset node ID
  using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was
  specified by buggy firmware.

  Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in
  memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()"

* tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
  memblock: make memblock_set_node() also warn about use of MAX_NUMNODES

10 months agobnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
Aleksandr Mishin [Tue, 11 Jun 2024 08:25:46 +0000 (11:25 +0300)]
bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()

In case of token is released due to token->state == BNXT_HWRM_DEFERRED,
released token (set to NULL) is used in log messages. This issue is
expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But
this error code is returned by recent firmware. So some firmware may not
return it. This may lead to NULL pointer dereference.
Adjust this issue by adding token pointer check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 8fa4219dba8e ("bnxt_en: add dynamic debug support for HWRM messages")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agoaf_unix: Read with MSG_PEEK loops if the first unread byte is OOB
Rao Shoaib [Tue, 11 Jun 2024 08:46:39 +0000 (01:46 -0700)]
af_unix: Read with MSG_PEEK loops if the first unread byte is OOB

Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
commit 22dd70eb2c3d ("af_unix: Don't peek OOB data without MSG_OOB.")
addresses the loop issue but does not address the issue that no data
beyond OOB byte can be read.

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>>

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agobnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
Michael Chan [Wed, 12 Jun 2024 23:17:36 +0000 (16:17 -0700)]
bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response

Firmware interface 1.10.2.118 has increased the size of
HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
forwarded.  When the VF's link state is not the default auto state,
the PF will need to forward the response back to the VF to indicate
the forced state.  This regression may cause the VF to fail to
initialize.

Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
96 bytes.  The SPEEDS2_SUPPORTED flag needs to be cleared because the
new speeds2 fields are beyond the legacy structure.  Also modify
bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
bytes to make this failure more obvious.

Fixes: 84a911db8305 ("bnxt_en: Update firmware interface to 1.10.2.118")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 months agogve: Clear napi->skb before dev_kfree_skb_any()
Ziwei Xiao [Wed, 12 Jun 2024 00:16:54 +0000 (00:16 +0000)]
gve: Clear napi->skb before dev_kfree_skb_any()

gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it
is freed with dev_kfree_skb_any(). This can result in a subsequent call
to napi_get_frags returning a dangling pointer.

Fix this by clearing napi->skb before the skb is freed.

Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path")
Cc: stable@vger.kernel.org
Reported-by: Shailend Chand <shailend@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>