]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
16 months agonet: count drops due to missing qdisc as dev->tx_drops
Jakub Kicinski [Wed, 29 May 2024 16:25:27 +0000 (09:25 -0700)]
net: count drops due to missing qdisc as dev->tx_drops

Catching and debugging missing qdiscs is pretty tricky. When qdisc
is deleted we replace it with a noop qdisc, which silently drops
all the packets. Since the noop qdisc has a single static instance
we can't count drops at the qdisc level. Count them as dev->tx_drops.

  ip netns add red
  ip link add type veth peer netns red
  ip            link set dev veth0 up
  ip -netns red link set dev veth0 up
  ip            a a dev veth0 10.0.0.1/24
  ip -netns red a a dev veth0 10.0.0.2/24
  ping -c 2 10.0.0.2
  #  2 packets transmitted, 2 received, 0% packet loss, time 1031ms
  ip -s link show dev veth0
  #  TX:  bytes packets errors dropped carrier collsns
  #        1314      17      0       0       0       0

  tc qdisc replace dev veth0 root handle 1234: mq
  tc qdisc replace dev veth0 parent 1234:1 pfifo
  tc qdisc del dev veth0 parent 1234:1
  ping -c 2 10.0.0.2
  #  2 packets transmitted, 0 received, 100% packet loss, time 1034ms
  ip -s link show dev veth0
  #  TX:  bytes packets errors dropped carrier collsns
  #        1314      17      0       3       0       0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240529162527.3688979-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agor8152: Wake up the system if the we need a reset
Douglas Anderson [Thu, 30 May 2024 23:43:09 +0000 (16:43 -0700)]
r8152: Wake up the system if the we need a reset

If we get to the end of the r8152's suspend() routine and we find that
the USB device is INACCESSIBLE then it means that some of our
preparation for suspend didn't take place. We need a USB reset to get
ourselves back in a consistent state so we can try again and that
can't happen during system suspend. Call pm_wakeup_event() to wake the
system up in this case.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f25.170a0220.8b5ad.1752@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agor8152: If inaccessible at resume time, issue a reset
Douglas Anderson [Thu, 30 May 2024 23:43:08 +0000 (16:43 -0700)]
r8152: If inaccessible at resume time, issue a reset

If we happened to get a USB transfer error during the transition to
suspend then the usb_queue_reset_device() that r8152_control_msg()
calls will get dropped on the floor. This is because
usb_lock_device_for_reset() (which usb_queue_reset_device() uses)
silently fails if it's called when a device is suspended or if too
much time passes.

Let's resolve this by resetting the device ourselves in r8152's
resume() function.

NOTE: due to timing, it's _possible_ that we could end up with two USB
resets: the one queued previously and the one called from the resume()
patch. This didn't happen in test cases I ran, though it's conceivably
possible. We can't easily know if this happened since
usb_queue_reset_device() can just silently drop the reset request. In
any case, it's not expected that this is a problem since the two
resets can't run at the same time (because of the device lock) and it
should be OK to reset the device twice. If somehow the double-reset
causes problems we could prevent resets from being queued up while
suspend is running.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f22.170a0220.8b5ad.1750@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'Felix-DSA-probing-cleanup'
David S. Miller [Mon, 3 Jun 2024 12:06:16 +0000 (13:06 +0100)]
Merge branch 'Felix-DSA-probing-cleanup'

Vladimir Oltean says:

====================
Probing cleanup for the Felix DSA driver

This is a follow-up to Russell King's request for code consolidation
among felix_vsc9959, seville_vsc9953 and ocelot_ext, stated here:
https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Details are in individual patches. Testing was done on NXP LS1028A
(felix_vsc9959).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: unexport felix_phylink_mac_ops and felix_switch_ops
Vladimir Oltean [Thu, 30 May 2024 16:33:33 +0000 (19:33 +0300)]
net: dsa: ocelot: unexport felix_phylink_mac_ops and felix_switch_ops

Now that the common felix_register_switch() from the umbrella driver
is the only entity that accesses these data structures, we can remove
them from the list of the exported symbols.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: common probing code
Vladimir Oltean [Thu, 30 May 2024 16:33:32 +0000 (19:33 +0300)]
net: dsa: ocelot: common probing code

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init code, which could be
made common [1].

[1]: https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Here, we take the following common steps:
- "felix" and "ds" structure allocation
- "felix", "ocelot" and "ds" basic structure initialization
- dsa_register_switch() call

and we make a common function out of them.

For every driver except felix_vsc9959, this is also the entire probing
procedure. For felix_vsc9959, we also need to do some PCI-specific
stuff, which can easily be reordered to be done before, and unwound on
failure.

We also have to convert the bus-specific platform_set_drvdata() and
pci_set_drvdata() calls into dev_set_drvdata(). But this should have no
impact on the behavior.

Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: use ds->num_tx_queues = OCELOT_NUM_TC for all models
Vladimir Oltean [Thu, 30 May 2024 16:33:31 +0000 (19:33 +0300)]
net: dsa: ocelot: use ds->num_tx_queues = OCELOT_NUM_TC for all models

Russell King points out that seville_vsc9953 populates
felix->info->num_tx_queues = 8, but this doesn't make it all the way
into ds->num_tx_queues (which is how the user interface netdev queues
get allocated) [1].

[1]: https://lore.kernel.org/all/20240415160150.yejcazpjqvn7vhxu@skbuf/

When num_tx_queues=0 for seville, this is implicitly converted to 1 by
dsa_user_create(), and this is good enough for basic operation for a
switch port. The tc qdisc offload layer works with netdev TX queues,
so for QoS offload we need to pretend we have multiple TX queues. The
VSC9953, like ocelot_ext, doesn't export QoS offload, so it doesn't
really matter. But we can definitely set num_tx_queues=8 for all
switches.

The felix->info->num_tx_queues construct itself seems unnecessary.
It was introduced by commit de143c0e274b ("net: dsa: felix: Configure
Time-Aware Scheduler via taprio offload") at a time when vsc9959
(LS1028A) was the only switch supported by the driver.

8 traffic classes, and 1 queue per traffic class, is a common
architectural feature of all switches in the family. So they could
all just set OCELOT_NUM_TC and be fine.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: move devm_request_threaded_irq() to felix_setup()
Vladimir Oltean [Thu, 30 May 2024 16:33:30 +0000 (19:33 +0300)]
net: dsa: ocelot: move devm_request_threaded_irq() to felix_setup()

The current placement of devm_request_threaded_irq() is inconvenient.
It is between the allocation of the "felix" structure and
dsa_register_switch(), both of which we'd like to refactor into a
function that's common for all switches. But the IRQ is specific to
felix_vsc9959.

A closer inspection of the felix_irq_handler() code suggests that
it does things that depend on the data structures having been fully
initialized. For example, ocelot_get_txtstamp() takes
&port->tx_skbs.lock, which has only been initialized in
ocelot_init_port() which has not run yet.

It is not one of those IRQF_SHARED IRQs, so CONFIG_DEBUG_SHIRQ_FIXME
shouldn't apply here, and thus, it doesn't really matter, because in
practice, the IRQ will not be triggered so early. Nonetheless, it is a
good practice for the driver to be prepared for it to fire as soon as it
is requested.

Create a new felix->info method for running custom code for vsc9959 from
within felix_setup(), and move the request_irq() call there. The
ocelot_ext should have an IRQ as well, so this should be a step in the
right direction for that model (VSC7512) as well.

Some minor changes are made while moving the code. Casts from void *
aren't necessary, so drop them, and rename felix_irq_handler() to the
more specific vsc9959_irq_handler().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: consistently use devres in felix_pci_probe()
Vladimir Oltean [Thu, 30 May 2024 16:33:29 +0000 (19:33 +0300)]
net: dsa: ocelot: consistently use devres in felix_pci_probe()

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the felix_vsc9959 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.

Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: delete open coded status = "disabled" parsing
Vladimir Oltean [Thu, 30 May 2024 16:33:28 +0000 (19:33 +0300)]
net: dsa: ocelot: delete open coded status = "disabled" parsing

Since commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
status"), PCI device drivers with OF bindings no longer need this check.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: use devres in seville_probe()
Vladimir Oltean [Thu, 30 May 2024 16:33:27 +0000 (19:33 +0300)]
net: dsa: ocelot: use devres in seville_probe()

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the seville_vsc9953 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.

Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet: dsa: ocelot: use devres in ocelot_ext_probe()
Vladimir Oltean [Thu, 30 May 2024 16:33:26 +0000 (19:33 +0300)]
net: dsa: ocelot: use devres in ocelot_ext_probe()

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the ocelot_ext driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.

Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agoMerge branch 'net-smc-snd_buf-rcv_buf'
David S. Miller [Mon, 3 Jun 2024 11:12:42 +0000 (12:12 +0100)]
Merge branch 'net-smc-snd_buf-rcv_buf'

Guangguan Wang says:

====================
net/smc: Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet/smc: change SMCR_RMBE_SIZES from 5 to 15
Guangguan Wang [Mon, 3 Jun 2024 03:00:19 +0000 (11:00 +0800)]
net/smc: change SMCR_RMBE_SIZES from 5 to 15

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agonet/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN...
Guangguan Wang [Mon, 3 Jun 2024 03:00:18 +0000 (11:00 +0800)]
net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined

SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
will be allocated in one piece of scatterlist. When the entries of
scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
whether sg chain is supported. So, SMC-R's rmb buffer should be limited
by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
defined.

Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 months agoaf_unix: Remove dead code in unix_stream_read_generic().
Kuniyuki Iwashima [Wed, 29 May 2024 14:46:48 +0000 (07:46 -0700)]
af_unix: Remove dead code in unix_stream_read_generic().

When splice() support was added in commit 2b514574f7e8 ("net:
af_unix: implement splice for stream af_unix sockets"), we had
to release unix_sk(sk)->readlock (current iolock) before calling
splice_to_pipe().

Due to the unlock, commit 73ed5d25dce0 ("af-unix: fix use-after-free
with concurrent readers while splicing") added a safeguard in
unix_stream_read_generic(); we had to bump the skb refcount before
calling ->recv_actor() and then check if the skb was consumed by a
concurrent reader.

However, the pipe side locking was refactored, and since commit
25869262ef7a ("skb_splice_bits(): get rid of callback"), we can
call splice_to_pipe() without releasing unix_sk(sk)->iolock.

Now, the skb is always alive after the ->recv_actor() callback,
so let's remove the unnecessary drop_skb logic.

This is mostly the revert of commit 73ed5d25dce0 ("af-unix: fix
use-after-free with concurrent readers while splicing").

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240529144648.68591-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'lan78xx-enable-125-mhz-clk-and-auto-speed-configuration-for-lan7801...
Jakub Kicinski [Sat, 1 Jun 2024 23:24:36 +0000 (16:24 -0700)]
Merge branch 'lan78xx-enable-125-mhz-clk-and-auto-speed-configuration-for-lan7801-if-no-eeprom-is-detected'

Rengarajan says:

====================
lan78xx: Enable 125 MHz CLK and Auto Speed configuration for LAN7801 if NO EEPROM is detected

This patch series adds the support for 125 MHz clock, Auto speed and
auto duplex configuration for LAN7801 in the absence of EEPROM.
====================

Link: https://lore.kernel.org/r/20240529140256.1849764-1-rengarajan.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agolan78xx: Enable Auto Speed and Auto Duplex configuration for LAN7801 if NO EEPROM...
Rengarajan S [Wed, 29 May 2024 14:02:56 +0000 (19:32 +0530)]
lan78xx: Enable Auto Speed and Auto Duplex configuration for LAN7801 if NO EEPROM is detected

Enabled ASD/ADD configuration for LAN7801 in the absence of EEPROM.
After the lite reset these contents go back to defaults where ASD/
ADD is disabled. The check is already available for LAN7800.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://lore.kernel.org/r/20240529140256.1849764-3-rengarajan.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agolan78xx: Enable 125 MHz CLK configuration for LAN7801 if NO EEPROM is detected
Rengarajan S [Wed, 29 May 2024 14:02:55 +0000 (19:32 +0530)]
lan78xx: Enable 125 MHz CLK configuration for LAN7801 if NO EEPROM is detected

The 125MHz and 25MHz clock configurations are enabled in the initialization
regardless of EEPROM (125MHz is needed for RGMII 1000Mbps operation). After
a lite reset (lan78xx_reset), these contents go back to defaults(all 0, so
no 125MHz or 25MHz clock).

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://lore.kernel.org/r/20240529140256.1849764-2-rengarajan.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'net-ethernet-cortina-use-phylib-for-rx-and-tx-pause'
Jakub Kicinski [Sat, 1 Jun 2024 23:07:31 +0000 (16:07 -0700)]
Merge branch 'net-ethernet-cortina-use-phylib-for-rx-and-tx-pause'

Linus Walleij says:

====================
net: ethernet: cortina: Use phylib for RX and TX pause

This patch series switches the Cortina Gemini ethernet
driver to use phylib to set up RX and TX pause for the
PHY.

v3: https://lore.kernel.org/r/20240513-gemini-ethernet-fix-tso-v3-0-b442540cc140@linaro.org
v2: https://lore.kernel.org/r/20240511-gemini-ethernet-fix-tso-v2-0-2ed841574624@linaro.org
v1: https://lore.kernel.org/r/20240509-gemini-ethernet-fix-tso-v1-0-10cd07b54d1c@linaro.org
====================

Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-0-16487ca4c2fe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: ethernet: cortina: Implement .set_pauseparam()
Linus Walleij [Wed, 29 May 2024 14:00:02 +0000 (16:00 +0200)]
net: ethernet: cortina: Implement .set_pauseparam()

The Cortina Gemini ethernet can very well set up TX or RX
pausing, so add this functionality to the driver in a
.set_pauseparam() callback. Essentially just call down to
phylib and let phylib deal with this, .adjust_link()
will respect the setting from phylib.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-3-16487ca4c2fe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: ethernet: cortina: Use negotiated TX/RX pause
Linus Walleij [Wed, 29 May 2024 14:00:01 +0000 (16:00 +0200)]
net: ethernet: cortina: Use negotiated TX/RX pause

Instead of directly poking into registers of the PHY, use
the existing function to query phylib about this directly.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-2-16487ca4c2fe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: ethernet: cortina: Rename adjust link callback
Linus Walleij [Wed, 29 May 2024 14:00:00 +0000 (16:00 +0200)]
net: ethernet: cortina: Rename adjust link callback

The callback passed to of_phy_get_and_connect() in the
Cortina Gemini driver is called "gmac_speed_set" which is
archaic, rename it to "gmac_adjust_link" following the
pattern of most other drivers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-1-16487ca4c2fe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'net-visibility-of-memory-limits-in-netns'
Jakub Kicinski [Sat, 1 Jun 2024 23:03:23 +0000 (16:03 -0700)]
Merge branch 'net-visibility-of-memory-limits-in-netns'

Matteo Croce says:

====================
net: visibility of memory limits in netns

Some programs need to know the size of the network buffers to operate
correctly, export the following sysctls read-only in network namespaces:

- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
====================

Link: https://lore.kernel.org/r/20240530232722.45255-1-technoboy85@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoselftests: net: tests net.core.{r,w}mem_{default,max} sysctls in a netns
Matteo Croce [Thu, 30 May 2024 23:27:22 +0000 (01:27 +0200)]
selftests: net: tests net.core.{r,w}mem_{default,max} sysctls in a netns

Add a selftest which checks that the sysctl is present in a netns,
that the value is read from the init one, and that it's readonly.

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Link: https://lore.kernel.org/r/20240530232722.45255-3-technoboy85@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: make net.core.{r,w}mem_{default,max} namespaced
Matteo Croce [Thu, 30 May 2024 23:27:21 +0000 (01:27 +0200)]
net: make net.core.{r,w}mem_{default,max} namespaced

The following sysctl are global and can't be read from a netns:

net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max

Make the following sysctl parameters available readonly from within a
network namespace, allowing a container to read them.

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20240530232722.45255-2-technoboy85@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agobnxt_en: add timestamping statistics support
Vadim Fedorenko [Thu, 30 May 2024 20:47:51 +0000 (13:47 -0700)]
bnxt_en: add timestamping statistics support

The ethtool_ts_stats structure was introduced earlier this year. Now
it's time to support this group of counters in more drivers.
This patch adds support to bnxt driver.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240530204751.99636-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'ice-introduce-eth56g-phy-model-for-e825c-products'
Jakub Kicinski [Sat, 1 Jun 2024 22:51:53 +0000 (15:51 -0700)]
Merge branch 'ice-introduce-eth56g-phy-model-for-e825c-products'

Jacob Keller says:

====================
ice: Introduce ETH56G PHY model for E825C products

E825C products have a different PHY model than E822, E823 and E810 products.
This PHY is ETH56G and its support is necessary to have functional PTP stack
for E825C products.

This series refactors the ice driver to add support for the new PHY model.

Karol introduces the ice_ptp_hw structure. This is used to replace some
hard-coded values relating to the PHY quad and port numbers, as well as to
hold the phy_model type.

Jacob refactors the driver code that converts between the ice_ptp_tmr_cmd
enumeration and hardware register values to better re-use logic and reduce
duplication when introducing another PHY type.

Sergey introduces functions to help enable and disable the Tx timestamp
interrupts. This makes the ice_ptp.c code more generic and encapsulates the
PHY specifics into ice_ptp_hw.c

Karol introduces helper functions to clear the valid bits for Tx and Rx
timestamps. This enables informing hardware to discard stale timestamps
after performing clock operations.

Sergey moves the Clock Generation Unit (CGU) logic out of the E822 specific
area of the ice_ptp_hw.c file as it will be re-used for other device PHY
models.

Jacob introduces a helper function for obtaining the base increment values,
moving this logic out of ice_ptp.c and into the ice_ptp_hw.c file to better
encapsulate hardware differences.

Sergey builds on these refactors to introduce the new ETH56G PHY model used
by the E825C products. This includes introducing the required helpers,
constants, and PHY model checks.

Karol simplifies the CGU logic by using anonymous structures, dropping an
unnecessary ".field" name for accessing the CGU data.

Michal Michalik updates the CGU logic to support the E825C hardware,
ensuring that the clock generation is configured properly.

Grzegorz Nitka adds support to read the NAC topology data from the device.
This is in preparation for supporting devices which combine two NACs
together, connecting all ports to the same clock source. This enables the
driver to determine if its operating on such a device, or if its operating
on the standard 1-NAC configuration.

Grzsecgorz Nitka adjusts the PTP initialization to prepare for the 2x50G
E825C devices, introducing special mapping for the PHY ports to prepare for
support of the 2-NAC devices.

With this, the ice driver is capable of handling PTP for the single-NAC
E825C devices. Complete support for the 2-NAC devices requirs some work on
how the ports connect to the clock owner. During review of this work, it
was pointed out that our existing use of auxiliary bus is disliked, and
Jiri requested that we change it. We are currently working on developing a
replacement solution for the auxiliary bus implementation and have dropped
the relevant changes out of this series. A future series will refactor the
port to clock connection, at which time we will finish the support for
2-NAC E825C devices.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================

Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-0-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Adjust PTP init for 2x50G E825C devices
Grzegorz Nitka [Tue, 28 May 2024 23:04:01 +0000 (16:04 -0700)]
ice: Adjust PTP init for 2x50G E825C devices

>From FW/HW perspective, 2 port topology in E825C devices requires
merging of 2 port mapping internally and breakout mapping externally.
As a consequence, it requires different port numbering from PTP code
perspective.
For that topology, pf_id can not be used to index PTP ports. Even if
the 2nd port is identified as port with pf_id = 1, all PHY operations
need to be performed as it was port 2. Thus, special mapping is needed
for the 2nd port.
This change adds detection of 2x50G topology and applies 'custom'
mapping on the 2nd port.

Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-11-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Add NAC Topology device capability parser
Grzegorz Nitka [Tue, 28 May 2024 23:04:00 +0000 (16:04 -0700)]
ice: Add NAC Topology device capability parser

Add new device capability ICE_AQC_CAPS_NAC_TOPOLOGY which allows to
determine the mode of operation (1 or 2 NAC).
Define a new structure to store data from new capability and
corresponding parser code.

Co-developed-by: Prathisna Padmasanan <prathisna.padmasanan@intel.com>
Signed-off-by: Prathisna Padmasanan <prathisna.padmasanan@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Pawel Kaminski <pawel.kaminski@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-10-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Add support for E825-C TS PLL handling
Michal Michalik [Tue, 28 May 2024 23:03:59 +0000 (16:03 -0700)]
ice: Add support for E825-C TS PLL handling

The CGU layout of E825-C is a little different than E822/E823. Add
support the new hardware adding relevant functions.

Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-9-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Change CGU regs struct to anonymous
Karol Kolacinski [Tue, 28 May 2024 23:03:58 +0000 (16:03 -0700)]
ice: Change CGU regs struct to anonymous

Simplify the code by using anonymous struct in CGU registers instead of
naming each structure 'field'.

Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-8-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Introduce ETH56G PHY model for E825C products
Sergey Temerkhanov [Tue, 28 May 2024 23:03:57 +0000 (16:03 -0700)]
ice: Introduce ETH56G PHY model for E825C products

E825C products feature a new PHY model - ETH56G.

Introduces all necessary PHY definitions, functions etc. for ETH56G PHY,
analogous to E82X and E810 ones with addition of a few HW-specific
functionalities for ETH56G like one-step timestamping.

It ensures correct PTP initialization and operation for E825C products.

Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-7-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Introduce ice_get_base_incval() helper
Jacob Keller [Tue, 28 May 2024 23:03:56 +0000 (16:03 -0700)]
ice: Introduce ice_get_base_incval() helper

Add a new helper for getting base clock increment value for specific HW.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-6-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Move CGU block
Sergey Temerkhanov [Tue, 28 May 2024 23:03:55 +0000 (16:03 -0700)]
ice: Move CGU block

Move CGU block to the beginning of ice_ptp_hw.c

Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-5-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Add PHY OFFSET_READY register clearing
Karol Kolacinski [Tue, 28 May 2024 23:03:54 +0000 (16:03 -0700)]
ice: Add PHY OFFSET_READY register clearing

Add a possibility to mark all transmitted/received timestamps as invalid
by clearing PHY OFFSET_READY registers.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-4-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Implement Tx interrupt enablement functions
Sergey Temerkhanov [Tue, 28 May 2024 23:03:53 +0000 (16:03 -0700)]
ice: Implement Tx interrupt enablement functions

Introduce functions enabling/disabling Tx TS interrupts
for the E822 and ETH56G PHYs

Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-3-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Introduce helper to get tmr_cmd_reg values
Jacob Keller [Tue, 28 May 2024 23:03:52 +0000 (16:03 -0700)]
ice: Introduce helper to get tmr_cmd_reg values

Multiple places in the driver code need to convert enum ice_ptp_tmr_cmd
values into register bits for both the main timer and the PHY port
timers. The main MAC register has one bit scheme for timer commands,
while the PHY commands use a different scheme.

The E810 and E830 devices use the same scheme for port commands as used
for the main timer. However, E822 and ETH56G hardware has a separate
scheme used by the PHY.

Introduce helper functions to convert the timer command enumeration into
the register values, reducing some code duplication, and making it
easier to later refactor the individual port write commands.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-2-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: Introduce ice_ptp_hw struct
Karol Kolacinski [Tue, 28 May 2024 23:03:51 +0000 (16:03 -0700)]
ice: Introduce ice_ptp_hw struct

Create new ice_ptp_hw struct and use it for all HW and PTP-related
fields from struct ice_hw.
Replace definitions with struct fields, which values are set accordingly
to a specific device.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-1-c082739bb6f6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: validate SO_TXTIME clockid coming from userspace
Abhishek Chauhan [Wed, 29 May 2024 18:31:30 +0000 (11:31 -0700)]
net: validate SO_TXTIME clockid coming from userspace

Currently there are no strict checks while setting SO_TXTIME
from userspace. With the recent development in skb->tstamp_type
clockid with unsupported clocks results in warn_on_once, which causes
unnecessary aborts in some systems which enables panic on warns.

Add validation in setsockopt to support only CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_TAI to be set from userspace.

Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
Link: https://lore.kernel.org/lkml/6bdba7b6-fd22-4ea5-a356-12268674def1@quicinc.com/
Fixes: 1693c5db6ab8 ("net: Add additional bit to support clockid_t timestamp type")
Reported-by: syzbot+d7b227731ec589e7f4f0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7b227731ec589e7f4f0
Reported-by: syzbot+30a35a2e9c5067cc43fa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=30a35a2e9c5067cc43fa
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240529183130.1717083-1-quic_abchauha@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'doc-mptcp-new-general-doc-and-fixes'
Jakub Kicinski [Sat, 1 Jun 2024 22:18:03 +0000 (15:18 -0700)]
Merge branch 'doc-mptcp-new-general-doc-and-fixes'

Matthieu Baerts says:

====================
doc: mptcp: new general doc and fixes

A general documentation about MPTCP was missing since its introduction
in v5.6. The last patch adds a new 'mptcp' page in the 'networking'
documentation.

The first patch is a fix for a missing sysctl entry introduced in v6.10
rc0, and the second one reorder the sysctl entries.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

v2: https://lore.kernel.org/r/20240528-upstream-net-20240520-mptcp-doc-v2-0-47f2d5bc2ef3@kernel.org
v1: https://lore.kernel.org/r/20240520-upstream-net-20240520-mptcp-doc-v1-0-e3ad294382cb@kernel.org

Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-0-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agodoc: new 'mptcp' page in 'networking'
Matthieu Baerts (NGI0) [Thu, 30 May 2024 14:07:32 +0000 (16:07 +0200)]
doc: new 'mptcp' page in 'networking'

A general documentation about MPTCP was missing since its introduction
in v5.6.

Most of what is there comes from our recently updated mptcp.dev website,
with additional links to resources from the kernel documentation.

This is a first version, mainly targeting app developers and users.

Link: https://www.mptcp.dev
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-3-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agodoc: mptcp: alphabetical order
Matthieu Baerts (NGI0) [Thu, 30 May 2024 14:07:31 +0000 (16:07 +0200)]
doc: mptcp: alphabetical order

Similar to what is done in other 'sysctl' pages: it looks clearer from a
readability perspective.

This might cause troubles in the short/mid-term with the backports, but
by not putting new entries at the end, this can help to reduce conflicts
in case of backports in the long term. We don't change the information
here too often, so it looks OK to do that.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-2-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agodoc: mptcp: add missing 'available_schedulers' entry
Matthieu Baerts (NGI0) [Thu, 30 May 2024 14:07:30 +0000 (16:07 +0200)]
doc: mptcp: add missing 'available_schedulers' entry

This sysctl knob has been added recently, but the documentation has not
been updated.

This knob is used to show the available schedulers choices that are
registered, similar to 'net.ipv4.tcp_available_congestion_control'.

Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-1-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: smc91x: Fix pointer types
Thorsten Blum [Wed, 29 May 2024 14:39:02 +0000 (16:39 +0200)]
net: smc91x: Fix pointer types

Use void __iomem pointers as parameters for mcf_insw() and mcf_outsw()
to align with the parameter types of readw() and writew() to fix the
following warnings reported by kernel test robot:

drivers/net/ethernet/smsc/smc91x.c:590:9: sparse: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    expected void *a
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    got void [noderef] __iomem *
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    expected void *a
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    got void [noderef] __iomem *
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    expected void *a
drivers/net/ethernet/smsc/smc91x.c:590:9: sparse:    got void [noderef] __iomem *
drivers/net/ethernet/smsc/smc91x.c:483:17: sparse: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/smsc/smc91x.c:483:17: sparse:    expected void *a
drivers/net/ethernet/smsc/smc91x.c:483:17: sparse:    got void [noderef] __iomem *

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405160853.3qyaSj8w-lkp@intel.com/
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240529143859.108201-4-thorsten.blum@toblux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: qstat: extend kdoc about get_base_stats
Jakub Kicinski [Wed, 29 May 2024 16:29:22 +0000 (09:29 -0700)]
net: qstat: extend kdoc about get_base_stats

mlx5 has a dedicated queue for PTP packets. Clarify that
this sort of queues can also be accounted towards the base.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/20240529162922.3690698-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 31 May 2024 21:09:20 +0000 (14:09 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/ti/icssg/icssg_classifier.c
  abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
  56a5cf538c3f ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/

No other adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agotools: ynl: make the attr and msg helpers more C++ friendly
Jakub Kicinski [Wed, 29 May 2024 19:20:31 +0000 (12:20 -0700)]
tools: ynl: make the attr and msg helpers more C++ friendly

Folks working on a C++ codegen would like to reuse the attribute
helpers directly. Add the few necessary casts, it's not too ugly.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240529192031.3785761-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'net-phylink-rearrange-ovr_an_inband-support'
Jakub Kicinski [Fri, 31 May 2024 01:32:21 +0000 (18:32 -0700)]
Merge branch 'net-phylink-rearrange-ovr_an_inband-support'

Russell King says:

====================
net: phylink: rearrange ovr_an_inband support

This series addresses the use of the ovr_an_inband flag, which is used
by two drivers to indicate to phylink that they wish to use inband mode
without firmware specifying inband mode.

The issue with ovr_an_inband is that it overrides not only PHY mode,
but also fixed-link mode. Both of the drivers that set this flag
contain code to detect when fixed-link mode will be used, and then
either avoid setting it or explicitly clear the flag. This is
wasteful when phylink already knows this.

Therefore, the approach taken in this patch set is to replace the
ovr_an_inband flag with a default_an_inband flag which means that
phylink defaults to MLO_AN_INBAND instead of MLO_AN_PHY, and will
allow that default to be overriden if firmware specifies a fixed-link.
This allows users of ovr_an_inband to be simplified.

What's more is this requires minimal changes in phylink to allow this
new mode of operation.

This series changes phylink, and also updates the two drivers
(fman_memac and stmmac), and then removes the unnecessary complexity
from the drivers.

This series may depend on the stmmac cleanup series I've posted
earlier - this is something I have not checked, but I currently have
these patches on top of that series.
====================

Link: https://lore.kernel.org/r/ZlctinnTT8Xhemsm@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: dwmac-intel: remove checking for fixed link
Russell King (Oracle) [Wed, 29 May 2024 13:29:45 +0000 (14:29 +0100)]
net: stmmac: dwmac-intel: remove checking for fixed link

With the new default_an_inband functionality in phylink, there is no
need to check for a fixed link when this flag is set, since a fixed
link will now override default_an_inband.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJNB-00EcrJ-7L@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: rename xpcs_an_inband to default_an_inband
Russell King (Oracle) [Wed, 29 May 2024 13:29:40 +0000 (14:29 +0100)]
net: stmmac: rename xpcs_an_inband to default_an_inband

Rename xpcs_an_inband to default_an_inband to reflect the change in
phylink and its changed functionality.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJN6-00EcrD-43@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: fman_memac: remove the now unnecessary checking for fixed-link
Russell King (Oracle) [Wed, 29 May 2024 13:29:35 +0000 (14:29 +0100)]
net: fman_memac: remove the now unnecessary checking for fixed-link

Since default_an_inband can be overriden by a fixed-link specification,
there is no need for memac to be checking for this before setting
default_an_inband. Remove this code and update the comment.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJN1-00Ecr7-02@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: phylink: rename ovr_an_inband to default_an_inband
Russell King (Oracle) [Wed, 29 May 2024 13:29:29 +0000 (14:29 +0100)]
net: phylink: rename ovr_an_inband to default_an_inband

Since ovr_an_inband no longer overrides every MLO_AN_xxx mode, rename
it to reflect what it now does - it changes the default mode from
MLO_AN_PHY to MLO_AN_INBAND. Fix up the two users of this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJMv-00Ecr1-Sk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: phylink: move test for ovr_an_inband
Russell King (Oracle) [Wed, 29 May 2024 13:29:24 +0000 (14:29 +0100)]
net: phylink: move test for ovr_an_inband

Of the two users of phylink_config->ovr_an_inband, both manually check
for a fixed link before setting this flag (or clearing it if they find
a fixed link.) This is unnecessary complication.

Test ovr_an_inband before checking for the fixed-link properties, which
will allow ovr_an_inband to be overriden by a fixed link specification.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJMq-00Ecqv-P8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: phylink: rearrange phylink_parse_mode()
Russell King (Oracle) [Wed, 29 May 2024 13:29:19 +0000 (14:29 +0100)]
net: phylink: rearrange phylink_parse_mode()

Of the two users of phylink_config->ovr_an_inband, both manually check
for a fixed link before setting this flag (or clearing it if they find
a fixed link.) This is unnecessary complication.

Rearrange phylink_parse_mode() a little so we can change how
phylink_config->ovr_an_inband works. This will allow the flag to be
tested before checking for the fixed link properties in the next patch.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJMl-00Ecqp-K0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agodt-bindings: net: ti: icssg_prueth: Add documentation for PA_STATS support
MD Danish Anwar [Wed, 29 May 2024 11:52:25 +0000 (17:22 +0530)]
dt-bindings: net: ti: icssg_prueth: Add documentation for PA_STATS support

Add documentation for ti,pa-stats property which is syscon regmap for
PA_STATS registers. This will be used to dump statistics maintained by
ICSSG firmware.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240529115225.630535-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'net-stmmac-cleanups'
Jakub Kicinski [Fri, 31 May 2024 01:30:12 +0000 (18:30 -0700)]
Merge branch 'net-stmmac-cleanups'

Russell King says:

====================
net: stmmac: cleanups

This series removes various redundant items in the stmmac driver:

- the unused TBI and RTBI PCS flags
- the NULL pointer initialisations for PCS methods in dwxgmac2
- the stmmac_pcs_rane() method which is never called, and it's
  associated implementations
- the redundant netif_carrier_off()s

Finally, it replaces asm/io.h with the preferred linux/io.h.
====================

Link: https://lore.kernel.org/r/Zlbp7xdUZAXblOZJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: ethqos: clean up setting serdes speed
Russell King (Oracle) [Wed, 29 May 2024 08:40:59 +0000 (09:40 +0100)]
net: stmmac: ethqos: clean up setting serdes speed

There are four repititions of the same sequence of code, three of which
are identical. Pull these out into a separate function to improve
readability.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCErj-00EOQ9-Vh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: include linux/io.h rather than asm/io.h
Russell King (Oracle) [Wed, 29 May 2024 08:40:54 +0000 (09:40 +0100)]
net: stmmac: include linux/io.h rather than asm/io.h

Include linux/io.h instead of asm/io.h since linux/ includes are
preferred.

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCEre-00EOQ3-SR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: remove unnecessary netif_carrier_off()
Russell King (Oracle) [Wed, 29 May 2024 08:40:49 +0000 (09:40 +0100)]
net: stmmac: remove unnecessary netif_carrier_off()

It is incorrect to call netif_carrier_off(), or in fact any driver
teardown, before unregister_netdev() has been called.

unregister_netdev() unpublishes the network device from userspace, and
takes the interface down if it was up prior to returning. Therefore,
once the call has returned, we are guaranteed that .ndo_stop() will
have been called for an interface that was up. Phylink will take the
carrier down via phylink_stop(), making any manipulation of the carrier
in the remove path unnecessary.

In the stmmac_release() path, the netif_carrier_off() call follows the
call to phylink_stop(), so this call is redundant.

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCErZ-00EOPx-PF@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: remove pcs_rane() method
Russell King (Oracle) [Wed, 29 May 2024 08:40:44 +0000 (09:40 +0100)]
net: stmmac: remove pcs_rane() method

The pcs_rane() method is not called, so lets just remove this
redundant code.

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCErU-00EOPr-MC@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: dwxgmac2: remove useless NULL pointer initialisations
Russell King (Oracle) [Wed, 29 May 2024 08:40:39 +0000 (09:40 +0100)]
net: stmmac: dwxgmac2: remove useless NULL pointer initialisations

Remove useless NULL pointer initialisations for "PCS" methods from the
dwxgmac2 code.

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCErP-00EOPl-IT@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: stmmac: Drop TBI/RTBI PCS flags
Serge Semin [Wed, 29 May 2024 08:40:34 +0000 (09:40 +0100)]
net: stmmac: Drop TBI/RTBI PCS flags

First of all the flags are never set by any of the driver parts. If nobody
have them set then the respective statements will always have the same
result. Thus the statements can be simplified or even dropped with no risk
to break things.

Secondly shall any of the TBI or RTBI flag is set the MDIO-bus
registration will be bypassed. Why? It really seems weird. It's perfectly
fine to have a TBI/RTBI-capable PHY configured over the MDIO bus
interface.

Based on the notes above the TBI/RTBI PCS flags can be freely dropped thus
simplifying the driver code.

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/E1sCErK-00EOPf-EP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoipv6: sr: restruct ifdefines
Hangbin Liu [Wed, 29 May 2024 04:09:08 +0000 (12:09 +0800)]
ipv6: sr: restruct ifdefines

There are too many ifdef in IPv6 segment routing code that may cause logic
problems. like commit 160e9d275218 ("ipv6: sr: fix invalid unregister error
path"). To avoid this, the init functions are redefined for both cases. The
code could be more clear after all fidefs are removed.

Suggested-by: Simon Horman <horms@kernel.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240529040908.3472952-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: fjes: correct TRACE_INCLUDE_PATH
Jakub Kicinski [Wed, 29 May 2024 02:33:22 +0000 (19:33 -0700)]
net: fjes: correct TRACE_INCLUDE_PATH

A comment in define_trace.h clearly states:

 TRACE_INCLUDE_PATH if the path is something other than core kernel
                                                           vvvvvvvvvvvvvv
 include/trace then this macro can define the path to use. Note, the path
 is relative to define_trace.h, not the file including it. Full path names
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 for out of tree modules must be used.

fjes uses path relative to itself. Which (somehow) works most of
the time. Except when the kernel tree is "nested" in another
kernel tree, and ../drivers/net/fjes actually exists. In which
case build will use the header file from the wrong directory.

I've been trying to figure out why net NIPA builder is constantly
failing for the last 5 days, with:

include/trace/../../../drivers/net/fjes/fjes_trace.h:88:17: error: ‘__assign_str’ undeclared (first use in this function)
   88 |                 __assign_str(err, err);
      |                 ^~~~~~~~~~~~

when the line in the tree clearly has only one "err". NIPA does
indeed have "nested" trees, because it uses git work-trees and
the tree on the "outside" is not very up to date.

Link: https://lore.kernel.org/r/20240529023322.3467755-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'ionic-updates-for-v6-11'
Jakub Kicinski [Fri, 31 May 2024 01:10:38 +0000 (18:10 -0700)]
Merge branch 'ionic-updates-for-v6-11'

Shannon Nelson says:

====================
ionic: updates for v6.11

These are a few minor fixes for the ionic driver to clean
up a some little things that have been waiting for attention.
These were originally sent for net, but now respun for net-next.

v1: https://lore.kernel.org/netdev/20240521013715.12098-1-shannon.nelson@amd.com/
====================

Link: https://lore.kernel.org/r/20240529000259.25775-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: fix up ionic_if.h kernel-doc issues
Shannon Nelson [Wed, 29 May 2024 00:02:59 +0000 (17:02 -0700)]
ionic: fix up ionic_if.h kernel-doc issues

All the changes here are whitespace and comments, no code
or definitions changed.  Not all issues were addressed, but
it is much better than it was.  Mostly fixed was a lot
of "Excess union member" and ''rsvd' not described' warnings.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-8-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: only sync frag_len in first buffer of xdp
Shannon Nelson [Wed, 29 May 2024 00:02:58 +0000 (17:02 -0700)]
ionic: only sync frag_len in first buffer of xdp

We don't want to try to sync more length than might be
in the first frag of an Rx skb, so make sure to use
the frag_len rather than the full len.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-7-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: Use netdev_name() function instead of netdev->name
Brett Creeley [Wed, 29 May 2024 00:02:57 +0000 (17:02 -0700)]
ionic: Use netdev_name() function instead of netdev->name

There is no reason not to use netdev_name() in these places, so do just
as the title states.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20240529000259.25775-6-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: Mark error paths in the data path as unlikely
Brett Creeley [Wed, 29 May 2024 00:02:56 +0000 (17:02 -0700)]
ionic: Mark error paths in the data path as unlikely

As the title states, mark unlikely error paths in the data path as
unlikely.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-5-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: Pass ionic_txq_desc to ionic_tx_tso_post
Brett Creeley [Wed, 29 May 2024 00:02:55 +0000 (17:02 -0700)]
ionic: Pass ionic_txq_desc to ionic_tx_tso_post

Pass the ionic_txq_desc instead of re-referencing it from the q->txq
array since the caller to ionic_tx_tso_post will always have the
current ionic_txq_desc pointer already.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-4-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: Reset LIF device while restarting LIF
Shannon Nelson [Wed, 29 May 2024 00:02:54 +0000 (17:02 -0700)]
ionic: Reset LIF device while restarting LIF

Recovery from broken states can be hard.  If the LIF reset in
the fw_down path didn't work because the PCI link was broken,
the FW won't be in the right state for proper restart.  We can
fire another LIF reset in the fw_up path to be sure things
are clean on restart.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-3-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoionic: fix potential irq name truncation
Shannon Nelson [Wed, 29 May 2024 00:02:53 +0000 (17:02 -0700)]
ionic: fix potential irq name truncation

Address a warning about potential string truncation based on the
string buffer sizes.  We can add some hints to the string format
specifier to set limits on the resulting possible string to
squelch the complaints.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240529000259.25775-2-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agohns3: avoid linking objects into multiple modules
Arnd Bergmann [Tue, 28 May 2024 16:15:25 +0000 (18:15 +0200)]
hns3: avoid linking objects into multiple modules

Each object file contains information about which module it gets linked
into, so linking the same file into multiple modules now causes a warning:

scripts/Makefile.build:254: drivers/net/ethernet/hisilicon/hns3/Makefile: hns3_common/hclge_comm_cmd.o is added to multiple modules: hclge hclgevf
scripts/Makefile.build:254: drivers/net/ethernet/hisilicon/hns3/Makefile: hns3_common/hclge_comm_rss.o is added to multiple modules: hclge hclgevf
scripts/Makefile.build:254: drivers/net/ethernet/hisilicon/hns3/Makefile: hns3_common/hclge_comm_tqp_stats.o is added to multiple modules: hclge hclgevf

Change the way that hns3 is built by moving the three common files into a
separate module with exported symbols instead.

Fixes: 5f20be4e90e6 ("net: hns3: refactor hns3 makefile to support hns3_common module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528161603.2443125-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoethernet: octeontx2: avoid linking objects into multiple modules
Arnd Bergmann [Tue, 28 May 2024 15:25:05 +0000 (17:25 +0200)]
ethernet: octeontx2: avoid linking objects into multiple modules

Each object file contains information about which module it gets linked
into, so linking the same file into multiple modules now causes a warning:

scripts/Makefile.build:254: drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_devlink.o is added to multiple modules: rvu_nicpf rvu_nicvf

Change the way that octeontx2 ethernet is built by moving the common
file into a separate module with exported symbols instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528152527.2148092-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'mlx4-add-support-for-netdev-genl-api'
Jakub Kicinski [Fri, 31 May 2024 00:14:47 +0000 (17:14 -0700)]
Merge branch 'mlx4-add-support-for-netdev-genl-api'

Joe Damato says:

====================
mlx4: Add support for netdev-genl API

There are no functional changes from v5, which I mistakenly sent right
after net-next was closed (oops). This revision, however, includes
Tariq's Reviewed-by tags of the v5 in each commit message. See the
changelog below.

This series adds support to mlx4 for the netdev-genl API which makes it
much easier for users and user programs to map NAPI IDs back to
ifindexes, queues, and IRQs. This is extremely useful for a number of
use cases, including epoll-based busy poll.

In addition, this series includes a patch to generate per-queue
statistics using the netlink API, as well.

To facilitate the stats, patch 1/3 adds a field "alloc_fail" to the ring
structure. This is incremented by the driver in an appropriate place and
used in patch 3/3 as alloc_fail.

Please note: I do not have access to mlx4 hardware, but I've been
working closely with Martin Karsten from University of Waterloo (CC'd)
who has very graciously tested my patches on their mlx4 hardware (hence
his Tested-by attribution in each commit). His latest research work is
particularly interesting [1] and this series helps to support that (and
future) work.

Martin re-test v4 using Jakub's suggested tool [2] and the
stats.pkt_byte_sum and stats.qstat_by_ifindex tests passed. He also
adjusted the queue count and re-ran test to confirm it still passed even
if the queue count was modified.

[1]: https://dl.acm.org/doi/pdf/10.1145/3626780
[2]: https://lore.kernel.org/lkml/20240423175718.4ad4dc5a@kernel.org/
====================

Link: https://lore.kernel.org/r/20240528181139.515070-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet/mlx4: support per-queue statistics via netlink
Joe Damato [Tue, 28 May 2024 18:11:38 +0000 (18:11 +0000)]
net/mlx4: support per-queue statistics via netlink

Make mlx4 compatible with the newly added netlink queue stats API.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528181139.515070-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet/mlx4: link NAPI instances to queues and IRQs
Joe Damato [Tue, 28 May 2024 18:11:37 +0000 (18:11 +0000)]
net/mlx4: link NAPI instances to queues and IRQs

Make mlx4 compatible with the newly added netlink queue GET APIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528181139.515070-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet/mlx4: Track RX allocation failures in a stat
Joe Damato [Tue, 28 May 2024 18:11:36 +0000 (18:11 +0000)]
net/mlx4: Track RX allocation failures in a stat

mlx4_en_alloc_frags currently returns -ENOMEM when mlx4_alloc_page
fails but does not increment a stat field when this occurs.

A new field called alloc_fail has been added to struct mlx4_en_rx_ring
which is now incremented in mlx4_en_rx_ring when -ENOMEM occurs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528181139.515070-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 May 2024 15:33:04 +0000 (08:33 -0700)]
Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - gro: initialize network_offset in network layer

   - tcp: reduce accepted window in NEW_SYN_RECV state

  Current release - new code bugs:

   - eth: mlx5e: do not use ptp structure for tx ts stats when not
     initialized

   - eth: ice: check for unregistering correct number of devlink params

  Previous releases - regressions:

   - bpf: Allow delete from sockmap/sockhash only if update is allowed

   - sched: taprio: extend minimum interval restriction to entire cycle
     too

   - netfilter: ipset: add list flush to cancel_gc

   - ipv4: fix address dump when IPv4 is disabled on an interface

   - sock_map: avoid race between sock_map_close and sk_psock_put

   - eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
     status rules

  Previous releases - always broken:

   - core: fix __dst_negative_advice() race

   - bpf:
       - fix multi-uprobe PID filtering logic
       - fix pkt_type override upon netkit pass verdict

   - netfilter: tproxy: bail out if IP has been disabled on the device

   - af_unix: annotate data-race around unix_sk(sk)->addr

   - eth: mlx5e: fix UDP GSO for encapsulated packets

   - eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
     buffers

   - eth: i40e: fully suspend and resume IO operations in EEH case

   - eth: octeontx2-pf: free send queue buffers incase of leaf to inner

   - eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"

* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  netdev: add qstat for csum complete
  ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
  net: ena: Fix redundant device NUMA node override
  ice: check for unregistering correct number of devlink params
  ice: fix 200G PHY types to link speed mapping
  i40e: Fully suspend and resume IO operations in EEH case
  i40e: factoring out i40e_suspend/i40e_resume
  e1000e: move force SMBUS near the end of enable_ulp function
  net: dsa: microchip: fix RGMII error in KSZ DSA driver
  ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
  net: fix __dst_negative_advice() race
  nfc/nci: Add the inconsistency check between the input data length and count
  MAINTAINERS: dwmac: starfive: update Maintainer
  net/sched: taprio: extend minimum interval restriction to entire cycle too
  net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
  netfilter: nft_fib: allow from forward/input without iif selector
  netfilter: tproxy: bail out if IP has been disabled on the device
  netfilter: nft_payload: skbuff vlan metadata mangle support
  net: ti: icssg-prueth: Fix start counter for ft1 filter
  sock_map: avoid race between sock_map_close and sk_psock_put
  ...

16 months agoMerge branch 'introduce-switch-mode-support-for-icssg-driver'
Paolo Abeni [Thu, 30 May 2024 13:37:29 +0000 (15:37 +0200)]
Merge branch 'introduce-switch-mode-support-for-icssg-driver'

MD Danish Anwar says:

====================
Introduce switch mode support for ICSSG driver

This series adds support for switch-mode for ICSSG driver. This series
also introduces helper APIs to configure firmware maintained FDB
(Forwarding Database) and VLAN tables. These APIs are later used by ICSSG
driver in switch mode.

Now the driver will boot by default in dual EMAC mode. When first ICSSG
interface is added to bridge driver will still be in EMAC mode. As soon as
second ICSSG interface is added to same bridge, switch-mode will be
enabled and switch firmwares will be loaded to PRU cores. The driver will
remain in dual EMAC mode if ICSSG interfaces are added to two different
bridges or if two different interfaces (One ICSSG, one other) is added to
the same bridge. We'll only enable is_switch_mode flag when two ICSSG
interfaces are added to same bridge.

We start in dual MAC mode. Let's say lan0 and lan1 are ICSSG interfaces

ip link add name br0 type bridge
ip link set lan0 master br0

At this point, we get a CHANGEUPPER event. Only one port is a member of
the bridge, so we will still be in dual MAC mode.

ip link set lan1 master br0

We get a second CHANGEUPPER event, the second interface lan1 is also ICSSG
interface so we will set the is_switch_mode flag and when interfaces are
brought up again, ICSSG switch firmwares will be loaded to PRU Cores.

There are some other cases to consider as well.

ip link add name br0 type bridge
ip link add name br1 type bridge

ip link set lan0 master br0
ip link set ppp0 master br0

Here we are adding lan0 (ICSSG) and ppp0 (non ICSSG) to same bridge, as
they both are not ICSSG, we will still be running in dual EMAC mode.

ip link set lan1 master br1
ip link set vpn0 master br1

Here we are adding lan1 (ICSSG) and vpn0 (non ICSSG) to same bridge, as
they both are not ICSSG, we will still be running in dual EMAC mode.

This is v6 of the series.

Changes from v5 to v6:
*) Removed __packed from structures in icssg_config.h file.
*) Added RB tags of Andrew Lunn <andrew@lunn.ch> to patch 2/3 and patch
   3/3 of this series.

Changes from v4 to v5:
*) Rebased on 6.10-rc1.
*) Dropped the RFC tag.

Changes from v3 to v4:
*) Added RFC tag as net-next is closed now.
*) Modified the driver to remove the need of bringing interfaces up / down
   for enabling / disabling switch mode. Now switch mode can be enabled
   without bringig interfaces up / down as requested by Andrew Lunn
   <andrew@lunn.ch>
*) Modified commit message of patch 3/3.

Changes from v2 to v3:
*) Dropped RFC tag.
*) Used ether_addr_copy() instead of manually copying mac address using
   for loop in patch 1/3 as suggested by Andrew Lunn <andrew@lunn.ch>
*) Added helper API icssg_fdb_setup() in patch 1/3 to reduce code
   duplication as suggested by Andrew Lunn <andrew@lunn.ch>
*) In prueth_switchdev_stp_state_set() removed BR_STATE_LEARNING as
   learning without forwarding is not supported by ICSSG firmware.
*) Used ether_addr_equal() wherever possible in patch 2/3 as suggested
   by Andrew Lunn <andrew@lunn.ch>
*) Fixed typo "nit: s/prueth_switchdevice_nb/prueth_switchdev_nb/" in
   patch 2/3 as suggested by Simon Horman <horms@kernel.org>
*) Squashed "#include "icssg_mii_rt.h" to patch 2/3 from patch 3/3 as
   suggested by Simon Horman <horms@kernel.org>
*) Rebased on latest net-next/main.

Changes from v1 to v2:
*) Removed TAPRIO support patch from this series.
*) Stopped using devlink for enabling switch-mode as suggested by Andrew L
*) Added read_poll_timeout() in patch 1 / 3 as suggested by Andrew L.

v1 https://lore.kernel.org/all/20230830110847.1219515-4-danishanwar@ti.com/
v2 https://lore.kernel.org/all/20240118071005.1514498-1-danishanwar@ti.com/
v3 https://lore.kernel.org/all/20240327114054.1907278-1-danishanwar@ti.com/
v4 https://lore.kernel.org/all/20240515060320.2783244-1-danishanwar@ti.com/
v5 https://lore.kernel.org/all/20240527052738.152821-1-danishanwar@ti.com/

Thanks and Regards,
Md Danish Anwar
====================

Link: https://lore.kernel.org/r/20240528113734.379422-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agonet: ti: icssg-prueth: Add support for ICSSG switch firmware
MD Danish Anwar [Tue, 28 May 2024 11:37:34 +0000 (17:07 +0530)]
net: ti: icssg-prueth: Add support for ICSSG switch firmware

Add support for ICSSG switch firmware using existing Dual EMAC driver
with switchdev.

Limitations:
VLAN offloading is limited to 0-256 IDs.
MDB/FDB static entries are limited to 511 entries and different FDBs can
hash to same bucket and thus may not completely offloaded

Example assuming ETH1 and ETH2 as ICSSG2 interfaces:

Switch to ICSSG Switch mode:
 ip link add name br0 type bridge
 ip link set dev eth1 master br0
 ip link set dev eth2 master br0
 ip link set dev br0 up
 bridge vlan add dev br0 vid 1 pvid untagged self

Going back to Dual EMAC mode:

 ip link set dev br0 down
 ip link set dev eth1 nomaster
 ip link set dev eth2 nomaster
 ip link del name br0 type bridge

By default, Dual EMAC firmware is loaded, and can be changed to switch
mode by above steps

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agonet: ti: icssg-switch: Add switchdev based driver for ethernet switch support
MD Danish Anwar [Tue, 28 May 2024 11:37:33 +0000 (17:07 +0530)]
net: ti: icssg-switch: Add switchdev based driver for ethernet switch support

ICSSG can operating in switch mode with 2 ext port and 1 host port with
VLAN/FDB/MDB and STP offloading. Add switchdev based driver to
support the same.

Driver itself will be integrated with icssg_prueth in future commits

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agonet: ti: icssg-prueth: Add helper functions to configure FDB
MD Danish Anwar [Tue, 28 May 2024 11:37:32 +0000 (17:07 +0530)]
net: ti: icssg-prueth: Add helper functions to configure FDB

Introduce helper functions to configure firmware FDB tables, VLAN tables
and Port VLAN ID settings to aid adding Switch mode support.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agonetdev: add qstat for csum complete
Jakub Kicinski [Wed, 29 May 2024 16:35:47 +0000 (09:35 -0700)]
netdev: add qstat for csum complete

Recent commit 0cfe71f45f42 ("netdev: add queue stats") added
a lot of useful stats, but only those immediately needed by virtio.
Presumably virtio does not support CHECKSUM_COMPLETE,
so statistic for that form of checksumming wasn't included.
Other drivers will definitely need it, in fact we expect it
to be needed in net-next soon (mlx5). So let's add the definition
of the counter for CHECKSUM_COMPLETE to uAPI in net already,
so that the counters are in a more natural order (all subsequent
counters have not been present in any released kernel, yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Fixes: 0cfe71f45f42 ("netdev: add queue stats")
Link: https://lore.kernel.org/r/20240529163547.3693194-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agoipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
Yue Haibing [Wed, 29 May 2024 09:56:33 +0000 (17:56 +0800)]
ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound

Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will
hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path.

WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70
Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper
CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:sk_mc_loop+0x2d/0x70
Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c
RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212
RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000
RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00
R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000
R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000
FS:  0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
 ? __warn (kernel/panic.c:693)
 ? sk_mc_loop (net/core/sock.c:760)
 ? report_bug (lib/bug.c:201 lib/bug.c:219)
 ? handle_bug (arch/x86/kernel/traps.c:239)
 ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
 ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
 ? sk_mc_loop (net/core/sock.c:760)
 ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1))
 ? nf_hook_slow (net/netfilter/core.c:626)
 ip6_finish_output (net/ipv6/ip6_output.c:222)
 ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215)
 ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan
 ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan
 dev_hard_start_xmit (net/core/dev.c:3594)
 sch_direct_xmit (net/sched/sch_generic.c:343)
 __qdisc_run (net/sched/sch_generic.c:416)
 net_tx_action (net/core/dev.c:5286)
 handle_softirqs (kernel/softirq.c:555)
 __irq_exit_rcu (kernel/softirq.c:589)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043)

The warning triggers as this:
packet_sendmsg
   packet_snd //skb->sk is packet sk
      __dev_queue_xmit
         __dev_xmit_skb //q->enqueue is not NULL
             __qdisc_run
               sch_direct_xmit
                 dev_hard_start_xmit
                   ipvlan_start_xmit
                      ipvlan_xmit_mode_l3 //l3 mode
                        ipvlan_process_outbound //vepa flag
                          ipvlan_process_v6_outbound
                            ip6_local_out
                                __ip6_finish_output
                                  ip6_finish_output2 //multicast packet
                                    sk_mc_loop //sk->sk_family is AF_PACKET

Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this.

Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agoMerge tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 30 May 2024 08:14:56 +0000 (10:14 +0200)]
Merge tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 syzbot reports that nf_reinject() could be called without
         rcu_read_lock() when flushing pending packets at nfnetlink
         queue removal, from Eric Dumazet.

Patch #2 flushes ipset list:set when canceling garbage collection to
         reference to other lists to fix a race, from Jozsef Kadlecsik.

Patch #3 restores q-in-q matching with nft_payload by reverting
         f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support").

Patch #4 fixes vlan mangling in skbuff when vlan offload is present
         in skbuff, without this patch nft_payload corrupts packets
         in this case.

Patch #5 fixes possible nul-deref in tproxy no IP address is found in
         netdevice, reported by syzbot and patch from Florian Westphal.

Patch #6 removes a superfluous restriction which prevents loose fib
         lookups from input and forward hooks, from Eric Garver.

My assessment is that patches #1, #2 and #5 address possible kernel
crash, anything else in this batch fixes broken features.

netfilter pull request 24-05-29

* tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_fib: allow from forward/input without iif selector
  netfilter: tproxy: bail out if IP has been disabled on the device
  netfilter: nft_payload: skbuff vlan metadata mangle support
  netfilter: nft_payload: restore vlan q-in-q match support
  netfilter: ipset: Add list flush to cancel_gc
  netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
====================

Link: https://lore.kernel.org/r/20240528225519.1155786-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
16 months agonet: ena: Fix redundant device NUMA node override
Shay Agroskin [Tue, 28 May 2024 17:09:12 +0000 (20:09 +0300)]
net: ena: Fix redundant device NUMA node override

The driver overrides the NUMA node id of the device regardless of
whether it knows its correct value (often setting it to -1 even though
the node id is advertised in 'struct device'). This can lead to
suboptimal configurations.

This patch fixes this behavior and makes the shared memory allocation
functions use the NUMA node id advertised by the underlying device.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20240528170912.1204417-1-shayagr@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'intel-wired-lan-driver-updates-2024-05-28-e1000e-i40e-ice'
Jakub Kicinski [Thu, 30 May 2024 01:57:02 +0000 (18:57 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2024-05-28-e1000e-i40e-ice'

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2024-05-28 (e1000e, i40e, ice) [part]

This series includes a variety of fixes that have been accumulating on the
Intel Wired LAN dev-queue.

Hui Wang provides a fix for suspend/resume on e1000e due to failure
to correctly setup the SMBUS in enable_ulp().

Thinh Tran provides a fix for EEH I/O suspend/resume on i40e to
ensure that I/O operations can continue after a resume. To avoid duplicate
code, the common logic is factored out of i40e_suspend and i40e_resume.

Paul Greenwalt provides a fix to correctly map the 200G PHY types to link
speeds in the ice driver.

Dave Ertman provides a fix correcting devlink parameter unregistration in
the event that the driver loads in safe mode and some of the parameters
were not registered.
====================

Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-0-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: check for unregistering correct number of devlink params
Dave Ertman [Tue, 28 May 2024 22:06:11 +0000 (15:06 -0700)]
ice: check for unregistering correct number of devlink params

On module load, the ice driver checks for the lack of a specific PF
capability to determine if it should reduce the number of devlink params
to register.  One situation when this test returns true is when the
driver loads in safe mode.  The same check is not present on the unload
path when devlink params are unregistered.  This results in the driver
triggering a WARN_ON in the kernel devlink code.

The current check and code path uses a reduction in the number of elements
reported in the list of params.  This is fragile and not good for future
maintaining.

Change the parameters to be held in two lists, one always registered and
one dependent on the check.

Add a symmetrical check in the unload path so that the correct parameters
are unregistered as well.

Fixes: 109eb2917284 ("ice: Add tx_scheduling_layers devlink param")
CC: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-8-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoice: fix 200G PHY types to link speed mapping
Paul Greenwalt [Tue, 28 May 2024 22:06:08 +0000 (15:06 -0700)]
ice: fix 200G PHY types to link speed mapping

Commit 24407a01e57c ("ice: Add 200G speed/phy type use") added support
for 200G PHY speeds, but did not include the mapping of 200G PHY types
to link speed. As a result the driver is returning UNKNOWN link speed
when setting 200G ethtool advertised link modes.

To fix this add 200G PHY types to link speed mapping to
ice_get_link_speed_based_on_phy_type().

Fixes: 24407a01e57c ("ice: Add 200G speed/phy type use")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-5-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoi40e: Fully suspend and resume IO operations in EEH case
Thinh Tran [Tue, 28 May 2024 22:06:06 +0000 (15:06 -0700)]
i40e: Fully suspend and resume IO operations in EEH case

When EEH events occurs, the callback functions in the i40e, which are
managed by the EEH driver, will completely suspend and resume all IO
operations.

- In the PCI error detected callback, replaced i40e_prep_for_reset()
  with i40e_io_suspend(). The change is to fully suspend all I/O
  operations
- In the PCI error slot reset callback, replaced pci_enable_device_mem()
  with pci_enable_device(). This change enables both I/O and memory of
  the device.
- In the PCI error resume callback, replaced i40e_handle_reset_warning()
  with i40e_io_resume(). This change allows the system to resume I/O
  operations

Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoi40e: factoring out i40e_suspend/i40e_resume
Thinh Tran [Tue, 28 May 2024 22:06:05 +0000 (15:06 -0700)]
i40e: factoring out i40e_suspend/i40e_resume

Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
introduced.  These functions were factored out from the existing
i40e_suspend() and i40e_resume() respectively.  This factoring was
done due to concerns about the logic of the I40E_SUSPENSED state, which
caused the device to be unable to recover.  The functions are now used
in the EEH handling for device suspend/resume callbacks.

The function i40e_enable_mc_magic_wake() has been moved ahead of
i40e_io_suspend() to ensure it is declared before being used.

Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoe1000e: move force SMBUS near the end of enable_ulp function
Hui Wang [Tue, 28 May 2024 22:06:04 +0000 (15:06 -0700)]
e1000e: move force SMBUS near the end of enable_ulp function

The commit 861e8086029e ("e1000e: move force SMBUS from enable ulp
function to avoid PHY loss issue") introduces a regression on
PCH_MTP_I219_LM18 (PCIID: 0x8086550A). Without the referred commit, the
ethernet works well after suspend and resume, but after applying the
commit, the ethernet couldn't work anymore after the resume and the
dmesg shows that the NIC link changes to 10Mbps (1000Mbps originally):

    [   43.305084] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx

Without the commit, the force SMBUS code will not be executed if
"return 0" or "goto out" is executed in the enable_ulp(), and in my
case, the "goto out" is executed since FWSM_FW_VALID is set. But after
applying the commit, the force SMBUS code will be ran unconditionally.

Here move the force SMBUS code back to enable_ulp() and put it
immediately ahead of hw->phy.ops.release(hw), this could allow the
longest settling time as possible for interface in this function and
doesn't change the original code logic.

The issue was found on a Lenovo laptop with the ethernet hw as below:
00:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:550a]
(rev 20).

And this patch is verified (cable plug and unplug, system suspend
and resume) on Lenovo laptops with ethernet hw: [8086:550a],
[8086:550b], [8086:15bb], [8086:15be], [8086:1a1f], [8086:1a1c] and
[8086:0dc7].

Fixes: 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-1-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: dsa: microchip: fix RGMII error in KSZ DSA driver
Tristram Ha [Tue, 28 May 2024 21:34:26 +0000 (14:34 -0700)]
net: dsa: microchip: fix RGMII error in KSZ DSA driver

The driver should return RMII interface when XMII is running in RMII mode.

Fixes: 0ab7f6bf1675 ("net: dsa: microchip: ksz9477: use common xmii function")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Acked-by: Jerry Ray <jerry.ray@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1716932066-3342-1-git-send-email-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoipv4: correctly iterate over the target netns in inet_dump_ifaddr()
Alexander Mikhalitsyn [Tue, 28 May 2024 20:30:30 +0000 (22:30 +0200)]
ipv4: correctly iterate over the target netns in inet_dump_ifaddr()

A recent change to inet_dump_ifaddr had the function incorrectly iterate
over net rather than tgt_net, resulting in the data coming for the
incorrect network namespace.

Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://github.com/lxc/incus/issues/892
Bisected-by: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Stéphane Graber <stgraber@stgraber.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240528203030.10839-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: smc91x: Remove commented out code
Thorsten Blum [Tue, 28 May 2024 16:00:37 +0000 (18:00 +0200)]
net: smc91x: Remove commented out code

Remove commented out code

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://lore.kernel.org/r/20240528160036.404946-2-thorsten.blum@toblux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: dsa: felix: provide own phylink MAC operations
Russell King (Oracle) [Tue, 28 May 2024 15:15:42 +0000 (16:15 +0100)]
net: dsa: felix: provide own phylink MAC operations

Convert felix to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/E1sByYA-00EM0y-Jn@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agonet: dsa: remove mac_prepare()/mac_finish() shims
Russell King (Oracle) [Tue, 28 May 2024 15:05:09 +0000 (16:05 +0100)]
net: dsa: remove mac_prepare()/mac_finish() shims

No DSA driver makes use of the mac_prepare()/mac_finish() shimmed
operations anymore, so we can remove these.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1sByNx-00ELW1-Vp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
16 months agoMerge branch 'doc-netlink-fixes-for-ynl-doc-generator'
Jakub Kicinski [Thu, 30 May 2024 01:10:27 +0000 (18:10 -0700)]
Merge branch 'doc-netlink-fixes-for-ynl-doc-generator'

Donald Hunter says:

====================
doc: netlink: Fixes for ynl doc generator

Several fixes to ynl-gen-rst to resolve issues that can be seen
in:

https://docs.kernel.org/6.10-rc1/networking/netlink_spec/dpll.html#device-id-get
https://docs.kernel.org/6.10-rc1/networking/netlink_spec/dpll.html#lock-status

In patch 2, by not using 'sanitize' for op docs, any formatting in the
.yaml gets passed straight through to the generated .rst which means
that basic rst (also markdown compatible) list formatting can be used in
the .yaml
====================

Link: https://lore.kernel.org/r/20240528140652.9445-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>