If caam module is built without OF support, the compiler returns the
following warning:
drivers/crypto/caam/ctrl.c:83:34: warning: 'imx8m_machine_match' defined but not used [-Wunused-const-variable=]
imx8m_machine_match is only referenced by of_match_node(), which is set
to NULL if CONFIG_OF is not set, as of commit 5762c20593b6b ("dt: Add
empty of_match_node() macro"):
#define of_match_node(_matches, _node) NULL
Do not create imx8m_machine_match if CONFIG_OF is not set.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407011309.cpTuOGdg-lkp@intel.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240702185557.3699991-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: dsa: microchip: lan937x: Add error handling in lan937x_setup
Introduce error handling for lan937x_cfg function calls in lan937x_setup.
This change ensures that if any lan937x_cfg or ksz_rmw32 calls fails, the
function will return the appropriate error code.
====================
selftests: openvswitch: Address some flakes in the CI environment
These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series. There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).
====================
selftests: openvswitch: Be more verbose with selftest debugging.
The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases. Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.
Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.
selftests: openvswitch: Attempt to autoload module.
Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module. The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development. However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests. This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.
selftests: openvswitch: Bump timeout to 15 minutes.
We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded. Bump the timeout to 15 minutes.
Jakub Kicinski [Tue, 2 Jul 2024 16:41:57 +0000 (09:41 -0700)]
net: ethtool: fix compat with old RSS context API
Device driver gets access to rxfh_dev, while rxfh is just a local
copy of user space params. We need to check what RSS context ID
driver assigned in rxfh_dev, not rxfh.
Using rxfh leads to trying to store all contexts at index 0xffffffff.
From the user perspective it leads to "driver chose duplicate ID"
warnings when second context is added and inability to access any
contexts even tho they were successfully created - xa_load() for
the actual context ID will return NULL, and syscall will return -ENOENT.
Looks like a rebasing mistake, since rxfh_dev was added relatively
recently by commit fb6e30a72539 ("net: ethtool: pass a pointer to
parameters to get/set_rxfh ethtool ops").
Jakub Kicinski [Tue, 2 Jul 2024 23:37:28 +0000 (16:37 -0700)]
selftests: drv-net: rss_ctx: allow more noise on default context
As predicted by David running the test on a machine with a single
interface is a bit unreliable. We try to send 20k packets with
iperf and expect fewer than 10k packets on the default context.
The test isn't very quick, iperf will usually send 100k packets
by the time we stop it. So we're off by 5x on the number of iperf
packets but still expect default context to only get the hardcoded
10k. The intent is to make sure we get noticeably less traffic
on the default context. Use half of the resulting iperf traffic
instead of the hard coded 10k.
Xin Long [Mon, 1 Jul 2024 17:48:49 +0000 (13:48 -0400)]
sctp: cancel a blocking accept when shutdown a listen socket
As David Laight noticed,
"In a multithreaded program it is reasonable to have a thread blocked in
accept(). With TCP a subsequent shutdown(listen_fd, SHUT_RDWR) causes
the accept to fail. But nothing happens for SCTP."
sctp_disconnect() is eventually called when shutdown a listen socket,
but nothing is done in this function. This patch sets RCV_SHUTDOWN
flag in sk->sk_shutdown there, and adds the check (sk->sk_shutdown &
RCV_SHUTDOWN) to break and return in sctp_accept().
Note that shutdown() is only supported on TCP-style SCTP socket.
Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lucas Stach [Mon, 1 Jul 2024 08:53:43 +0000 (10:53 +0200)]
net: dsa: microchip: lan937x: disable VPHY support
As described by the microchip article "LAN937X - The required
configuration for the external MAC port to operate at RGMII-to-RGMII
1Gbps link speed." [1]:
"When VPHY is enabled, the auto-negotiation process following IEEE 802.3
standard will be triggered and will result in RGMII-to-RGMII signal
failure on the interface because VPHY will try to poll the PHY status
that is not available in the scenario of RGMII-to-RGMII connection
(normally the link partner is usually an external processor).
Note that when VPHY fails on accessing PHY registers, it will fall back
to 100Mbps speed, it indicates disabling VPHY is optional if you only
need the port to link at 100Mbps speed.
Again, VPHY must and can only be disabled by writing VPHY_DISABLE bit in
the register below as there is no strapping pin for the control."
This patch was tested on LAN9372, so far it seems to not to affect VPHY
based clock crossing optimization for the ports with integrated PHYs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lucas Stach [Mon, 1 Jul 2024 08:53:42 +0000 (10:53 +0200)]
net: dsa: microchip: lan937x: disable in-band status support for RGMII interfaces
This driver do not support in-band mode and in case of CPU<->Switch
link, this mode is not working any way. So, disable it otherwise ingress
path of the switch MAC will stay disabled.
Note: lan9372 manual do not document 0xN301 BIT(2) for the RGMII mode
and recommend[1] to disable in-band link status update for the RGMII RX
path by clearing 0xN302 BIT(0). But, 0xN301 BIT(2) seems to work too, so
keep it unified with other KSZ switches.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lucas Stach [Mon, 1 Jul 2024 08:53:41 +0000 (10:53 +0200)]
net: dsa: microchip: lan9371/2: add 100BaseTX PHY support
On the LAN9371 and LAN9372, the 4th internal PHY is a 100BaseTX PHY
instead of a 100BaseT1 PHY. The 100BaseTX PHYs have a different base
register offset.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mina Almasry [Fri, 28 Jun 2024 00:32:42 +0000 (00:32 +0000)]
page_pool: convert to use netmem
Abstract the memory type from the page_pool so we can later add support
for new memory types. Convert the page_pool to use the new netmem type
abstraction, rather than use struct page directly.
As of this patch the netmem type is a no-op abstraction: it's always a
struct page underneath. All the page pool internals are converted to
use struct netmem instead of struct page, and the page pool now exports
2 APIs:
1. The existing struct page API.
2. The new struct netmem API.
Keeping the existing API is transitional; we do not want to refactor all
the current drivers using the page pool at once.
The netmem abstraction is currently a no-op. The page_pool uses
page_to_netmem() to convert allocated pages to netmem, and uses
netmem_to_page() to convert the netmem back to pages to pass to mm APIs,
Follow up patches to this series add non-paged netmem support to the
page_pool. This change is factored out on its own to limit the code
churn to this 1 patch, for ease of code review.
====================
Fixes for stm32-dwmac driver fails to probe
Mark Brown found issue during stm32-dwmac probe:
For the past few days networking has been broken on the Avenger 96, a
stm32mp157a based platform. The stm32-dwmac driver fails to probe:
<6>[ 1.894271] stm32-dwmac 5800a000.ethernet: IRQ eth_wake_irq not found
<6>[ 1.899694] stm32-dwmac 5800a000.ethernet: IRQ eth_lpi not found
<6>[ 1.905849] stm32-dwmac 5800a000.ethernet: IRQ sfty not found
<3>[ 1.912304] stm32-dwmac 5800a000.ethernet: Unable to parse OF data
<3>[ 1.918393] stm32-dwmac 5800a000.ethernet: probe with driver stm32-dwmac failed with error -75
which looks a bit odd given the commit contents but I didn't look at the
driver code at all.
Full boot log here:
https://lava.sirena.org.uk/scheduler/job/467150
A working equivalent is here:
https://lava.sirena.org.uk/scheduler/job/466518
I delivered 2 fixes to solve issue.
====================
net: stmmac: dwmac-stm32: update err status in case different of stm32mp13
The mask parameter of syscfg property is mandatory for MP13 but
optional for all other cases.
The function should not return error code because for non-MP13
the missing syscfg phandle in DT is not considered an error.
So reset err to 0 in that case to support existing DTs without
syscfg phandle.
Fixes: 50bbc0393114 ("net: stmmac: dwmac-stm32: add management of stm32mp13 for stm32") Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Marek Vasut <marex@denx.de> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net: stmmac: dwmac-stm32: Add test to verify if ETHCK is used before checking clk rate
When we want to use clock from RCC to clock Ethernet PHY (with ETHCK)
we need to check if value of clock rate is authorized.
If ETHCK is unused, the ETHCK frequency is 0Hz and validation fails.
It makes no sense to validate unused ETHCK, so skip the validation.
Fixes: 582ac134963e ("net: stmmac: dwmac-stm32: Separate out external clock rate validation") Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Marek Vasut <marex@denx.de> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Serge Semin [Fri, 28 Jun 2024 15:45:12 +0000 (18:45 +0300)]
dt-bindings: net: dwmac: Validate PBL for all IP-cores
Indeed the maximum DMA burst length can be programmed not only for DW
xGMACs, Allwinner EMACs and Spear SoC GMAC, but in accordance with
[1, 2, 3] for Generic DW *MAC IP-cores. Moreover the STMMAC driver parses
the property and then apply the configuration for all supported DW MAC
devices. All of that makes the property being available for all IP-cores
the bindings supports. Let's make sure the PBL-related properties are
validated for all of them by the common DW *MAC DT schema.
[1] DesignWare Cores Ethernet MAC Universal Databook, Revision 3.73a,
October 2013, p.378.
Sebastian Andrzej Siewior [Fri, 28 Jun 2024 10:18:56 +0000 (12:18 +0200)]
net: Move flush list retrieval to where it is used.
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.
Move flush list retrieval to where it is used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sebastian Andrzej Siewior [Fri, 28 Jun 2024 10:18:55 +0000 (12:18 +0200)]
net: Optimize xdp_do_flush() with bpf_net_context infos.
Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().
Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.
Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sebastian Andrzej Siewior [Fri, 28 Jun 2024 10:18:54 +0000 (12:18 +0200)]
net: Remove task_struct::bpf_net_context init on fork.
There is no clone() invocation within a bpf_net_ctx_…() block. Therefore
the task_struct::bpf_net_context has always to be NULL and an explicit
initialisation is not required.
Remove the NULL assignment in the clone() path.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
page_pool: bnxt_en: unlink old page pool in queue api using helper
56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().
Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.
The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.
In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.
====================
David Wei [Thu, 27 Jun 2024 03:02:00 +0000 (20:02 -0700)]
bnxt_en: unlink page pool when stopping Rx queue
Have bnxt call page_pool_disable_direct_recycling() to unlink the old
page pool when resetting a queue prior to destroying it, instead of
touching a netdev core struct directly.
Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().
Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.
The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.
In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 2 Jul 2024 10:06:52 +0000 (12:06 +0200)]
Merge branch 'zerocopy-tx-cleanups'
Pavel Begunkov says:
====================
zerocopy tx cleanups
Assorted zerocopy send path cleanups, the main part of which is
moving some net stack specific accounting out of io_uring back
to net/ in Patch 4.
====================
Pavel Begunkov [Thu, 27 Jun 2024 12:59:45 +0000 (13:59 +0100)]
net: limit scope of a skb_zerocopy_iter_stream var
skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path,
and we can move the local variable in the appropriate block.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 27 Jun 2024 12:59:44 +0000 (13:59 +0100)]
io_uring/net: move charging socket out of zc io_uring
Currently, io_uring's io_sg_from_iter() duplicates the part of
__zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
to miss while changing it in net/, the chunk is not the most
straightforward for outside users and full of internal implementation
details. io_uring is not a good place to keep it, deduplicate it by
moving out of the callback into __zerocopy_sg_from_iter().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 27 Jun 2024 12:59:43 +0000 (13:59 +0100)]
net: batch zerocopy_fill_skb_from_iter accounting
Instead of accounting every page range against the socket separately, do
it in batch based on the change in skb->truesize. It's also moved into
__zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
simpler and responsible for setting frags but not the accounting.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 27 Jun 2024 12:59:42 +0000 (13:59 +0100)]
net: split __zerocopy_sg_from_iter()
Split a function out of __zerocopy_sg_from_iter() that only cares about
the traditional path with refcounted pages and doesn't need to know
about ->sg_from_iter. A preparation patch, we'll improve on the function
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 27 Jun 2024 12:59:41 +0000 (13:59 +0100)]
net: always try to set ubuf in skb_zerocopy_iter_stream
skb_zcopy_set() does nothing if there is already a ubuf_info associated
with an skb, and since ->link_skb should have set it several lines above
the check here essentially does nothing and can be removed. It's also
safer this way, because even if the callback is faulty we'll
have it set.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Fri, 28 Jun 2024 10:32:11 +0000 (11:32 +0100)]
net: phy: fix potential use of NULL pointer in phy_suspend()
phy_suspend() checks the WoL status, and then dereferences
phydrv->flags if (and only if) we decided that WoL has been enabled
on either the PHY or the netdev.
We then check whether phydrv was NULL, but we've potentially already
dereferenced the pointer.
If phydrv is NULL, then phy_ethtool_get_wol() will return an error
and leave wol.wolopts set to zero. However, if netdev->wol_enabled
is true, then we would dereference a NULL pointer.
Checking the PHY drivers, the only place that phydev->wol_enabled is
checked by them is in their suspend/resume callbacks and nowhere else
(which is correct, because phylib only updates this in phy_suspend()).
So, move the NULL pointer check earlier to avoid a NULL pointer
dereference. Leave the check for phydrv->suspend in place as a driver
may populate the .resume method but not the .suspend method.
Geert Uytterhoeven contributes 3 patches with small improvements and
cleanups for the rcar_canfd driver.
A patch by Christophe JAILLET constifies the struct m_can_ops in the
m_can driver to reduce the code size.
The last 9 patches are by me an work around erratum DS80000789E 6 of
mcp2518fd.
* tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
can: mcp251xfd: clarify the meaning of timestamp
can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
can: mcp251xfd: update errata references
can: mcp251xfd: properly indent labels
can: gs_usb: add VID/PID for Xylanta SAINT3 product family
can: m_can: Constify struct m_can_ops
can: rcar_canfd: Remove superfluous parentheses in address calculations
can: rcar_canfd: Improve printing of global operational state
can: rcar_canfd: Simplify clock handling
====================
Heng Qi [Fri, 28 Jun 2024 04:40:18 +0000 (12:40 +0800)]
net: ethtool: Fix the panic caused by dev being null when dumping coalesce
syzbot reported a general protection fault caused by a null pointer
dereference in coalesce_fill_reply(). The issue occurs when req_base->dev
is null, leading to an invalid memory access.
This panic occurs if dumping coalesce when no device name is specified.
Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management") Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3 Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Jul 2024 10:23:22 +0000 (11:23 +0100)]
Merge branch 'bnxt_en-ptp' into main
Michael Chan says:
====================
bnxt_en: PTP updates for net-next
The first 5 patches implement the PTP feature on the new BCM5760X
chips. The main new hardware feature is the new TX timestamp
completion which enables the driver to retrieve the TX timestamp
in NAPI without deferring to the PTP worker.
The last 5 patches increase the number of TX PTP packets in-flight
from 1 to 4 on the older BCM5750X chips. On these older chips, we
need to call firmware in the PTP worker to retrieve the timestamp.
We use an arry to keep track of the in-flight TX PTP packets.
v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts.
Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:05 +0000 (12:30 -0700)]
bnxt_en: Remove atomic operations on ptp->tx_avail
Now that we require the spinlock to protect ptp->txts_prod, change
ptp->tx_avail to non-atomic and protect it under the same spinlock.
Add a new helper function bnxt_ptp_get_txts_prod() to decrement
ptp->tx_avail under spinlock and return the producer.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:04 +0000 (12:30 -0700)]
bnxt_en: Increase the max total outstanding PTP TX packets to 4
Start accepting up to 4 TX TS requests on BCM5750X (P5) chips.
These PTP TX packets will be queued in the ptp->txts_req[] array
waiting for the TX timestamp to complete. The entries in the
array will be managed by a producer and consumer index. The
producer index is updated under spinlock since multiple TX rings
can try to send PTP packets at the same time.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:03 +0000 (12:30 -0700)]
bnxt_en: Let bnxt_stamp_tx_skb() return error code
Change the function bnxt_stamp_tx_skb() to return 0 for suceess
or -EAGAIN if the timestamp is still pending in firmware. The
calling PTP aux worker will reschedule based on the return code.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:02 +0000 (12:30 -0700)]
bnxt_en: Remove an impossible condition check for PTP TX pending SKB
In the current 5750X PTP code paths, there is always at most one TX
SKB requested for timestamp and we won't accept another one until we
have retrieved the timestamp or it has timed out. Remove the
unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change
the function to void.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:01 +0000 (12:30 -0700)]
bnxt_en: Refactor all PTP TX timestamp fields into a struct
On the older 5750X (P5) chips, we currently support only 1 TX PTP
packet in-flight waiting for the timestamp. Refactor the
datastructures to prepare to support up to 4 TX PTP packets.
Combine all fields required for PTP TX timestamp query into one
structure. An array of this structure will be added in follow-on
patches to support multiple outstanding TX timestamps.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 28 Jun 2024 19:30:00 +0000 (12:30 -0700)]
bnxt_en: Add BCM5760X specific PHC registers mapping
BCM5760X firmware will advertise direct 64-bit PHC registers access
for the driver from BAR0.
Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's
response and PHC register mapping for 5760X chips.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 28 Jun 2024 19:29:59 +0000 (12:29 -0700)]
bnxt_en: Add TX timestamp completion logic
The new BCM5760X chips will return the timestamp of TX packets in a
new completion. Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp. This feature eliminates
the limit on the number of in-flight PTP TX packets.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 28 Jun 2024 19:29:58 +0000 (12:29 -0700)]
bnxt_en: Allow some TX packets to be unprocessed in NAPI
The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI. In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.
Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not. bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps. Note that legacy chips will
never hold any SKBs this way. The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware. On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 28 Jun 2024 19:29:57 +0000 (12:29 -0700)]
bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd
Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd. This field will mark the TX BD that has requested
HW TX timestamp. The field needs to be cleared if the timestamp packet
is later aborted. This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 28 Jun 2024 19:29:56 +0000 (12:29 -0700)]
bnxt_en: Add new TX timestamp completion definitions
The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Dabilpuram [Fri, 28 Jun 2024 16:31:26 +0000 (22:01 +0530)]
octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM
Octeontx2 hardware uses Near Data Cache(NDC) block to cache
contexts in it so that access to LLC/DRAM can be avoided.
It is recommended in HRM to sync the NDC contents before
releasing/resetting LF resources. Hence implement NDC_SYNC
mailbox and sync contexts during driver teardown.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
FUJITA Tomonori [Fri, 28 Jun 2024 13:41:16 +0000 (22:41 +0900)]
net: tn40xx: add initial ethtool_ops support
Call phylink_ethtool_ksettings_get() for get_link_ksettings method and
ethtool_op_get_link() for get_link method.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
struct nft_trans_elem can now be allocated from kmalloc-96 instead of
kmalloc-128 slab.
Series from Florian Westphal. For the record, I have mangled patch #1
to add nft_trans_container_*() and use if for every transaction object.
I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
at the beginning of the container transaction object. And few minor
cleanups, any new bugs are of my own.
Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.
Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.
Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
default conntrack timeouts via nfnetlink_cttimeout API, from
Lin Ma.
Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
larger secctx names than the existing 256 bytes length.
Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
nfnetlink_queue, from Florian Westphal.
Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Jul 2024 08:44:27 +0000 (09:44 +0100)]
Merge branch 'tcp_metrics-netlink-specs' into main
Jakub Kicinski says:
====================
tcp_metrics: add netlink protocol spec in YAML
Add a netlink protocol spec for the tcp_metrics generic netlink family.
First patch adjusts the uAPI header guards to make it easier to build
tools/ with non-system headers.
Jakub Kicinski [Thu, 27 Jun 2024 21:35:51 +0000 (14:35 -0700)]
tcp_metrics: add netlink protocol spec in YAML
Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
Useful at the very least for testing fixes.
In this episode of "10,000 ways to complicate netlink" the metric
nest has defines which are off by 1. iproute2 does:
struct rtattr *m[TCP_METRIC_MAX + 1 + 1];
parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);
for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
// ...
attr = m[i + 1];
This is too weird to support in YNL, add a new set of defines
with _correct_ values to the official kernel header.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 27 Jun 2024 21:35:50 +0000 (14:35 -0700)]
tcp_metrics: add UAPI to the header guard
tcp_metrics' header lacks the customary _UAPI in the header guard.
This makes YNL build rules work less seamlessly.
We can easily fix that on YNL side, but this could also be
problematic if we ever needed to create a kernel-only tcp_metrics.h.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2024 20:42:17 +0000 (22:42 +0200)]
net: phy: realtek: Add support for PHY LEDs on RTL8211F
Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to
indicate link status and activity. Add minimal LED controller driver
supporting the most common uses with the 'netdev' trigger.
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ethtool: track custom RSS contexts in the core
Make the core responsible for tracking the set of custom RSS contexts,
their IDs, indirection tables, hash keys, and hash functions; this
lets us get rid of duplicative code in drivers, and will allow us to
support netlink dumps later.
This series only moves the sfc EF10 & EF100 driver over to the new API;
other drivers (mvpp2, octeontx2, mlx5, sfc/siena, bnxt_en) can be converted
afterwards and the legacy API removed.
====================
Edward Cree [Thu, 27 Jun 2024 15:33:53 +0000 (16:33 +0100)]
net: ethtool: use the tracking array for get_rxfh on custom RSS contexts
On 'ethtool -x' with rss_context != 0, instead of calling the driver to
read the RSS settings for the context, just get the settings from the
rss_ctx xarray, and return them to the user with no driver involvement.
Edward Cree [Thu, 27 Jun 2024 15:33:52 +0000 (16:33 +0100)]
sfc: use new rxfh_context API
The core is now responsible for allocating IDs and a memory region for
us to store our state (struct efx_rss_context_priv), so we no longer
need efx_alloc_rss_context_entry() and friends.
Since the contexts are now maintained by the core, use the core's lock
(net_dev->ethtool->rss_lock), rather than our own mutex (efx->rss_lock),
to serialise access against changes; and remove the now-unused
efx->rss_lock from struct efx_nic.
Edward Cree [Thu, 27 Jun 2024 15:33:51 +0000 (16:33 +0100)]
net: ethtool: add a mutex protecting RSS contexts
While this is not needed to serialise the ethtool entry points (which
are all under RTNL), drivers may have cause to asynchronously access
dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to
do this safely without needing to take the RTNL.
Edward Cree [Thu, 27 Jun 2024 15:33:49 +0000 (16:33 +0100)]
net: ethtool: let the core choose RSS context IDs
Add a new API to create/modify/remove RSS contexts, that passes in the
newly-chosen context ID (not as a pointer) rather than leaving the
driver to choose it on create. Also pass in the ctx, allowing drivers
to easily use its private data area to store their hardware-specific
state.
Keep the existing .set_rxfh API for now as a fallback, but deprecate it
for custom contexts (rss_context != 0).
Edward Cree [Thu, 27 Jun 2024 15:33:48 +0000 (16:33 +0100)]
net: ethtool: record custom RSS contexts in the XArray
Since drivers are still choosing the context IDs, we have to force the
XArray to use the ID they've chosen rather than picking one ourselves,
and handle the case where they give us an ID that's already in use.
Edward Cree [Thu, 27 Jun 2024 15:33:47 +0000 (16:33 +0100)]
net: ethtool: attach an XArray of custom RSS contexts to a netdevice
Each context stores the RXFH settings (indir, key, and hfunc) as well
as optionally some driver private data.
Delete any still-existing contexts at netdev unregister time.
Jakub Kicinski [Thu, 27 Jun 2024 18:55:01 +0000 (11:55 -0700)]
selftests: drv-net: add ability to schedule cleanup with defer()
This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:
cmd("touch /tmp/file")
defer(cmd, "rm /tmp/file")
defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).
As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.
There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..
Jakub Kicinski [Thu, 27 Jun 2024 18:55:00 +0000 (11:55 -0700)]
selftests: net: ksft: avoid continue when handling results
Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.
Jon Kohler [Thu, 27 Jun 2024 20:20:13 +0000 (13:20 -0700)]
enic: add ethtool get_channel support
Add .get_channel to enic_ethtool_ops to enable basic ethtool -l
support to get the current channel configuration.
Note that the driver does not support dynamically changing queue
configuration, so .set_channel is intentionally unused. Instead, users
should use Cisco's hardware management tools (UCSM/IMC) to modify
virtual interface card configuration out of band.
====================
Lift UDP_SEGMENT restriction for egress via device w/o csum offload
This is a follow-up to an earlier question [1] if we can make UDP GSO work
with any egress device, even those with no checksum offload capability.
That's the default setup for TUN/TAP.
Because there is a change in behavior - sendmsg() does no longer return
EIO error - I'm submitting through net-next tree, rather than net,
as per Willem's advice.
Jakub Sitnicki [Wed, 26 Jun 2024 17:51:27 +0000 (19:51 +0200)]
selftests/net: Add test coverage for UDP GSO software fallback
Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.
Test against a dummy device because it is simpler to set up then TUN/TAP.
This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.
However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.
Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.
Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.
Marc Kleine-Budde [Fri, 28 Jun 2024 21:49:37 +0000 (23:49 +0200)]
Merge patch series "can: mcp251xfd: workaround for erratum DS80000789E 6 of mcp2518fd"
Marc Kleine-Budde <mkl@pengutronix.de> says:
This patch series tries to work around erratum DS80000789E 6 of the
mcp2518fd, found by Stefan Althöfer, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.
In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value.
For the RX FIDO this caused old, already processed CAN frames or new,
incompletely written CAN frames to be (re-)processed.
To work around this issue, keep a per FIFO timestamp of the last valid
received CAN frame and compare against the timestamp of every received
CAN frame.
Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames are not recognized but then processed. These CAN frames have
the correct data and time stamps, but the DLC has not yet been
updated.
For the TEF FIFO the original driver already detects the error, update
the error handling with the knowledge that it is causes by this erratum.
Marc Kleine-Budde [Sun, 22 Jan 2023 21:35:03 +0000 (22:35 +0100)]
can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
This patch updates the workaround for a problem similar to erratum
DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.
In the bad case, the driver reads a too large head index. As the FIFO
is implemented as a ring buffer, this results in re-handling old CAN
transmit complete events.
Every transmit complete event contains with a sequence number that
equals to the sequence number of the corresponding TX request. This
way old TX complete events can be detected.
If the original driver detects a non matching sequence number, it
prints an info message and tries again later. As wrong sequence
numbers can be explained by the erratum DS80000789E 6, demote the info
message to debug level, streamline the code and update the comments.
Keep the behavior: If an old CAN TX complete event is detected, abort
the iteration and mark the number of valid CAN TX complete events as
processed in the chip by incrementing the FIFO's tail index.
Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sun, 22 Jan 2023 20:30:41 +0000 (21:30 +0100)]
can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
This is a preparatory patch to work around a problem similar to
erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.
When handling the TEF interrupt, the driver reads the FIFO header
index from the TEF FIFO STA register of the chip.
In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN transmit complete events that were already processed to be
re-processed.
Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().
The mcp251xfd_handle_tefif() function reads the CAN transmit complete
events from the chip, iterates over them and pushes them into the
network stack. The original driver already contains code to detect old
CAN transmit complete events, that will be updated in the next patch.
Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 11 Jan 2023 10:53:50 +0000 (11:53 +0100)]
can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
This patch tries to works around erratum DS80000789E 6 of the
mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.
In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old, already processed CAN frames or new, incompletely written CAN
frames to be (re-)processed.
To work around this issue, keep a per FIFO timestamp [1] of the last
valid received CAN frame and compare against the timestamp of every
received CAN frame. If an old CAN frame is detected, abort the
iteration and mark the number of valid CAN frames as processed in the
chip by incrementing the FIFO's tail index.
Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames [2] are not recognized but then processed. These CAN frames
have the correct data and time stamps, but the DLC has not yet been
updated.
[1] As the raw timestamp overflows every 107 seconds (at the usual
clock rate of 40 MHz) convert it to nanoseconds with the
timecounter framework and use this to detect stale CAN frames.
Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com Reported-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 11 Jan 2023 20:07:03 +0000 (21:07 +0100)]
can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
This is a preparatory patch to work around erratum DS80000789E 6 of
the mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.
When handling the RX interrupt, the driver iterates over all pending
FIFOs (which are implemented as ring buffers in hardware) and reads
the FIFO header index from the RX FIFO STA register of the chip.
In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN frames that were already processed, or new, incompletely
written CAN frames to be (re-)processed.
Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_rx_ring_update() with mcp251xfd_get_rx_len().
The mcp251xfd_handle_rxif_ring() function reads the received CAN
frames from the chip, iterates over them and pushes them into the
network stack. Prepare that the iteration can be stopped if an old CAN
frame is detected. The actual code to detect old or incomplete frames
and abort will be added in the next patch.
Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com Reported-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 11 Jan 2023 19:25:48 +0000 (20:25 +0100)]
can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
This is a preparation patch.
Sending the UINC messages followed by incrementing the tail pointer
will be called in more than one place in upcoming patches, so factor
this out into a separate function.
Also make mcp251xfd_handle_rxif_ring_uinc() safe to be called with a
"len" of 0.
Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 11 Jan 2023 10:48:16 +0000 (11:48 +0100)]
can: mcp251xfd: clarify the meaning of timestamp
The mcp251xfd chip is configured to provide a timestamp with each
received and transmitted CAN frame. The timestamp is derived from the
internal free-running timer, which can also be read from the TBC
register via SPI. The timer is 32 bits wide and is clocked by the
external oscillator (typically 20 or 40 MHz).
To avoid confusion, we call this timestamp "timestamp_raw" or "ts_raw"
for short.
Using the timecounter framework, the "ts_raw" is converted to 64 bit
nanoseconds since the epoch. This is what we call "timestamp".
This is a preparation for the next patches which use the "timestamp"
to work around a bug where so far only the "ts_raw" is used.
Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com> Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 11 Jan 2023 11:10:04 +0000 (12:10 +0100)]
can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity
is detected. To avoid this, make sure that the timestamp worker is
stopped before shutting down the chip.
Split the starting of the timestamp worker out of
mcp251xfd_timestamp_init() into the separate function
mcp251xfd_timestamp_start().
Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move
mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way,
mcp251xfd_timestamp_stop() can be called unconditionally by
mcp251xfd_chip_stop().
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sun, 5 May 2024 12:40:10 +0000 (14:40 +0200)]
can: mcp251xfd: update errata references
Since the errata references have been added to the driver, new errata
sheets have been published. Update the references for the mcp2517fd
and mcp2518fd. For completeness add references for the mcp251863.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Przemek Kitszel [Mon, 17 Jun 2024 13:24:07 +0000 (15:24 +0200)]
ice: do not init struct ice_adapter more times than needed
Allocate and initialize struct ice_adapter object only once per physical
card instead of once per port. This is not a big deal by now, but we want
to extend this struct more and more in the near future. Our plans include
PTP stuff and a devlink instance representing whole-device/physical card.
Transactions requiring to be sleep-able (like those doing user (here ice)
memory allocation) must be performed with an additional (on top of xarray)
mutex. Adding it here removes need to xa_lock() manually.
Since this commit is a reimplementation of ice_adapter_get(), a rather new
scoped_guard() wrapper for locking is used to simplify the logic.
It's worth to mention that xa_insert() use gives us both slot reservation
and checks if it is already filled, what simplifies code a tiny bit.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Gardocki [Fri, 14 Jun 2024 10:38:11 +0000 (12:38 +0200)]
ice: Distinguish driver reset and removal for AQ shutdown
Admin queue command for shutdown AQ contains a flag to indicate driver
unload. However, the flag is always set in the driver, even for resets. It
can cause the firmware to consider driver as unloaded once the PF reset is
triggered on all ports of device, which could lead to unexpected results.
Add an additional function parameter to functions that shutdown AQ,
indicating whether the driver is actually unloading.
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Thu, 6 Jun 2024 13:48:46 +0000 (09:48 -0400)]
ice: Allow different FW API versions based on MAC type
Allow the driver to be compatible with different FW API versions based
on the device's MAC type. Currently, E810 is only compatible with one
FW API version. Now the driver can be compatible with different FW API
versions for both E810 and E830. For example, E810 FW API version is
1.5.0 and E830 is 1.7.0.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Phil Sutter [Wed, 26 Jun 2024 22:35:05 +0000 (00:35 +0200)]
netfilter: xt_recent: Lift restrictions on max hitcount value
Support tracking of up to 65535 packets per table entry instead of just
255 to better facilitate longer term tracking or higher throughput
scenarios.
Note how this aligns sizes of struct recent_entry's 'nstamps' and
'index' fields when 'nstamps' was larger before. This is unnecessary as
the value of 'nstamps' grows along with that of 'index' after being
initialized to 1 (see recent_entry_update()). Its value will thus never
exceed that of 'index' and therefore does not need to provide space for
larger values.
Florian Westphal [Tue, 25 Jun 2024 19:07:44 +0000 (21:07 +0200)]
selftests: netfilter: nft_queue.sh: add test for disappearing listener
If userspace program exits while the queue its subscribed to has packets
those need to be discarded.
commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.
Add a test case to cover this.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Fri, 28 Jun 2024 09:55:38 +0000 (10:55 +0100)]
Merge branch 'net-selftests-mirroring-cleanup' into main
Petr Machata says:
====================
selftest: Clean-up and stabilize mirroring tests
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy.
mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But that only works up to a point, and on
busy systems won't be always enough.
In this patch set, clean up and stabilize the mirroring selftests. The
original intention was to port the tests over to UDP, but the logic of
ICMP ends up being so entangled in the mirroring selftests that the
changes feel overly invasive. Instead, ICMP is kept, but where possible,
we match on ICMP message type, thus filtering out hits by other ICMP
messages.
Where this is not practical (where the counter tap is put on a device
that carries encapsulated packets), switch the counter condition to _at
least_ X observed packets. This is less robust, but barely so --
probably the only scenario that this would not catch is something like
erroneous packet duplication, which would hopefully get caught by the
numerous other tests in this extensive suite.
- Patches #1 to #3 clean up parameters at various helpers.
- Patches #4 to #6 stabilize the mirroring selftests as described above.
- Mirroring tests currently allow testing SW datapath even on HW
netdevices by trapping traffic to the SW datapath. This complicates
the tests a bit without a good reason: to test SW datapath, just run
the selftests on the veth topology. Thus in patch #7, drop support for
this dual SW/HW testing.
- At this point, some cleanups were either made possible by the previous
patches, or were always possible. In patches #8 to #11, realize these
cleanups.
- In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 27 Jun 2024 14:48:49 +0000 (16:48 +0200)]
selftests: mlxsw: mirror_gre: Obey TESTS
This test is unusual in that overriding TESTS does not change the tests to
be run. Split the individual tests into several functions and invoke them
through tests_run() as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 27 Jun 2024 14:48:48 +0000 (16:48 +0200)]
selftests: libs: Drop unused functions
Nothing calls these.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 27 Jun 2024 14:48:47 +0000 (16:48 +0200)]
selftests: libs: Drop slow_path_trap_install()/_uninstall()
These functions are not used anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 27 Jun 2024 14:48:46 +0000 (16:48 +0200)]
selftests: mirror_gre_lag_lacp: Drop unnecessary code
The selftest does not use functions from mirror_gre_lib, ditch the import.
It does not use arping either, so drop the require_command as well.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 27 Jun 2024 14:48:45 +0000 (16:48 +0200)]
selftests: mlxsw: mirror_gre: Simplify
After the previous patch, the function test_span_failable() is always
called with should_fail=1. Drop the argument and streamline the code.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>