Sebastian Andrzej Siewior [Fri, 6 Sep 2024 13:25:31 +0000 (15:25 +0200)]
net: hsr: Use the seqnr lock for frames received via interlink port.
syzbot reported that the seqnr_lock is not acquire for frames received
over the interlink port. In the interlink case a new seqnr is generated
and assigned to the frame.
Frames, which are received over the slave port have already a sequence
number assigned so the lock is not required.
Acquire the hsr_priv::seqnr_lock during in the invocation of
hsr_forward_skb() if a packet has been received from the interlink port.
Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ Fixes: 5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukasz Majewski <lukma@denx.de> Tested-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 20:46:56 +0000 (13:46 -0700)]
Merge tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.12
The last -next "new features" pull request for v6.12. The stack now
supports DFS on MLO but otherwise nothing really standing out.
Major changes:
cfg80211/mac80211
* EHT rate support in AQL airtime
* DFS support for MLO
rtw89
* complete BT-coexistence code for RTL8852BT
* RTL8922A WoWLAN net-detect support
* tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
wifi: brcmfmac: cfg80211: Convert comma to semicolon
wifi: rsi: Remove an unused field in struct rsi_debugfs
wifi: libertas: Cleanup unused declarations
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe()
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe()
wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param
wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()
wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors
wifi: cfg80211: fix kernel-doc for per-link data
wifi: mt76: mt7925: replace chan config with extend txpower config for clc
wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc
wifi: mt76: mt7615: check devm_kasprintf() returned value
wifi: mt76: mt7925: convert comma to semicolon
wifi: mt76: mt7925: fix a potential association failure upon resuming
wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings
wifi: mt76: mt7921: Check devm_kasprintf() returned value
wifi: mt76: mt7915: check devm_kasprintf() returned value
wifi: mt76: mt7915: avoid long MCU command timeouts during SER
wifi: mt76: mt7996: fix uninitialized TLV data
...
====================
David S. Miller [Wed, 11 Sep 2024 10:06:12 +0000 (11:06 +0100)]
Merge branch 'lan743x-phylink'
Raju Lakkaraju says:
====================
Add support to PHYLINK for LAN743x/PCI11x1x chips
This is the follow-up patch series of
https://lkml.iu.edu/hypermail/linux/kernel/2310.2/02078.html
Divide the PHYLINK adaptation and SFP modifications into two separate patch
series.
The current patch series focuses on transitioning the LAN743x driver's PHY
support from phylib to phylink.
Tested on PCI11010 Rev-1 Evaluation board
Change List:
============
V5 -> V6:
- Remove the lan743x_find_max_speed( ) function. Not require
- Add EEE enable check before calling lan743x_mac_eee_enable( ) function
V4 -> V5:
- Remove the fixed_phy_unregister( ) function. Not require
- Remove the "phydev->eee_enabled" check to update the MAC EEE
enable/disable
- Call lan743x_mac_eee_enable() with true after update tx_lpi_timer.
- Add phy_support_eee() to initialize the EEE flags
V3 -> V4:
- Add fixed-link patch along with this series.
Note: Note: This code was developed by Mr.Russell King
Ref:
https://lore.kernel.org/netdev/LV8PR11MB8700C786F5F1C274C73036CC9F8E2@LV8PR11MB8700.namprd11.prod.outlook.com/T/#me943adf54f1ea082edf294aba448fa003a116815
- Change phylink fixed-link function header's string from "Returns" to
"Returns:"
- Remove the EEE private variable from LAN743x adapter strcture and fix the
EEE's set/get functions
- set the individual caps (i.e. _RGMII, _RGMII_ID, _RGMII_RXID and
__RGMII_TXID) replace with phy_interface_set_rgmii( ) function
- Change lan743x_set_eee( ) to lan743x_mac_eee_enable( )
V2 -> V3:
- Remove the unwanted parens in each of these if() sub-blocks
- Replace "to_net_dev(config->dev)" with "netdev".
- Add GMII_ID/RGMII_TXID/RGMII_RXID in supported_interfaces
- Fix the lan743x_phy_handle_exists( ) return type
V1 -> V2:
- Fix the Russell King's comments i.e. remove the speed, duplex update in
lan743x_phylink_mac_config( )
- pre-March 2020 legacy support has been removed
V0 -> V1:
- Integrate with Synopsys DesignWare XPCS drivers
- Based on external review comments,
- Changes made to SGMII interface support only 1G/100M/10M bps speed
- Changes made to 2500Base-X interface support only 2.5Gbps speed
- Add check for not is_sgmii_en with is_sfp_support_en support
- Change the "pci11x1x_strap_get_status" function return type from void to
int
- Add ethtool phylink wol, eee, pause get/set functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: lan743x: Add support to ethtool phylink get and set settings
Add support to ethtool phylink functions:
- get/set settings like speed, duplex etc
- get/set the wake-on-lan (WOL)
- get/set the energy-efficient ethernet (EEE)
- get/set the pause
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: lan743x: Create separate Link Speed Duplex state function
Create separate Link Speed Duplex (LSD) update state function from
lan743x_sgmii_config () to use as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: lan743x: Create separate PCS power reset function
Create separate PCS power reset function from lan743x_sgmii_config () to use
as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 6 Sep 2024 10:35:07 +0000 (16:05 +0530)]
net: phylink: Add phylink_set_fixed_link() to configure fixed link state in phylink
The function allows for the configuration of a fixed link state for a given
phylink instance. This addition is particularly useful for network devices that
operate with a fixed link configuration, where the link parameters do not change
dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up
the fixed link state during initialization or configuration changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 11 Sep 2024 03:05:09 +0000 (20:05 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: support devlink subfunction
Michal Swiatkowski says:
Currently ice driver does not allow creating more than one networking
device per physical function. The only way to have more hardware backed
netdev is to use SR-IOV.
Following patchset adds support for devlink port API. For each new
pcisf type port, driver allocates new VSI, configures all resources
needed, including dynamically MSIX vectors, program rules and registers
new netdev.
This series supports only one Tx/Rx queue pair per subfunction.
Example commands:
devlink port add pci/0000:31:00.1 flavour pcisf pfnum 1 sfnum 1000
devlink port function set pci/0000:31:00.1/1 hw_addr 00:00:00:00:03:14
devlink port function set pci/0000:31:00.1/1 state active
devlink port function del pci/0000:31:00.1/1
Make the port representor and eswitch code generic to support
subfunction representor type.
VSI configuration is slightly different between VF and SF. It needs to
be reflected in the code.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: subfunction activation and base devlink ops
ice: basic support for VLAN in subfunctions
ice: support subfunction devlink Tx topology
ice: implement netdevice ops for SF representor
ice: check if SF is ready in ethtool ops
ice: don't set target VSI for subfunction
ice: create port representor for SF
ice: make representor code generic
ice: implement netdev for subfunction
ice: base subfunction aux driver
ice: allocate devlink for subfunction
ice: treat subfunction VSI the same as PF VSI
ice: add basic devlink subfunctions support
ice: export ice ndo_ops functions
ice: add new VSI type for subfunctions
====================
Jakub Kicinski [Wed, 11 Sep 2024 03:01:15 +0000 (20:01 -0700)]
Merge tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Jakub Kicinski [Wed, 11 Sep 2024 01:42:48 +0000 (18:42 -0700)]
Merge branch 'bnxt_en-msix-improvements'
Michael Chan says:
====================
bnxt_en: MSIX improvements
This patchset makes some improvements related to MSIX. The first
patch adjusts the default MSIX vectors assigned for RoCE. On the
PF, the number of MSIX is increased to 64 from the current 9. The
second patch allocates additional MSIX vectors ahead of time when
changing ethtool channels if dynamic MSIX is supported. The 3rd
patch makes sure that the IRQ name is not truncated.
====================
bnxt_en: resize bnxt_irq name field to fit format string
The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation. Truncated IRQ names are
less meaningful to the user. For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.
Make sure we have space for the extra characters added to the IRQ
names:
- the characters introduced by the static format string: hyphens
- the maximal static substituted ring type string: "TxRx"
- the maximum length of an integer formatted as a string, even
though reasonable ring numbers would never be as long as this.
Michael Chan [Mon, 9 Sep 2024 20:27:36 +0000 (13:27 -0700)]
bnxt_en: Add MSIX check in bnxt_check_rings()
bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings. MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.
Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.
For example, the IOMMU group may limit the number of MSIX vectors
for the device. With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.
It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "amlogic,tx-delay-ns" property schema has unnecessary type reference
as it's a standard unit suffix, and the constraints are in freeform
text rather than schema.
The partial rx checksum feature computes a checksum over the entire
packet, regardless of the L3 protocol. Remove the check for IPv4.
Additionally, testing with csum.py (from kselftests) shows no anomalies
with 64-byte packets, so we can remove that check as well.
Sean Anderson [Mon, 9 Sep 2024 16:10:14 +0000 (12:10 -0400)]
net: xilinx: axienet: Enable NETIF_F_HW_CSUM for partial tx checksumming
Partial tx chechsumming is completely generic and does not depend on the
L3/L4 protocol. Signal this to the net subsystem by enabling the
more-generic offload feature (instead of restricting ourselves to
TCP/UDP over IPv4 checksumming only like is necessary with full
checksumming).
Dave Taht [Mon, 9 Sep 2024 09:16:28 +0000 (11:16 +0200)]
sch_cake: constify inverse square root cache
sch_cake uses a cache of the first 16 values of the inverse square root
calculation for the Cobalt AQM to save some cycles on the fast path.
This cache is populated when the qdisc is first loaded, but there's
really no reason why it can't just be pre-populated. So change it to be
pre-populated with constants, which also makes it possible to constify
it.
This gives a modest space saving for the module (not counting debug data):
.text: -224 bytes
.rodata: +80 bytes
.bss: -64 bytes
Total: -192 bytes
Signed-off-by: Dave Taht <dave.taht@gmail.com>
[ fixed up comment, rewrote commit message ] Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20240909091630.22177-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net-timestamp: introduce a flag to filter out rx software and hardware report
When one socket is set SOF_TIMESTAMPING_RX_SOFTWARE which means the
whole system turns on the netstamp_needed_key button, other sockets
that only have SOF_TIMESTAMPING_SOFTWARE will be affected and then
print the rx timestamp information even without setting
SOF_TIMESTAMPING_RX_SOFTWARE generation flag.
How to solve it without breaking users?
We introduce a new flag named SOF_TIMESTAMPING_OPT_RX_FILTER. Using
it together with SOF_TIMESTAMPING_SOFTWARE can stop reporting the
rx software timestamp.
Similarly, we also filter out the hardware case where one process
enables the rx hardware generation flag, then another process only
passing SOF_TIMESTAMPING_RAW_HARDWARE gets the timestamp. So we can set
both SOF_TIMESTAMPING_RAW_HARDWARE and SOF_TIMESTAMPING_OPT_RX_FILTER
to stop reporting rx hardware timestamp after this patch applied.
Jason Xing [Mon, 9 Sep 2024 01:56:11 +0000 (09:56 +0800)]
net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Jason Xing [Sun, 8 Sep 2024 12:41:41 +0000 (20:41 +0800)]
net-timestamp: correct the use of SOF_TIMESTAMPING_RAW_HARDWARE
SOF_TIMESTAMPING_RAW_HARDWARE is a report flag which passes the
timestamps generated by either SOF_TIMESTAMPING_TX_HARDWARE or
SOF_TIMESTAMPING_RX_HARDWARE to the userspace all the time.
Drop driver defined stmmac_fpe_state, and switch to common
ethtool_mm_verify_status for local TX verification status.
Local side and remote side verification processes are completely
independent. There is no reason at all to keep a local state and
a remote state.
Add a spinlock to avoid races among ISR, timer, link update
and register configuration.
This patch is based on Vladimir Oltean's proposal.
Vladimir Oltean says:
====================
In the INITIAL state, the timer sends MPACKET_VERIFY. Eventually the
stmmac_fpe_event_status() IRQ fires and advances the state to VERIFYING,
then rearms the timer after verify_time ms. If a subsequent IRQ comes in
and modifies the state to SUCCEEDED after getting MPACKET_RESPONSE, the
timer sees this. It must enable the EFPE bit now. Otherwise, it
decrements the verify_limit counter and tries again. Eventually it
moves the status to FAILED, from which the IRQ cannot move it anywhere
else, except for another stmmac_fpe_apply() call.
====================
Alexander Dahl [Fri, 6 Sep 2024 06:22:56 +0000 (08:22 +0200)]
net: mdiobus: Debug print fwnode handle instead of raw pointer
Was slightly misleading before, because printed is pointer to fwnode,
not to phy device, as placement in message suggested. Include header
for dev_dbg() declaration while at it.
D. Wythe [Fri, 6 Sep 2024 02:35:35 +0000 (10:35 +0800)]
net/smc: add sysctl for smc_limit_hs
In commit 48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"),
we introduce a mechanism to put constraint on SMC connections visit
according to the pressure of SMC handshake process.
At that time, we believed that controlling the feature through netlink
was sufficient. However, most people have realized now that netlink is
not convenient in container scenarios, and sysctl is a more suitable
approach.
In addition, since commit 462791bbfa35 ("net/smc: add sysctl interface for SMC")
had introcuded smc_sysctl_net_init(), it is reasonable for us to
initialize limit_smc_hs in it instead of initializing it in
smc_pnet_net_int().
Lee Trager [Thu, 5 Sep 2024 23:37:51 +0000 (16:37 -0700)]
eth: fbnic: Add devlink firmware version info
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
====================
net: lan966x: use the newly introduced FDMA library
This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation. This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.
In this second series, the FDMA library will be taken into use by the
lan966x switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #8: uses the library for adding DCB's in the tx path.
Patch #9: uses the library helpers in the tx path.
Patch #10: ditch last_in_use variable and use library instead.
Daniel Machon [Thu, 5 Sep 2024 08:06:38 +0000 (10:06 +0200)]
net: lan966x: ditch tx->last_in_use variable
This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:36 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the tx path
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:33 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the rx path
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:31 +0000 (10:06 +0200)]
net: lan966x: replace a few variables with new equivalent ones
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure. This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.
The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:
Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.
The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.
Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.
Shannon Nelson [Fri, 6 Sep 2024 23:26:22 +0000 (16:26 -0700)]
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure. This
converts the Rx traffic queues to use page_pool.
Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.
include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]
ionic: Fully reconfigure queues when going to/from a NULL XDP program
Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.
In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.
Shannon Nelson [Fri, 6 Sep 2024 23:26:20 +0000 (16:26 -0700)]
ionic: always use rxq_info
Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use. This is the more common use pattern and better supports
future conversion to page_pool. Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.
Shannon Nelson [Fri, 6 Sep 2024 23:26:19 +0000 (16:26 -0700)]
ionic: use per-queue xdp_prog
We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program. With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers. However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.
This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp(). We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.
Shannon Nelson [Fri, 6 Sep 2024 23:26:18 +0000 (16:26 -0700)]
ionic: rename ionic_xdp_rx_put_bufs
We aren't "putting" buf, we're just unlinking them from our tracking in
order to let the XDP_TX and XDP_REDIRECT tx clean paths take care of the
pages when they are done with them. This rename clears up the intent.
Gal Pressman [Fri, 6 Sep 2024 14:46:32 +0000 (17:46 +0300)]
ptp: ptp_ines: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-17-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:31 +0000 (17:46 +0300)]
ixp4xx_eth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20240906144632.404651-16-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:30 +0000 (17:46 +0300)]
net: stmmac: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-15-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:29 +0000 (17:46 +0300)]
sfc/siena: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20240906144632.404651-14-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:28 +0000 (17:46 +0300)]
sfc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20240906144632.404651-13-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:27 +0000 (17:46 +0300)]
qede: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-12-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:26 +0000 (17:46 +0300)]
net: mscc: ocelot: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-11-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:25 +0000 (17:46 +0300)]
net/funeth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-10-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:24 +0000 (17:46 +0300)]
enic: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-9-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:23 +0000 (17:46 +0300)]
net: thunderx: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-8-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:22 +0000 (17:46 +0300)]
liquidio: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-7-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:21 +0000 (17:46 +0300)]
net: macb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://patch.msgid.link/20240906144632.404651-6-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:20 +0000 (17:46 +0300)]
amd-xgbe: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://patch.msgid.link/20240906144632.404651-5-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:19 +0000 (17:46 +0300)]
bonding: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:18 +0000 (17:46 +0300)]
tg3: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:17 +0000 (17:46 +0300)]
bnxt_en: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pa_stats is optional in dt bindings, make it optional in driver as well.
Currently if pa_stats syscon regmap is not found driver returns -ENODEV.
Fix this by not returning an error in case pa_stats is not found and
continue generating ethtool stats without pa_stats.
Fixes: 550ee90ac61c ("net: ti: icssg-prueth: Add support for PA Stats") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240906093649.870883-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 6 Sep 2024 07:36:09 +0000 (08:36 +0100)]
net: ibm: emac: Use __iomem annotation for emac_[xg]aht_base
dev->emacp contains an __iomem pointer and values derived
from it are used as __iomem pointers. So use this annotation
in the return type for helpers that derive pointers from dev->emacp.
Flagged by Sparse as:
.../core.c:444:36: warning: incorrect type in argument 1 (different address spaces)
.../core.c:444:36: expected unsigned int volatile [noderef] [usertype] __iomem *addr
.../core.c:444:36: got unsigned int [usertype] *
.../core.c: note: in included file:
.../core.h:416:25: warning: cast removes address space '__iomem' of expression
Compile tested only.
No functional change intended.
Willem de Bruijn [Thu, 5 Sep 2024 23:15:52 +0000 (19:15 -0400)]
selftests/net: integrate packetdrill with ksft
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").
This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.
Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.
Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.
The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.
Implementation notes:
- restore alphabetical order when adding the new directory to
tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
except for
- update `source ./defaults.sh` path (to adjust for flat dir)
- add SPDX headers
- remove one author statement
- Acknowledgment: drop an e (checkpatch)
Tested:
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
run_tests
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
install INSTALL_PATH=$KSFT_INSTALL_PATH
# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt
Willem de Bruijn [Thu, 5 Sep 2024 23:15:51 +0000 (19:15 -0400)]
selftests: support interpreted scripts with ksft_runner.sh
Support testcases that are themselves not executable, but need an
interpreter to run them.
If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run
./ksft_runner.sh ./$BASENAME_TEST
Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.
Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.
Sven Eckelmann [Thu, 5 Sep 2024 19:49:38 +0000 (12:49 -0700)]
net: ag71xx: disable napi interrupts during probe
ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1
interrupts. The handler is trying to use napi_schedule to handle the
processing of packets. But the netif_napi_add for this device is
called a lot later in ag71xx_probe.
It can therefore happen that a still running gmac0/gmac1 is triggering the
interrupt handler with a bit from AG71XX_INT_POLL set in
AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the
napi code will crash the system because the ag->napi is not yet
initialized.
The gmcc0/gmac1 must be brought in a state in which it doesn't signal a
AG71XX_INT_POLL related status bits as interrupt before registering the
interrupt handler. ag71xx_hw_start will take care of re-initializing the
AG71XX_REG_INT_ENABLE.
This will become relevant when dual GMAC devices get added here.
====================
af_unix: Correct manage_oob() when OOB follows a consumed OOB.
Recently syzkaller reported UAF of OOB skb.
The bug was introduced by commit 93c99f21db36 ("af_unix: Don't stop
recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") but uncovered
by another recent commit 8594d9b85c07 ("af_unix: Don't call skb_get()
for OOB skb.").
syzbot reported use-after-free in unix_stream_recv_urg(). [0]
The scenario is
1. send(MSG_OOB)
2. recv(MSG_OOB)
-> The consumed OOB remains in recv queue
3. send(MSG_OOB)
4. recv()
-> manage_oob() returns the next skb of the consumed OOB
-> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
5. recv(MSG_OOB)
-> unix_sk(sk)->oob_skb is used but already freed
The recent commit 8594d9b85c07 ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.
If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.
Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.
Note that we need to add a similar check for SIOCATMARK.
[0]:
BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
Read of size 4 at addr ffff8880326abcc4 by task syz-executor178/5235
The buggy address belongs to the object at ffff8880326abc80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 68 bytes inside of
freed 240-byte region [ffff8880326abc80, ffff8880326abd70)
Memory state around the buggy address: ffff8880326abb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880326abc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff8880326abc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8880326abd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff8880326abd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: 93c99f21db36 ("af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") Reported-by: syzbot+8811381d455e3e9ec788@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8811381d455e3e9ec788 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240905193240.17565-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first patch is by Rob Herring and simplifies the DT parsing in the
cc770 driver.
The next 2 patches both target the rockchip_canfd driver added in the
last pull request to net-next. The first one is by Nathan Chancellor
and fixes the return type of rkcanfd_start_xmit(), the second one is
by me and fixes a 64 bit division on 32 bit platforms in
rkcanfd_timestamp_init().
* tag 'linux-can-next-for-6.12-20240909' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rockchip_canfd: rkcanfd_timestamp_init(): fix 64 bit division on 32 bit platforms
can: rockchip_canfd: fix return type of rkcanfd_start_xmit()
net: can: cc770: Simplify parsing DT properties
====================
Jakub Kicinski [Fri, 6 Sep 2024 16:10:59 +0000 (09:10 -0700)]
net: remove dev_pick_tx_cpu_id()
dev_pick_tx_cpu_id() has been introduced with two users by
commit a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions").
The use in AF_PACKET has been removed in 2019 by
commit b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
The other user was a Netlogic XLP driver, removed in 2021 by
commit 47ac6f567c28 ("staging: Remove Netlogic XLP network driver").
It's relatively unlikely that any modern driver will need an
.ndo_select_queue implementation which picks purely based on CPU ID
and skips XPS, delete dev_pick_tx_cpu_id()
====================
selftests: mptcp: add time per subtests in TAP output
Patches here add 'time=<N>ms' in the diagnostic data of the TAP output,
e.g.
ok 1 - pm_netlink: defaults addr list # time=9ms
This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.
Note that there are no specific formats to follow to show this time
according to the TAP 13, TAP 14 and KTAP specifications, but we follow
the format being parsed by NIPA [1].
Patch 1 modifies mptcp_lib.sh to add this support to all MPTCP
selftests.
Patch 2 removes the now duplicated info in mptcp_connect.sh
Patch 3 slightly improves the precision of the first subtests in all
MPTCP subtests.
Patches 4 and 5 remove duplicated spaces in TAP output, for the TAP
parsers that cannot handle them properly.