After blamed commit rtm_to_fib_config() now calls
lwtunnel_valid_encap_type{_attr}() without RTNL held,
triggering an unlock balance in __rtnl_unlock,
as reported by syzbot [1]
IPv6 and rtm_to_nh_config() are not yet converted.
Add a temporary @rtnl_is_held parameter to lwtunnel_valid_encap_type()
and lwtunnel_valid_encap_type_attr().
While we are at it replace the two rcu_dereference()
in lwtunnel_valid_encap_type() with more appropriate
rcu_access_pointer().
[1]
syz-executor245/5836 is trying to release lock (rtnl_mutex) at:
[<ffffffff89d0e38c>] __rtnl_unlock+0x6c/0xf0 net/core/rtnetlink.c:142
but there are no more locks to release!
other info that might help us debug this:
no locks held by syz-executor245/5836.
Heiner Kallweit [Mon, 3 Mar 2025 20:19:25 +0000 (21:19 +0100)]
net: phy: remove remaining PHY package related definitions from phy.h
Move definition of struct phy_package_shared to phy_package.c, and
move remaining PHY package related declarations from phy.h to
phylib.h, thus making them accessible for PHY drivers only.
Heiner Kallweit [Mon, 3 Mar 2025 20:18:46 +0000 (21:18 +0100)]
net: phy: move PHY package related code from phy.h to phy_package.c
Move PHY package related inline functions from phy.h to phy_package.c.
While doing so remove locked versions phy_package_read() and
phy_package_write() which have no user.
Heiner Kallweit [Mon, 3 Mar 2025 20:15:09 +0000 (21:15 +0100)]
net: phy: add getters for public members in struct phy_package_shared
Add getters for public members, this prepares for making struct
phy_package_shared private to phylib. Declare the getters in a new header
file phylib.h, which will be used by PHY drivers only.
====================
Enable SGMII and 2500BASEX interface mode switching for Intel platforms
During the interface mode change, the 'phylink_major_config' function will
be triggered in phylink. The modification of the following functions will
support the switching between SGMII and 2500BASE-X interface modes for
the Intel platform:
- xpcs_switch_interface_mode: Re-initiates clause 37 auto-negotiation for
the SGMII interface mode to perform auto-negotiation.
- mac_finish: Configures the SerDes according to the interface mode.
With the above changes, the code will work as follows during the interface
mode change. The PCS will reconfigure according to the pcs_neg_mode and the
selected interface mode. Then, the MAC driver will perform SerDes
configuration in 'mac_finish' based on the selected interface mode. During
the SerDes configuration, the selected interface mode will identify TSN
lane registers from FIA and then send IPC commands to the Power Management
Controller (PMC) through the PMC driver/API. The PMC will act as a proxy to
program the PLL registers.
====================
net: stmmac: configure SerDes according to the interface mode
Intel platform will configure the SerDes through PMC API based on the
provided interface mode.
This patch adds several new functions below:-
- intel_tsn_lane_is_available(): This new function reads FIA lane
ownership registers and common lane registers through IPC commands
to know which lane the mGbE port is assigned to.
- intel_mac_finish(): To configure the SerDes based on the assigned
lane and latest interface mode, it sends IPC command to the PMC through
PMC driver/API. The PMC acts as a proxy for R/W on behalf of the driver.
- intel_set_reg_access(): Set the register access to the available TSN
interface.
David E. Box [Thu, 27 Feb 2025 12:15:19 +0000 (20:15 +0800)]
arch: x86: add IPC mailbox accessor function and add SoC register access
- Exports intel_pmc_ipc() for host access to the PMC IPC mailbox
- Enables the host to access specific SoC registers through the PMC
firmware using IPC commands. This access method is necessary for
registers that are not available through direct Memory-Mapped I/O (MMIO),
which is used for other accessible parts of the PMC.
Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Chao Qin <chao.qin@intel.com> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250227121522.1802832-4-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The xpcs_switch_interface_mode function was introduced to handle
interface switching.
According to the XPCS datasheet, a soft reset is required to initiate
Clause 37 auto-negotiation when the XPCS switches interface modes.
When the interface mode switches from 2500BASE-X to SGMII,
re-initiating Clause 37 auto-negotiation is required for the SGMII
interface mode to function properly.
net: phylink: use pl->link_interface in phylink_expects_phy()
The phylink_expects_phy() function allows MAC drivers to check if they are
expecting a PHY to attach. The checking condition in phylink_expects_phy()
aims to achieve the same result as the checking condition in
phylink_attach_phy().
However, the checking condition in phylink_expects_phy() uses
pl->link_config.interface, while phylink_attach_phy() uses
pl->link_interface.
Initially, both pl->link_interface and pl->link_config.interface are set
to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND.
When the interface switches from SGMII to 2500BASE-X,
pl->link_config.interface is updated by phylink_major_config().
At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and
pl->link_config.interface is set to 2500BASE-X.
Subsequently, when the STMMAC interface is taken down
administratively and brought back up, it is blocked by
phylink_expects_phy().
Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the
same result, phylink_expects_phy() should check pl->link_interface,
which never changes, instead of pl->link_config.interface, which is
updated by phylink_major_config().
====================
Permission checks for dynamic POSIX clocks
Dynamic clocks - such as PTP clocks - extend beyond the standard POSIX
clock API by using ioctl calls. While file permissions are enforced for
standard POSIX operations, they are not implemented for ioctl calls,
since the POSIX layer cannot differentiate between calls which modify
the clock's state (like enabling PPS output generation) and those that
don't (such as retrieving the clock's PPS capabilities).
On the other hand, drivers implementing the dynamic clocks lack the
necessary information context to enforce permission checks themselves.
Additionally, POSIX clock layer requires the WRITE permission even for
readonly adjtime() operations before invoking the callback.
Add a struct file pointer to the POSIX clock context and use it to
implement the appropriate permission checks on PTP chardevs. Permit
readonly adjtime() for dynamic clocks. Add a readonly option to testptp.
Changes in v4:
- Allow readonly adjtime() for dynamic clocks, as suggested by Thomas
Changes in v3:
- Reword the log message for commit against posix-clock and fix
documentation of struct posix_clock_context, as suggested by Thomas
Changes in v2:
- Store file pointer in POSIX clock context rather than fmode in the PTP
clock's private data, as suggested by Richard.
- Move testptp.c changes into separate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wojtek Wasko [Mon, 3 Mar 2025 16:13:45 +0000 (18:13 +0200)]
testptp: Add option to open PHC in readonly mode
PTP Hardware Clocks no longer require WRITE permission to perform
readonly operations, such as listing device capabilities or listening to
EXTTS events once they have been enabled by a process with WRITE
permissions.
Add '-r' option to testptp to open the PHC in readonly mode instead of
the default read-write mode. Skip enabling EXTTS if readonly mode is
requested.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Many devices implement highly accurate clocks, which the kernel manages
as PTP Hardware Clocks (PHCs). Userspace applications rely on these
clocks to timestamp events, trace workload execution, correlate
timescales across devices, and keep various clocks in sync.
The kernel’s current implementation of PTP clocks does not enforce file
permissions checks for most device operations except for POSIX clock
operations, where file mode is verified in the POSIX layer before
forwarding the call to the PTP subsystem. Consequently, it is common
practice to not give unprivileged userspace applications any access to
PTP clocks whatsoever by giving the PTP chardevs 600 permissions. An
example of users running into this limitation is documented in [1].
Additionally, POSIX layer requires WRITE permission even for readonly
adjtime() calls which are used in PTP layer to return current frequency
offset applied to the PHC.
Add permission checks for functions that modify the state of a PTP
device. Continue enforcing permission checks for POSIX clock operations
(settime, adjtime) in the POSIX layer. Only require WRITE access for
dynamic clocks adjtime() if any flags are set in the modes field.
Changes in v4:
- Require FMODE_WRITE in ajtime() only for calls modifying the clock in
any way.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Wojtek Wasko [Mon, 3 Mar 2025 16:13:43 +0000 (18:13 +0200)]
posix-clock: Store file pointer in struct posix_clock_context
File descriptor based pc_clock_*() operations of dynamic posix clocks
have access to the file pointer and implement permission checks in the
generic code before invoking the relevant dynamic clock callback.
Character device operations (open, read, poll, ioctl) do not implement a
generic permission control and the dynamic clock callbacks have no
access to the file pointer to implement them.
Extend struct posix_clock_context with a struct file pointer and
initialize it in posix_clock_open(), so that all dynamic clock callbacks
can access it.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 3 Mar 2025 12:02:12 +0000 (15:02 +0300)]
net: Prevent use after free in netif_napi_set_irq_locked()
The cpu_rmap_put() will call kfree() when the last reference is dropped
so it could result in a use after free when we dereference the same
pointer the next line. Move the cpu_rmap_put() after the dereference.
Sean Anderson [Mon, 3 Mar 2025 23:18:32 +0000 (18:18 -0500)]
net: cadence: macb: Synchronize standard stats
The new stats calculations add several additional calls to
macb/gem_update_stats() and accesses to bp->hw_stats. These are
protected by a spinlock since commit fa52f15c745c ("net: cadence: macb:
Synchronize stats calculations"), which was applied in parallel. Add
some locking now that the net has been merged into net-next.
Eric Dumazet [Sun, 2 Mar 2025 12:42:37 +0000 (12:42 +0000)]
tcp: use RCU lookup in __inet_hash_connect()
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
the many spin_lock_bh(&head->lock) performed in its loop.
This patch adds an RCU lookup to avoid the spinlock cost.
check_established() gets a new @rcu_lookup argument.
First reason is to not make any changes while head->lock
is not held.
Second reason is to not make this RCU lookup a second time
after the spinlock has been acquired.
Eric Dumazet [Sun, 2 Mar 2025 12:42:34 +0000 (12:42 +0000)]
tcp: use RCU in __inet{6}_check_established()
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
__inet_check_established() and/or __inet6_check_established().
This patch adds an RCU lookup to avoid the spinlock
acquisition when the 4-tuple is found in the hash table.
Note that there are still spin_lock_bh() calls in
__inet_hash_connect() to protect inet_bind_hashbucket,
this will be fixed later in this series.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Markus Elfring [Thu, 13 Apr 2023 15:00:11 +0000 (17:00 +0200)]
tipc: Reduce scope for the variable “fdefq” in tipc_link_tnl_prepare()
The address of a data structure member was determined before
a corresponding null pointer check in the implementation of
the function “tipc_link_tnl_prepare”.
Thus avoid the risk for undefined behaviour by moving the definition
for the local variable “fdefq” into an if branch at the end.
This issue was detected by using the Coccinelle software.
Jakub Kicinski [Fri, 28 Feb 2025 21:29:55 +0000 (13:29 -0800)]
selftests: drv-net: use env.rpath in the HDS test
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added a new test case in the net tree, now that
this code has made its way to net-next convert it to use the env.rpath()
helper instead of manually computing the relative path.
Daniel Golle [Sat, 1 Mar 2025 01:41:18 +0000 (01:41 +0000)]
dsa: mt7530: Utilize REGMAP_IRQ for interrupt handling
Replace the custom IRQ chip handler and mask/unmask functions with
REGMAP_IRQ. This significantly simplifies the code and allows for the
removal of almost all interrupt-related functions from mt7530.c.
Tested on MT7988A built-in switch (MMIO) as well as MT7531AE IC (MDIO).
Qingfang Deng [Sat, 1 Mar 2025 13:55:16 +0000 (21:55 +0800)]
ppp: use IFF_NO_QUEUE in virtual interfaces
For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly
forwards packets to the underlying network stack and never returns
anything other than 1. So these interfaces do not require a qdisc,
and the IFF_NO_QUEUE flag should be set.
Introduces a direct_xmit flag in struct ppp_channel to indicate when
IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel()
for relevant protocols.
While at it, remove the usused latency member from struct ppp_channel.
====================
eth: fbnic: Cleanup macros and string function
We have received some feedback that the macros we use for reading FW mailbox
attributes are too large in scope and confusing to understanding. Additionally
the string function did not provide errors allowing it to silently succeed.
This patch set fixes theses issues.
====================
Lee Trager [Fri, 28 Feb 2025 19:15:28 +0000 (11:15 -0800)]
eth: fbnic: Replace firmware field macros
Replace the firmware field macros with new macros which follow typical
kernel standards. No variables are required to be predefined for use and
results are now returned. These macros are prefixed with fta or fbnic
TLV attribute.
Lee Trager [Fri, 28 Feb 2025 19:15:27 +0000 (11:15 -0800)]
eth: fbnic: Update fbnic_tlv_attr_get_string() to work like nla_strscpy()
Allow fbnic_tlv_attr_get_string() to return an error code. In the event the
source mailbox attribute is missing return -EINVAL. Like nla_strscpy() return
-E2BIG when the source string is larger than the destination string. In this
case the amount of data copied is equal to dstsize.
The aim of this series is to modernize the device tree bindings for the
Freescale "Gianfar" ethernet controller (a.k.a. TSEC, Triple Speed
Ethernet Controller) by converting them to YAML.
J. Neuschäfer [Fri, 28 Feb 2025 17:32:51 +0000 (18:32 +0100)]
dt-bindings: net: fsl,gianfar-mdio: Update information about TBI
When this binding was originally written, all known TSEC Ethernet
controllers had a Ten-Bit Interface (TBI). However, some datasheets such
as for the MPC8315E suggest that this is not universally true:
The eTSECs do not support TBI, GMII, and FIFO operating modes, so all
references to these interfaces and features should be ignored for this
device.
J. Neuschäfer [Fri, 28 Feb 2025 17:32:50 +0000 (18:32 +0100)]
dt-bindings: net: Convert fsl,gianfar-{mdio,tbi} to YAML
Move the information related to the Freescale Gianfar (TSEC) MDIO bus
and the Ten-Bit Interface (TBI) from fsl-tsec-phy.txt to a new binding
file in YAML format, fsl,gianfar-mdio.yaml.
====================
net: phy: nxp-c45-tja11xx: add support for TJA1121
This patch series adds .match_phy_device for the existing TJAs
to differentiate between TJA1103/TJA1104 and TJA1120/TJA1121.
TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec
capabilities while TJA1103 doesn't.
Also add support for TJA1121 which is based on TJA1120 hardware
with additional MACsec IP.
====================
Andrei Botila [Fri, 28 Feb 2025 15:43:19 +0000 (17:43 +0200)]
net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104
Add .match_phy_device for the existing TJAs to differentiate between
TJA1103 and TJA1104.
TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec
capabilities while TJA1103 doesn't.
mptcp: pm: exit early with ADD_ADDR echo if possible
When the userspace PM is used, or when the in-kernel limits are reached,
there will be no need to schedule the PM worker to signal new addresses.
That corresponds to pm->work_pending set to 0.
In this case, an early exit can be done in mptcp_pm_add_addr_echoed()
not to hold the PM lock, and iterate over the announced addresses list,
not to schedule the worker anyway in this case. This is similar to what
is done when a connection or a subflow has been established.
Geliang Tang [Fri, 28 Feb 2025 14:38:38 +0000 (15:38 +0100)]
mptcp: pm: in-kernel: reduce parameters of set_flags
The number of parameters in mptcp_nl_set_flags() can be reduced.
Only need to pass a "local" parameter to it instead of "local->addr"
and "local->flags".
Geliang Tang [Fri, 28 Feb 2025 14:38:37 +0000 (15:38 +0100)]
mptcp: pm: in-kernel: avoid access entry without lock
In mptcp_pm_nl_set_flags(), "entry" is copied to "local" when pernet->lock
is held to avoid direct access to entry without pernet->lock.
Therefore, "local->flags" should be passed to mptcp_nl_set_flags instead
of "entry->flags" when pernet->lock is not held, so as to avoid access to
entry.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Fixes: 145dc6cc4abd ("mptcp: pm: change to fullmesh only for 'subflow'") Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-3-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gang Yan [Fri, 28 Feb 2025 14:38:36 +0000 (15:38 +0100)]
selftests: mptcp: add a test for mptcp_diag_dump_one
This patch introduces a new 'chk_diag' test in diag.sh. It retrieves
the token for a specified MPTCP socket (msk) using the 'ss' command and
then accesses the 'mptcp_diag_dump_one' in kernel via ./mptcp_diag
to verify if the correct token is returned.
Gang Yan [Fri, 28 Feb 2025 14:38:35 +0000 (15:38 +0100)]
selftests: mptcp: Add a tool to get specific msk_info
This patch enables the retrieval of the mptcp_info structure corresponding
to a specified MPTCP socket (msk). When multiple MPTCP connections are
present, specific information can be obtained for a given connection
through the 'mptcp_diag_dump_one' by using the 'token' associated with
the msk.
Jakub Kicinski [Tue, 4 Mar 2025 16:50:40 +0000 (08:50 -0800)]
Merge tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
First 6.15 material:
* cfg80211/mac80211
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
* rtw88
- preparation for RTL8814AU support
* rtw89
- use wiphy_lock/wiphy_work
- preparations for MLO
- BT-Coex improvements
- regulatory support in firmware files
* iwlwifi
- preparations for the new iwlmld sub-driver
* tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (128 commits)
wifi: iwlwifi: remove mld/roc.c
wifi: mac80211: refactor populating mesh related fields in sinfo
wifi: cfg80211: reorg sinfo structure elements for mesh
wifi: iwlwifi: Fix spelling mistake "Increate" -> "Increase"
wifi: iwlwifi: add Debug Host Command APIs
wifi: iwlwifi: add IWL_MAX_NUM_IGTKS macro
wifi: iwlwifi: add OMI bandwidth reduction APIs
wifi: iwlwifi: remove mvm prefix from iwl_mvm_d3_end_notif
wifi: iwlwifi: remember if the UATS table was read successfully
wifi: iwlwifi: export iwl_get_lari_config_bitmap
wifi: iwlwifi: add support for external 32 KHz clock
wifi: iwlwifi: mld: add a debug level for EHT prints
wifi: iwlwifi: mld: add a debug level for PTP prints
wifi: iwlwifi: remove mvm prefix from iwl_mvm_esr_mode_notif
wifi: iwlwifi: use 0xff instead of 0xffffffff for invalid
wifi: iwlwifi: location api cleanup
wifi: cfg80211: expose update timestamp to drivers
wifi: mac80211: add ieee80211_iter_chan_contexts_mtx
wifi: mac80211: fix integer overflow in hwmp_route_info_get()
wifi: mac80211: Fix possible integer promotion issue
...
====================
====================
netconsole: Add taskname sysdata support
This patchset introduces a new feature to the netconsole extradata
subsystem that enables the inclusion of the current task's name in the
sysdata output of netconsole messages.
This enhancement is particularly valuable for large-scale deployments,
such as Meta's, where netconsole collects messages from millions of
servers and stores them in a data warehouse for analysis. Engineers
often rely on these messages to investigate issues and assess kernel
health.
One common challenge we face is determining the context in which
a particular message was generated. By including the task name
(task->comm) with each message, this feature provides a direct answer to
the frequently asked question: "What was running when this message was
generated?"
This added context will significantly improve our ability to diagnose
and troubleshoot issues, making it easier to interpret output of
netconsole.
The patchset consists of seven patches that implement the following changes:
* Refactor CPU number formatting into a separate function
* Prefix CPU_NR sysdata feature with SYSDATA_
* Patch to covert a bitwise operation into boolean
* Add configfs controls for taskname sysdata feature
* Add taskname to extradata entry count
* Add support for including task name in netconsole's extra data output
* Document the task name feature in Documentation/networking/netconsole.rst
* Add test coverage for the task name feature to the existing sysdata selftest script
These changes allow users to enable or disable the task name feature via
configfs and provide additional context for kernel messages by showing
which task generated each console message.
I have tested these patches on some servers and they seem to work as
expected.
Breno Leitao [Fri, 28 Feb 2025 12:50:24 +0000 (04:50 -0800)]
netconsole: selftest: add task name append testing
Add test coverage for the netconsole task name feature to the existing
sysdata selftest script. This extends the test infrastructure to verify
that task names are correctly appended when enabled and absent when
disabled.
The test validates that:
- Task names appear in the expected format "taskname=<name>"
- Task names are included when the feature is enabled
- Task names are excluded when the feature is disabled
- The feature works correctly alongside other sysdata fields like CPU
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:23 +0000 (04:50 -0800)]
netconsole: docs: document the task name feature
Add documentation for the netconsole task name feature in
Documentation/networking/netconsole.rst. This explains how to enable
task name via configfs and demonstrates the output format.
The documentation includes:
- How to enable/disable the feature via taskname_enabled
- The format of the task name in the output
- An example showing the task name appearing in messages
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:22 +0000 (04:50 -0800)]
netconsole: add task name to extra data fields
This is the core patch for this whole patchset. Add support for
including the current task's name in netconsole's extra data output.
This adds a new append_taskname() function that writes the task name
(from current->comm) into the target's extradata buffer, similar to how
CPU numbers are handled.
The task name is included when the SYSDATA_TASKNAME field is set,
appearing in the format "taskname=<name>" in the output. This additional
context can help with debugging by showing which task generated each
console message.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:21 +0000 (04:50 -0800)]
netconsole: add configfs controls for taskname sysdata feature
Add configfs interface to enable/disable the taskname sysdata feature.
This adds the following functionality:
The implementation follows the same pattern as the existing CPU number
feature, ensuring consistent behavior and error handling across sysdata
features.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:20 +0000 (04:50 -0800)]
netconsole: add taskname to extradata entry count
New SYSDATA_TASKNAME feature flag to track when taskname append is enabled.
Additional check in count_extradata_entries() to include taskname in
total, counting it as an entry in extradata. This function is used to
check if we are not overflowing the number of extradata items.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:19 +0000 (04:50 -0800)]
netconsole: refactor CPU number formatting into separate function
Extract CPU number formatting logic from prepare_extradata() into a new
append_cpu_nr() function.
This refactoring improves code organization by isolating CPU number
formatting into its own function while reducing the complexity of
prepare_extradata().
The change prepares the codebase for the upcoming taskname feature by
establishing a consistent pattern for handling sysdata features.
The CPU number formatting logic itself remains unchanged; only its
location has moved to improve maintainability.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 28 Feb 2025 12:50:18 +0000 (04:50 -0800)]
netconsole: Make boolean comparison consistent
Convert the current state assignment to use explicit boolean conversion,
making the code more robust and easier to read. This change adds a
double-negation operator to ensure consistent boolean conversion as
suggested by Paolo[1].
This approach aligns with the existing pattern used in
sysdata_cpu_nr_enabled_show().
Breno Leitao [Fri, 28 Feb 2025 12:50:17 +0000 (04:50 -0800)]
netconsole: prefix CPU_NR sysdata feature with SYSDATA_
Rename the CPU_NR enum value to SYSDATA_CPU_NR to establish a consistent
naming convention for sysdata features. This change prepares for
upcoming additions to the sysdata feature set by clearly grouping
related features under the SYSDATA prefix.
This change is purely cosmetic and does not modify any functionality.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Support some enhances features for the HIBMCGE driver
In this patch set, we mainly implement some enhanced features.
It mainly includes the statistics, diagnosis, and ioctl to
improve fault locating efficiency,
abnormal irq and MAC link exception handling feature
to enhance driver robustness,
and rx checksum offload feature to improve performance
(tx checksum feature has been implemented).
Jijie Shao [Fri, 28 Feb 2025 11:54:10 +0000 (19:54 +0800)]
net: hibmcge: Add support for BMC diagnose feature
The MAC hardware is on the BMC side, and the driver is on the host side.
When the driver is abnormal, the BMC cannot directly detect the
exception cause.
Therefore, this patch implements the BMC diagnosis feature.
When users query driver diagnosis information on the BMC, the driver
detects the query request in the scheduled task and reports
driver statistics and link status to the BMC through the bar space.
The BMC collects logs to analyze exception causes.
Currently, the scheduled task is executed every 30 seconds
To quickly respond to user query requests,
this patch changes the scheduled task to once every second.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Fri, 28 Feb 2025 11:54:09 +0000 (19:54 +0800)]
net: hibmcge: Add support for mac link exception handling feature
If the rate changed frequently, the PHY link ok,
but the MAC link maybe fails.
As a result, the network port is unavailable.
According to the documents of the chip,
core_reset needs to do to fix the fault.
In hw_adjus_link(), the core_reset is added to try to
ensure that MAC link status is normal.
In addition, MAC link failure detection is added.
If the MAC link fails after core_reset, driver invokes
the phy_stop() and phy_start() to re-link.
Due to phydev->lock, re-link cannot be triggered
in adjust_link(). Therefore, this operation
is invoked in a scheduled task.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Fri, 28 Feb 2025 11:54:08 +0000 (19:54 +0800)]
net: hibmcge: Add support for abnormal irq handling feature
the hardware error was reported by interrupt,
and need be fixed by doing function reset,
but the whole reset flow takes a long time,
should not do it in irq handler,
so do it in scheduled task.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Fri, 28 Feb 2025 11:54:07 +0000 (19:54 +0800)]
net: hibmcge: Add support for checksum offload
This patch implements the rx checksum offload feature.
The tx checksum offload processing in .ndo_start_xmit()
has been accepted. This patch also adds the tx checksum
feature, including NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Fri, 28 Feb 2025 11:54:06 +0000 (19:54 +0800)]
net: hibmcge: Add support for dump statistics
The driver supports many hw statistics. This patch supports
dump statistics through ethtool_ops and ndo.get_stats64().
The type of hw statistics register is u32,
To prevent the statistics register from overflowing,
the driver dump the statistics every 30 seconds.
in a scheduled task.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Introduce flowtable hw offloading in airoha_eth driver
Introduce netfilter flowtable integration in airoha_eth driver to
offload 5-tuple flower rules learned by the PPE module if the user
accelerates them using a nft configuration similar to the one reported
below:
Packet Processor Engine (PPE) module available on EN7581 SoC populates
the PPE table with 5-tuples flower rules learned from traffic forwarded
between the GDM ports connected to the Packet Switch Engine (PSE) module.
airoha_eth driver configures and collects data from the PPE module via a
Network Processor Unit (NPU) RISC-V module available on the EN7581 SoC.
Move airoha_eth driver in a dedicated folder
(drivers/net/ethernet/airoha).
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:22 +0000 (11:54 +0100)]
net: airoha: Add loopback support for GDM2
Enable hw redirection for traffic received on GDM2 port to GDM{3,4}.
This is required to apply Qdisc offloading (HTB or ETS) for traffic to
and from GDM{3,4} port.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:21 +0000 (11:54 +0100)]
net: airoha: Introduce flowtable offload support
Introduce netfilter flowtable integration in order to allow airoha_eth
driver to offload 5-tuple flower rules learned by the PPE module if the
user accelerates them using a nft configuration similar to the one reported
below:
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:20 +0000 (11:54 +0100)]
net: airoha: Introduce Airoha NPU support
Packet Processor Engine (PPE) module available on EN7581 SoC populates
the PPE table with 5-tuples flower rules learned from traffic forwarded
between the GDM ports connected to the Packet Switch Engine (PSE) module.
The airoha_eth driver can enable hw acceleration of learned 5-tuples
rules if the user configure them in netfilter flowtable (netfilter
flowtable support will be added with subsequent patches).
airoha_eth driver configures and collects data from the PPE module via a
Network Processor Unit (NPU) RISC-V module available on the EN7581 SoC.
Introduce basic support for Airoha NPU module.
Tested-by: Sayantan Nandy <sayantan.nandy@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Introduce the airoha,npu property for the NPU node available on
EN7581 SoC. The airoha Network Processor Unit (NPU) is used to
offload network traffic forwarded between Packet Switch Engine
(PSE) ports.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:18 +0000 (11:54 +0100)]
dt-bindings: net: airoha: Add the NPU node for EN7581 SoC
This patch adds the NPU document binding for EN7581 SoC.
The Airoha Network Processor Unit (NPU) provides a configuration interface
to implement wired and wireless hardware flow offloading programming Packet
Processor Engine (PPE) flow table.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:16 +0000 (11:54 +0100)]
net: airoha: Move REG_GDM_FWD_CFG() initialization in airoha_dev_init()
Move REG_GDM_FWD_CFG() register initialization in airoha_dev_init
routine. Moreover, always send traffic PPE module in order to be
processed by hw accelerator.
This is a preliminary patch to enable netfilter flowtable hw offloading
on EN7581 SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:15 +0000 (11:54 +0100)]
net: airoha: Enable support for multiple net_devices
In the current codebase airoha_eth driver supports just a single
net_device connected to the Packet Switch Engine (PSE) lan port (GDM1).
As shown in commit 23020f049327 ("net: airoha: Introduce ethernet
support for EN7581 SoC"), PSE can switch packets between four GDM ports.
Enable the capability to create a net_device for each GDM port of the
PSE module. Moreover, since the QDMA blocks can be shared between
net_devices, do not stop TX/RX DMA in airoha_dev_stop() if there are
active net_devices for this QDMA block.
This is a preliminary patch to enable flowtable hw offloading for EN7581
SoC.
Co-developed-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:14 +0000 (11:54 +0100)]
net: dsa: mt7530: Enable Rx sptag for EN7581 SoC
Packet Processor Engine (PPE) module used for hw acceleration on EN7581
mac block, in order to properly parse packets, requires DSA untagged
packets on TX side and read DSA tag from DMA descriptor on RX side.
For this reason, enable RX Special Tag (SPTAG) for EN7581 SoC.
This is a preliminary patch to enable netfilter flowtable hw offloading
on EN7581 SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:13 +0000 (11:54 +0100)]
net: airoha: Move DSA tag in DMA descriptor
Packet Processor Engine (PPE) module reads DSA tags from the DMA descriptor
and requires untagged DSA packets to properly parse them. Move DSA tag
in the DMA descriptor on TX side and read DSA tag from DMA descriptor
on RX side. In order to avoid skb reallocation, store tag in skb_dst on
RX side.
This is a preliminary patch to enable netfilter flowtable hw offloading
on EN7581 SoC.
Tested-by: Sayantan Nandy <sayantan.nandy@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:12 +0000 (11:54 +0100)]
net: airoha: Move register definitions in airoha_regs.h
Move common airoha_eth register definitions in airoha_regs.h in order
to reuse them for Packet Processor Engine (PPE) codebase.
PPE module is used to enable support for flowtable hw offloading in
airoha_eth driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:10 +0000 (11:54 +0100)]
net: airoha: Move definitions in airoha_eth.h
Move common airoha_eth definitions in airoha_eth.h in order to reuse
them for Packet Processor Engine (PPE) codebase.
PPE module is used to enable support for flowtable hw offloading in
airoha_eth driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 28 Feb 2025 10:54:09 +0000 (11:54 +0100)]
net: airoha: Move airoha_eth driver in a dedicated folder
The airoha_eth driver has no codebase shared with mtk_eth_soc one.
Moreover, the upcoming features (flowtable hw offloading, PCS, ..) will
not reuse any code from MediaTek driver. Move the Airoha driver in a
dedicated folder.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Nicolas Dichtel [Fri, 28 Feb 2025 10:20:57 +0000 (11:20 +0100)]
net: advertise netns_immutable property via netlink
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to
dev->netns_local"), there is no way to see if the netns_immutable property
s set on a device. Let's add a netlink attribute to advertise it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Some pktgen fixes/improvments (part II)
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I (now already applied to net-next)
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
nd part II (this one):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Peter Seiderer [Thu, 27 Feb 2025 13:56:01 +0000 (14:56 +0100)]
net: pktgen: fix access outside of user given buffer in pktgen_if_write()
Honour the user given buffer size for the hex32_arg(), num_arg(),
strn_len(), get_imix_entries() and get_labels() calls (otherwise they will
access memory outside of the user given buffer).
Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>