Russell King (Oracle) [Wed, 8 Jan 2025 16:48:19 +0000 (16:48 +0000)]
net: stmmac: convert to use phy_eee_rx_clock_stop()
Convert stmmac to use phy_eee_rx_clock_stop() to set the PHY receive
clock stop in LPI setting, rather than calling the legacy
phy_init_eee() function.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEB-0002Ke-RZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:48:14 +0000 (16:48 +0000)]
net: stmmac: report EEE error statistics if EEE is supported
Report the number of EEE error statistics in the xstats even when EEE
is not enabled in hardware, but is supported. The PHY maintains this
counter even when EEE is not enabled.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZE6-0002KY-Nx@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:48:09 +0000 (16:48 +0000)]
net: stmmac: remove priv->tx_lpi_enabled
Through using phylib's EEE state, priv->tx_lpi_enabled has become a
write-only variable. Remove it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZE1-0002KS-K1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:48:04 +0000 (16:48 +0000)]
net: stmmac: clean up stmmac_disable_eee_mode()
stmmac_disable_eee_mode() is now only called from stmmac_xmit() when
both priv->tx_path_in_lpi_mode and priv->eee_sw_timer_en are true.
Therefore:
if (!priv->eee_sw_timer_en)
in stmmac_disable_eee_mode() will never be true, so this is dead code.
Remove it, and rename the function to indicate that it now only deals
with software based EEE mode.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDw-0002KL-Gg@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:59 +0000 (16:47 +0000)]
net: stmmac: remove redundant code from ethtool EEE ops
Setting edata->tx_lpi_enabled in stmmac_ethtool_op_get_eee() gets
overwritten by phylib, so there's no point setting this.
In stmmac_ethtool_op_set_eee(), now that stmmac is using the result of
phylib's evaluation of EEE, there is no need to handle anything in the
ethtool EEE ops other than calling through to the appropriate phylink
function, which will pass on to phylib the users request.
As stmmac_disable_eee_mode() is now no longer called from outside
stmmac_main.c, make it static.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDr-0002KF-Cv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:54 +0000 (16:47 +0000)]
net: stmmac: make EEE depend on phy->enable_tx_lpi
Make stmmac EEE depend on phylib's evaluation of user settings and PHY
negotiation, as indicated by phy->enable_tx_lpi. This will ensure when
phylib has evaluated that the user has disabled LPI, phy_init_eee()
will not be called, and priv->eee_active will be false, causing LPI/EEE
to be disabled.
This is an interim measure - phy_init_eee() will be removed in a later
patch.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDm-0002K9-9w@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:49 +0000 (16:47 +0000)]
net: stmmac: use unsigned int for eee_timer
Since eee_timer is used to initialise priv->tx_lpi_timer, this also
should be unsigned to avoid a negative number being interpreted as a
very large positive number. Note that this makes the check for negative
numbers passed in as a module parameter redundant, and passing a
negative number will now produce a large delay rather than the
default. Since the default is used without an argument, passing a
negative number would be quite obscure. However, if users do, then
this will need to be revisited.
Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDh-0002K3-6y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:44 +0000 (16:47 +0000)]
net: stmmac: use correct type for tx_lpi_timer
The ethtool interface uses u32 for tx_lpi_timer, and so does phylib.
Use u32 to store this internally within stmmac rather than "int"
which could misinterpret large values.
Correct "value" in dwmac4_set_eee_lpi_entry_timer() to use u32
rather than int, which is derived from tx_lpi_timer. Even though this
path won't be used with values larger than STMMAC_ET_MAX, this brings
consistency of type usage to the stmmac code for this variable.
We leave eee_timer unchanged for now, with the assumption that values
up to INT_MAX will safely fit in a u32.
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:38 +0000 (16:47 +0000)]
net: stmmac: move tx_lpi_timer tracking to phylib
When stmmac_ethtool_op_get_eee() is called, stmmac sets the tx_lpi_timer
and tx_lpi_enabled members, and then calls into phylink and thus phylib.
phylib overwrites these members.
phylib will also cause a link down/link up transition when settings
that impact the MAC have been changed.
Convert stmmac to use the tx_lpi_timer setting in struct phy_device,
updating priv->tx_lpi_timer each time when the link comes up, rather
than trying to maintain this user setting itself. We initialise the
phylib tx_lpi_timer setting by doing a get_ee-modify-set_eee sequence
with the last known priv->tx_lpi_timer value. In order for this to work
correctly, we also need this member to be initialised earlier.
As stmmac_eee_init() is no longer called outside of stmmac_main.c, make
it static.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDW-0002Jr-W3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 8 Jan 2025 16:47:33 +0000 (16:47 +0000)]
net: phy: add configuration of rx clock stop mode
Add a function to allow configuration of the PCS's clock stop enable
bit, used to configure whether the xMII receive clock can be stopped
during LPI mode.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDR-0002Jl-Ry@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
netconsole: selftest for userdata overflow
Implement comprehensive testing for netconsole userdata entry handling,
demonstrating correct behavior when creating maximum entries and
preventing unauthorized overflow.
Refactor existing test infrastructure to support modular, reusable
helper functions that validate strict entry limit enforcement.
Also, add a warning if update_userdata() sees more than
MAX_USERDATA_ITEMS entries. This shouldn't happen and it is a bug that
shouldn't be silently ignored.
Breno Leitao [Wed, 8 Jan 2025 11:50:27 +0000 (03:50 -0800)]
netconsole: selftest: Delete all userdata keys
Modify the cleanup function to remove all userdata keys created during the
test, instead of just deleting a single predefined key. This ensures a
more thorough cleanup of temporary resources.
Move the KEY_PATH variable definition inside the set_user_data function
to reduce global variables and improve encapsulation. The KEY_PATH
variable is now dynamically created when setting user data.
This change has no effect on the current test, while improving an
upcoming test that would create several userdata entries.
Breno Leitao [Wed, 8 Jan 2025 11:50:26 +0000 (03:50 -0800)]
netconsole: selftest: Split the helpers from the selftest
Split helper functions from the netconsole basic test into a separate
library file to enable reuse across different netconsole tests. This
change only moves the existing helper functions to lib/sh/lib_netcons.sh
while preserving the same test functionality.
The helpers provide common functions for:
- Setting up network namespaces and interfaces
- Managing netconsole dynamic targets
- Setting user data
- Handling test dependencies
- Cleanup operations
Do not make any change in the code, other than the mechanical
separation.
This series adds an install target for ynl. The python code
is moved to a subdirectory, so it can be used as a package
with flat layout, as well as directly from the tree.
To try the install as a non-root user you can run:
$ mkdir /tmp/myroot
$ make DESTDIR=/tmp/myroot install
Jakub Kicinski [Wed, 8 Jan 2025 20:07:58 +0000 (12:07 -0800)]
tools: ynl-gen-c: improve support for empty nests
Empty nests are the same size as a flag at the netlink level
(just a 4 byte nlattr without a payload). They are sometimes
useful in case we want to only communicate a presence of
something but may want to add more details later.
This may be the case in the upcoming io_uring ZC patches,
for example.
Improve handling of nested empty structs. We already support
empty structs since a lot of netlink replies are empty, but
for nested ones we need minor tweaks to avoid pointless empty
lines and unused variables.
* tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
sctp: sysctl: udp_port: avoid using current->nsproxy
sctp: sysctl: auth_enable: avoid using current->nsproxy
sctp: sysctl: rto_min/max: avoid using current->nsproxy
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
mptcp: sysctl: sched: avoid using current->nsproxy
mptcp: sysctl: avail sched: remove write access
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
MAINTAINERS: remove Ying Xue from TIPC
MAINTAINERS: remove Mark Lee from MediaTek Ethernet
MAINTAINERS: mark stmmac ethernet as an Orphan
MAINTAINERS: remove Andy Gospodarek from bonding
MAINTAINERS: update maintainers for Microchip LAN78xx
MAINTAINERS: mark Synopsys DW XPCS as Orphan
net/mlx5: Fix variable not being completed when function returns
rtase: Fix a check for error in rtase_alloc_msix()
net: stmmac: dwmac-tegra: Read iommu stream id from device tree
...
John Daley [Tue, 7 Jan 2025 21:41:59 +0000 (13:41 -0800)]
enic: Fix typo in comment in table indexed by link speed
The RX adaptive interrupt moderation table is indexed by link speed
range, where the last row of the table is the catch-all for all link
speeds greater than 10Gbps. The comment said 10 - 40Gbps, but since
there are now adapters with link speeds than 40Gbps, the comment is now
wrong and should indicate it applies to all speeds greater than 10Gbps.
Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107214159.18807-4-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Daley [Tue, 7 Jan 2025 21:41:58 +0000 (13:41 -0800)]
enic: Obtain the Link speed only after the link comes up
The link speed is obtained in the RX adaptive coalescing function. It
was being called at probe time when the link may not be up. Change the
call to run after the Link comes up.
The impact of not getting the correct link speed was that the low end of
the adaptive interrupt range was always being set to 0 which could have
caused a slight increase in the number of RX interrupts.
Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107214159.18807-3-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Daley [Tue, 7 Jan 2025 21:41:57 +0000 (13:41 -0800)]
enic: Move RX coalescing set function
Move the function used for setting the RX coalescing range to before
the function that checks the link status. It needs to be called from
there instead of from the probe function.
There is no functional change.
Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Link: https://patch.msgid.link/20250107214159.18807-2-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krzysztof Kozlowski [Wed, 8 Jan 2025 12:02:42 +0000 (13:02 +0100)]
dt-bindings: net: qcom,ipa: Use recommended MBN firmware format in DTS example
All Qualcomm firmwares uploaded to linux-firmware are in MBN format,
instead of split MDT. No functional changes, just correct the DTS
example so people will not rely on unaccepted files.
Linus Torvalds [Thu, 9 Jan 2025 18:16:45 +0000 (10:16 -0800)]
Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes.
Besides the one-liners in Btrfs there's fix to the io_uring and
encoded read integration (added in this development cycle). The update
to io_uring provides more space for the ongoing command that is then
used in Btrfs to handle some cases.
- io_uring and encoded read:
- provide stable storage for io_uring command data
- make a copy of encoded read ioctl call, reuse that in case the
call would block and will be called again
- properly initialize zlib context for hardware compression on s390
- fix max extent size calculation on filesystems with non-zoned
devices
- fix crash in scrub on crafted image due to invalid extent tree"
* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
btrfs: zoned: calculate max_extent_size properly on non-zoned setup
btrfs: avoid NULL pointer dereference if no valid extent tree
btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
io_uring: add io_uring_cmd_get_async_data helper
io_uring/cmd: add per-op data to struct io_uring_cmd_data
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
Jakub Kicinski [Thu, 9 Jan 2025 16:54:49 +0000 (08:54 -0800)]
Merge tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix imbalance between flowtable BIND and UNBIND calls to configure
hardware offload, this fixes a possible kmemleak.
2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible
WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from
init_netns.
* tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: clamp maximum hashtable size to INT_MAX
netfilter: nf_tables: imbalance in flowtable binding
====================
====================
net: sysctl: avoid using current->nsproxy
As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns as it is usually done. This could cause
unexpected issues when other operations are done on the wrong netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' or 'pernet' structure can be obtained from the table->data
using container_of().
Note that table->data could also be used directly in more places, but
that would increase the size of this fix to replace all accesses via
'net'. Probably best to avoid that for fixes.
Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
in MPTCP, SCTP and RDS. There are multiple patches doing almost the same
thing, but the reason is to ease the backports.
Patch 1 is not directly linked to this, but it is a small fix for MPTCP
available_schedulers sysctl knob to explicitly mark it as read-only.
Please note that this series does not address Al's comment [2]. In SCTP,
some sysctl knobs set other sysfs-exposed variables for the min/max: two
processes could then write two linked values at the same time, resulting
in new values being outside the new boundaries. It would be great if
SCTP developers can look at this problem.
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.
sctp: sysctl: udp_port: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
sctp: sysctl: auth_enable: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
sctp: sysctl: rto_min/max: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'pernet' structure can be obtained from the table->data using
container_of().
mptcp: sysctl: sched: avoid using current->nsproxy
Using the 'net' structure via 'current' is not recommended for different
reasons.
First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.
Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.
Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler") Cc: stable@vger.kernel.org Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
MAINTAINERS: spring 2025 cleanup of networking maintainers
Annual cleanup of inactive maintainers. To identify inactive maintainers
we use Jon Corbet's maintainer analysis script from gitdm, and some manual
scanning of lore.
Jakub Kicinski [Wed, 8 Jan 2025 15:52:42 +0000 (07:52 -0800)]
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
We have not seen emails or tags from Lars in almost 4 years.
Steen and Daniel are pretty active, but the review coverage
isn't stellar (35% of changes go in without a review tag).
Subsystem ARM/Microchip Sparx5 SoC support
Changes 28 / 79 (35%)
Last activity: 2024-11-24
Lars Povlsen <lars.povlsen@microchip.com>:
Steen Hegelund <Steen.Hegelund@microchip.com>:
Tags 6c7c4b91aa43 2024-04-08 00:00:00 15
Daniel Machon <daniel.machon@microchip.com>:
Author 48ba00da2eb4 2024-04-09 00:00:00 2
Tags f164b296638d 2024-11-24 00:00:00 6
Top reviewers:
[7]: horms@kernel.org
[1]: jacob.e.keller@intel.com
[1]: jensemil.schulzostergaard@microchip.com
[1]: horatiu.vultur@microchip.com
INACTIVE MAINTAINER Lars Povlsen <lars.povlsen@microchip.com>
Jakub Kicinski [Wed, 8 Jan 2025 15:52:41 +0000 (07:52 -0800)]
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
Noam Dagan was added to ENA reviewers in 2021, we have not seen
a single email from this person to any list, ever (according to lore).
Git history mentions the name in 2 SoB tags from 2020.
Jakub Kicinski [Wed, 8 Jan 2025 15:52:40 +0000 (07:52 -0800)]
MAINTAINERS: remove Ying Xue from TIPC
There is a steady stream of fixes for TIPC, even tho the development
has slowed down a lot. Over last 2 years we have merged almost 70
TIPC patches, but we haven't heard from Ying Xue once:
Jakub Kicinski [Wed, 8 Jan 2025 15:52:38 +0000 (07:52 -0800)]
MAINTAINERS: mark stmmac ethernet as an Orphan
I tried a couple of things to reinvigorate the stmmac maintainers
over the last few years but with little effect. The maintainers
are not active, let the MAINTAINERS file reflect reality.
The Synopsys IP this driver supports is very popular we need
a solid maintainer to deal with the complexity of the driver.
gitdm missingmaints says:
Subsystem STMMAC ETHERNET DRIVER
Changes 344 / 978 (35%)
Last activity: 2020-05-01
Alexandre Torgue <alexandre.torgue@foss.st.com>:
Tags 1bb694e20839 2020-05-01 00:00:00 1
Jose Abreu <joabreu@synopsys.com>:
Top reviewers:
[75]: horms@kernel.org
[49]: andrew@lunn.ch
[46]: fancer.lancer@gmail.com
INACTIVE MAINTAINER Jose Abreu <joabreu@synopsys.com>
Jakub Kicinski [Wed, 8 Jan 2025 15:52:36 +0000 (07:52 -0800)]
MAINTAINERS: update maintainers for Microchip LAN78xx
Woojung Huh seems to have only replied to the list 35 times
in the last 5 years, and didn't provide any reviews in 3 years.
The LAN78XX driver has seen quite a bit of activity lately.
gitdm missingmaints says:
Subsystem USB LAN78XX ETHERNET DRIVER
Changes 35 / 91 (38%)
(No activity)
Top reviewers:
[23]: andrew@lunn.ch
[3]: horms@kernel.org
[2]: mateusz.polchlopek@intel.com
INACTIVE MAINTAINER Woojung Huh <woojung.huh@microchip.com>
Move Woojung to CREDITS and add new maintainers who are more
likely to review LAN78xx patches.
Jakub Kicinski [Wed, 8 Jan 2025 15:52:35 +0000 (07:52 -0800)]
MAINTAINERS: mark Synopsys DW XPCS as Orphan
There's not much review support from Jose, there is a sharp
drop in his participation around 4 years ago.
The DW XPCS IP is very popular and the driver requires active
maintenance.
gitdm missingmaints says:
Subsystem SYNOPSYS DESIGNWARE ETHERNET XPCS DRIVER
Changes 33 / 94 (35%)
(No activity)
Top reviewers:
[16]: andrew@lunn.ch
[12]: vladimir.oltean@nxp.com
[2]: f.fainelli@gmail.com
INACTIVE MAINTAINER Jose Abreu <Jose.Abreu@synopsys.com>
Chenguang Zhao [Wed, 8 Jan 2025 03:00:09 +0000 (11:00 +0800)]
net/mlx5: Fix variable not being completed when function returns
When cmd_alloc_index(), fails cmd_work_handler() needs
to complete ent->slotted before returning early.
Otherwise the task which issued the command may hang:
mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry
INFO: task kworker/13:2:4055883 blocked for more than 120 seconds.
Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/13:2 D 0 4055883 2 0x00000228
Workqueue: events mlx5e_tx_dim_work [mlx5_core]
Call trace:
__switch_to+0xe8/0x150
__schedule+0x2a8/0x9b8
schedule+0x2c/0x88
schedule_timeout+0x204/0x478
wait_for_common+0x154/0x250
wait_for_completion+0x28/0x38
cmd_exec+0x7a0/0xa00 [mlx5_core]
mlx5_cmd_exec+0x54/0x80 [mlx5_core]
mlx5_core_modify_cq+0x6c/0x80 [mlx5_core]
mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core]
mlx5e_tx_dim_work+0x54/0x68 [mlx5_core]
process_one_work+0x1b0/0x448
worker_thread+0x54/0x468
kthread+0x134/0x138
ret_from_fork+0x10/0x18
Fixes: 485d65e13571 ("net/mlx5: Add a timeout to acquire the command queue semaphore") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Wed, 8 Jan 2025 09:15:53 +0000 (12:15 +0300)]
rtase: Fix a check for error in rtase_alloc_msix()
The pci_irq_vector() function never returns zero. It returns negative
error codes or a positive non-zero IRQ number. Fix the error checking to
test for negatives.
Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/f2ecc88d-af13-4651-9820-7cc665230019@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parker Newman [Tue, 7 Jan 2025 21:24:59 +0000 (16:24 -0500)]
net: stmmac: dwmac-tegra: Read iommu stream id from device tree
Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be
written to the MGBE_WRAP_AXI_ASID0_CTRL register.
The current driver is hard coded to use MGBE0's SID for all controllers.
This causes softirq time outs and kernel panics when using controllers
other than MGBE0.
Example dmesg errors when an ethernet cable is connected to MGBE1:
This bug has existed since the dwmac-tegra driver was added in Dec 2022
(See Fixes tag below for commit hash).
The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit
only uses MGBE0 which is why the bug was not found previously. Connect Tech
has many products that use 2 (or more) MGBE controllers.
The solution is to read the controller's SID from the existing "iommus"
device tree property. The 2nd field of the "iommus" device tree property
is the controller's SID.
Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property:
Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in
the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the
SID using the tegra_dev_iommu_get_stream_id() helper function found in
linux/iommu.h.
Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus"
property is removed from the device tree or the IOMMU is disabled.
While the Tegra234 SOC technically supports bypassing the IOMMU, it is not
supported by the current firmware, has not been tested and not recommended.
More detailed discussion with Thierry Reding from Nvidia linked below.
Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.
To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).
As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.
Fixes: 546ea84d07e3 ("sched: sch_cake: fix bulk flow accounting logic for host fairness") Reported-by: syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Dave Taht <dave.taht@gmail.com> Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: make sure we retain NAPI ordering on netdev->napi_list
I promised Eric to remove the rtnl protection of the NAPI list,
when I sat down to implement it over the break I realized that
the recently added NAPI ID retention will break the list ordering
assumption we have in netlink dump. The ordering used to happen
"naturally", because we'd always add NAPIs that the head of the
list, and assign a new monotonically increasing ID.
Before the first patch of this series we'd still only add at
the head of the list but now the newly added NAPI may inherit
from its config an ID lower than something else already on the list.
The fix is in the first patch, the rest is netdevsim churn to test it.
I'm posting this for net-next, because AFAICT the problem can't
be triggered in net, given the very limited queue API adoption.
v2:
- [patch 2] allocate the array with kcalloc() instead of kvcalloc()
- [patch 2] set GFP_KERNEL_ACCOUNT when allocating queues
- [patch 6] don't null-check page pool before page_pool_destroy()
- [patch 6] controled -> controlled
- [patch 7] change mode to 0200
- [patch 7] reorder removal to be inverse of add
- [patch 7] fix the spaces vs tabs
v1: https://lore.kernel.org/20250103185954.1236510-1-kuba@kernel.org
====================
Jakub Kicinski [Tue, 7 Jan 2025 16:08:46 +0000 (08:08 -0800)]
selftests: net: test listing NAPI vs queue resets
Test listing netdevsim NAPIs before and after a single queue
has been reset (and NAPIs re-added).
Start from resetting the middle queue because edge cases
(first / last) may actually be less likely to trigger bugs.
# ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..4
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:45 +0000 (08:08 -0800)]
netdevsim: add debugfs-triggered queue reset
Support triggering queue reset via debugfs for an upcoming test.
Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:44 +0000 (08:08 -0800)]
netdevsim: add queue management API support
Add queue management API support. We need a way to reset queues
to test NAPI reordering, the queue management API provides a
handy scaffolding for that.
Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:43 +0000 (08:08 -0800)]
netdevsim: add queue alloc/free helpers
We'll need the code to allocate and free queues in the queue management
API, factor it out.
Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:42 +0000 (08:08 -0800)]
netdevsim: allocate rqs individually
Make nsim->rqs an array of pointers and allocate them individually
so that we can swap them out one by one.
Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:41 +0000 (08:08 -0800)]
netdevsim: support NAPI config
Link the NAPI instances to their configs. This will be needed to test
that NAPI config doesn't break list ordering.
Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:40 +0000 (08:08 -0800)]
netdev: define NETDEV_INTERNAL
Linus suggested during one of past maintainer summits (in context of
a DMA_BUF discussion) that symbol namespaces can be used to prevent
unwelcome but in-tree code from using all exported functions.
Create a namespace for netdev.
Export netdev_rx_queue_restart(), drivers may want to use it since
it gives them a simple and safe way to restart a queue to apply
config changes. But it's both too low level and too actively developed
to be used outside netdev.
Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 16:08:39 +0000 (08:08 -0800)]
net: make sure we retain NAPI ordering on netdev->napi_list
Netlink code depends on NAPI instances being sorted by ID on
the netdev list for dump continuation. We need to be able to
find the position on the list where we left off if dump does
not fit in a single skb, and in the meantime NAPI instances
can come and go.
This was trivially true when we were assigning a new ID to every
new NAPI instance. Since we added the NAPI config API, we try
to retain the ID previously used for the same queue, but still
add the new NAPI instance at the start of the list.
This is fine if we reset the entire netdev and all NAPIs get
removed and added back. If driver replaces a NAPI instance
during an operation like DEVMEM queue reset, or recreates
a subset of NAPI instances in other ways we may end up with
broken ordering, and therefore Netlink dumps with either
missing or duplicated entries.
At this stage the problem is theoretical. Only two drivers
support queue API, bnxt and gve. gve recreates NAPIs during
queue reset, but it doesn't support NAPI config.
bnxt supports NAPI config but doesn't recreate instances
during reset.
We need to save the ID in the config as soon as it is assigned
because otherwise the new NAPI will not know what ID it will
get at enable time, at the time it is being added.
Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso [Wed, 8 Jan 2025 21:56:33 +0000 (22:56 +0100)]
netfilter: conntrack: clamp maximum hashtable size to INT_MAX
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:
0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")
Note: hashtable resize is only possible from init_netns.
Fixes: 9cc1c73ad666 ("netfilter: conntrack: avoid integer overflow when resizing") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Tue, 7 Jan 2025 14:43:42 +0000 (14:43 +0000)]
net: no longer reset transport_header in __netif_receive_skb_core()
In commit 66e4c8d95008 ("net: warn if transport header was not set")
I added a debug check in skb_transport_header() to detect
if a caller expects the transport_header to be set to a meaningful
value by a prior code path.
Unfortunately, __netif_receive_skb_core() resets the transport header
to the same value than the network header, defeating this check
in receive paths.
Pretending the transport and network headers are the same
is usually wrong.
This patch removes this reset for CONFIG_DEBUG_NET=y builds
to let fuzzers and CI find bugs.
Krzysztof Kozlowski [Tue, 7 Jan 2025 12:56:13 +0000 (13:56 +0100)]
dt-bindings: net: Correct indentation and style in DTS example
DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.
No functional changes here, but saves some comments during reviews of
new patches built on existing code.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can Reviewed-by: Roger Quadros <rogerq@kernel.org> # for ti,k3-am654-* Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> # net/brcm,* Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://patch.msgid.link/20250107125613.211478-1-krzysztof.kozlowski@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This change introduces a mechanism for notifying userspace
applications about changes to IPv6 anycast addresses via netlink. It
includes:
* Addition and deletion of IPv6 anycast addresses are reported using
RTM_NEWANYCAST and RTM_DELANYCAST.
* A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these
notifications.
This enables user space applications(e.g. ip monitor) to efficiently
track anycast addresses through netlink messages, improving metrics
collection and system monitoring. It also unlocks the potential for
advanced anycast management in user space, such as hardware offload
control and fine grained network control.
Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vadim Fedorenko [Tue, 7 Jan 2025 10:48:12 +0000 (02:48 -0800)]
net/mlx5: use do_aux_work for PHC overflow checks
The overflow_work is using system wq to do overflow checks and updates
for PHC device timecounter, which might be overhelmed by other tasks.
But there is dedicated kthread in PTP subsystem designed for such
things. This patch changes the work queue to proper align with PTP
subsystem and to avoid overloading system work queue.
The adjfine() function acts the same way as overflow check worker,
we can postpone ptp aux worker till the next overflow period after
adjfine() was called.
Leo Yang [Tue, 7 Jan 2025 03:15:30 +0000 (11:15 +0800)]
mctp i3c: fix MCTP I3C driver multi-thread issue
We found a timeout problem with the pldm command on our system. The
reason is that the MCTP-I3C driver has a race condition when receiving
multiple-packet messages in multi-thread, resulting in a wrong packet
order problem.
We identified this problem by adding a debug message to the
mctp_i3c_read function.
According to the MCTP spec, a multiple-packet message must be composed
in sequence, and if there is a wrong sequence, the whole message will be
discarded and wait for the next SOM.
For example, SOM → Pkt Seq #2 → Pkt Seq #1 → Pkt Seq #3 → EOM.
Therefore, we try to solve this problem by adding a mutex to the
mctp_i3c_read function. Before the modification, when a command
requesting a multiple-packet message response is sent consecutively, an
error usually occurs within 100 loops. After the mutex, it can go
through 40000 loops without any error, and it seems to run well.
Fixes: c8755b29b58e ("mctp i3c: MCTP I3C driver") Signed-off-by: Leo Yang <Leo-Yang@quantatw.com> Link: https://patch.msgid.link/20250107031529.3296094-1-Leo-Yang@quantatw.com
[pabeni@redhat.com: dropped already answered question from changelog] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 7 Jan 2025 02:29:32 +0000 (18:29 -0800)]
selftests: drv-net: test drivers sleeping in ndo_get_stats64
Most of our tests use rtnetlink to read device stats, so they
don't expose the drivers much to paths in which device stats
are read under RCU. Add tests which hammer profcs reads to
make sure drivers:
- don't sleep while reporting stats,
- can handle parallel reads,
- can handle device going down while reading.
Set ifname on the env class in NetDrvEnv, we already do that
in NetDrvEpEnv.
KTAP version 1
1..7
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
ok 6 stats.procfs_hammer
# completed up/down cycles: 6
ok 7 stats.procfs_downup_hammer
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Kory Maincent [Tue, 7 Jan 2025 14:26:59 +0000 (15:26 +0100)]
dt-bindings: net: pse-pd: Fix unusual character in documentation
The documentation contained an unusual character due to an issue in my
personal b4 setup. Fix the problem by providing the correct PSE Pinout
Alternatives table number description.
Jakub Kicinski [Thu, 9 Jan 2025 03:33:26 +0000 (19:33 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-01-07 (ice, igc)
For ice:
Arkadiusz corrects mask value being used to determine DPLL phase range.
Przemyslaw corrects frequency value for E823 devices.
For igc:
En-Wei Wu adds a check and, early, return for failed register read.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: return early when failing to read EECD register
ice: fix incorrect PHY settings for 100 GB/s
ice: fix max values for dpll pin phase adjust
====================
Jakub Kicinski [Thu, 9 Jan 2025 03:08:18 +0000 (19:08 -0800)]
Merge tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btmtk: Fix failed to send func ctrl for MediaTek devices.
- hci_sync: Fix not setting Random Address when required
- MGMT: Fix Add Device to responding before completing
- btnxpuart: Fix driver sending truncated data
* tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Bluetooth: btnxpuart: Fix driver sending truncated data
Bluetooth: MGMT: Fix Add Device to responding before completing
Bluetooth: hci_sync: Fix not setting Random Address when required
====================
Linus Torvalds [Wed, 8 Jan 2025 19:55:20 +0000 (11:55 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four driver fixes in UFS, mostly to do with power management"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom: Power down the controller/device during system suspend for SM8550/SM8650 SoCs
scsi: ufs: qcom: Allow passing platform specific OF data
scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers
scsi: ufs: qcom: Power off the PHY if it was already powered on in ufs_qcom_power_up_sequence()
====================
There are some bugfix for the HNS3 ethernet driver
There's a series of bugfix that's been accepted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=d80a3091308491455b6501b1c4b68698c4a7cd24
However, The series is making the driver poke into IOMMU internals instead of
implementing appropriate IOMMU workarounds. After discussion, the series was reverted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=249cfa318fb1b77eb726c2ff4f74c9685f04e568
But only two patches are related to the IOMMU.
Other patches involve only the modification of the driver.
This series resends other patches.
Jie Wang [Mon, 6 Jan 2025 14:36:42 +0000 (22:36 +0800)]
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
Currently, HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL. But the tx process would still try to set hardware time
stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash.
Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-8-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Lan [Mon, 6 Jan 2025 14:36:41 +0000 (22:36 +0800)]
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.
The error message:
Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
hclge_fetch_pf_reg+0x138/0x250 [hclge]
hclge_get_regs+0x84/0x1d0 [hclge]
hns3_get_regs+0x2c/0x50 [hns3]
ethtool_get_regs+0xf4/0x270
dev_ethtool+0x674/0x8a0
dev_ioctl+0x270/0x36c
sock_do_ioctl+0x110/0x2a0
sock_ioctl+0x2ac/0x530
__arm64_sys_ioctl+0xa8/0x100
invoke_syscall+0x4c/0x124
el0_svc_common.constprop.0+0x140/0x15c
do_el0_svc+0x30/0xd0
el0_svc+0x1c/0x2c
el0_sync_handler+0xb0/0xb4
el0_sync+0x168/0x180
Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-7-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 6 Jan 2025 14:36:40 +0000 (22:36 +0800)]
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.
Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-6-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 6 Jan 2025 14:36:39 +0000 (22:36 +0800)]
net: hns3: don't auto enable misc vector
Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:
Hao Lan [Mon, 6 Jan 2025 14:36:38 +0000 (22:36 +0800)]
net: hns3: Resolved the issue that the debugfs query result is inconsistent.
This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the
user process is called next time, save_buf is used to copy the cached
data to the user space. As a result, the queried data is stale.
To solve this problem, this patch implements .open() and .release() handler
for debugfs file_operations. moving allocation buffer and execution
of the cmd to the .open() handler and freeing in to the .release() handler.
Allocate separate buffer for each reader and associate the buffer
with the file pointer.
When different user read processes no longer share the buffer,
the stale data problem is fixed.
Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Lan [Mon, 6 Jan 2025 14:36:37 +0000 (22:36 +0800)]
net: hns3: fix missing features due to dev->features configuration too early
Currently, the netdev->features is configured in hns3_nic_set_features.
As a result, __netdev_update_features considers that there is no feature
difference, and the procedures of the real features are missing.
Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Lan [Mon, 6 Jan 2025 14:36:36 +0000 (22:36 +0800)]
net: hns3: fixed reset failure issues caused by the incorrect reset type
When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.
Daniel Borkmann [Tue, 7 Jan 2025 10:14:39 +0000 (11:14 +0100)]
tcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset
This is a follow-up to 3c5b4d69c358 ("net: annotate data-races around
sk->sk_mark"). sk->sk_mark can be read and written without holding
the socket lock. IPv6 equivalent is already covered with READ_ONCE()
annotation in tcp_v6_send_response().
Jakub Kicinski [Mon, 6 Jan 2025 18:01:36 +0000 (10:01 -0800)]
netdev: prevent accessing NAPI instances from another namespace
The NAPI IDs were not fully exposed to user space prior to the netlink
API, so they were never namespaced. The netlink API must ensure that
at the very least NAPI instance belongs to the same netns as the owner
of the genl sock.
napi_by_id() can become static now, but it needs to move because of
dev_get_by_napi_id().
Cc: stable@vger.kernel.org Fixes: 1287c1ae0fc2 ("netdev-genl: Support setting per-NAPI config values") Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi") Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250106180137.1861472-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 8 Jan 2025 18:12:01 +0000 (10:12 -0800)]
Merge tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- dm-array fixes
- dm-verity forward error correction fixes
- remove the flag DM_TARGET_PASSES_INTEGRITY from dm-ebs
- dm-thin RCU list fix
* tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: make get_first_thin use rcu-safe list first function
dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
dm-verity FEC: Avoid copying RS parity bytes twice.
dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)
dm array: fix cursor index when skipping across block boundaries
dm array: fix unreleased btree blocks on closing a faulty array cursor
dm array: fix releasing a faulty array block twice in dm_array_cursor_end
Chris Lu [Wed, 8 Jan 2025 09:50:28 +0000 (17:50 +0800)]
Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Use usb_autopm_get_interface() and usb_autopm_put_interface()
in btmtk_usb_shutdown(), it could send func ctrl after enabling
autosuspend.
Bluetooth: btmtk_usb_hci_wmt_sync() hci0: Execution of wmt command
timed out
Bluetooth: btmtk_usb_shutdown() hci0: Failed to send wmt func ctrl
(-110)
Fixes: 5c5e8c52e3ca ("Bluetooth: btmtk: move btusb_mtk_[setup, shutdown] to btmtk.c") Signed-off-by: Chris Lu <chris.lu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Neeraj Sanjay Kale [Fri, 20 Dec 2024 13:02:52 +0000 (18:32 +0530)]
Bluetooth: btnxpuart: Fix driver sending truncated data
This fixes the apparent controller hang issue seen during stress test
where the host sends a truncated payload, followed by HCI commands. The
controller treats these HCI commands as a part of previously truncated
payload, leading to command timeouts.
Adding a serdev_device_wait_until_sent() call after
serdev_device_write_buf() fixed the issue.
Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets") Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Mon, 25 Nov 2024 20:42:10 +0000 (15:42 -0500)]
Bluetooth: MGMT: Fix Add Device to responding before completing
Add Device with LE type requires updating resolving/accept list which
requires quite a number of commands to complete and each of them may
fail, so instead of pretending it would always work this checks the
return of hci_update_passive_scan_sync which indicates if everything
worked as intended.
Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Mon, 25 Nov 2024 20:42:09 +0000 (15:42 -0500)]
Bluetooth: hci_sync: Fix not setting Random Address when required
This fixes errors such as the following when Own address type is set to
Random Address but it has not been programmed yet due to either be
advertising or connecting:
< HCI Command: LE Set Exte.. (0x08|0x0041) plen 13
Own address type: Random (0x03)
Filter policy: Ignore not in accept list (0x01)
PHYs: 0x05
Entry 0: LE 1M
Type: Passive (0x00)
Interval: 60.000 msec (0x0060)
Window: 30.000 msec (0x0030)
Entry 1: LE Coded
Type: Passive (0x00)
Interval: 180.000 msec (0x0120)
Window: 90.000 msec (0x0090)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Exten.. (0x08|0x0042) plen 6
Extended scan: Enabled (0x01)
Filter duplicates: Enabled (0x01)
Duration: 0 msec (0x0000)
Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
Status: Invalid HCI Command Parameters (0x12)
Fixes: c45074d68a9b ("Bluetooth: Fix not generating RPA when required") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Krister Johansen [Tue, 7 Jan 2025 23:24:58 +0000 (15:24 -0800)]
dm thin: make get_first_thin use rcu-safe list first function
The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() ->
list_first() sequence in RCU safe code. This is because each of these
functions performs its own READ_ONCE() of the list head. This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.
In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path. This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself. The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself. When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.
The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.
Fortunately, the fix here is straight forward. Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.
This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep") Cc: stable@vger.kernel.org Acked-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Mikulas Patocka [Tue, 7 Jan 2025 16:47:01 +0000 (17:47 +0100)]
dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
dm-ebs uses dm-bufio to process requests that are not aligned on logical
sector size. dm-bufio doesn't support passing integrity data (and it is
unclear how should it do it), so we shouldn't set the
DM_TARGET_PASSES_INTEGRITY flag.
Sriram Yagnaraman and Kurt Kanzenbach add support for AF_XDP
zero-copy.
Original cover letter:
The first couple of patches adds helper functions to prepare for AF_XDP
zero-copy support which comes in the last couple of patches, one each
for Rx and TX paths.
As mentioned in v1 patchset [0], I don't have access to an actual IGB
device to provide correct performance numbers. I have used Intel 82576EB
emulator in QEMU [1] to test the changes to IGB driver.
The tests use one isolated vCPU for RX/TX and one isolated vCPU for the
xdp-sock application [2]. Hope these measurements provide at the least
some indication on the increase in performance when using ZC, especially
in the TX path. It would be awesome if someone with a real IGB NIC can
test the patch.
Subsequent changes and information can be found here:
https://lore.kernel.org/intel-wired-lan/20241018-b4-igb_zero_copy-v9-0-da139d78d796@linutronix.de/
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For igc:
Song Yoong Siang allows for the XDP program to be hot-swapped.
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
Joe Damato adds sets IRQ and queues to NAPI instances to allow for
reporting via netdev-genl API.
For ixgbe:
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For ixgbevf:
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For i40e:
Alex implements "mdd-auto-reset-vf" private flag to automatically reset
VFs when encountering an MDD event.
For fm10k:
Dr. David Alan Gilbert removes an unused function.
====================
Joe Damato [Mon, 6 Jan 2025 22:19:22 +0000 (14:19 -0800)]
igc: Link queues to NAPI instances
Link queues to NAPI instances via netdev-genl API so that users can
query this information with netlink. Handle a few cases in the driver:
1. Link/unlink the NAPIs when XDP is enabled/disabled
2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled
Example output when IGC_FLAG_QUEUE_PAIRS is enabled:
To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using
the grub command line option "maxcpus=2" to force
igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS.
Example output when IGC_FLAG_QUEUE_PAIRS is disabled:
Now we examine which queues these NAPIs are associated with, expecting
that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will
have its own NAPI instance: