www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

Ethan Carter Edwards [Sun, 9 Feb 2025 04:06:21 +0000 (23:06 -0500)]

hamradio: baycom: replace strcpy() with strscpy()

The strcpy() function has been deprecated and replaced with strscpy().
There is an effort to make this change treewide:
https://github.com/KSPP/linux/issues/88.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/3qo3fbrak7undfgocsi2s74v4uyjbylpdqhie4dohfoh4welfn@joq7up65ug6v
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Tamir Duberstein [Sat, 8 Feb 2025 19:26:43 +0000 (14:26 -0500)]

blackhole_dev: convert self-test to KUnit

Convert this very simple smoke test to a KUnit test.

Add a missing `htons` call that was spotted[0] by kernel test robot
<lkp@intel.com> after initial conversion to KUnit.

Link: https://lore.kernel.org/oe-kbuild-all/202502090223.qCYMBjWT-lkp@intel.com/
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20250208-blackholedev-kunit-convert-v2-1-182db9bd56ec@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 11 Feb 2025 23:19:13 +0000 (15:19 -0800)]

Merge branch 'net-phy-rename-eee_broken_mode'

Heiner Kallweit says:

====================
net: phy: rename eee_broken_mode

eee_broken_mode is used also if an EEE mode isn't actually broken
but e.g. not supported by MAC. So rename it.

This is split out from a bigger series that needs more rework.
====================

Link: https://patch.msgid.link/d7924d4e-49b0-4182-831f-73c558d4425e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Mon, 10 Feb 2025 20:50:10 +0000 (21:50 +0100)]

net: phy: rename phy_set_eee_broken to phy_disable_eee_mode

Consider that an EEE mode may not be broken but simply not supported
by the MAC, and rename function phy_set_eee_broken().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/30deb630-3f6b-4ffb-a1e6-a9736021f43a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Mon, 10 Feb 2025 20:49:22 +0000 (21:49 +0100)]

net: phy: rename eee_broken_modes to eee_disabled_modes

This bitmap is used also if the MAC doesn't support an EEE mode.
So the mode isn't necessarily broken in the PHY. Therefore rename
the bitmap.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6cd11422-dd67-4c87-a642-308de694af92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 11 Feb 2025 12:08:02 +0000 (13:08 +0100)]

Merge branch 'tcp-allow-to-reduce-max-rto'

Eric Dumazet says:

====================
tcp: allow to reduce max RTO

This is a followup of a discussion started 6 months ago
by Jason Xing.

Some applications want to lower the time between each
retransmit attempts.

TCP_KEEPINTVL and TCP_KEEPCNT socket options don't
work around the issue.

This series adds:

- a new TCP level socket option (TCP_RTO_MAX_MS)
- a new sysctl (/proc/sys/net/ipv4/tcp_rto_max_ms)

Admins and/or applications can now change the max rto value
at their own risk.

Link: https://lore.kernel.org/netdev/20240715033118.32322-1-kerneljasonxing@gmail.com/T/
====================

Link: https://patch.msgid.link/20250207152830.2527578-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 7 Feb 2025 15:28:30 +0000 (15:28 +0000)]

tcp: add tcp_rto_max_ms sysctl

Previous patch added a TCP_RTO_MAX_MS socket option
to tune a TCP socket max RTO value.

Many setups prefer to change a per netns sysctl.

This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms

Its initial value is 120000 (120 seconds).

Keep in mind that a decrease of tcp_rto_max_ms
means shorter overall timeouts, unless tcp_retries2
sysctl is increased.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 7 Feb 2025 15:28:29 +0000 (15:28 +0000)]

tcp: add the ability to control max RTO

Currently, TCP stack uses a constant (120 seconds)
to limit the RTO value exponential growth.

Some applications want to set a lower value.

Add TCP_RTO_MAX_MS socket option to set a value (in ms)
between 1 and 120 seconds.

It is discouraged to change the socket rto max on a live
socket, as it might lead to unexpected disconnects.

Following patch is adding a netns sysctl to control the
default value at socket creation time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 7 Feb 2025 15:28:28 +0000 (15:28 +0000)]

tcp: use tcp_reset_xmit_timer()

In order to reduce TCP_RTO_MAX occurrences, replace:

inet_csk_reset_xmit_timer(sk, what, when, TCP_RTO_MAX)

With:

tcp_reset_xmit_timer(sk, what, when, false);

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 7 Feb 2025 15:28:27 +0000 (15:28 +0000)]

tcp: add a @pace_delay parameter to tcp_reset_xmit_timer()

We want to factorize calls to inet_csk_reset_xmit_timer(),
to ease TCP_RTO_MAX change.

Current users want to add tcp_pacing_delay(sk)
to the timeout.

Remaining calls to inet_csk_reset_xmit_timer()
do not add the pacing delay. Following patch
will convert them, passing false for @pace_delay.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 7 Feb 2025 15:28:26 +0000 (15:28 +0000)]

tcp: remove tcp_reset_xmit_timer() @max_when argument

All callers use TCP_RTO_MAX, we can factorize this constant,
becoming a variable soon.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 11 Feb 2025 11:46:39 +0000 (12:46 +0100)]

Merge branch 'mptcp-pm-misc-cleanups-part-2'

Matthieu Baerts says:

====================
mptcp: pm: misc cleanups, part 2

These cleanups lead the way to the unification of the path-manager
interfaces, and allow future extensions. The following patches are not
all linked to each others, but are all related to the path-managers.

- Patch 1: drop unneeded parameter in a function helper.

- Patch 2: clearer NL error message when an NL attribute is missing.

- Patch 3: more precise NL messages by avoiding 'this or that is NOK'.

- Patch 4: improve too vague or missing NL err messages.

- Patch 5: use GENL_REQ_ATTR_CHECK to look for mandatory NL attributes.

- Patch 6: avoid overriding the error message.

- Patch 7: check all mandatory NL attributes with GENL_REQ_ATTR_CHECK.

- Patch 8: use NL_SET_ERR_MSG_ATTR instead of GENL_SET_ERR_MSG

- Patch 9: move doit callbacks used for both PM to pm.c.

- Patch 10: drop another unneeded parameter in a function helper.

- Patch 11: share the ID parsing code for the 'get_addr' callback.

- Patch 12: share sending NL code for the 'get_addr' callback.

- Patch 13: drop yet another unneeded parameter in a function helper.

- Patch 14: pick the usual structure type for the remote address.

- Patch 15: share the local addr parsing code for the 'set_flags' cb.

The behaviour when there are no errors should then not be modified.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

v2: https://lore.kernel.org/r/20250117-net-next-mptcp-pm-misc-cleanup-2-v2-0-61d4fe0586e8@kernel.org
v1: https://lore.kernel.org/r/20250116-net-next-mptcp-pm-misc-cleanup-2-v1-0-c0b43f18fe06@kernel.org

Link: https://patch.msgid.link/20250207-net-next-mptcp-pm-misc-cleanup-2-v3-0-71753ed957de@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:33 +0000 (14:59 +0100)]

mptcp: pm: add local parameter for set_flags

This patch updates the interfaces set_flags to reduce repetitive
code, adds a new parameter 'local' for them.

The local address is parsed in public helper mptcp_pm_nl_set_flags_doit(),
then pass it to mptcp_pm_nl_set_flags() and mptcp_userspace_pm_set_flags().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:32 +0000 (14:59 +0100)]

mptcp: pm: change rem type of set_flags

Generally, in the path manager interfaces, the local address is defined
as an mptcp_pm_addr_entry type address, while the remote address is
defined as an mptcp_addr_info type one:

(struct mptcp_pm_addr_entry *local, struct mptcp_addr_info *remote)

But the set_flags() interface uses two mptcp_pm_addr_entry type
parameters.

This patch changes the second one to mptcp_addr_info type and use helper
mptcp_pm_parse_addr() to parse it instead of using mptcp_pm_parse_entry().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:31 +0000 (14:59 +0100)]

mptcp: pm: drop skb parameter of set_flags

The first parameter 'skb' in mptcp_pm_nl_set_flags() is only used to
obtained the network namespace, which can also be obtained through the
second parameters 'info' by using genl_info_net() helper.

This patch drops these useless parameters 'skb' in all three set_flags()
interfaces.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:30 +0000 (14:59 +0100)]

mptcp: pm: reuse sending nlmsg code in get_addr

The netlink messages are sent both in mptcp_pm_nl_get_addr() and
mptcp_userspace_pm_get_addr(), this makes the code somewhat repetitive.
This is because the netlink PM and userspace PM use different locks to
protect the address entry that needs to be sent via the netlink message.
The former uses rcu read lock, and the latter uses msk->pm.lock.

The current get_addr() flow looks like this:

lock();
entry = get_entry();
send_nlmsg(entry);
unlock();

After holding the lock, get the entry from the list, send the entry, and
finally release the lock.

This patch changes the process by getting the entry while holding the lock,
then making a copy of the entry so that the lock can be released. Finally,
the copy of the entry is sent without locking:

lock();
entry = get_entry();
*copy = *entry;
unlock();

send_nlmsg(copy);

This way we can reuse the send_nlmsg() code in get_addr() interfaces
between the netlink PM and userspace PM. They only need to implement their
own get_addr() interfaces to hold the different locks, get the entry from
the different lists, then release the locks.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:29 +0000 (14:59 +0100)]

mptcp: pm: add id parameter for get_addr

The address id is parsed both in mptcp_pm_nl_get_addr() and
mptcp_userspace_pm_get_addr(), this makes the code somewhat repetitive.

So this patch adds a new parameter 'id' for all get_addr() interfaces.
The address id is only parsed in mptcp_pm_nl_get_addr_doit(), then pass
it to both mptcp_pm_nl_get_addr() and mptcp_userspace_pm_get_addr().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:28 +0000 (14:59 +0100)]

mptcp: pm: drop skb parameter of get_addr

The first parameters 'skb' of get_addr() interfaces are now useless
since mptcp_userspace_pm_get_sock() helper is used. This patch drops
these useless parameters of them.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:27 +0000 (14:59 +0100)]

mptcp: pm: make three pm wrappers static

Three netlink functions:

mptcp_pm_nl_get_addr_doit()
mptcp_pm_nl_get_addr_dumpit()
mptcp_pm_nl_set_flags_doit()

are generic, implemented for each PM, in-kernel PM and userspace PM. It's
clearer to move them from pm_netlink.c to pm.c.

And the linked three path manager wrappers

mptcp_pm_get_addr()
mptcp_pm_dump_addr()
mptcp_pm_set_flags()

can be changed as static functions, no need to export them in protocol.h.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:26 +0000 (14:59 +0100)]

mptcp: pm: use NL_SET_ERR_MSG_ATTR when possible

Instead of only returning a text message with GENL_SET_ERR_MSG(),
NL_SET_ERR_MSG_ATTR() can help the userspace developers by also
reporting which attribute is faulty.

When the error is specific to an attribute, NL_SET_ERR_MSG_ATTR() is now
used. The error messages have not been modified in this commit.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:25 +0000 (14:59 +0100)]

mptcp: pm: mark missing address attributes

mptcp_pm_parse_entry() will check if the given attribute is defined. If
not, it will return a generic error: "missing address info".

It might then not be clear for the userspace developer which attribute
is missing, especially when the command takes multiple addresses.

By using GENL_REQ_ATTR_CHECK(), the userspace will get a hint about
which attribute is missing, making thing clearer. Note that this is what
was already done for most of the other MPTCP NL commands, this patch
simply adds the missing ones.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:24 +0000 (14:59 +0100)]

mptcp: pm: remove duplicated error messages

mptcp_pm_parse_entry() and mptcp_pm_parse_addr() will already set a
error message in case of parsing issue.

Then, no need to override this error message with another less precise
one: "error parsing address".

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:23 +0000 (14:59 +0100)]

mptcp: pm: userspace: use GENL_REQ_ATTR_CHECK

A more general way to check if MPTCP_PM_ATTR_* exists in 'info'
is to use GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ATTR_*) instead of
directly reading info->attrs[MPTCP_PM_ATTR_*] and then checking
if it's NULL.

So this patch uses GENL_REQ_ATTR_CHECK() for userspace PM in
mptcp_pm_nl_announce_doit(), mptcp_pm_nl_remove_doit(),
mptcp_pm_nl_subflow_create_doit(), mptcp_pm_nl_subflow_destroy_doit()
and mptcp_userspace_pm_get_sock().

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:22 +0000 (14:59 +0100)]

mptcp: pm: improve error messages

Some error messages were:

- too generic: "missing input", "invalid request"

- not precise enough: "limit greater than maximum" but what's the max?

- missing: subflow not found, or connect error.

This can be easily improved by being more precise, or adding new error
messages.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:21 +0000 (14:59 +0100)]

mptcp: pm: more precise error messages

Some errors reported by the userspace PM were vague: "this or that is
invalid".

It is easier for the userspace to know which part is wrong, instead of
having to guess that.

While at it, in mptcp_userspace_pm_set_flags() move the parsing after
the check linked to the local attribute.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 7 Feb 2025 13:59:20 +0000 (14:59 +0100)]

mptcp: pm: userspace: flags: clearer msg if no remote addr

Since its introduction in commit 892f396c8e68 ("mptcp: netlink: issue
MP_PRIO signals from userspace PMs"), it was mandatory to specify the
remote address, because of the 'if (rem->addr.family == AF_UNSPEC)'
check done later one.

In theory, this attribute can be optional, but it sounds better to be
precise to avoid sending the MP_PRIO on the wrong subflow, e.g. if there
are multiple subflows attached to the same local ID. This can be relaxed
later on if there is a need to act on multiple subflows with one
command.

For the moment, the check to see if attr_rem is NULL can be removed,
because mptcp_pm_parse_entry() will do this check as well, no need to do
that differently here.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Geliang Tang [Fri, 7 Feb 2025 13:59:19 +0000 (14:59 +0100)]

mptcp: pm: drop info of userspace_pm_remove_id_zero_address

The only use of 'info' parameter of userspace_pm_remove_id_zero_address()
is to set an error message into it.

Plus, this helper will only fail when it cannot find any subflows with a
local address ID 0.

This patch drops this parameter and sets the error message where this
function is called in mptcp_pm_nl_remove_doit().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Yuyang Huang [Fri, 7 Feb 2025 11:08:36 +0000 (20:08 +0900)]

selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support

This change introduces a new selftest case to verify the functionality
of dumping IPv4 multicast addresses using the RTM_GETMULTICAST netlink
message. The test utilizes the ynl library to interact with the
netlink interface and validate that the kernel correctly reports the
joined IPv4 multicast addresses.

To run the test, execute the following command:

$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net \
TEST_PROGS=rtnetlink.py TEST_GEN_PROGS="" run_tests

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://patch.msgid.link/20250207110836.2407224-2-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Yuyang Huang [Fri, 7 Feb 2025 11:08:35 +0000 (20:08 +0900)]

netlink: support dumping IPv4 multicast addresses

Extended RTM_GETMULTICAST to support dumping joined IPv4 multicast
addresses, in addition to the existing IPv6 functionality. This allows
userspace applications to retrieve both IPv4 and IPv6 multicast
addresses through similar netlink command and then monitor future
changes by registering to RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://patch.msgid.link/20250207110836.2407224-1-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Csókás, Bence [Fri, 7 Feb 2025 12:12:55 +0000 (13:12 +0100)]

net: fec: Refactor MAC reset to function

The core is reset both in `fec_restart()` (called on link-up) and
`fec_stop()` (going to sleep, driver remove etc.). These two functions
had their separate implementations, which was at first only a register
write and a `udelay()` (and the accompanying block comment). However,
since then we got soft-reset (MAC disable) and Wake-on-LAN support, which
meant that these implementations diverged, often causing bugs.

For instance, as of now, `fec_stop()` does not check for
`FEC_QUIRK_NO_HARD_RESET`, meaning the MII/RMII mode is cleared on eg.
a PM power-down event; and `fec_restart()` missed the refactor renaming
the "magic" constant `1` to `FEC_ECR_RESET`.

To harmonize current implementations, and eliminate this source of
potential future bugs, refactor implementation to a common function.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Link: https://patch.msgid.link/20250207121255.161146-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Fri, 7 Feb 2025 18:00:44 +0000 (19:00 +0100)]

mlxsw: Enable Tx checksum offload

The device is able to checksum plain TCP / UDP packets over IPv4 / IPv6
when the 'ipcs' bit in the send descriptor is set. Advertise support for
the 'NETIF_F_IP{,6}_CSUM' features in net devices registered by the
driver and VLAN uppers and set the 'ipcs' bit when the stack requests Tx
checksum offload.

Note that the device also calculates the IPv4 checksum, but it first
zeroes the current checksum so there should not be any difference
compared to the checksum calculated by the kernel.

On SN5600 (Spectrum-4) there is about 10% improvement in Tx packet rate
with 1400 byte packets when using pktgen.

Tested on Spectrum-{1,2,3,4} with all the combinations of IPv4 / IPv6,
TCP / UDP, with and without VLAN.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8dc86c95474ce10572a0fa83b8adb0259558e982.1738950446.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 7 Feb 2025 18:41:40 +0000 (10:41 -0800)]

selftests: drv-net: add helper for path resolution

Refering to C binaries from Python code is going to be a common
need. Add a helper to convert from path in relation to the test.
Meaning, if the test is in the same directory as the binary, the
call would be simply: cfg.rpath("binary").

The helper name "rpath" is not great. I can't think of a better
name that would be accurate yet concise.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250207184140.1730466-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 7 Feb 2025 18:41:39 +0000 (10:41 -0800)]

selftests: drv-net: factor out a DrvEnv base class

We have separate Env classes for local tests and tests with a remote
endpoint. Make it easier to share the code by creating a base class.
Make env loading a method of this class.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250207184140.1730466-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 7 Feb 2025 18:31:19 +0000 (10:31 -0800)]

selftests: drv-net: remove an unnecessary libmnl include

ncdevmem doesn't need libmnl, remove the unnecessary include.

Since YNL doesn't depend on libmnl either, any more, it's actually
possible to build selftests without having libmnl installed.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250207183119.1721424-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 11 Feb 2025 03:08:55 +0000 (19:08 -0800)]

Merge branch 'fib-rules-convert-rtm_newrule-and-rtm_delrule-to-per-netns-rtnl'

Kuniyuki Iwashima says:

====================
fib: rules: Convert RTM_NEWRULE and RTM_DELRULE to per-netns RTNL.

Patch 1 ~ 2 are small cleanup, and patch 3 ~ 8 make fib_nl_newrule()
and fib_nl_delrule() hold per-netns RTNL.

v1: https://lore.kernel.org/20250206084629.16602-1-kuniyu@amazon.com
====================

Link: https://patch.msgid.link/20250207072502.87775-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:25:02 +0000 (16:25 +0900)]

net: fib_rules: Convert RTM_DELRULE to per-netns RTNL.

fib_nl_delrule() is the doit() handler for RTM_DELRULE but also called
from vrf_newlink() in case something fails in vrf_add_fib_rules().

In the latter case, RTNL is already held and the 4th arg is true.

Let's hold per-netns RTNL in fib_delrule() if rtnl_held is false.

Now we can place ASSERT_RTNL_NET() in call_fib_rule_notifiers().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:25:01 +0000 (16:25 +0900)]

net: fib_rules: Add error_free label in fib_delrule().

We will hold RTNL just before calling fib_nl2rule_rtnl() in
fib_delrule() and release it before kfree(nlrule).

Let's add a new rule to make the following change cleaner.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:25:00 +0000 (16:25 +0900)]

net: fib_rules: Convert RTM_NEWRULE to per-netns RTNL.

fib_nl_newrule() is the doit() handler for RTM_NEWRULE but also called
from vrf_newlink().

In the latter case, RTNL is already held and the 4th arg is true.

Let's hold per-netns RTNL in fib_newrule() if rtnl_held is false.

Note that we call fib_rule_get() before releasing per-netns RTNL to call
notify_rule_change() without RTNL and prevent freeing the new rule.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:24:59 +0000 (16:24 +0900)]

net: fib_rules: Factorise fib_newrule() and fib_delrule().

fib_nl_newrule() / fib_nl_delrule() is the doit() handler for
RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink().

Currently, we hold RTNL on both paths but will not on the former.

Also, we set dev_net(dev)->rtnl to skb->sk in vrf_fib_rule() because
fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb->sk).

Let's Factorise the two functions and pass net and rtnl_held flag.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:24:58 +0000 (16:24 +0900)]

ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().

The following patch will not set skb->sk from VRF path.

Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk)
in fib[46]_rule_configure().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:24:57 +0000 (16:24 +0900)]

net: fib_rules: Split fib_nl2rule().

We will move RTNL down to fib_nl_newrule() and fib_nl_delrule().

Some operations in fib_nl2rule() require RTNL: fib_default_rule_pref()
and __dev_get_by_name().

Let's split the RTNL parts as fib_nl2rule_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:24:56 +0000 (16:24 +0900)]

net: fib_rules: Pass net to fib_nl2rule() instead of skb.

skb is not used in fib_nl2rule() other than sock_net(skb->sk),
which is already available in callers, fib_nl_newrule() and
fib_nl_delrule().

Let's pass net directly to fib_nl2rule().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 7 Feb 2025 07:24:55 +0000 (16:24 +0900)]

net: fib_rules: Don't check net in rule_exists() and rule_find().

fib_nl_newrule() / fib_nl_delrule() looks up struct fib_rules_ops
in sock_net(skb->sk) and calls rule_exists() / rule_find() respectively.

fib_nl_newrule() creates a new rule and links it to the found ops, so
struct fib_rule never belongs to a different netns's ops->rules_list.

Let's remove redundant netns check in rule_exists() and rule_find().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 11 Feb 2025 03:07:13 +0000 (19:07 -0800)]

Merge branch 'tun-unify-vnet-implementation'

Akihiko Odaki says:

====================
tun: Unify vnet implementation

When I implemented virtio's hash-related features to tun/tap [1],
I found tun/tap does not fill the entire region reserved for the virtio
header, leaving some uninitialized hole in the middle of the buffer
after read()/recvmesg().

This series fills the uninitialized hole. More concretely, the
num_buffers field will be initialized with 1, and the other fields will
be inialized with 0. Setting the num_buffers field to 1 is mandated by
virtio 1.0 [2].

The change to virtio header is preceded by another change that refactors
tun and tap to unify their virtio-related code.

[1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
[2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/
====================

Link: https://patch.msgid.link/20250207-tun-v6-0-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:57 +0000 (15:10 +0900)]

tap: Use tun's vnet-related code

tun and tap implements the same vnet-related features so reuse the code.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-7-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:56 +0000 (15:10 +0900)]

tap: Keep hdr_len in tap_get_user()

hdr_len is repeatedly used so keep it in a local variable.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-6-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:55 +0000 (15:10 +0900)]

tun: Extract the vnet handling code

The vnet handling code will be reused by tap.

Functions are renamed to ensure that their names contain "vnet" to
clarify that they are part of the decoupled vnet handling code.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-5-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:54 +0000 (15:10 +0900)]

tun: Decouple vnet handling

Decouple the vnet handling code so that we can reuse it for tap.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-4-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:53 +0000 (15:10 +0900)]

tun: Decouple vnet from tun_struct

Decouple vnet-related functions from tun_struct so that we can reuse
them for tap in the future.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-3-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:52 +0000 (15:10 +0900)]

tun: Keep hdr_len in tun_get_user()

hdr_len is repeatedly used so keep it in a local variable.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-2-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Akihiko Odaki [Fri, 7 Feb 2025 06:10:51 +0000 (15:10 +0900)]

tun: Refactor CONFIG_TUN_VNET_CROSS_LE

Check IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) to save some lines and make
future changes easier.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250207-tun-v6-1-fb49cf8b103e@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 11 Feb 2025 02:53:43 +0000 (18:53 -0800)]

Merge branch 'net-xilinx-axienet-enable-adaptive-irq-coalescing-with-dim'

Sean Anderson says:

====================
net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM

To improve performance without sacrificing latency under low load,
enable DIM. While I appreciate not having to write the library myself, I
do think there are many unusual aspects to DIM, as detailed in the last
patch.
====================

Link: https://patch.msgid.link/20250206201036.1516800-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sean Anderson [Thu, 6 Feb 2025 20:10:36 +0000 (15:10 -0500)]

net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM

The default RX IRQ coalescing settings of one IRQ per packet can represent
a significant CPU load. However, increasing the coalescing unilaterally
can result in undesirable latency under low load. Adaptive IRQ
coalescing with DIM offers a way to adjust the coalescing settings based
on load.

This device only supports "CQE" mode [1], where each packet resets the
timer. Therefore, an interrupt is fired either when we receive
coalesce_count_rx packets or when the interface is idle for
coalesce_usec_rx. With this in mind, consider the following scenarios:

Link saturated
    Here we want to set coalesce_count_rx to a large value, in order to
    coalesce more packets and reduce CPU load. coalesce_usec_rx should
    be set to at least the time for one packet. Otherwise the link will
    be "idle" and we will get an interrupt for each packet anyway.

Bursts of packets
    Each burst should be coalesced into a single interrupt, although it
    may be prudent to reduce coalesce_count_rx for better latency.
    coalesce_usec_rx should be set to at least the time for one packet
    so bursts are coalesced. However, additional time beyond the packet
    time will just increase latency at the end of a burst.

Sporadic packets
    Due to low load, we can set coalesce_count_rx to 1 in order to
    reduce latency to the minimum. coalesce_usec_rx does not matter in
    this case.

Based on this analysis, I expected the CQE profiles to look something
like

usec =  0, pkts = 1   // Low load
usec = 16, pkts = 4
usec = 16, pkts = 16
usec = 16, pkts = 64
usec = 16, pkts = 256 // High load

Where usec is set to 16 to be a few us greater than the 12.3 us packet
time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is
instead

usec =  2, pkts = 256 // Low load
usec =  8, pkts = 128
usec = 16, pkts =  64
usec = 32, pkts =  64
usec = 64, pkts =  64 // High load

I found this very surprising. The number of coalesced packets
*decreases* as load increases. But as load increases we have more
opportunities to coalesce packets without affecting latency as much.
Additionally, the profile *increases* the usec as the load increases.
But as load increases, the gaps between packets will tend to become
smaller, making it possible to *decrease* usec for better latency at the
end of a "burst".

I consider the default CQE profile unsuitable for this NIC. Therefore,
we use the first profile outlined in this commit instead.
coalesce_usec_rx is set to 16 by default, but the user can customize it.
This may be necessary if they are using jumbo frames. I think adjusting
the profile times based on the link speed/mtu would be good improvement
for generic DIM.

In addition to the above profile problems, I noticed the following
additional issues with DIM while testing:

- DIM tends to "wander" when at low load, since the performance gradient
  is pretty flat. If you only have 10p/ms anyway then adjusting the
  coalescing settings will not affect throughput very much.
- DIM takes a long time to adjust back to low indices when load is
  decreased following a period of high load. This is because it only
  re-evaluates its settings once every 64 interrupts. However, at low
  load 64 interrupts can be several seconds.

Finally: performance. This patch increases receive throughput with
iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from
69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from
43% to 9%.

[1] Who names this stuff?

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sean Anderson [Thu, 6 Feb 2025 20:10:35 +0000 (15:10 -0500)]

net: xilinx: axienet: Get coalesce parameters from driver state

The cr variables now contain the same values as the control registers
themselves. Extract/calculate the values from the variables instead of
saving the user-specified values. This allows us to remove some
bookeeping, and also lets the user know what the actual coalesce
settings are.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-4-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sean Anderson [Thu, 6 Feb 2025 20:10:34 +0000 (15:10 -0500)]

net: xilinx: axienet: Support adjusting coalesce settings while running

In preparation for adaptive IRQ coalescing, we first need to support
adjusting the settings at runtime. The existing code doesn't require any
locking because

- dma_start is the only function that modifies rx/tx_dma_cr. It is
  always called with IRQs and NAPI disabled, so nothing else is touching
  the hardware.
- The IRQs don't race with poll, since the latter is a softirq.
- The IRQs don't race with dma_stop since they both just clear the
  control registers.
- dma_stop doesn't race with poll since the former is called with NAPI
  disabled.

However, once we introduce another function that modifies rx/tx_dma_cr,
we need to have some locking to prevent races. Introduce two locks to
protect these variables and their registers.

The control register values are now generated where the coalescing
settings are set. Converting coalescing settings to control register
values may require sleeping because of clk_get_rate. However, the
read/modify/write of the control registers themselves can't sleep
because it needs to happen in IRQ context. By pre-calculating the
control register values, we avoid introducing an additional mutex.

Since axienet_dma_start writes the control settings when it runs, we
don't bother updating the CR registers when rx/tx_dma_started is false.
This prevents any issues from writing to the control registers in the
middle of a reset sequence.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sean Anderson [Thu, 6 Feb 2025 20:10:33 +0000 (15:10 -0500)]

net: xilinx: axienet: Combine CR calculation

Combine the common parts of the CR calculations for better code reuse.
While we're at it, simplify the code a bit.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Aleksander Jan Bajkowski [Thu, 6 Feb 2025 22:40:33 +0000 (23:40 +0100)]

r8152: add vendor/device ID pair for Dell Alienware AW1022z

The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller.

Add the vendor and product ID values to the driver. This makes Ethernet
work with the adapter.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 11 Feb 2025 01:54:45 +0000 (17:54 -0800)]

Merge branch 'xsk-the-lost-bits-from-chapter-iii'

Alexander Lobakin says:

====================
xsk: the lost bits from Chapter III

Before introducing libeth_xdp, we need to add a couple more generic
helpers. Notably:

* 01: add generic loop unrolling hint helpers;
* 04: add helper to get both xdp_desc's DMA address and metadata
  pointer in one go, saving several cycles and hotpath object
  code size in drivers (especially when unrolling).

Bonus:

* 02, 03: convert two drivers which were using custom macros to
  generic unrolled_count() (trivial, no object code changes).
====================

Link: https://patch.msgid.link/20250206182630.3914318-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Lobakin [Thu, 6 Feb 2025 18:26:29 +0000 (19:26 +0100)]

xsk: add helper to get &xdp_desc's DMA and meta pointer in one go

Currently, when your driver supports XSk Tx metadata and you want to
send an XSk frame, you need to do the following:

* call external xsk_buff_raw_get_dma();
* call inline xsk_buff_get_metadata(), which calls external
xsk_buff_raw_get_data() and then do some inline checks.

This effectively means that the following piece:

addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;

is done twice per frame, plus you have 2 external calls per frame, plus
this:

meta = pool->addrs + addr - pool->tx_metadata_len;
if (unlikely(!xsk_buff_valid_tx_metadata(meta)))

is always inlined, even if there's no meta or it's invalid.

Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
in one go. It returns a small structure with 2 fields: DMA address,
filled unconditionally, and metadata pointer, non-NULL only if it's
present and valid. The address correction is performed only once and
you also have only 1 external call per XSk frame, which does all the
calculations and checks outside of your hotpath. You only need to
check `if (ctx.meta)` for the metadata presence.
To not copy any existing code, derive address correction and getting
virtual and DMA address into small helpers. bloat-o-meter reports no
object code changes for the existing functionality.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250206182630.3914318-5-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Lobakin [Thu, 6 Feb 2025 18:26:28 +0000 (19:26 +0100)]

ice: use generic unrolled_count() macro

ice, same as i40e, has a custom loop unrolling macros for unrolling
Tx descriptors filling on XSk xmit.
Replace ice defs with generic unrolled_count(), which is also more
convenient as it allows passing defines as its argument, not hardcoded
values, while the loop declaration will still be usual for-loop.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250206182630.3914318-4-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Lobakin [Thu, 6 Feb 2025 18:26:27 +0000 (19:26 +0100)]

i40e: use generic unrolled_count() macro

i40e, as well as ice, has a custom loop unrolling macro for unrolling
Tx descriptors filling on XSk xmit.
Replace i40e defs with generic unrolled_count(), which is also more
convenient as it allows passing defines as its argument, not hardcoded
values, while the loop declaration will still be a usual for-loop.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250206182630.3914318-3-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Lobakin [Thu, 6 Feb 2025 18:26:26 +0000 (19:26 +0100)]

unroll: add generic loop unroll helpers

There are cases when we need to explicitly unroll loops. For example,
cache operations, filling DMA descriptors on very high speeds etc.
Add compiler-specific attribute macros to give the compiler a hint
that we'd like to unroll a loop.
Example usage:

#define UNROLL_BATCH 8

unrolled_count(UNROLL_BATCH)
for (u32 i = 0; i < UNROLL_BATCH; i++)
op(priv, i);

Note that sometimes the compilers won't unroll loops if they think this
would have worse optimization and perf than without unrolling, and that
unroll attributes are available only starting GCC 8. For older compiler
versions, no hints/attributes will be applied.
For better unrolling/parallelization, don't have any variables that
interfere between iterations except for the iterator itself.

Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250206182630.3914318-2-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Wed, 5 Feb 2025 10:38:46 +0000 (11:38 +0100)]

net: phy: dp83td510: introduce LED framework support

Add LED brightness, mode, HW control and polarity functions to enable
external LED control in the TI DP83TD510 PHY.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250205103846.2273833-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Mon, 10 Feb 2025 16:26:54 +0000 (08:26 -0800)]

Merge branch 'eth-fbnic-support-rss-contexts-and-ntuple-filters'

Jakub Kicinski says:

====================
eth: fbnic: support RSS contexts and ntuple filters

Add support for RSS contexts and ntuple filters in fbnic.
The device has only one context, intended for use by TCP zero-copy Rx.

First two patches add a check we seem to be missing in the core,
to avoid having to copy it to all drivers.

  $ ./drivers/net/hw/rss_ctx.py
  KTAP version 1
  1..16
  ok 1 rss_ctx.test_rss_key_indir
  ok 2 rss_ctx.test_rss_queue_reconfigure
  ok 3 rss_ctx.test_rss_resize
  ok 4 rss_ctx.test_hitless_key_update
  ok 5 rss_ctx.test_rss_context
  # Failed to create context 2, trying to test what we got
  ok 6 rss_ctx.test_rss_context4 # SKIP Tested only 1 contexts, wanted 4
  # Increasing queue count 44 -> 66
  # Failed to create context 2, trying to test what we got
  ok 7 rss_ctx.test_rss_context32 # SKIP Tested only 1 contexts, wanted 32
  # Added only 1 out of 3 contexts
  ok 8 rss_ctx.test_rss_context_dump
  # Driver does not support rss + queue offset
  ok 9 rss_ctx.test_rss_context_queue_reconfigure
  ok 10 rss_ctx.test_rss_context_overlap
  ok 11 rss_ctx.test_rss_context_overlap2 # SKIP Test requires at least 2 contexts, but device only has 1
  ok 12 rss_ctx.test_rss_context_out_of_order # SKIP Test requires at least 4 contexts, but device only has 1
  # Failed to create context 2, trying to test what we got
  ok 13 rss_ctx.test_rss_context4_create_with_cfg # SKIP Tested only 1 contexts, wanted 4
  ok 14 rss_ctx.test_flow_add_context_missing
  ok 15 rss_ctx.test_delete_rss_context_busy
  ok 16 rss_ctx.test_rss_ntuple_addition # SKIP Ntuple filter with RSS and nonzero action not supported
  # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:6 error:0
====================

Link: https://patch.msgid.link/20250206235334.1425329-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Duyck [Thu, 6 Feb 2025 23:53:34 +0000 (15:53 -0800)]

eth: fbnic: support listing tcam content via debugfs

The device has a handful of relatively small TCAM tables,
support dumping the driver state via debugfs.

  # ethtool -N eth0 flow-type tcp6 \
      dst-ip 1111::2222 dst-port $((0x1122)) \
      src-ip 3333::4444 src-port $((0x3344)) \
      action 2
  Added rule with ID 47

  # cd $dbgfs
  # cat ip_src
  Idx S TCAM Bitmap       V Addr/Mask
  ------------------------------------
  00  1 00020000,00000000 6 33330000000000000000000000004444
                            00000000000000000000000000000000
  ...
  # cat ip_dst
  Idx S TCAM Bitmap       V Addr/Mask
  ------------------------------------
  00  1 00020000,00000000 6 11110000000000000000000000002222
                            00000000000000000000000000000000
  ...

  # cat act_tcam
  Idx S Value/Mask                                              RSS  Dest
  ------------------------------------------------------------------------
  ...
  49  1 0000 0000 0000 0000 0000 0000 1122 3344 0000 9c00 0088  000f 00000212
        ffff ffff ffff ffff ffff ffff 0000 0000 ffff 23ff ff00
  ...

The ipo_* tables are for outer IP addresses.
The tce_* table is for directing/stealing traffic to NC-SI.

Signed-off-by: Alexander Duyck <alexanderduyck@meta.com>
Link: https://patch.msgid.link/20250206235334.1425329-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 23:53:33 +0000 (15:53 -0800)]

selftests: drv-net: rss_ctx: skip tests which need multiple contexts cleanly

There's no good API to check how many contexts device supports.
But initial tests sense the context count already, so just store
that number and skip tests which we know need more.

Link: https://patch.msgid.link/20250206235334.1425329-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Duyck [Thu, 6 Feb 2025 23:53:32 +0000 (15:53 -0800)]

eth: fbnic: support n-tuple filters

Add ethtool -n / -N support. Support only "un-ordered" rule sets
(RX_CLS_LOC_ANY), just for simplicity of the code. It's unclear
anyone actually cares about the rule ordering.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20250206235334.1425329-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Alexander Duyck [Thu, 6 Feb 2025 23:53:31 +0000 (15:53 -0800)]

eth: fbnic: add IP TCAM programming

IPv6 addresses are huge so the device has 4 TCAMs used for narrowing
them down to a smaller key before the main match / action engine.

Add the tables in which we'll keep the IP addresses used by
ethtool n-tuple rules. Add the code for programming them
into the device, and code for allocating and freeing entries.

A bit of copy / paste here as we need to support IPv4 and
IPv6 in the same tables, and there is four of them.
But it makes the code easier to match up with the device.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20250206235334.1425329-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Daniel Zahka [Thu, 6 Feb 2025 23:53:30 +0000 (15:53 -0800)]

eth: fbnic: support an additional RSS context

Add support for an extra RSS context. The device has a primary
and a secondary context.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250206235334.1425329-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 23:53:29 +0000 (15:53 -0800)]

selftests: net-drv: test adding flow rule to invalid RSS context

Check that adding Rx flow steering rules pointing to an RSS
context which does not exist is prevented.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250206235334.1425329-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 23:53:28 +0000 (15:53 -0800)]

net: ethtool: prevent flow steering to RSS contexts which don't exist

Since commit 42dc431f5d0e ("ethtool: rss: prevent rss ctx deletion
when in use") we prevent removal of RSS contexts pointed to by
existing flow rules. Core should also prevent creation of rules
which point to RSS context which don't exist in the first place.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250206235334.1425329-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

David S. Miller [Mon, 10 Feb 2025 15:04:18 +0000 (15:04 +0000)]

Merge branch 'netconsole-cpu-population'

Breno Leitao says:

====================
netconsole: Add support for CPU population

The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.

As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.

Example of a message got from produciton hosts:

------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)

This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:

* Improved demultiplexing of parallel log output: When multiple CPUs are
  sending messages concurrently, the added CPU number in the userdata
  makes it easier to differentiate and attribute the messages to their
  originating CPUs.

* Better visibility into message sources: The CPU number information
  gives users and developers more insight into which specific CPU a
  particular log message came from, which can be valuable for debugging
  and analysis.

The changes in this series are as follows Patches::

Patch "consolidate send buffers into netconsole_target struct"
=================================================

Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().

Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao

Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.

Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)

Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata

Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space

Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature

Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata

Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature

I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.

PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:

https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org/

---
Changes in v5:
- Fixed a kernel doc syntax syntax (Simon)
- Link to v4: https://lore.kernel.org/r/20250204-netcon_cpu-v4-0-9480266ef556@debian.org

Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org

Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
  the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
  a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org

Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
  easier to understand, and the code easier to read.
  * Basically extradata encompasses userdata and the new sysdata.
    Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
  checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
====================

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:59 +0000 (03:05 -0800)]

netconsole: docs: Add documentation for CPU number auto-population

Update the netconsole documentation to explain the new feature that
allows automatic population of the CPU number.

The key changes include introducing a new section titled "CPU number
auto population in userdata", explaining how to enable the CPU number
auto-population feature by writing to the "populate_cpu_nr" file in the
netconsole configfs hierarchy.

This documentation update ensures users are aware of the new CPU number
auto-population functionality and how to leverage it for better
demultiplexing and visibility of parallel netconsole output.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:58 +0000 (03:05 -0800)]

netconsole: selftest: test for sysdata CPU

Add a new selftest to verify that the netconsole module correctly
handles CPU runtime data in sysdata. The test validates three scenarios:

1. Basic CPU sysdata functionality - verifies that cpu=X is appended to
messages
2. CPU sysdata with userdata - ensures CPU data works alongside userdata
3. Disabled CPU sysdata - confirms no CPU data is included when disabled

The test uses taskset to control which CPU sends messages and verifies
the reported CPU matches the one used. This helps ensure that netconsole
accurately tracks and reports the originating CPU of messages.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:57 +0000 (03:05 -0800)]

netconsole: add support for sysdata and CPU population

Add infrastructure to automatically append kernel-generated data (sysdata)
to netconsole messages. As the first use case, implement CPU number
population, which adds the CPU that sent the message.

This change introduces three distinct data types:
- extradata: The complete set of appended data (sysdata + userdata)
- userdata: User-provided key-value pairs from userspace
- sysdata: Kernel-populated data (e.g. cpu=XX)

The implementation adds a new configfs attribute 'cpu_nr' to control CPU
number population per target. When enabled, each message is tagged with
its originating CPU. The sysdata is dynamically updated at message time
and appended after any existing userdata.

The CPU number is formatted as "cpu=XX" and is added to the extradata
buffer, respecting the existing size limits.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:56 +0000 (03:05 -0800)]

netconsole: Include sysdata in extradata entry count

Modify count_extradata_entries() to include sysdata fields when
calculating the total number of extradata entries. This change ensures
that the sysdata feature, specifically the CPU number field, is
correctly counted against the MAX_EXTRADATA_ITEMS limit.

The modification adds a simple check for the CPU_NR flag in the
sysdata_fields, incrementing the entry count accordingly.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:55 +0000 (03:05 -0800)]

netconsole: Introduce configfs helpers for sysdata features

This patch introduces a bitfield to store sysdata features in the
netconsole_target struct. It also adds configfs helpers to enable
or disable the CPU_NR feature, which populates the CPU number in
sysdata.

The patch provides the necessary infrastructure to set or unset the
CPU_NR feature, but does not modify the message itself.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:54 +0000 (03:05 -0800)]

netconsole: Helper to count number of used entries

Add a helper function nr_extradata_entries() to count the number of used
extradata entries in a netconsole target. This refactors the duplicate
code for counting entries into a single function, which will be reused
by upcoming CPU sysdata changes.

The helper uses list_count_nodes() to count the number of children in
the userdata group configfs hierarchy.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:53 +0000 (03:05 -0800)]

netconsole: Rename userdata to extradata

Rename "userdata" to "extradata" since this structure will hold both
user and system data in future patches. Keep "userdata" term only for
data that comes from userspace (configfs), while "extradata" encompasses
both userdata and future kerneldata.

These are the rules of the design

1. extradata_complete will hold userdata and sysdata (coming)
2. sysdata will come after userdata_length
3. extradata_complete[userdata_length] string will be replaced at every
   message
5. userdata is replaced when configfs changes (update_userdata())
6. sysdata is replaced at every message

Example:
  extradata_complete = "userkey=uservalue cpu=42"
  userdata_length = 17
  sysdata_length = 7 (space (" ") is part of sysdata)

Since sysdata is still not available, you will see the following in the
send functions:

extradata_len = nt->userdata_length;

The upcoming patches will, which will add support for sysdata, will
change it to:

extradata_len = nt->userdata_length + sysdata_len;

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 6 Feb 2025 11:05:52 +0000 (03:05 -0800)]

netconsole: consolidate send buffers into netconsole_target struct

Move the static buffers from send_msg_no_fragmentation() and
send_msg_fragmented() into the netconsole_target structure. This
simplifies the code by:
- Eliminating redundant static buffers
- Centralizing buffer management in the target structure
- Reducing memory usage by 1KB (one buffer instead of two)

The buffer in netconsole_target is protected by target_list_lock,
maintaining the same synchronization semantics as the original code.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Sat, 8 Feb 2025 01:21:05 +0000 (17:21 -0800)]

Merge branch 'net-improve-core-queue-api-handling-while-device-is-down'

Jakub Kicinski says:

====================
net: improve core queue API handling while device is down

The core netdev_rx_queue_restart() doesn't currently take into account
that the device may be down. The current and proposed queue API
implementations deal with this by rejecting queue API calls while
the device is down. We can do better, in theory we can still allow
devmem binding when the device is down - we shouldn't stop and start
the queues just try to allocate the memory. The reason we allocate
the memory is that memory provider binding checks if any compatible
page pool has been created (page_pool_check_memory_provider()).

Alternatively we could reject installing MP while the device is down
but the MP assignment survives ifdown (so presumably MP doesn't cease
to exist while down), and in general we allow configuration while down.

Previously I thought we need this as a fix, but gve rejects page pool
calls while down, and so did Saeed in the patches he posted. So this
series just makes the core act more sensibly but practically should
be a noop for now.

v1: https://lore.kernel.org/20250205190131.564456-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250206225638.1387810-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 22:56:38 +0000 (14:56 -0800)]

netdevsim: allow normal queue reset while down

Resetting queues while the device is down should be legal.
Allow it, test it. Ideally we'd test this with a real device
supporting devmem but I don't have access to such devices.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 22:56:37 +0000 (14:56 -0800)]

net: page_pool: avoid false positive warning if NAPI was never added

We expect NAPI to be in disabled state when page pool is torn down.
But it is also legal if the NAPI is completely uninitialized.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 22:56:36 +0000 (14:56 -0800)]

net: devmem: don't call queue stop / start when the interface is down

We seem to be missing a netif_running() check from the devmem
installation path. Starting a queue on a stopped device makes
no sense. We still want to be able to allocate the memory, just
to test that the device is indeed setting up the page pools
in a memory provider compatible way.

This is not a bug fix, because existing drivers check if
the interface is down as part of the ops. But new drivers
shouldn't have to do this, as long as they can correctly
alloc/free while down.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 6 Feb 2025 22:56:35 +0000 (14:56 -0800)]

net: refactor netdev_rx_queue_restart() to use local qops

Shorten the lines by storing dev->queue_mgmt_ops in a temp variable.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Thu, 6 Feb 2025 22:06:07 +0000 (23:06 +0100)]

net: gianfar: simplify init_phy()

Use phy_set_max_speed() to simplify init_phy().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/b863dcf7-31e8-45a1-a284-7075da958ff0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Sat, 8 Feb 2025 00:12:06 +0000 (16:12 -0800)]

Merge branch 'add-usb-support-for-telit-cinterion-fn990b'

Fabio Porcedda says:

====================
Add usb support for Telit Cinterion FN990B

Add usb support for Telit Cinterion FN990B.
Also fix Telit Cinterion FN990A name.

Connection with ModemManager was tested also AT ports.
====================

Link: https://patch.msgid.link/20250205171649.618162-1-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fabio Porcedda [Wed, 5 Feb 2025 17:16:49 +0000 (18:16 +0100)]

net: usb: cdc_mbim: fix Telit Cinterion FN990A name

The correct name for FN990 is FN990A so use it in order to avoid
confusion with FN990B.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-6-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fabio Porcedda [Wed, 5 Feb 2025 17:16:48 +0000 (18:16 +0100)]

net: usb: qmi_wwan: fix Telit Cinterion FN990A name

The correct name for FN990 is FN990A so use it in order to avoid
confusion with FN990B.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-5-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fabio Porcedda [Wed, 5 Feb 2025 17:16:46 +0000 (18:16 +0100)]

net: usb: qmi_wwan: add Telit Cinterion FN990B composition

Add the following Telit Cinterion FN990B composition:

0x10d0: rmnet + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
        tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 17 Spd=480  MxCh= 0
D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10d0 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN990
S:  SerialNumber=43b38f19
C:  #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250205171649.618162-3-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Geert Uytterhoeven [Wed, 5 Feb 2025 16:12:09 +0000 (17:12 +0100)]

net: renesas: rswitch: Convert to for_each_available_child_of_node()

Simplify rswitch_get_port_node() by using the
for_each_available_child_of_node() helper instead of manually ignoring
unavailable child nodes, and leaking a reference.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/54f544d573a64b96e01fd00d3481b10806f4d110.1738771798.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 7 Feb 2025 19:56:12 +0000 (11:56 -0800)]

Merge branch 'net-stmmac-yet-more-eee-updates'

Russell King says:

====================
net: stmmac: yet more EEE updates

Continuing on with the STMMAC EEE cleanups from last cycle, this series
further cleans up the EEE code, and fixes a problem with the existing
implementation - disabling EEE doesn't immediately disable LPI
signalling until the next packet is transmitted. It likely also fixes
a potential race condition when trying to disable LPI vs the software
timer.
====================

Link: https://patch.msgid.link/Z6NqGnM2yL7Ayo-T@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:51 +0000 (13:40 +0000)]

net: stmmac: remove old EEE methods

As we no longer call the set_eee_mode(), reset_eee_mode() and
set_eee_lpi_entry_timer() methods, remove these and their glue in
common.h

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffe7-003ZIm-Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:46 +0000 (13:40 +0000)]

net: stmmac: use stmmac_set_lpi_mode()

Use the new stmmac_set_lpi_mode() API to configure the parameters of
the desired LPI mode rather than the older methods.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffe2-003ZIg-Mx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:41 +0000 (13:40 +0000)]

net: stmmac: dwmac4: clear LPI_CTRL_STATUS_LPITCSE too

Ensure that LPI_CTRL_STATUS_LPITCSE is also appropriately cleared when
disabling LPI or enabling LPI without TX clock gating.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdx-003ZIZ-JQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:36 +0000 (13:40 +0000)]

net: stmmac: add new MAC method set_lpi_mode()

Add a new method to control LPI mode configuration. This is architected
to have three configuration states: LPI disabled, LPI forced (active),
or LPI under hardware timer control. This reflects the three modes
which the main body of the driver wishes to deal with.

We pass in whether transmit clock gating should be used, and the
hardware timer value in microseconds to be set when using hardware
timer control.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffds-003ZIT-E8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:31 +0000 (13:40 +0000)]

net: stmmac: use common LPI_CTRL_STATUS bit definitions

The bit definitions for the LPI control/status register are
identical across all MAC versions, with the exception that some
bits may not be implemented. Provide definitions for bits in this
register in common.h, convert to use them, and remove the core-
specific definitions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdn-003ZIN-9p@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:26 +0000 (13:40 +0000)]

net: stmmac: remove unnecessary LPI disable when enabling LPI

Remove the unnecessary LPI disable when enabling LPI - as noted in
previous commits, there will never be two consecutive calls to
stmmac_mac_enable_tx_lpi() without an intervening
stmmac_mac_disable_tx_lpi.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdi-003ZIH-5h@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:21 +0000 (13:40 +0000)]

net: stmmac: clear priv->tx_path_in_lpi_mode when disabling LPI

As other code paths do, clear priv->tx_path_in_lpi_mode when disabling
LPI. This is done after the software timer has been deleted and
hardware LPI has been disabled.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdd-003ZIB-22@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 5 Feb 2025 13:40:15 +0000 (13:40 +0000)]

net: stmmac: remove unnecessary priv->eee_enabled tests

Phylink will not call the mac_disable_tx_lpi() and mac_enable_tx_lpi()
methods randomly - the first method to be called will be the enable
method, and then after, the disable method will be called once between
subsequent enable calls. Thus there is a guaranteed ordering.

Therefore, we know the previous state of priv->eee_enabled, and can
remove it from both methods.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tffdX-003ZI5-UV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom