www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:59 +0000 (19:11 +0200)]

netlink: specs: Add route flow label attribute

Add the new flow label attribute to the spec. Example:

# ip link add name dummy1 up type dummy
# ip -6 route add default table 254 dev dummy1
# ip -6 route add default table 10 dev dummy1
# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--do newrule \
--json '{"family": 10, "priority": 1, "flowlabel": 10, "flowlabel-mask": 255, "action": 1, "table": 10}'
None
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_route.yaml \
--do getroute \
--json '{"rtm-family": 10, "rta-flowlabel": 1}' --output-json \
| jq '.["rta-table"]'
254
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_route.yaml \
--do getroute \
--json '{"rtm-family": 10, "rta-flowlabel": 10}' --output-json \
| jq '.["rta-table"]'
10

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:58 +0000 (19:11 +0200)]

ipv6: Add flow label to route get requests

The default IPv6 multipath hash policy takes the flow label into account
when calculating a multipath hash and previous patches added a flow
label selector to IPv6 FIB rules.

Allow user space to specify a flow label in route get requests by adding
a new netlink attribute and using its value to populate the "flowlabel"
field in the IPv6 flow info structure prior to a route lookup.

Deny the attribute in RTM_{NEW,DEL}ROUTE requests by checking for it in
rtm_to_fib6_config() and returning an error if present.

A subsequent patch will use this capability to test the new flow label
selector in IPv6 FIB rules.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:57 +0000 (19:11 +0200)]

netlink: specs: Add FIB rule flow label attributes

Add the new flow label attributes to the spec. Example:

# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--do newrule \
--json '{"family": 10, "flowlabel": 1, "flowlabel-mask": 1, "action": 1, "table": 1}'
None
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--dump getrule --json '{"family": 10}' --output-json \
| jq '.[] | select(.flowlabel == "0x1")'
{
   "table": 1,
   "suppress-prefixlen": "0xffffffff",
   "protocol": 0,
   "priority": 32765,
   "flowlabel": "0x1",
   "flowlabel-mask": "0x1",
   "family": 10,
   "dst-len": 0,
   "src-len": 0,
   "tos": 0,
   "action": "to-tbl",
   "flags": 0
}

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:56 +0000 (19:11 +0200)]

net: fib_rules: Enable flow label selector usage

Now that both IPv4 and IPv6 correctly handle the new flow label
attributes, enable user space to configure FIB rules that make use of
the flow label by changing the policy to stop rejecting them and
accepting 32 bit values in big-endian byte order.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:55 +0000 (19:11 +0200)]

ipv6: fib_rules: Add flow label support

Implement support for the new flow label selector which allows IPv6 FIB
rules to match on the flow label with a mask. Ensure that both flow
label attributes are specified (or none) and that the mask is valid.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:54 +0000 (19:11 +0200)]

ipv4: fib_rules: Reject flow label attributes

IPv4 FIB rules cannot match on flow label so reject requests that try to
add such rules. Do that in the IPv4 configure callback as the netlink
policy resides in the core and used by both IPv4 and IPv6.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 17:11:53 +0000 (19:11 +0200)]

net: fib_rules: Add flow label selector attributes

Add new FIB rule attributes which will allow user space to match on the
IPv6 flow label with a mask. Temporarily set the type of the attributes
to 'NLA_REJECT' while support is being added in the IPv6 code.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Thu, 19 Dec 2024 03:32:07 +0000 (19:32 -0800)]

Merge branch 'mdio-support-updates'

Nikita Yushchenko says:

====================
rswitch: mdio support updates

This series cleans up rswitch mdio support, and adds C22 operations.
====================

Link: https://patch.msgid.link/20241216071957.2587354-1-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nikita Yushchenko [Mon, 16 Dec 2024 07:19:57 +0000 (12:19 +0500)]

net: renesas: rswitch: add mdio C22 support

The generic MPSM operation added by the previous patch can be used both
for C45 and C22.

Add handlers for C22 operations.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241216071957.2587354-6-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nikita Yushchenko [Mon, 16 Dec 2024 07:19:56 +0000 (12:19 +0500)]

net: renesas: rswitch: use generic MPSM operation for mdio C45

Introduce rswitch_etha_mpsm_op() that accepts values for MPSM register
fields and executes the transaction.

This avoids some code duptication, and can be used both for C45 and C22.

Convert C45 read and write operations to use that.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241216071957.2587354-5-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nikita Yushchenko [Mon, 16 Dec 2024 07:19:55 +0000 (12:19 +0500)]

net: renesas: rswitch: align mdio C45 operations with datasheet

Per rswitch datasheet, software can know that mdio operation completed
either by polling MPSM.PSME bit, or via interrupt.

Instead, the driver currently polls for interrupt status bit. Although
this still provides correct result, it requires additional register
operations to clean the interrupt status bits, and generally looks wrong.

Fix it to poll MPSM.PSME bit, as the datasheet suggests.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241216071957.2587354-4-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nikita Yushchenko [Mon, 16 Dec 2024 07:19:54 +0000 (12:19 +0500)]

net: renesas: rswitch: use FIELD_PREP for remaining MPIC register fields

Commit fb9e6039c325 ("net: renesas: rswitch: fix initial MPIC register
setting") converted setting some MPIC fields to FIELD_PREP.

To keep common style, do the same with mii bus related fields of the
same register.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241216071957.2587354-3-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nikita Yushchenko [Mon, 16 Dec 2024 07:19:53 +0000 (12:19 +0500)]

net: renesas: rswitch: do not write to MPSM register at init time

MPSM register is used to execute mdio bus transactions.
There is no need to initialize it early.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241216071957.2587354-2-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 19 Dec 2024 03:17:08 +0000 (19:17 -0800)]

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: add support for devlink health events

Przemek Kitszel says:

Reports for two kinds of events are implemented, Malicious Driver
Detection (MDD) and Tx hang.

Patches 1, 2, 3: core improvements (checkpatch.pl, devlink extension)
Patch 4: rename current ice devlink/ files
Patches 5, 6, 7: ice devlink health infra + reporters

Mateusz did good job caring for this series, and hardening the code.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add MDD logging via devlink health
  ice: add Tx hang devlink health reporter
  ice: rename devlink_port.[ch] to port.[ch]
  devlink: add devlink_fmsg_dump_skb() function
  devlink: add devlink_fmsg_put() macro
  checkpatch: don't complain on _Generic() use
====================

Link: https://patch.msgid.link/20241217210835.3702003-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Tue, 17 Dec 2024 13:51:21 +0000 (13:51 +0000)]

ptr_ring: do not block hard interrupts in ptr_ring_resize_multiple()

Jakub added a lockdep_assert_no_hardirq() check in __page_pool_put_page()
to increase test coverage.

syzbot found a splat caused by hard irq blocking in
ptr_ring_resize_multiple() [1]

As current users of ptr_ring_resize_multiple() do not require
hard irqs being masked, replace it to only block BH.

Rename helpers to better reflect they are safe against BH only.

- ptr_ring_resize_multiple() to ptr_ring_resize_multiple_bh()
- skb_array_resize_multiple() to skb_array_resize_multiple_bh()

[1]

WARNING: CPU: 1 PID: 9150 at net/core/page_pool.c:709 __page_pool_put_page net/core/page_pool.c:709 [inline]
WARNING: CPU: 1 PID: 9150 at net/core/page_pool.c:709 page_pool_put_unrefed_netmem+0x157/0xa40 net/core/page_pool.c:780
Modules linked in:
CPU: 1 UID: 0 PID: 9150 Comm: syz.1.1052 Not tainted 6.11.0-rc3-syzkaller-00202-gf8669d7b5f5d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:__page_pool_put_page net/core/page_pool.c:709 [inline]
RIP: 0010:page_pool_put_unrefed_netmem+0x157/0xa40 net/core/page_pool.c:780
Code: 74 0e e8 7c aa fb f7 eb 43 e8 75 aa fb f7 eb 3c 65 8b 1d 38 a8 6a 76 31 ff 89 de e8 a3 ae fb f7 85 db 74 0b e8 5a aa fb f7 90 <0f> 0b 90 eb 1d 65 8b 1d 15 a8 6a 76 31 ff 89 de e8 84 ae fb f7 85
RSP: 0018:ffffc9000bda6b58 EFLAGS: 00010083
RAX: ffffffff8997e523 RBX: 0000000000000000 RCX: 0000000000040000
RDX: ffffc9000fbd0000 RSI: 0000000000001842 RDI: 0000000000001843
RBP: 0000000000000000 R08: ffffffff8997df2c R09: 1ffffd40003a000d
R10: dffffc0000000000 R11: fffff940003a000e R12: ffffea0001d00040
R13: ffff88802e8a4000 R14: dffffc0000000000 R15: 00000000ffffffff
FS: 00007fb7aaf716c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa15a0d4b72 CR3: 00000000561b0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tun_ptr_free drivers/net/tun.c:617 [inline]
__ptr_ring_swap_queue include/linux/ptr_ring.h:571 [inline]
ptr_ring_resize_multiple_noprof include/linux/ptr_ring.h:643 [inline]
tun_queue_resize drivers/net/tun.c:3694 [inline]
tun_device_event+0xaaf/0x1080 drivers/net/tun.c:3714
notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
call_netdevice_notifiers net/core/dev.c:2046 [inline]
dev_change_tx_queue_len+0x158/0x2a0 net/core/dev.c:9024
do_setlink+0xff6/0x41f0 net/core/rtnetlink.c:2923
rtnl_setlink+0x40d/0x5a0 net/core/rtnetlink.c:3201
rtnetlink_rcv_msg+0x73f/0xcf0 net/core/rtnetlink.c:6647
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550

Fixes: ff4e538c8c3e ("page_pool: add a lockdep check for recycling in hardirq")
Reported-by: syzbot+f56a5c5eac2b28439810@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/671e10df.050a0220.2b8c0f.01cf.GAE@google.com/T/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241217135121.326370-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

shunlizhou [Mon, 16 Dec 2024 13:54:46 +0000 (13:54 +0000)]

docs: net: bonding: fix typos

The bonding documentation had several "insure" which is not
properly used in the context. Suggest to change to "ensure"
to improve readability.

Signed-off-by: shunlizhou <shunlizhou@aliyun.com>
Link: https://patch.msgid.link/20241216135447.57681-1-shunlizhou@aliyun.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Yafang Shao [Tue, 10 Dec 2024 02:27:06 +0000 (10:27 +0800)]

net/mlx5e: Report rx_discards_phy via rx_dropped

We noticed a high number of rx_discards_phy events on certain servers while
running `ethtool -S`. However, this critical counter is not currently
included in the standard /proc/net/dev statistics file, making it difficult
to monitor effectively—especially given the diversity of vendors across a
large fleet of servers.

Let's report it via the standard rx_dropped metric.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241210022706.6665-1-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 18 Dec 2024 18:01:33 +0000 (10:01 -0800)]

Merge branch 'selftests-net-packetdrill-import-multiple-tests'

Soham Chakradeo says:

====================
selftests/net: packetdrill: import multiple tests

Import tests for the following features (folder names in brackets):
ECN (ecn) : RFC 3168
Close (close) : RFC 9293
TCP_INFO (tcp_info) : RFC 9293
Fast recovery (fast_recovery) : RFC 5681
Timestamping (timestamping) : RFC 1323
Nagle (nagle) : RFC 896
Selective Acknowledgments (sack) : RFC 2018
Recent Timestamp (ts_recent) : RFC 1323
Send file (sendfile)
Syscall bad arg (syscall_bad_arg)
Validate (validate)
Blocking (blocking)
Splice (splice)
End of record (eor)
Limited transmit (limited_transmit)

Procedure to import and test the packetdrill tests into upstream linux
is explained in the first patch of this series

These tests have many authors. We only import them here from
github.com/google/packetdrill. Thanks to the following authors fo their
contributions over the years to these tests: Neal Cardwell, Shuo Chen,
Yuchung Cheng, Jerry Chu, Eric Dumazet, Luke Hsiao, Priyaranjan Jha,
Chonggang Li, Tanner Love, John Sperbeck, Wei Wang and Maciej
Żenczykowski. For more info see the original github commits, such as
https://github.com/google/packetdrill/commit/8229c94928ac.
====================

Link: https://patch.msgid.link/20241217185203.297935-1-sohamch.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Soham Chakradeo [Tue, 17 Dec 2024 18:52:01 +0000 (18:52 +0000)]

selftests/net: packetdrill: import tcp/user_timeout, tcp/validate, tcp/sendfile, tcp/limited-transmit, tcp/syscall_bad_arg

Use the standard import and testing method, as described in the
import of tcp/ecn and tcp/close , tcp/sack , tcp/tcp_info.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Soham Chakradeo <sohamch@google.com>
Link: https://patch.msgid.link/20241217185203.297935-5-sohamch.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Soham Chakradeo [Tue, 17 Dec 2024 18:52:00 +0000 (18:52 +0000)]

selftests/net: packetdrill: import tcp/eor, tcp/splice, tcp/ts_recent, tcp/blocking

Use the standard import and testing method, as described in the
import of tcp/ecn and tcp/close , tcp/sack , tcp/tcp_info.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Soham Chakradeo <sohamch@google.com>
Link: https://patch.msgid.link/20241217185203.297935-4-sohamch.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Soham Chakradeo [Tue, 17 Dec 2024 18:51:59 +0000 (18:51 +0000)]

selftests/net: packetdrill: import tcp/fast_recovery, tcp/nagle, tcp/timestamping

Use the standard import and testing method, as described in the
import of tcp/ecn , tcp/close , tcp/sack , tcp/tcp_info.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Soham Chakradeo <sohamch@google.com>
Link: https://patch.msgid.link/20241217185203.297935-3-sohamch.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Soham Chakradeo [Tue, 17 Dec 2024 18:51:58 +0000 (18:51 +0000)]

selftests/net: packetdrill: import tcp/ecn, tcp/close, tcp/sack, tcp/tcp_info

Same as initial tests, import verbatim from
github.com/google/packetdrill, aside from:

- update `source ./defaults.sh` path to adjust for flat dir
- add SPDX headers
- remove author statements if any
- drop blank lines at EOF

Same test process as previous tests. Both with and without debug mode.
Recording the steps once:

make mrproper
vng --build \
--config tools/testing/selftests/net/packetdrill/config \
--config kernel/configs/debug.config
vng -v --run . --user root --cpus 4 -- \
make -C tools/testing/selftests TARGETS=net/packetdrill run_tests

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Soham Chakradeo <sohamch@google.com>
Link: https://patch.msgid.link/20241217185203.297935-2-sohamch.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Mon, 16 Dec 2024 16:56:05 +0000 (16:56 +0000)]

net: Remove bouncing hippi list

linux-hippi is bouncing with:

<linux-hippi@sunsite.dk>:
Sorry, no mailbox here by that name. (#5.1.1)

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Sun, 15 Dec 2024 17:43:55 +0000 (17:43 +0000)]

net: dsa: qca8k: Fix inconsistent use of jiffies vs milliseconds

wait_for_complete_timeout() expects a timeout in jiffies. With the
driver, some call sites converted QCA8K_ETHERNET_TIMEOUT to jiffies,
others did not. Make the code consistent by changes the #define to
include a call to msecs_to_jiffies, and remove all other calls to
msecs_to_jiffies.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: from Christian would be very welcome.
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Wed, 18 Dec 2024 04:01:41 +0000 (20:01 -0800)]

Merge branch 'support-some-features-for-the-hibmcge-driver'

Jijie Shao says:

====================
Support some features for the HIBMCGE driver

In this patch series, The HIBMCGE driver implements some functions
such as dump register, unicast MAC address filtering, debugfs and reset.
====================

Link: https://patch.msgid.link/20241216040532.1566229-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:32 +0000 (12:05 +0800)]

net: hibmcge: Add nway_reset supported in this module

Add nway_reset supported in this module

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216040532.1566229-8-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:31 +0000 (12:05 +0800)]

net: hibmcge: Add reset supported in this module

Sometimes, if the port doesn't work, we can try to fix it by resetting it.

This patch supports reset triggered by ethtool or FLR of PCIe, For example:
ethtool --reset eth0 dedicated
echo 1 > /sys/bus/pci/devices/0000\:83\:00.1/reset

We hope that the reset can be performed only when the port is down,
and the port cannot be up during the reset.
Therefore, the entire reset process is protected by the rtnl lock.

After the reset is complete, the hardware registers are restored
to their default values. Therefore, some rebuild operations are
required to rewrite the user configuration to the registers.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216040532.1566229-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:30 +0000 (12:05 +0800)]

net: hibmcge: Add pauseparam supported in this module

The MAC can automatically send or respond to pause frames.
This patch supports the function of enabling pause frames
by using ethtool.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216040532.1566229-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:29 +0000 (12:05 +0800)]

net: hibmcge: Add register dump supported in this module

The dump register is an effective way to analyze problems.

To ensure code flexibility, each register contains the type,
offset, and value information. The ethtool does the pretty print
based on these information.

The driver can dynamically add or delete registers that need to be dumped
in the future because information such as type and offset is contained.
ethtool always can do pretty print.

With the ethtool of a specific version,
the following effects are achieved:
[root@localhost sjj]# ./ethtool -d enp131s0f1
[SPEC] VALID                    [0x0000]: 0x00000001
[SPEC] EVENT_REQ                [0x0004]: 0x00000000
[SPEC] MAC_ID                   [0x0008]: 0x00000002
[SPEC] PHY_ADDR                 [0x000c]: 0x00000002
[SPEC] MAC_ADDR_L               [0x0010]: 0x00000808
[SPEC] MAC_ADDR_H               [0x0014]: 0x08080802
[SPEC] UC_MAX_NUM               [0x0018]: 0x00000004
[SPEC] MAX_MTU                  [0x0028]: 0x00000fc2
[SPEC] MIN_MTU                  [0x002c]: 0x00000100
[SPEC] TX_FIFO_NUM              [0x0030]: 0x00000040
[SPEC] RX_FIFO_NUM              [0x0034]: 0x0000007f
[SPEC] VLAN_LAYERS              [0x0038]: 0x00000002
[MDIO] COMMAND_REG              [0x0000]: 0x0000185f
[MDIO] ADDR_REG                 [0x0004]: 0x00000000
[MDIO] WDATA_REG                [0x0008]: 0x0000a000
[MDIO] RDATA_REG                [0x000c]: 0x00000000
[MDIO] STA_REG                  [0x0010]: 0x00000000
[GMAC] DUPLEX_TYPE              [0x0008]: 0x00000001
[GMAC] FD_FC_TYPE               [0x000c]: 0x00008808
[GMAC] FC_TX_TIMER              [0x001c]: 0x000000ff
[GMAC] FD_FC_ADDR_LOW           [0x0020]: 0xc2000001
[GMAC] FD_FC_ADDR_HIGH          [0x0024]: 0x00000180
[GMAC] MAX_FRM_SIZE             [0x003c]: 0x000005f6
[GMAC] PORT_MODE                [0x0040]: 0x00000002
[GMAC] PORT_EN                  [0x0044]: 0x00000006
...

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216040532.1566229-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:28 +0000 (12:05 +0800)]

net: hibmcge: Add unicast frame filter supported in this module

MAC supports filtering unmatched unicast packets according to
the MAC address table. This patch adds the support for
unicast frame filtering.

To support automatic restoration of MAC entries
after reset, the driver saves a copy of MAC entries in the driver.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20241216040532.1566229-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:27 +0000 (12:05 +0800)]

net: hibmcge: Add irq_info file to debugfs

the driver requested three interrupts: "tx", "rx", "err".
The err interrupt is a summary interrupt. We distinguish
different errors based on the status register and mask.

With "cat /proc/interrupts | grep hibmcge",
we can't distinguish the detailed cause of the error,
so we added this file to debugfs.

the following effects are achieved:
[root@localhost sjj]# cat /sys/kernel/debug/hibmcge/0000\:83\:00.1/irq_info
RX                  : enabled: true , logged: false, count: 0
TX                  : enabled: true , logged: false, count: 0
MAC_MII_FIFO_ERR    : enabled: false, logged: true , count: 0
MAC_PCS_RX_FIFO_ERR : enabled: false, logged: true , count: 0
MAC_PCS_TX_FIFO_ERR : enabled: false, logged: true , count: 0
MAC_APP_RX_FIFO_ERR : enabled: false, logged: true , count: 0
MAC_APP_TX_FIFO_ERR : enabled: false, logged: true , count: 0
SRAM_PARITY_ERR     : enabled: true , logged: true , count: 0
TX_AHB_ERR          : enabled: true , logged: true , count: 0
RX_BUF_AVL          : enabled: true , logged: false, count: 0
REL_BUF_ERR         : enabled: true , logged: true , count: 0
TXCFG_AVL           : enabled: true , logged: false, count: 0
TX_DROP             : enabled: true , logged: false, count: 0
RX_DROP             : enabled: true , logged: false, count: 0
RX_AHB_ERR          : enabled: true , logged: true , count: 0
MAC_FIFO_ERR        : enabled: true , logged: false, count: 0
RBREQ_ERR           : enabled: true , logged: false, count: 0
WE_ERR              : enabled: true , logged: false, count: 0

The irq framework of hibmcge driver also includes tx/rx interrupts.
Therefore, TX and RX are not moved separately form this file.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216040532.1566229-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jijie Shao [Mon, 16 Dec 2024 04:05:26 +0000 (12:05 +0800)]

net: hibmcge: Add debugfs supported in this module

This patch initializes debugfs and creates root directory
for each device. The tx_ring and rx_ring debugfs files
are implemented together.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216040532.1566229-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 18 Dec 2024 03:51:58 +0000 (19:51 -0800)]

Merge branch 'lan78xx-preparations-for-phylink'

Oleksij Rempel says:

====================
lan78xx: Preparations for PHYlink

This patch set is a third part of the preparatory work for migrating
the lan78xx USB Ethernet driver to the PHYlink framework. During
extensive testing, I observed that resetting the USB adapter can lead to
various read/write errors. While the errors themselves are acceptable,
they generate excessive log messages, resulting in significant log spam.
This set improves error handling to reduce logging noise by addressing
errors directly and returning early when necessary.
====================

Link: https://patch.msgid.link/20241216120941.1690908-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:41 +0000 (13:09 +0100)]

net: usb: lan78xx: Improve error handling in WoL operations

Enhance error handling in Wake-on-LAN (WoL) operations:
- Log a warning in `lan78xx_get_wol` if `lan78xx_read_reg` fails.
- Check and handle errors from `device_set_wakeup_enable` and
`phy_ethtool_set_wol` in `lan78xx_set_wol`.
- Ensure proper cleanup with a unified error handling path.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216120941.1690908-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:40 +0000 (13:09 +0100)]

net: usb: lan78xx: remove PHY register access from ethtool get_regs

Remove PHY register handling from `lan78xx_get_regs` and
`lan78xx_get_regs_len`. Since the controller can have different PHYs
attached, the first 32 registers are not universally relevant or the
most interesting. Simplify the implementation to focus on MAC and device
registers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:39 +0000 (13:09 +0100)]

net: usb: lan78xx: rename phy_mutex to mdiobus_mutex

Rename `phy_mutex` to `mdiobus_mutex` for clarity, as the mutex protects
MDIO bus access rather than PHY-specific operations. Update all
references to ensure consistency.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:38 +0000 (13:09 +0100)]

net: usb: lan78xx: Use action-specific label in lan78xx_mac_reset

Rename the generic `done` label to the action-specific `exit_unlock`
label in `lan78xx_mac_reset`. This improves clarity by indicating the
specific cleanup action (mutex unlock) and aligns with best practices
for error handling and cleanup labels.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20241216120941.1690908-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:37 +0000 (13:09 +0100)]

net: usb: lan78xx: Use ETIMEDOUT instead of ETIME in lan78xx_stop_hw

Update lan78xx_stop_hw to return -ETIMEDOUT instead of -ETIME when
a timeout occurs. While -ETIME indicates a general timer expiration,
-ETIMEDOUT is more commonly used for signaling operation timeouts and
provides better consistency with standard error handling in the driver.

The -ETIME checks in tx_complete() and rx_complete() are unrelated to
this error handling change. In these functions, the error values are derived
from urb->status, which reflects USB transfer errors. The error value from
lan78xx_stop_hw will be exposed in the following cases:
- usb_driver::suspend
- net_device_ops::ndo_stop (potentially, though currently the return value
is not used).

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20241216120941.1690908-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Mon, 16 Dec 2024 12:09:36 +0000 (13:09 +0100)]

net: usb: lan78xx: Add error handling to lan78xx_get_regs

Update `lan78xx_get_regs` to handle errors during register and PHY
reads. Log warnings for failed reads and exit the function early if an
error occurs. Drop all previously logged registers to signal
inconsistent readings to the user space. This ensures that invalid data
is not returned to users.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Dec 2024 15:51:22 +0000 (15:51 +0000)]

niu: Use page->private instead of page->index

We are close to removing page->index. Use page->private instead, which
is least likely to be removed.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20241216155124.3114-1-willy@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Mon, 16 Dec 2024 13:18:44 +0000 (14:18 +0100)]

mlxsw: Switch to napi_gro_receive()

Benefit from the recent conversion of the driver to NAPI and enable GRO
support through the use of napi_gro_receive(). Pass the NAPI pointer
from the bus driver (mlxsw_pci) to the switch driver (mlxsw_spectrum)
through the skb control block where various packet metadata is already
encoded.

The main motivation is to improve forwarding performance through the use
of GRO fraglist [1]. In my testing, when the forwarding data path is
simple (routing between two ports) there is not much difference in
forwarding performance between GRO disabled and GRO enabled with
fraglist.

The improvement becomes more noticeable as the data path becomes more
complex since it is traversed less times with GRO enabled. For example,
with 10 ingress and 10 egress flower filters with different priorities
on the two ports between which routing is performed, there is an
improvement of about 140% in forwarded bandwidth.

[1] https://lore.kernel.org/netdev/20200125102645.4782-1-steffen.klassert@secunet.com/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/21258fe55f608ccf1ee2783a5a4534220af28903.1734354812.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 18 Dec 2024 03:37:02 +0000 (19:37 -0800)]

Merge branch 'inetpeer-reduce-false-sharing-and-atomic-operations'

Eric Dumazet says:

====================
inetpeer: reduce false sharing and atomic operations

After commit 8c2bd38b95f7 ("icmp: change the order of rate limits"),
there is a risk that a host receiving packets from an unique
source targeting closed ports is using a common inet_peer structure
from many cpus.

All these cpus have to acquire/release a refcount and update
the inet_peer timestamp (p->dtime)

Switch to pure RCU to avoid changing the refcount, and update
p->dtime only once per jiffy.

Tested:
  DUT : 128 cores, 32 hw rx queues.
  receiving 8,400,000 UDP packets per second, targeting closed ports.

Before the series:
- napi poll can not keep up, NIC drops 1,200,000 packets
  per second.
- We use 20 % of cpu cycles

After this series:
- All packets are received (no more hw drops)
- We use 12 % of cpu cycles.

v1: https://lore.kernel.org/20241213130212.1783302-1-edumazet@google.com
====================

Link: https://patch.msgid.link/20241215175629.1248773-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 15 Dec 2024 17:56:29 +0000 (17:56 +0000)]

inetpeer: do not get a refcount in inet_getpeer()

All inet_getpeer() callers except ip4_frag_init() don't need
to acquire a permanent refcount on the inetpeer.

They can switch to full RCU protection.

Move the refcount_inc_not_zero() into ip4_frag_init(),
so that all the other callers no longer have to
perform a pair of expensive atomic operations on
a possibly contended cache line.

inet_putpeer() no longer needs to be exported.

After this patch, my DUT can receive 8,400,000 UDP packets
per second targeting closed ports, using 50% less cpu cycles
than before.

Also change two calls to l3mdev_master_ifindex() by
l3mdev_master_ifindex_rcu() (Ido ideas)

Fixes: 8c2bd38b95f7 ("icmp: change the order of rate limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 15 Dec 2024 17:56:28 +0000 (17:56 +0000)]

inetpeer: update inetpeer timestamp in inet_getpeer()

inet_putpeer() will be removed in the following patch,
because we will no longer use refcounts.

Update inetpeer timestamp (p->dtime) at lookup time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 15 Dec 2024 17:56:27 +0000 (17:56 +0000)]

inetpeer: remove create argument of inet_getpeer()

All callers of inet_getpeer() want to create an inetpeer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 15 Dec 2024 17:56:26 +0000 (17:56 +0000)]

inetpeer: remove create argument of inet_getpeer_v[46]()

All callers of inet_getpeer_v4() and inet_getpeer_v6()
want to create an inetpeer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 18 Dec 2024 03:00:51 +0000 (19:00 -0800)]

Merge branch 'net-constify-struct-bin_attribute'

Thomas Weißschuh says:

====================
net: constify 'struct bin_attribute'

The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
====================

Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-0-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Weißschuh [Mon, 16 Dec 2024 11:30:11 +0000 (12:30 +0100)]

netxen_nic: constify 'struct bin_attribute'

The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-4-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Weißschuh [Mon, 16 Dec 2024 11:30:09 +0000 (12:30 +0100)]

net: phy: ks8995: constify 'struct bin_attribute'

The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-2-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Weißschuh [Mon, 16 Dec 2024 11:30:08 +0000 (12:30 +0100)]

net: bridge: constify 'struct bin_attribute'

The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-1-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Sun, 15 Dec 2024 21:29:38 +0000 (13:29 -0800)]

net: page_pool: rename page_pool_is_last_ref()

page_pool_is_last_ref() releases a reference while the name,
to me at least, suggests it just checks if the refcount is 1.
The semantics of the function are the same as those of
atomic_dec_and_test() and refcount_dec_and_test(), so just
use the _and_test() suffix.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20241215212938.99210-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ben Shelton [Mon, 16 Dec 2024 14:15:35 +0000 (15:15 +0100)]

ice: Add MDD logging via devlink health

Add a devlink health reporter for MDD events. The 'dump' handler will
return the information captured in each call to ice_handle_mdd_event().
A device reset (CORER/PFR) will put the reporter back in healthy state.

Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Przemek Kitszel [Mon, 16 Dec 2024 14:15:34 +0000 (15:15 +0100)]

ice: add Tx hang devlink health reporter

Add Tx hang devlink health reporter, see struct ice_tx_hang_event to see
what exactly is reported. For now dump descriptors with little metadata
and skb diagnostic information.

Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Przemek Kitszel [Mon, 16 Dec 2024 14:15:33 +0000 (15:15 +0100)]

ice: rename devlink_port.[ch] to port.[ch]

Drop "devlink_" prefix from files that sit in devlink/.
I'm going to add more files there, and repeating "devlink" does not feel
good. This is also the scheme used in most other places, most notably the
devlink core files are named like that.

devlink.[ch] stays as is.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Mateusz Polchlopek [Mon, 16 Dec 2024 14:15:32 +0000 (15:15 +0100)]

devlink: add devlink_fmsg_dump_skb() function

Add devlink_fmsg_dump_skb() function that adds some diagnostic
information about skb (like length, pkt type, MAC, etc) to devlink
fmsg mechanism using bunch of devlink_fmsg_put() function calls.

Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Przemek Kitszel [Mon, 16 Dec 2024 14:15:31 +0000 (15:15 +0100)]

devlink: add devlink_fmsg_put() macro

Add devlink_fmsg_put() that dispatches based on the type
of the value to put, example: bool -> devlink_fmsg_bool_pair_put().

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Przemek Kitszel [Mon, 16 Dec 2024 14:15:30 +0000 (15:15 +0100)]

checkpatch: don't complain on _Generic() use

Improve CamelCase recognition logic to avoid reporting on
_Generic() use.

Other C keywords, such as _Bool, are intentionally omitted, as those
should be rather avoided in new source code.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

commit | commitdiff | tree

Rahul Rameshbabu [Sat, 14 Dec 2024 19:43:06 +0000 (19:43 +0000)]

rust: net::phy scope ThisModule usage in the module_phy_driver macro

Similar to the use of $crate::Module, ThisModule should be referred to as
$crate::ThisModule in the macro evaluation. The reason the macro previously
did not cause any errors is because all the users of the macro would use
kernel::prelude::*, bringing ThisModule into scope.

Signed-off-by: Rahul Rameshbabu <sergeantsagara@protonmail.com>
Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20241214194242.19505-1-sergeantsagara@protonmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Toke Høiland-Jørgensen [Sat, 14 Dec 2024 16:50:59 +0000 (17:50 +0100)]

net/sched: Add drop reasons for AQM-based qdiscs

Now that we have generic QDISC_CONGESTED and QDISC_OVERLIMIT drop
reasons, let's have all the qdiscs that contain an AQM apply them
consistently when dropping packets.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241214-fq-codel-drop-reasons-v1-1-2a814e884c37@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 17 Dec 2024 11:08:30 +0000 (12:08 +0100)]

Merge branch 'af_unix-prepare-for-skb-drop-reason'

Kuniyuki Iwashima says:

====================
af_unix: Prepare for skb drop reason.

This is a prep series and cleans up error paths in the following
functions

  * unix_stream_connect()
  * unix_stream_sendmsg()
  * unix_dgram_sendmsg()

to make it easy to add skb drop reason for AF_UNIX, which seems to
have a potential user.

https://lore.kernel.org/netdev/CAAf2ycmZHti95WaBR3s+L5Epm1q7sXmvZ-EqCK=-oZj=45tOwQ@mail.gmail.com/

v1: https://lore.kernel.org/netdev/20241206052607.1197-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241213110850.25453-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:50 +0000 (20:08 +0900)]

af_unix: Remove unix_our_peer().

unix_our_peer() is used only in unix_may_send().

Let's inline it in unix_may_send().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:49 +0000 (20:08 +0900)]

af_unix: Clean up error paths in unix_dgram_sendmsg().

The error path is complicated in unix_dgram_sendmsg() because there
are two timings when other could be non-NULL: when it's fetched from
unix_peer_get() and when it's looked up by unix_find_other().

Let's move unix_peer_get() to the else branch for unix_find_other()
and clean up the error paths.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:48 +0000 (20:08 +0900)]

af_unix: Clean up SOCK_DEAD error paths in unix_dgram_sendmsg().

When other has SOCK_DEAD in unix_dgram_sendmsg(), we hold
unix_state_lock() for the sender socket first.

However, we do not need it for sk->sk_type.

Let's move the lock down a bit.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:47 +0000 (20:08 +0900)]

af_unix: Defer sock_put() to clean up path in unix_dgram_sendmsg().

When other has SOCK_DEAD in unix_dgram_sendmsg(), we call sock_put() for
it first and then set NULL to other before jumping to the error path.

This is to skip sock_put() in the error path.

Let's not set NULL to other and defer the sock_put() to the error path
to clean up the labels later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:46 +0000 (20:08 +0900)]

af_unix: Split restart label in unix_dgram_sendmsg().

There are two paths jumping to the restart label in unix_dgram_sendmsg().

One requires another lookup and sk_filter(), but the other doesn't.

Let's split the label to make each flow more straightforward.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:45 +0000 (20:08 +0900)]

af_unix: Use msg->{msg_name,msg_namelen} in unix_dgram_sendmsg().

In unix_dgram_sendmsg(), we use a local variable sunaddr pointing
NULL or msg->msg_name based on msg->msg_namelen.

Let's remove sunaddr and simplify the usage.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:44 +0000 (20:08 +0900)]

af_unix: Move !sunaddr case in unix_dgram_sendmsg().

When other is NULL in unix_dgram_sendmsg(), we check if sunaddr
is NULL before looking up a receiver socket.

There are three paths going through the check, but it's always
false for 2 out of the 3 paths: the first socket lookup and the
second 'goto restart'.

The condition can be true for the first 'goto restart' only when
SOCK_DEAD is flagged for the socket found with msg->msg_name.

Let's move the check to the single appropriate path.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:43 +0000 (20:08 +0900)]

af_unix: Set error only when needed in unix_dgram_sendmsg().

We will introduce skb drop reason for AF_UNIX, then we need to
set an errno and a drop reason for each path.

Let's set an error only when it's needed in unix_dgram_sendmsg().

Then, we need not (re)set 0 to err.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:42 +0000 (20:08 +0900)]

af_unix: Clean up error paths in unix_stream_sendmsg().

If we move send_sig() to the SEND_SHUTDOWN check before
the while loop, then we can reuse the same kfree_skb()
after the pipe_err_free label.

Let's gather the scattered kfree_skb()s in error paths.

While at it, some style issues are fixed, and the pipe_err_free
label is renamed to out_pipe to match other label names.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:41 +0000 (20:08 +0900)]

af_unix: Set error only when needed in unix_stream_sendmsg().

We will introduce skb drop reason for AF_UNIX, then we need to
set an errno and a drop reason for each path.

Let's set an error only when it's needed in unix_stream_sendmsg().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:40 +0000 (20:08 +0900)]

af_unix: Clean up error paths in unix_stream_connect().

The label order is weird in unix_stream_connect(), and all NULL checks
are unnecessary if reordered.

Let's clean up the error paths to make it easy to set a drop reason
for each path.

While at it, a comment with the old style is updated.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 13 Dec 2024 11:08:39 +0000 (20:08 +0900)]

af_unix: Set error only when needed in unix_stream_connect().

We will introduce skb drop reason for AF_UNIX, then we need to
set an errno and a drop reason for each path.

Let's set an error only when it's needed in unix_stream_connect().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Tue, 17 Dec 2024 02:27:36 +0000 (18:27 -0800)]

Merge branch 'r8169-add-support-for-rtl8125d-rev-b'

Heiner Kallweit says:

====================
r8169: add support for RTL8125D rev.b

Add support for RTL8125D rev.b. Its XID is 0x689. It is basically
based on the one with XID 0x688, but with different firmware file.
To avoid a mess with the version numbering, adjust it first.
====================

Link: https://patch.msgid.link/15c4a9fd-a653-4b09-825d-751964832a7a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

ChunHao Lin [Fri, 13 Dec 2024 19:02:58 +0000 (20:02 +0100)]

r8169: add support for RTL8125D rev.b

Add support for RTL8125D rev.b. Its XID is 0x689. It is basically
based on the one with XID 0x688, but with different firmware file.

Signed-off-by: ChunHao Lin <hau@realtek.com>
[hkallweit1@gmail.com: rebased after adjusted version numbering]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/75e5e9ec-d01f-43ac-b0f4-e7456baf18d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Fri, 13 Dec 2024 19:01:41 +0000 (20:01 +0100)]

r8169: adjust version numbering for RTL8126

Adjust version numbering for RTL8126, so that it doesn't overlap with
new RTL8125 versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/6a354364-20e9-48ad-a198-468264288757@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 17 Dec 2024 02:11:21 +0000 (18:11 -0800)]

Merge branch 'add-support-for-so_priority-cmsg'

Anna Emese Nyiri says:

====================
Add support for SO_PRIORITY cmsg

Introduce a new helper function, `sk_set_prio_allowed`,
to centralize the logic for validating priority settings.
Add support for the `SO_PRIORITY` control message,
enabling user-space applications to set socket priority
via control messages (cmsg).
====================

Link: https://patch.msgid.link/20241213084457.45120-1-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Anna Emese Nyiri [Fri, 13 Dec 2024 08:44:57 +0000 (09:44 +0100)]

sock: Introduce SO_RCVPRIORITY socket option

Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the
ancillary data returned by recvmsg().
This is analogous to the existing support for SO_RCVMARK,
as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option
for SO_MARK with recvmsg()").

Reviewed-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Link: https://patch.msgid.link/20241213084457.45120-5-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Anna Emese Nyiri [Fri, 13 Dec 2024 08:44:56 +0000 (09:44 +0100)]

selftests: net: test SO_PRIORITY ancillary data with cmsg_sender

Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
ancillary data.

cmsg_so_priority.sh script added to validate SO_PRIORITY behavior
by creating VLAN device with egress QoS mapping and testing packet
priorities using flower filters. Verify that packets with different
priorities are correctly matched and counted by filters for multiple
protocols and IP versions.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Link: https://patch.msgid.link/20241213084457.45120-4-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Anna Emese Nyiri [Fri, 13 Dec 2024 08:44:55 +0000 (09:44 +0100)]

sock: support SO_PRIORITY cmsg

The Linux socket API currently allows setting SO_PRIORITY at the
socket level, applying a uniform priority to all packets sent through
that socket. The exception to this is IP_TOS, when the priority value
is calculated during the handling of
ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS
and IP_TTL can be specified as ancillary data").
However, this is a computed
value, and there is currently no mechanism to set a custom priority
via control messages prior to this patch.

According to this patch, if SO_PRIORITY is specified as ancillary data,
the packet is sent with the priority value set through
sockc->priority, overriding the socket-level values
set via the traditional setsockopt() method. This is analogous to
the existing support for SO_MARK, as implemented in
commit c6af0c227a22 ("ip: support SO_MARK cmsg").

If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that
takes precedence is the last one in the cmsg list.

This patch has the side effect that raw_send_hdrinc now interprets cmsg
IP_TOS.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Link: https://patch.msgid.link/20241213084457.45120-3-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Anna Emese Nyiri [Fri, 13 Dec 2024 08:44:54 +0000 (09:44 +0100)]

sock: Introduce sk_set_prio_allowed helper function

Simplify priority setting permissions with the 'sk_set_prio_allowed'
function, centralizing the validation logic. This change is made in
anticipation of a second caller in a following patch.
No functional changes.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Link: https://patch.msgid.link/20241213084457.45120-2-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Donald Hunter [Fri, 13 Dec 2024 11:25:50 +0000 (11:25 +0000)]

netlink: specs: add phys-binding attr to rt_link spec

Add the missing phys-binding attr to the mctp-attrs in the rt_link spec.
This fixes commit 580db513b4a9 ("net: mctp: Expose transport binding
identifier via IFLA attribute").

Note that enum mctp_phys_binding is not currently uapi, but perhaps it
should be?

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20241213112551.33557-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

David Howells [Thu, 12 Dec 2024 21:04:22 +0000 (21:04 +0000)]

rxrpc: Fix ability to add more data to a call once MSG_MORE deasserted

When userspace is adding data to an RPC call for transmission, it must pass
MSG_MORE to sendmsg() if it intends to add more data in future calls to
sendmsg().  Calling sendmsg() without MSG_MORE being asserted closes the
transmission phase of the call (assuming sendmsg() adds all the data
presented) and further attempts to add more data should be rejected.

However, this is no longer the case.  The change of call state that was
previously the guard got bumped over to the I/O thread, which leaves a
window for a repeat sendmsg() to insert more data.  This previously went
unnoticed, but the more recent patch that changed the structures behind the
Tx queue added a warning:

        WARNING: CPU: 3 PID: 6639 at net/rxrpc/sendmsg.c:296 rxrpc_send_data+0x3f2/0x860

and rejected the additional data, returning error EPROTO.

Fix this by adding a guard flag to the call, setting the flag when we queue
the final packet and then rejecting further attempts to add data with
EPROTO.

Fixes: 2d689424b618 ("rxrpc: Move call state changes from sendmsg to I/O thread")
Reported-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/6757fb68.050a0220.2477f.005f.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/2870480.1734037462@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

David Howells [Thu, 12 Dec 2024 20:58:15 +0000 (20:58 +0000)]

rxrpc: Disable IRQ, not BH, to take the lock for ->attend_link

Use spin_lock_irq(), not spin_lock_bh() to take the lock when accessing the
->attend_link() to stop a delay in the I/O thread due to an interrupt being
taken in the app thread whilst that holds the lock and vice versa.

Fixes: a2ea9a907260 ("rxrpc: Use irq-disabling spinlocks between app and I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/2870146.1734037095@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 17 Dec 2024 00:47:42 +0000 (16:47 -0800)]

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next 2024-12-16

The following pull-request contains mlx5 IFC updates.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add device cap abs_native_port_num
  net/mlx5: qos: Add ifc support for cross-esw scheduling
  net/mlx5: Add support for new scheduling elements
  net/mlx5: Add ConnectX-8 device to ifc
  net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits
====================

Link: https://patch.msgid.link/20241216124028.973763-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

David S. Miller [Mon, 16 Dec 2024 12:51:41 +0000 (12:51 +0000)]

Merge branch 'net-timestamp-selectable'

Kory Maincent says:

====================
net: Make timestamping selectable

Up until now, there was no way to let the user select the hardware
PTP provider at which time stamping occurs. The stack assumed that PHY time
stamping is always preferred, but some MAC/PHY combinations were buggy.

This series updates the default MAC/PHY default timestamping and aims to
allow the user to select the desired hwtstamp provider administratively.

Here is few netlink spec usage examples:
./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema
             --dump tsinfo-get
             --json '{"header":{"dev-name":"eth0"}}'
[{'header': {'dev-index': 3, 'dev-name': 'eth0'},
  'hwtst-provider': {'index': 0, 'qualifier': 0},
  'phc-index': 0,
  'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'},
                                  {'index': 2, 'name': 'some'}]},
                 'nomask': True,
                 'size': 16},
  'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'},
                                    {'index': 2, 'name': 'hardware-receive'},
                                    {'index': 6,
                                     'name': 'hardware-raw-clock'}]},
                   'nomask': True,
                   'size': 17},
  'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'},
                                {'index': 1, 'name': 'on'}]},
               'nomask': True,
               'size': 4}},
{'header': {'dev-index': 3, 'dev-name': 'eth0'},
  'hwtst-provider': {'index': 2, 'qualifier': 0},
  'phc-index': 2,
  'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'},
                                  {'index': 1, 'name': 'all'}]},
                 'nomask': True,
                 'size': 16},
  'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'},
                                    {'index': 1, 'name': 'software-transmit'},
                                    {'index': 2, 'name': 'hardware-receive'},
                                    {'index': 3, 'name': 'software-receive'},
                                    {'index': 4,
                                     'name': 'software-system-clock'},
                                    {'index': 6,
                                     'name': 'hardware-raw-clock'}]},
                   'nomask': True,
                   'size': 17},
  'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'},
                                {'index': 1, 'name': 'on'},
                                {'index': 2, 'name': 'onestep-sync'}]},
               'nomask': True,
               'size': 4}}]

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsinfo-get
             --json '{"header":{"dev-name":"eth0"},
                      "hwtst-provider":{"index":0, "qualifier":0 }
}'
{'header': {'dev-index': 3, 'dev-name': 'eth0'},
'hwtst-provider': {'index': 0, 'qualifier': 0},
'phc-index': 0,
'rx-filters': {'bits': {'bit': [{'index': 0, 'name': 'none'},
                                 {'index': 2, 'name': 'some'}]},
                'nomask': True,
                'size': 16},
'timestamping': {'bits': {'bit': [{'index': 0, 'name': 'hardware-transmit'},
                                   {'index': 2, 'name': 'hardware-receive'},
                                   {'index': 6, 'name': 'hardware-raw-clock'}]},
                  'nomask': True,
                  'size': 17},
'tx-types': {'bits': {'bit': [{'index': 0, 'name': 'off'},
                               {'index': 1, 'name': 'on'}]},
              'nomask': True,
              'size': 4}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsinfo-set
             --json '{"header":{"dev-name":"eth0"},
                      "hwtst-provider":{"index":2, "qualifier":0}}'
None
./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsconfig-get
     --json '{"header":{"dev-name":"eth0"}}'
{'header': {'dev-index': 3, 'dev-name': 'eth0'},
'hwtstamp-flags': 1,
'hwtstamp-provider': {'index': 1, 'qualifier': 0},
'rx-filters': {'bits': {'bit': [{'index': 12, 'name': 'ptpv2-event'}]},
                'nomask': True,
                'size': 16},
'tx-types': {'bits': {'bit': [{'index': 1, 'name': 'on'}]},
              'nomask': True,
              'size': 4}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do tsconfig-set
      --json '{"header":{"dev-name":"eth0"},
       "hwtstamp-provider":{"index":1, "qualifier":0 },
       "rx-filters":{"bits": {"bit": {"name":"ptpv2-l4-event"}},
     "nomask": 1},
       "tx-types":{"bits": {"bit": {"name":"on"}},
   "nomask": 1}}'
{'header': {'dev-index': 3, 'dev-name': 'eth0'},
'hwtstamp-flags': 1,
'hwtstamp-provider': {'index': 1, 'qualifier': 0},
'rx-filters': {'bits': {'bit': [{'index': 12, 'name': 'ptpv2-event'}]},
                'nomask': True,
                'size': 16},
'tx-types': {'bits': {'bit': [{'index': 1, 'name': 'on'}]},
              'nomask': True,
              'size': 4}}

Changes in v21:
- NIT fixes.
- Link to v20: https://lore.kernel.org/r/20241204-feature_ptp_netnext-v20-0-9bd99dc8a867@bootlin.com

Changes in v20:
- Change hwtstamp provider design to avoid saving "user" (phy or net) in
  the ptp clock structure.
- Link to v19: https://lore.kernel.org/r/20241030-feature_ptp_netnext-v19-0-94f8aadc9d5c@bootlin.com

Changes in v19:
- Rebase on net-next
- Link to v18: https://lore.kernel.org/r/20241023-feature_ptp_netnext-v18-0-ed948f3b6887@bootlin.com

Changes in v18:
- Few changes in the tsconfig-set ethtool command.
- Add tsconfig-set-reply ethtool netlink socket.
- Add missing netlink tsconfig documentation
- Link to v17: https://lore.kernel.org/r/20240709-feature_ptp_netnext-v17-0-b5317f50df2a@bootlin.com

Changes in v17:
- Fix a documentation nit.
- Add a missing kernel_ethtool_tsinfo update from a new MAC driver.
- Link to v16: https://lore.kernel.org/r/20240705-feature_ptp_netnext-v16-0-5d7153914052@bootlin.com

Changes in v16:
- Add a new patch to separate tsinfo into a new tsconfig command to get
  and set the hwtstamp config.
- Used call_rcu() instead of synchronize_rcu() to free the hwtstamp_provider
- Moved net core changes of patch 12 directly to patch 8.
- Link to v15: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-0-b2a086257b63@bootlin.com

Changes in v15:
- Fix uninitialized ethtool_ts_info structure.
- Link to v14: https://lore.kernel.org/r/20240604-feature_ptp_netnext-v14-0-77b6f6efea40@bootlin.com

Changes in v14:
- Add back an EXPORT_SYMBOL() missing.
- Link to v13: https://lore.kernel.org/r/20240529-feature_ptp_netnext-v13-0-6eda4d40fa4f@bootlin.com

Changes in v13:
- Add PTP builtin code to fix build errors when building PTP as a module.
- Fix error spotted by smatch and sparse.
- Link to v12: https://lore.kernel.org/r/20240430-feature_ptp_netnext-v12-0-2c5f24b6a914@bootlin.com

Changes in v12:
- Add missing return description in the kdoc.
- Fix few nit.
- Link to v11: https://lore.kernel.org/r/20240422-feature_ptp_netnext-v11-0-f14441f2a1d8@bootlin.com

Changes in v11:
- Add netlink examples.
- Remove a change of my out of tree marvell_ptp patch in the patch series.
- Remove useless extern.
- Link to v10: https://lore.kernel.org/r/20240409-feature_ptp_netnext-v10-0-0fa2ea5c89a9@bootlin.com

Changes in v10:
- Move declarations to net/core/dev.h instead of netdevice.h
- Add netlink documentation.
- Add ETHTOOL_A_TSINFO_GHWTSTAMP netlink attributes instead of a bit in
  ETHTOOL_A_TSINFO_TIMESTAMPING bitset.
- Send "Move from simple ida to xarray" patch standalone.
- Add tsinfo ntf command.
- Add rcu_lock protection mechanism to avoid memory leak.
- Fixed doc and kdoc issue.
- Link to v9: https://lore.kernel.org/r/20240226-feature_ptp_netnext-v9-0-455611549f21@bootlin.com

Changes in v9:
- Remove the RFC prefix.
- Correct few NIT fixes.
- Link to v8: https://lore.kernel.org/r/20240216-feature_ptp_netnext-v8-0-510f42f444fb@bootlin.com

Changes in v8:
- Drop the 6 first patch as they are now merged.
- Change the full implementation to not be based on the hwtstamp layer
  (MAC/PHY) but on the hwtstamp provider which mean a ptp clock and a
  phc qualifier.
- Made some patch to prepare the new implementation.
- Expand netlink tsinfo instead of a new ts command for new hwtstamp
  configuration uAPI and for dumping tsinfo of specific hwtstamp provider.
- Link to v7: https://lore.kernel.org/r/20231114-feature_ptp_netnext-v7-0-472e77951e40@bootlin.com

Changes in v7:
- Fix a temporary build error.
- Link to v6: https://lore.kernel.org/r/20231019-feature_ptp_netnext-v6-0-71affc27b0e5@bootlin.com

Changes in v6:
- Few fixes from the reviews.
- Replace the allowlist to default_timestamp flag to know which phy is
  using old API behavior.
- Rename the timestamping layer enum values.
- Move to a simple enum instead of the mix between enum and bitfield.
- Update ts_info and ts-set in software timestamping case.

Changes in v5:
- Update to ndo_hwstamp_get/set. This bring several new patches.
- Add few patches to make the glue.
- Convert macb to ndo_hwstamp_get/set.
- Add netlink specs description of new ethtool commands.
- Removed netdev notifier.
- Split the patches that expose the timestamping to userspace to separate
  the core and ethtool development.
- Add description of software timestamping.
- Convert PHYs hwtstamp callback to use kernel_hwtstamp_config.

Changes in v4:
- Move on to ethtool netlink instead of ioctl.
- Add a netdev notifier to allow packet trapping by the MAC in case of PHY
  time stamping.
- Add a PHY whitelist to not break the old PHY default time-stamping
  preference API.

Changes in v3:
- Expose the PTP choice to ethtool instead of sysfs.
  You can test it with the ethtool source on branch feature_ptp of:
  https://github.com/kmaincent/ethtool
- Added a devicetree binding to select the preferred timestamp.

Changes in v2:
- Move selected_timestamping_layer variable of the concerned patch.
- Use sysfs_streq instead of strmcmp.
- Use the PHY timestamp only if available.
====================

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kory Maincent [Thu, 12 Dec 2024 17:06:45 +0000 (18:06 +0100)]

net: ethtool: Add support for tsconfig command to get/set hwtstamp config

Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket
to read and configure hwtstamp configuration of a PHC provider. Note that
simultaneous hwtstamp isn't supported; configuring a new one disables the
previous setting.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kory Maincent [Thu, 12 Dec 2024 17:06:44 +0000 (18:06 +0100)]

net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology

Either the MAC or the PHY can provide hwtstamp, so we should be able to
read the tsinfo for any hwtstamp provider.

Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a
network topology.

Add support for a specific dump command to retrieve all hwtstamp
providers within the network topology, with added functionality for
filtered dump to target a single interface.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kory Maincent [Thu, 12 Dec 2024 17:06:43 +0000 (18:06 +0100)]

net: Add the possibility to support a selected hwtstamp in netdevice

Introduce the description of a hwtstamp provider, mainly defined with a
the hwtstamp source and the phydev pointer.

Add a hwtstamp provider description within the netdev structure to
allow saving the hwtstamp we want to use. This prepares for future
support of an ethtool netlink command to select the desired hwtstamp
provider. By default, the old API that does not support hwtstamp
selectability is used, meaning the hwtstamp provider pointer is unset.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kory Maincent [Thu, 12 Dec 2024 17:06:42 +0000 (18:06 +0100)]

net: Make net_hwtstamp_validate accessible

Make the net_hwtstamp_validate function accessible in prevision to use
it from ethtool to validate the hwtstamp configuration before setting it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kory Maincent [Thu, 12 Dec 2024 17:06:41 +0000 (18:06 +0100)]

net: Make dev_get_hwtstamp_phylib accessible

Make the dev_get_hwtstamp_phylib function accessible in prevision to use
it from ethtool to read the hwtstamp current configuration.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 16 Dec 2024 12:47:30 +0000 (12:47 +0000)]

Merge branch 'tls1.3-key-updates'

Sabrina Dubroca says:

====================
tls: implement key updates for TLS1.3

This adds support for receiving KeyUpdate messages (RFC 8446, 4.6.3
[1]). A sender transmits a KeyUpdate message and then changes its TX
key. The receiver should react by updating its RX key before
processing the next message.

This patchset implements key updates by:
1. pausing decryption when a KeyUpdate message is received, to avoid
    attempting to use the old key to decrypt a record encrypted with
    the new key
2. returning -EKEYEXPIRED to syscalls that cannot receive the
    KeyUpdate message, until the rekey has been performed by userspace
3. passing the KeyUpdate message to userspace as a control message
4. allowing updates of the crypto_info via the TLS_TX/TLS_RX
    setsockopts

This API has been tested with gnutls to make sure that it allows
userspace libraries to implement key updates [2]. Thanks to Frantisek
Krenzelok <fkrenzel@redhat.com> for providing the implementation in
gnutls and testing the kernel patches.

=======================================================================
Discussions around v2 of this patchset focused on how HW offload would
interact with rekey.

RX
- The existing SW path will handle all records between the KeyUpdate
   message signaling the change of key and the new key becoming known
   to the kernel -- those will be queued encrypted, and decrypted in
   SW as they are read by userspace (once the key is provided, ie same
   as this patchset)
- Call ->tls_dev_del + ->tls_dev_add immediately during
   setsockopt(TLS_RX)

TX
- After setsockopt(TLS_TX), switch to the existing SW path (not the
   current device_fallback) until we're able to re-enable HW offload
   - tls_device_sendmsg will call into tls_sw_sendmsg under lock_sock
     to avoid changing socket ops during the rekey while another
     thread might be waiting on the lock
- We only re-enable HW offload (call ->tls_dev_add to install the new
   key in HW) once all records sent with the old key have been
   ACKed. At this point, all unacked records are SW-encrypted with the
   new key, and the old key is unused by both HW and retransmissions.
   - If there are no unacked records when userspace does
     setsockopt(TLS_TX), we can (try to) install the new key in HW
     immediately.
   - If yet another key has been provided via setsockopt(TLS_TX), we
     don't install intermediate keys, only the latest.
   - TCP notifies ktls of ACKs via the icsk_clean_acked callback. In
     case of a rekey, tls_icsk_clean_acked will record when all data
     sent with the most recent past key has been sent. The next call
     to sendmsg will install the new key in HW.
   - We close and push the current SW record before reenabling
     offload.

If ->tls_dev_add fails to install the new key in HW, we stay in SW
mode. We can add a counter to keep track of this.

In addition:

Because we can't change socket ops during a rekey, we'll also have to
modify do_tls_setsockopt_conf to check ctx->tx_conf and only call
either tls_set_device_offload or tls_set_sw_offload. RX already uses
the same ops for both TLS_HW and TLS_SW, so we could switch between HW
and SW mode on rekey.

An alternative would be to have a common sendmsg which locks
the socket and then calls the correct implementation. We'll need that
anyway for the offload under rekey case, so that would only add a test
to the SW path's ops (compared to the current code). That should allow
us to simplify build_protos a bit, but might have a performance
impact - we'll need to check it if we want to go that route.
=======================================================================

Changes since v4:
- add counter for received KeyUpdate messages
- improve wording in the documentation
- improve handling of bogus messages when looking for KeyUpdate's
- some coding style clean ups

Changes since v3:
- rebase on top of net-next
- rework tls_check_pending_rekey according to Jakub's feedback
- add statistics for rekey: {RX,TX}REKEY{OK,ERROR}
- some coding style clean ups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:09 +0000 (16:36 +0100)]

selftests: tls: add rekey tests

Test the kernel's ability to:
- update the key (but not the version or cipher), only for TLS1.3
- pause decryption after receiving a KeyUpdate message, until a new
RX key has been provided
- reflect the pause/non-readable socket in poll()

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:08 +0000 (16:36 +0100)]

selftests: tls: add key_generation argument to tls_crypto_info_init

This allows us to generate different keys, so that we can test that
rekey is using the correct one.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:07 +0000 (16:36 +0100)]

docs: tls: document TLS1.3 key updates

Document the kernel's behavior and userspace expectations.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:06 +0000 (16:36 +0100)]

tls: add counters for rekey

This introduces 5 counters to keep track of key updates:
Tls{Rx,Tx}Rekey{Ok,Error} and TlsRxRekeyReceived.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:05 +0000 (16:36 +0100)]

tls: implement rekey for TLS1.3

This adds the possibility to change the key and IV when using
TLS1.3. Changing the cipher or TLS version is not supported.

Once we have updated the RX key, we can unblock the receive side. If
the rekey fails, the context is unmodified and userspace is free to
retry the update or close the socket.

This change only affects tls_sw, since 1.3 offload isn't supported.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 12 Dec 2024 15:36:04 +0000 (16:36 +0100)]

tls: block decryption when a rekey is pending

When a TLS handshake record carrying a KeyUpdate message is received,
all subsequent records will be encrypted with a new key. We need to
stop decrypting incoming records with the old key, and wait until
userspace provides a new key.

Make a note of this in the RX context just after decrypting that
record, and stop recvmsg/splice calls with EKEYEXPIRED until the new
key is available.

key_update_pending can't be combined with the existing bitfield,
because we will read it locklessly in ->poll.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Rongwei Liu [Thu, 12 Dec 2024 22:13:20 +0000 (00:13 +0200)]

net/mlx5: Add device cap abs_native_port_num

When the abs_native_port_num is set, the native_port_num reported
by the device may not be continuous and bigger than the num_lag_ports.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241212221329.961628-2-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Sun, 15 Dec 2024 22:28:36 +0000 (14:28 -0800)]

Merge branch 'mptcp-pm-userspace-misc-cleanups'

Matthieu Baerts says:

====================
mptcp: pm: userspace: misc cleanups

These cleanups lead the way to the unification of the path-manager
interfaces, and allow future extensions. The following patches are not
linked to each others, but are all related to the userspace
path-manager.

- Patch 1: add a new helper to reduce duplicated code.

- Patch 2: add a macro to iterate over the address list, clearer.

- Patch 3: reduce duplicated code to get the corresponding MPTCP socket.

- Patch 4: move userspace PM specific code out of the in-kernel one.

- Patch 5: pass an entry instead of a list with always one entry.

- Patch 6: uniform struct type used for the local addresses.

- Patch 7: simplify error handling.
====================

Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-0-ddb6d00109a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Geliang Tang [Fri, 13 Dec 2024 19:52:58 +0000 (20:52 +0100)]

mptcp: drop useless "err = 0" in subflow_destroy

Upon successful return, mptcp_pm_parse_addr() returns 0. There is no need
to set "err = 0" after this. So after mptcp_nl_find_ssk() returns, just
need to set "err = -ESRCH", then release and free msk socket if it returns
NULL.

Also, no need to define the variable "subflow" in subflow_destroy(), use
mptcp_subflow_ctx(ssk) directly.

This patch doesn't change the behaviour of the code, just refactoring.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-7-ddb6d00109a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Unnamed repository; edit this file 'description' to name the repository.