]> www.infradead.org Git - users/hch/configfs.git/log
users/hch/configfs.git
8 months agonet/mlx5e: Fix CT entry update leaks of modify header context
Chris Mi [Tue, 30 Jul 2024 06:16:36 +0000 (09:16 +0300)]
net/mlx5e: Fix CT entry update leaks of modify header context

The cited commit allocates a new modify header to replace the old
one when updating CT entry. But if failed to allocate a new one, eg.
exceed the max number firmware can support, modify header will be
an error pointer that will trigger a panic when deallocating it. And
the old modify header point is copied to old attr. When the old
attr is freed, the old modify header is lost.

Fix it by restoring the old attr to attr when failed to allocate a
new modify header context. So when the CT entry is freed, the right
modify header context will be freed. And the panic of accessing
error pointer is also fixed.

Fixes: 94ceffb48eac ("net/mlx5e: Implement CT entry update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
Rahul Rameshbabu [Tue, 30 Jul 2024 06:16:35 +0000 (09:16 +0300)]
net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability

Require mlx5 classifier action support when creating IPSec chains in
offload path. MLX5_IPSEC_CAP_PRIO should only be set if CONFIG_MLX5_CLS_ACT
is enabled. If CONFIG_MLX5_CLS_ACT=n and MLX5_IPSEC_CAP_PRIO is set,
configuring IPsec offload will fail due to the mlxx5 ipsec chain rules
failing to be created due to lack of classifier action support.

Fixes: fa5aa2f89073 ("net/mlx5e: Use chains for IPsec policy priority offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Fix missing lock on sync reset reload
Moshe Shemesh [Tue, 30 Jul 2024 06:16:34 +0000 (09:16 +0300)]
net/mlx5: Fix missing lock on sync reset reload

On sync reset reload work, when remote host updates devlink on reload
actions performed on that host, it misses taking devlink lock before
calling devlink_remote_reload_actions_performed() which results in
triggering lock assert like the following:

WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50

 CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S      W          6.10.0-rc2+ #116
 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015
 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core]
 RIP: 0010:devl_assert_locked+0x3e/0x50

 Call Trace:
  <TASK>
  ? __warn+0xa4/0x210
  ? devl_assert_locked+0x3e/0x50
  ? report_bug+0x160/0x280
  ? handle_bug+0x3f/0x80
  ? exc_invalid_op+0x17/0x40
  ? asm_exc_invalid_op+0x1a/0x20
  ? devl_assert_locked+0x3e/0x50
  devlink_notify+0x88/0x2b0
  ? mlx5_attach_device+0x20c/0x230 [mlx5_core]
  ? __pfx_devlink_notify+0x10/0x10
  ? process_one_work+0x4b6/0xbb0
  process_one_work+0x4b6/0xbb0
[…]

Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Lag, don't use the hardcoded value of the first port
Mark Bloch [Tue, 30 Jul 2024 06:16:33 +0000 (09:16 +0300)]
net/mlx5: Lag, don't use the hardcoded value of the first port

The cited commit didn't change the body of the loop as it should.
It shouldn't be using MLX5_LAG_P1.

Fixes: 7e978e7714d6 ("net/mlx5: Lag, use actual number of lag ports")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
Yevgeny Kliteynik [Tue, 30 Jul 2024 06:16:32 +0000 (09:16 +0300)]
net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule

This patch reduces the size of hw_ste_arr_optimized array that is
allocated on stack from 640 bytes (5 match STEs + 5 action STES)
to 448 bytes (2 match STEs + 5 action STES).
This fixes the 'stack guard page was hit' issue, while still fitting
majority of the usecases (up to 2 match STEs).

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Fix error handling in irq_pool_request_irq
Shay Drory [Tue, 30 Jul 2024 06:16:31 +0000 (09:16 +0300)]
net/mlx5: Fix error handling in irq_pool_request_irq

In case mlx5_irq_alloc fails, the previously allocated index remains
in the XArray, which could lead to inconsistencies.

Fix it by adding error handling that erases the allocated index
from the XArray if mlx5_irq_alloc returns an error.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Always drain health in shutdown callback
Shay Drory [Tue, 30 Jul 2024 06:16:30 +0000 (09:16 +0300)]
net/mlx5: Always drain health in shutdown callback

There is no point in recovery during device shutdown. if health
work started need to wait for it to avoid races and NULL pointer
access.

Hence, drain health WQ on shutdown callback.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: Add skbuff.h to MAINTAINERS
Breno Leitao [Tue, 30 Jul 2024 16:14:03 +0000 (09:14 -0700)]
net: Add skbuff.h to MAINTAINERS

The network maintainers need to be copied if the skbuff.h is touched.

This also helps git-send-email to figure out the proper maintainers when
touching the file.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240730161404.2028175-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agor8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
Heiner Kallweit [Tue, 30 Jul 2024 19:51:52 +0000 (21:51 +0200)]
r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY

The skb isn't consumed in case of NETDEV_TX_BUSY, therefore don't
increment the tx_dropped counter.

Fixes: 188f4af04618 ("r8169: use NETDEV_TX_{BUSY/OK}")
Cc: stable@vger.kernel.org
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/bbba9c48-8bac-4932-9aa1-d2ed63bc9433@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 1 Aug 2024 00:49:00 +0000 (17:49 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-07-31

We've added 2 non-merge commits during the last 2 day(s) which contain
a total of 2 files changed, 2 insertions(+), 2 deletions(-).

The main changes are:

1) Fix BPF selftest build after tree sync with regards to a _GNU_SOURCE
   macro redefined compilation error, from Stanislav Fomichev.

2) Fix a wrong test in the ASSERT_OK() check in uprobe_syscall BPF selftest,
   from Jiri Olsa.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test
  selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp
====================

Link: https://patch.msgid.link/20240731115706.19677-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 31 Jul 2024 01:41:10 +0000 (18:41 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
ice: fix AF_XDP ZC timeout and concurrency issues

Maciej Fijalkowski says:

Changes included in this patchset address an issue that customer has
been facing when AF_XDP ZC Tx sockets were used in combination with flow
control and regular Tx traffic.

After executing:
ethtool --set-priv-flags $dev link-down-on-close on
ethtool -A $dev rx on tx on

launching multiple ZC Tx sockets on $dev + pinging remote interface (so
that regular Tx traffic is present) and then going through down/up of
$dev, Tx timeout occurred and then most of the time ice driver was unable
to recover from that state.

These patches combined together solve the described above issue on
customer side. Main focus here is to forbid producing Tx descriptors when
either carrier is not yet initialized or process of bringing interface
down has already started.

v1: https://lore.kernel.org/netdev/20240708221416.625850-1-anthony.l.nguyen@intel.com/

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: xsk: fix txq interrupt mapping
  ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog
  ice: improve updating ice_{t,r}x_ring::xsk_pool
  ice: toggle netif_carrier when setting up XSK pool
  ice: modify error handling when setting XSK pool in ndo_bpf
  ice: replace synchronize_rcu with synchronize_net
  ice: don't busy wait for Rx queue disable in ice_qp_dis()
  ice: respect netif readiness in AF_XDP ZC related ndo's
====================

Link: https://patch.msgid.link/20240729200716.681496-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: drop bad gso csum_start and offset in virtio_net_hdr
Willem de Bruijn [Mon, 29 Jul 2024 20:10:12 +0000 (16:10 -0400)]
net: drop bad gso csum_start and offset in virtio_net_hdr

Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.

The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.

Syzkaller demonstrated to reach this warning in skb_checksum_help

offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))

By injecting a TSO packet:

WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
 ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
 ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
 __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
 iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
 ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
 __gre_xmit net/ipv4/ip_gre.c:469 [inline]
 ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
 __netdev_start_xmit include/linux/netdevice.h:4850 [inline]
 netdev_start_xmit include/linux/netdevice.h:4864 [inline]
 xmit_one net/core/dev.c:3595 [inline]
 dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
 __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
 packet_snd net/packet/af_packet.c:3073 [inline]

The geometry of the bad input packet at tcp_gso_segment:

[   52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[   52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[   52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[   52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)

Mitigate with stricter input validation.

csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.

csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.

GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.

This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.

Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable@vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c
Bartosz Golaszewski [Mon, 29 Jul 2024 15:03:14 +0000 (17:03 +0200)]
net: phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c

Commit 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to
start returning real values") introduced a workaround for an issue
observed on aqr115c. However there were never any reports of it
happening on other models and the workaround has been reported to cause
and issue on aqr113c (and it may cause the same on any other model not
supporting 10M mode).

Let's limit the impact of the workaround to aqr113, aqr113c and aqr115c
and poll the 100M GLOBAL_CFG register instead as both models are known
to support it correctly.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/lkml/7c0140be-4325-4005-9068-7e0fc5ff344d@nvidia.com/
Fixes: 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values")
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20240729150315.65798-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: phy: micrel: Fix the KSZ9131 MDI-X status issue
Raju Lakkaraju [Thu, 25 Jul 2024 07:11:25 +0000 (12:41 +0530)]
net: phy: micrel: Fix the KSZ9131 MDI-X status issue

The MDIX status is not accurately reflecting the current state after the link
partner has manually altered its MDIX configuration while operating in forced
mode.

Access information about Auto mdix completion and pair selection from the
KSZ9131's Auto/MDI/MDI-X status register

Fixes: b64e6a8794d9 ("net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240725071125.13960-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agobpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test
Jiri Olsa [Fri, 26 Jul 2024 18:08:47 +0000 (20:08 +0200)]
bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test

Fixing ASSERT_OK condition check in uprobe_syscall test,
otherwise we return from test on pipe success.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240726180847.684584-1-jolsa@kernel.org
8 months agonet: mvpp2: Don't re-use loop iterator
Dan Carpenter [Wed, 24 Jul 2024 16:06:56 +0000 (11:06 -0500)]
net: mvpp2: Don't re-use loop iterator

This function has a nested loop.  The problem is that both the inside
and outside loop use the same variable as an iterator.  I found this
via static analysis so I'm not sure the impact.  It could be that it
loops forever or, more likely, the loop exits early.

Fixes: 3a616b92a9d1 ("net: mvpp2: Add TX flow control support for jumbo frames")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/eaa8f403-7779-4d81-973d-a9ecddc0bf6f@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/iucv: fix use after free in iucv_sock_close()
Alexandra Winter [Mon, 29 Jul 2024 12:28:16 +0000 (14:28 +0200)]
net/iucv: fix use after free in iucv_sock_close()

iucv_sever_path() is called from process context and from bh context.
iucv->path is used as indicator whether somebody else is taking care of
severing the path (or it is already removed / never existed).
This needs to be done with atomic compare and swap, otherwise there is a
small window where iucv_sock_close() will try to work with a path that has
already been severed and freed by iucv_callback_connrej() called by
iucv_tasklet_fn().

Example:
[452744.123844] Call Trace:
[452744.123845] ([<0000001e87f03880>] 0x1e87f03880)
[452744.123966]  [<00000000d593001e>] iucv_path_sever+0x96/0x138
[452744.124330]  [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv]
[452744.124336]  [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv]
[452744.124341]  [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv]
[452744.124345]  [<00000000d574794e>] __sock_release+0x5e/0xe8
[452744.124815]  [<00000000d5747a0c>] sock_close+0x34/0x48
[452744.124820]  [<00000000d5421642>] __fput+0xba/0x268
[452744.124826]  [<00000000d51b382c>] task_work_run+0xbc/0xf0
[452744.124832]  [<00000000d5145710>] do_notify_resume+0x88/0x90
[452744.124841]  [<00000000d5978096>] system_call+0xe2/0x2c8
[452744.125319] Last Breaking-Event-Address:
[452744.125321]  [<00000000d5930018>] iucv_path_sever+0x90/0x138
[452744.125324]
[452744.125325] Kernel panic - not syncing: Fatal exception in interrupt

Note that bh_lock_sock() is not serializing the tasklet context against
process context, because the check for sock_owned_by_user() and
corresponding handling is missing.

Ideas for a future clean-up patch:
A) Correct usage of bh_lock_sock() in tasklet context, as described in
Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/
Re-enqueue, if needed. This may require adding return values to the
tasklet functions and thus changes to all users of iucv.

B) Change iucv tasklet into worker and use only lock_sock() in af_iucv.

Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely")
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agonet/smc: prevent UAF in inet_create()
D. Wythe [Mon, 29 Jul 2024 03:40:15 +0000 (11:40 +0800)]
net/smc: prevent UAF in inet_create()

Following syzbot repro crashes the kernel:

socketpair(0x2, 0x1, 0x100, &(0x7f0000000140)) (fail_nth: 13)

Fix this by not calling sk_common_release() from smc_create_clcsk().

Stack trace:
socket: no more sockets
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
 WARNING: CPU: 1 PID: 5092 at lib/refcount.c:28
refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
Modules linked in:
CPU: 1 PID: 5092 Comm: syz-executor424 Not tainted
6.10.0-syzkaller-04483-g0be9ae5486cd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 06/27/2024
 RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
Code: 80 f3 1f 8c e8 e7 69 a8 fc 90 0f 0b 90 90 eb 99 e8 cb 4f e6 fc c6
05 8a 8d e8 0a 01 90 48 c7 c7 e0 f3 1f 8c e8 c7 69 a8 fc 90 <0f> 0b 90
90 e9 76 ff ff ff e8 a8 4f e6 fc c6 05 64 8d e8 0a 01 90
RSP: 0018:ffffc900034cfcf0 EFLAGS: 00010246
RAX: 3b9fcde1c862f700 RBX: ffff888022918b80 RCX: ffff88807b39bc00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000003 R08: ffffffff815878a2 R09: fffffbfff1c39d94
R10: dffffc0000000000 R11: fffffbfff1c39d94 R12: 00000000ffffffe9
R13: 1ffff11004523165 R14: ffff888022918b28 R15: ffff888022918b00
FS:  00005555870e7380(0000) GS:ffff8880b9500000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000140 CR3: 000000007582e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 inet_create+0xbaf/0xe70
  __sock_create+0x490/0x920 net/socket.c:1571
  sock_create net/socket.c:1622 [inline]
  __sys_socketpair+0x2ca/0x720 net/socket.c:1769
  __do_sys_socketpair net/socket.c:1822 [inline]
  __se_sys_socketpair net/socket.c:1819 [inline]
  __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fbcb9259669
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 1a 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffe931c6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
RAX: ffffffffffffffda RBX: 00007fffe931c6f0 RCX: 00007fbcb9259669
RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000002
RBP: 0000000000000002 R08: 00007fffe931c476 R09: 00000000000000a0
R10: 0000000020000140 R11: 0000000000000246 R12: 00007fffe931c6ec
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Link: https://lore.kernel.org/r/20240723175809.537291-1-edumazet@google.com/
Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/1722224415-30999-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agoMerge branch 'mptcp-fix-inconsistent-backup-usage'
Paolo Abeni [Tue, 30 Jul 2024 08:27:32 +0000 (10:27 +0200)]
Merge branch 'mptcp-fix-inconsistent-backup-usage'

Matthieu Baerts says:

====================
mptcp: fix inconsistent backup usage

In all the MPTCP backup related tests, the backup flag was set on one
side, and the expected behaviour is to have both sides respecting this
decision. That's also the "natural" way, and what the users seem to
expect.

On the scheduler side, only the 'backup' field was checked, which is
supposed to be set only if the other peer flagged a subflow as backup.
But in various places, this flag was also set when the local host
flagged the subflow as backup, certainly to have the expected behaviour
mentioned above.

Patch 1 modifies the packet scheduler to check if the backup flag has
been set on both directions, not to change its behaviour after having
applied the following patches. That's what the default packet scheduler
should have done since the beginning in v5.7.

Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by
accident since its introduction in v5.7. Instead, the received and sent
backup flags are properly distinguished in requests.

Patch 3 stops setting the received backup flag as well when sending an
MP_PRIO, something that was done since the MP_PRIO support in v5.12.

Patch 4 adds related and missing MIB counters to be able to easily check
if MP_JOIN are sent with a backup flag. Certainly because these counters
were not there, the behaviour that is fixed by patches here was not
properly verified.

Patch 5 validates the previous patch by extending the MPTCP Join
selftest.

Patch 6 fixes the backup support in signal endpoints: if a signal
endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as
expected. It was only set for ongoing connections, but not future ones
as expected, since the introduction of the backup flag in endpoints in
v5.10.

Patch 7 validates the previous patch by extending the MPTCP Join
selftest as well.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (7):
      mptcp: sched: check both directions for backup
      mptcp: distinguish rcv vs sent backup flag in requests
      mptcp: pm: only set request_bkup flag when sending MP_PRIO
      mptcp: mib: count MPJ with backup flag
      selftests: mptcp: join: validate backup in MPJ
      mptcp: pm: fix backup support in signal endpoints
      selftests: mptcp: join: check backup support in signal endp

 include/trace/events/mptcp.h                    |  2 +-
 net/mptcp/mib.c                                 |  2 +
 net/mptcp/mib.h                                 |  2 +
 net/mptcp/options.c                             |  2 +-
 net/mptcp/pm.c                                  | 12 +++++
 net/mptcp/pm_netlink.c                          | 19 ++++++-
 net/mptcp/pm_userspace.c                        | 18 +++++++
 net/mptcp/protocol.c                            | 10 ++--
 net/mptcp/protocol.h                            |  4 ++
 net/mptcp/subflow.c                             | 10 ++++
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++-----
 11 files changed, 132 insertions(+), 21 deletions(-)
====================

Link: https://patch.msgid.link/20240727-upstream-net-20240727-mptcp-backup-signal-v1-0-f50b31604cf1@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agoselftests: mptcp: join: check backup support in signal endp
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:29 +0000 (12:01 +0200)]
selftests: mptcp: join: check backup support in signal endp

Before the previous commit, 'signal' endpoints with the 'backup' flag
were ignored when sending the MP_JOIN.

The MPTCP Join selftest has then been modified to validate this case:
the "single address, backup" test, is now validating the MP_JOIN with a
backup flag as it is what we expect it to do with such name. The
previous version has been kept, but renamed to "single address, switch
to backup" to avoid confusions.

The "single address with port, backup" test is also now validating the
MPJ with a backup flag, which makes more sense than checking the switch
to backup with an MP_PRIO.

The "mpc backup both sides" test is now validating that the backup flag
is also set in MP_JOIN from and to the addresses used in the initial
subflow, using the special ID 0.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agomptcp: pm: fix backup support in signal endpoints
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:28 +0000 (12:01 +0200)]
mptcp: pm: fix backup support in signal endpoints

There was a support for signal endpoints, but only when the endpoint's
flag was changed during a connection. If an endpoint with the signal and
backup was already present, the MP_JOIN reply was not containing the
backup flag as expected.

That's confusing to have this inconsistent behaviour. On the other hand,
the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was
already there, it was just never set before. Now when requesting the
local ID from the path-manager, the backup status is also requested.

Note that when the userspace PM is used, the backup flag can be set if
the local address was already used before with a backup flag, e.g. if
the address was announced with the 'backup' flag, or a subflow was
created with the 'backup' flag.

Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agoselftests: mptcp: join: validate backup in MPJ
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:27 +0000 (12:01 +0200)]
selftests: mptcp: join: validate backup in MPJ

A peer can notify the other one that a subflow has to be treated as
"backup" by two different ways: either by sending a dedicated MP_PRIO
notification, or by setting the backup flag in the MP_JOIN handshake.

The selftests were previously monitoring the former, but not the latter.
This is what is now done here by looking at these new MIB counters when
validating the 'backup' cases:

  MPTcpExtMPJoinSynBackupRx
  MPTcpExtMPJoinSynAckBackupRx

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it will help to validate a new fix for an issue introduced by this
commit ID.

Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agomptcp: mib: count MPJ with backup flag
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:26 +0000 (12:01 +0200)]
mptcp: mib: count MPJ with backup flag

Without such counters, it is difficult to easily debug issues with MPJ
not having the backup flags on production servers.

This is not strictly a fix, but it eases to validate the following
patches without requiring to take packet traces, to query ongoing
connections with Netlink with admin permissions, or to guess by looking
at the behaviour of the packet scheduler. Also, the modification is self
contained, isolated, well controlled, and the increments are done just
after others, there from the beginning. It looks then safe, and helpful
to backport this.

Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agomptcp: pm: only set request_bkup flag when sending MP_PRIO
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:25 +0000 (12:01 +0200)]
mptcp: pm: only set request_bkup flag when sending MP_PRIO

The 'backup' flag from mptcp_subflow_context structure is supposed to be
set only when the other peer flagged a subflow as backup, not the
opposite.

Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agomptcp: distinguish rcv vs sent backup flag in requests
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:24 +0000 (12:01 +0200)]
mptcp: distinguish rcv vs sent backup flag in requests

When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow
as 'backup' by setting the flag with the same name. Before this patch,
the backup was set if the other peer set it in its MP_JOIN + SYN
request.

It is not correct: the backup flag should be set in the MPJ+SYN+ACK only
if the host asks for it, and not mirroring what was done by the other
peer. It is then required to have a dedicated bit for each direction,
similar to what is done in the subflow context.

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agomptcp: sched: check both directions for backup
Matthieu Baerts (NGI0) [Sat, 27 Jul 2024 10:01:23 +0000 (12:01 +0200)]
mptcp: sched: check both directions for backup

The 'mptcp_subflow_context' structure has two items related to the
backup flags:

 - 'backup': the subflow has been marked as backup by the other peer

 - 'request_bkup': the backup flag has been set by the host

Before this patch, the scheduler was only looking at the 'backup' flag.
That can make sense in some cases, but it looks like that's not what we
wanted for the general use, because either the path-manager was setting
both of them when sending an MP_PRIO, or the receiver was duplicating
the 'backup' flag in the subflow request.

Note that the use of these two flags in the path-manager are going to be
fixed in the next commits, but this change here is needed not to modify
the behaviour.

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoselftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp
Stanislav Fomichev [Thu, 25 Jul 2024 21:40:29 +0000 (14:40 -0700)]
selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp

Jakub reports build failures when merging linux/master with net tree:

CXX      test_cpp
In file included from <built-in>:454:
<command line>:2:9: error: '_GNU_SOURCE' macro redefined [-Werror,-Wmacro-redefined]
    2 | #define _GNU_SOURCE
      |         ^
<built-in>:445:9: note: previous definition is here
  445 | #define _GNU_SOURCE 1

The culprit is commit cc937dad85ae ("selftests: centralize -D_GNU_SOURCE= to
CFLAGS in lib.mk") which unconditionally added -D_GNU_SOUCE to CLFAGS.
Apparently clang++ also unconditionally adds it for the C++ targets [0]
which causes a conflict. Add small change in the selftests makefile
to filter it out for test_cpp.

Not sure which tree it should go via, targeting bpf for now, but net
might be better?

0: https://stackoverflow.com/questions/11670581/why-is-gnu-source-defined-by-default-and-how-to-turn-it-off

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240725214029.1760809-1-sdf@fomichev.me
9 months agoice: xsk: fix txq interrupt mapping
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:16 +0000 (20:17 +0200)]
ice: xsk: fix txq interrupt mapping

ice_cfg_txq_interrupt() internally handles XDP Tx ring. Do not use
ice_for_each_tx_ring() in ice_qvec_cfg_msix() as this causing us to
treat XDP ring that belongs to queue vector as Tx ring and therefore
misconfiguring the interrupts.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:15 +0000 (20:17 +0200)]
ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog

It is read by data path and modified from process context on remote cpu
so it is needed to use WRITE_ONCE to clear the pointer.

Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: improve updating ice_{t,r}x_ring::xsk_pool
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:14 +0000 (20:17 +0200)]
ice: improve updating ice_{t,r}x_ring::xsk_pool

xsk_buff_pool pointers that ice ring structs hold are updated via
ndo_bpf that is executed in process context while it can be read by
remote CPU at the same time within NAPI poll. Use synchronize_net()
after pointer update and {READ,WRITE}_ONCE() when working with mentioned
pointer.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: toggle netif_carrier when setting up XSK pool
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:13 +0000 (20:17 +0200)]
ice: toggle netif_carrier when setting up XSK pool

This so we prevent Tx timeout issues. One of conditions checked on
running in the background dev_watchdog() is netif_carrier_ok(), so let
us turn it off when we disable the queues that belong to a q_vector
where XSK pool is being configured. Turn carrier on in ice_qp_ena()
only when ice_get_link_status() tells us that physical link is up.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: modify error handling when setting XSK pool in ndo_bpf
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:12 +0000 (20:17 +0200)]
ice: modify error handling when setting XSK pool in ndo_bpf

Don't bail out right when spotting an error within ice_qp_{dis,ena}()
but rather track error and go through whole flow of disabling and
enabling queue pair.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: replace synchronize_rcu with synchronize_net
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:11 +0000 (20:17 +0200)]
ice: replace synchronize_rcu with synchronize_net

Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can
be called instead of synchronize_rcu() so that XDP rings can finish its
job in a faster way. Also let us do this as earlier in XSK queue disable
flow.

Additionally, turn off regular Tx queue before disabling irqs and NAPI.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: don't busy wait for Rx queue disable in ice_qp_dis()
Maciej Fijalkowski [Fri, 26 Jul 2024 18:17:10 +0000 (20:17 +0200)]
ice: don't busy wait for Rx queue disable in ice_qp_dis()

When ice driver is spammed with multiple xdpsock instances and flow
control is enabled, there are cases when Rx queue gets stuck and unable
to reflect the disable state in QRX_CTRL register. Similar issue has
previously been addressed in commit 13a6233b033f ("ice: Add support to
enable/disable all Rx queues before waiting").

To workaround this, let us simply not wait for a disabled state as later
patch will make sure that regardless of the encountered error in the
process of disabling a queue pair, the Rx queue will be enabled.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: respect netif readiness in AF_XDP ZC related ndo's
Michal Kubiak [Fri, 26 Jul 2024 18:17:09 +0000 (20:17 +0200)]
ice: respect netif readiness in AF_XDP ZC related ndo's

Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx
ring when link is either not yet fully initialized or process of
stopping the netdev has already started. To avoid this, add checks
against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc().
One could argue that bailing out early in ice_xsk_wakeup() would be
sufficient but given the fact that we produce Tx descriptors on behalf
of NAPI that is triggered for Rx traffic, the latter is also needed.

Bringing link up is an asynchronous event executed within
ice_service_task so even though interface has been brought up there is
still a time frame where link is not yet ok.

Without this patch, when AF_XDP ZC Tx is used simultaneously with stack
Tx, Tx timeouts occur after going through link flap (admin brings
interface down then up again). HW seem to be unable to transmit
descriptor to the wire after HW tail register bump which in turn causes
bit __QUEUE_STATE_STACK_XOFF to be set forever as
netdev_tx_completed_queue() sees no cleaned bytes on the input.

Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoMerge branch 'mptcp-endpoint-readd-fixes' into main
David S. Miller [Mon, 29 Jul 2024 12:31:28 +0000 (13:31 +0100)]
Merge branch 'mptcp-endpoint-readd-fixes' into main

Matthieu Baerts says:

====================
mptcp: fix signal endpoint readd

Issue #501 [1] showed that the Netlink PM currently doesn't correctly
support removal and re-add of signal endpoints.

Patches 1 and 2 address the issue: the first one in the userspace path-
manager, introduced in v5.19 ; and the second one in the in-kernel path-
manager, introduced in v5.7.

Patch 3 introduces a related selftest. There is no 'Fixes' tag, because
it might be hard to backport it automatically, as missing helpers in
Bash will not be caught when compiling the kernel or the selftests.

The last two patches address two small issues in the MPTCP selftests,
one introduced in v6.6., and the other one in v5.17.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501
====================

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: mptcp: always close input's FD if opened
Liu Jing [Sat, 27 Jul 2024 09:04:03 +0000 (11:04 +0200)]
selftests: mptcp: always close input's FD if opened

In main_loop_s function, when the open(cfg_input, O_RDONLY) function is
run, the last fd is not closed if the "--cfg_repeat > 0" branch is not
taken.

Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Signed-off-by: Liu Jing <liujing@cmss.chinamobile.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: mptcp: fix error path
Paolo Abeni [Sat, 27 Jul 2024 09:04:02 +0000 (11:04 +0200)]
selftests: mptcp: fix error path

pm_nl_check_endpoint() currently calls an not existing helper
to mark the test as failed. Fix the wrong call.

Fixes: 03668c65d153 ("selftests: mptcp: join: rework detailed report")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: mptcp: add explicit test case for remove/readd
Paolo Abeni [Sat, 27 Jul 2024 09:04:01 +0000 (11:04 +0200)]
selftests: mptcp: add explicit test case for remove/readd

Delete and re-create a signal endpoint and ensure that the PM
actually deletes and re-create the subflow.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agomptcp: fix NL PM announced address accounting
Paolo Abeni [Sat, 27 Jul 2024 09:04:00 +0000 (11:04 +0200)]
mptcp: fix NL PM announced address accounting

Currently the per connection announced address counter is never
decreased. As a consequence, after connection establishment, if
the NL PM deletes an endpoint and adds a new/different one, no
additional subflow is created for the new endpoint even if the
current limits allow that.

Address the issue properly updating the signaled address counter
every time the NL PM removes such addresses.

Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agomptcp: fix user-space PM announced address accounting
Paolo Abeni [Sat, 27 Jul 2024 09:03:59 +0000 (11:03 +0200)]
mptcp: fix user-space PM announced address accounting

Currently the per-connection announced address counter is never
decreased. When the user-space PM is in use, this just affect
the information exposed via diag/sockopt, but it could still foul
the PM to wrong decision.

Add the missing accounting for the user-space PM's sake.

Fixes: 8b1c94da1e48 ("mptcp: only send RM_ADDR in nl_cmd_remove")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agortnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in rtnl_dellink().
Kuniyuki Iwashima [Sat, 27 Jul 2024 00:19:53 +0000 (17:19 -0700)]
rtnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in rtnl_dellink().

The cited commit accidentally replaced tgt_net with net in rtnl_dellink().

As a result, IFLA_TARGET_NETNSID is ignored if the interface is specified
with IFLA_IFNAME or IFLA_ALT_IFNAME.

Let's pass tgt_net to rtnl_dev_get().

Fixes: cc6090e985d7 ("net: rtnetlink: introduce helper to get net_device instance by ifname")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: axienet: start napi before enabling Rx/Tx
Andy Chiu [Fri, 26 Jul 2024 07:06:50 +0000 (15:06 +0800)]
net: axienet: start napi before enabling Rx/Tx

softirq may get lost if an Rx interrupt comes before we call
napi_enable. Move napi_enable in front of axienet_setoptions(), which
turns on the device, to address the issue.

Link: https://lists.gnu.org/archive/html/qemu-devel/2024-07/msg06160.html
Fixes: cc37610caaf8 ("net: axienet: implement NAPI and GRO receive")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Adjust clamping window for applications specifying SO_RCVBUF
Subash Abhinov Kasiviswanathan [Fri, 26 Jul 2024 20:41:05 +0000 (13:41 -0700)]
tcp: Adjust clamping window for applications specifying SO_RCVBUF

tp->scaling_ratio is not updated based on skb->len/skb->truesize once
SO_RCVBUF is set leading to the maximum window scaling to be 25% of
rcvbuf after
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
and 50% of rcvbuf after
commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
50% tries to emulate the behavior of older kernels using
sysctl_tcp_adv_win_scale with default value.

Systems which were using a different values of sysctl_tcp_adv_win_scale
in older kernels ended up seeing reduced download speeds in certain
cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
While the sysctl scheme is no longer acceptable, the value of 50% is
a bit conservative when the skb->len/skb->truesize ratio is later
determined to be ~0.66.

Applications not specifying SO_RCVBUF update the window scaling and
the receiver buffer every time data is copied to userspace. This
computation is now used for applications setting SO_RCVBUF to update
the maximum window scaling while ensuring that the receive buffer
is within the application specified limit.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge branch 'ethtool-rss-fixes' into main
David S. Miller [Mon, 29 Jul 2024 09:59:08 +0000 (10:59 +0100)]
Merge branch 'ethtool-rss-fixes' into main

Jakub Kicinski says;

====================
ethtool: more RSS fixes

More fixes for RSS setting. First two patches fix my own bugs
in bnxt conversion to the new API. The third patch fixes
what seems to be a 10 year old issue (present since the Linux
RSS API was created). Fourth patch fixes an issue with
the XArray state being out of sync. And then a small test.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: drv-net: rss_ctx: check for all-zero keys
Jakub Kicinski [Thu, 25 Jul 2024 22:23:53 +0000 (15:23 -0700)]
selftests: drv-net: rss_ctx: check for all-zero keys

We had a handful of bugs relating to key being either all 0
or just reported incorrectly as all 0. Check for this in
the tests.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoethtool: fix the state of additional contexts with old API
Jakub Kicinski [Thu, 25 Jul 2024 22:23:52 +0000 (15:23 -0700)]
ethtool: fix the state of additional contexts with old API

We expect drivers implementing the new create/modify/destroy
API to populate the defaults in struct ethtool_rxfh_context.
In legacy API ctx isn't even passed, and rxfh.indir / rxfh.key
are NULL so drivers can't give us defaults even if they want to.
Call get_rxfh() to fetch the values. We can reuse rxfh_dev
for the get_rxfh(), rxfh stores the input from the user.

This fixes IOCTL reporting 0s instead of the default key /
indir table for drivers using legacy API.

Add a check to try to catch drivers using the new API
but not populating the key.

Fixes: 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoethtool: fix setting key and resetting indir at once
Jakub Kicinski [Thu, 25 Jul 2024 22:23:51 +0000 (15:23 -0700)]
ethtool: fix setting key and resetting indir at once

The indirection table and the key follow struct ethtool_rxfh
in user memory.

To reset the indirection table user space calls SET_RXFH with
table of size 0 (OTOH to say "no change" it should use -1 / ~0).
The logic for calculating the offset where they key sits is
incorrect in this case, as kernel would still offset by the full
table length, while for the reset there is no indir table and
key is immediately after the struct.

  $ ethtool -X eth0 default hkey 01:02:03...
  $ ethtool -x eth0
  [...]
  RSS hash key:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  [...]

Fixes: 3de0b592394d ("ethtool: Support for configurable RSS hash key")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoeth: bnxt: populate defaults in the RSS context struct
Jakub Kicinski [Thu, 25 Jul 2024 22:23:50 +0000 (15:23 -0700)]
eth: bnxt: populate defaults in the RSS context struct

As described in the kdoc for .create_rxfh_context we are responsible
for populating the defaults. The core will not call .get_rxfh
for non-0 context.

The problem can be easily observed since Netlink doesn't currently
use the cache. Using netlink ethtool:

  $ ethtool -x eth0 context 1
  [...]
  RSS hash key:
  13:60:cd:60:14:d3:55:36:86:df:90:f2:96:14:e2:21:05:57:a8:8f:a5:12:5e:54:62:7f:fd:3c:15:7e:76:05:71:42:a2:9a:73:80:09:9c
  RSS hash function:
      toeplitz: on
      xor: off
      crc32: off

But using IOCTL ethtool shows:

  $ ./ethtool-old -x eth0 context 1
  [...]
  RSS hash key:
  00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  RSS hash function:
      Operation not supported

Fixes: 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoeth: bnxt: reject unsupported hash functions
Jakub Kicinski [Thu, 25 Jul 2024 22:23:49 +0000 (15:23 -0700)]
eth: bnxt: reject unsupported hash functions

In commit under Fixes I split the bnxt_set_rxfh_context() function,
and attached the appropriate chunks to new ops. I missed that
bnxt_set_rxfh_context() gets called after some initial checks
in bnxt_set_rxfh(), namely that the hash function is Toeplitz.

Fixes: 5c466b4d4e75 ("eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotun: Add missing bpf_net_ctx_clear() in do_xdp_generic()
Jeongjun Park [Thu, 25 Jul 2024 21:40:49 +0000 (06:40 +0900)]
tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()

There are cases where do_xdp_generic returns bpf_net_context without
clearing it. This causes various memory corruptions, so the missing
bpf_net_ctx_clear must be added.

Reported-by: syzbot+44623300f057a28baf1e@syzkaller.appspotmail.com
Fixes: fecef4cd42c6 ("tun: Assign missing bpf_net_context.")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot+3c2b6d5d4bec3b904933@syzkaller.appspotmail.com
Reported-by: syzbot+707d98c8649695eaf329@syzkaller.appspotmail.com
Reported-by: syzbot+c226757eb784a9da3e8b@syzkaller.appspotmail.com
Reported-by: syzbot+61a1cfc2b6632363d319@syzkaller.appspotmail.com
Reported-by: syzbot+709e4c85c904bcd62735@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge tag 'for-net-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Sat, 27 Jul 2024 01:27:51 +0000 (18:27 -0700)]
Merge tag 'for-net-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btmtk: Fix kernel crash when entering btmtk_usb_suspend
 - btmtk: Fix btmtk.c undefined reference build error
 - btintel: Fail setup on error
 - hci_sync: Fix suspending with wrong filter policy
 - hci_event: Fix setting DISCOVERY_FINDING for passive scanning

* tag 'for-net-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_event: Fix setting DISCOVERY_FINDING for passive scanning
  Bluetooth: btmtk: remove #ifdef around declarations
  Bluetooth: btmtk: Fix btmtk.c undefined reference build error harder
  Bluetooth: btmtk: Fix btmtk.c undefined reference build error
  Bluetooth: hci_sync: Fix suspending with wrong filter policy
  Bluetooth: btmtk: Fix kernel crash when entering btmtk_usb_suspend
  Bluetooth: btintel: Fail setup on error
====================

Link: https://patch.msgid.link/20240726150502.3300832-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agofbnic: Change kconfig prompt from S390=n to !S390
Alexander Duyck [Thu, 25 Jul 2024 17:03:54 +0000 (10:03 -0700)]
fbnic: Change kconfig prompt from S390=n to !S390

In testing the recent kernel I found that the fbnic driver couldn't be
enabled on x86_64 builds. A bit of digging showed that the fbnic driver was
the only one to check for S390 to be n, all others had checked for !S390.
Since it is a boolean and not a tristate I am not sure it will be N. So
just update it to use the !S390 flag.

A quick check via "make menuconfig" verified that after making this change
there was an option to select the fbnic driver.

Fixes 0e03c643dc93 ("eth: fbnic: fix s390 build.")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/172192698293.1903337.4255690118685300353.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'wireless-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Sat, 27 Jul 2024 01:22:53 +0000 (18:22 -0700)]
Merge tag 'wireless-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Couple of more urgent fixes:
 * ath12k: wowlan loop iteration issue
 * ath12k: fix soft lockup on suspend in certain scenarios
 * mt76: fix crash when removing an interface
 * mac80211: fix injection crash with some drivers that
   don't want monitor vif
 * cfg80211: fix S1G beacon parsing in scan
 * cfg80211: fix MLO link status reporting on connect

* tag 'wireless-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: ath12k: fix soft lockup on suspend
  wifi: mt76: mt7921: fix null pointer access in mt792x_mac_link_bss_remove
  wifi: ath12k: fix reusing outside iterator in ath12k_wow_vif_set_wakeups()
  wifi: cfg80211: correct S1G beacon length calculation
  wifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_done
  wifi: mac80211: use monitor sdata with driver only if desired
====================

Link: https://patch.msgid.link/20240726122638.942420-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoBluetooth: hci_event: Fix setting DISCOVERY_FINDING for passive scanning
Luiz Augusto von Dentz [Thu, 25 Jul 2024 22:28:08 +0000 (18:28 -0400)]
Bluetooth: hci_event: Fix setting DISCOVERY_FINDING for passive scanning

DISCOVERY_FINDING shall only be set for active scanning as passive
scanning is not meant to generate MGMT Device Found events causing
discovering state to go out of sync since userspace would believe it
is discovering when in fact it is just passive scanning.

Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219088
Fixes: 2e2515c1ba38 ("Bluetooth: hci_event: Set DISCOVERY_FINDING on SCAN_ENABLED")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btmtk: remove #ifdef around declarations
Arnd Bergmann [Mon, 22 Jul 2024 09:27:06 +0000 (11:27 +0200)]
Bluetooth: btmtk: remove #ifdef around declarations

The caller of these functions in btusb.c is guarded with an
if(IS_ENABLED()) style check, so dead code is left out, but the
declarations are still needed at compile time:

drivers/bluetooth/btusb.c: In function 'btusb_mtk_reset':
drivers/bluetooth/btusb.c:2705:15: error: implicit declaration of function 'btmtk_usb_subsys_reset' [-Wimplicit-function-declaration]
 2705 |         err = btmtk_usb_subsys_reset(hdev, btmtk_data->dev_id);
      |               ^~~~~~~~~~~~~~~~~~~~~~
drivers/bluetooth/btusb.c: In function 'btusb_send_frame_mtk':
drivers/bluetooth/btusb.c:2720:23: error: implicit declaration of function 'alloc_mtk_intr_urb' [-Wimplicit-function-declaration]
 2720 |                 urb = alloc_mtk_intr_urb(hdev, skb, btusb_tx_complete);
      |                       ^~~~~~~~~~~~~~~~~~
drivers/bluetooth/btusb.c:2720:21: error: assignment to 'struct urb *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
 2720 |                 urb = alloc_mtk_intr_urb(hdev, skb, btusb_tx_complete);
      |                     ^

Fixes: f0c83a23fcbb ("Bluetooth: btmtk: Fix btmtk.c undefined reference build error")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btmtk: Fix btmtk.c undefined reference build error harder
Arnd Bergmann [Mon, 22 Jul 2024 09:27:05 +0000 (11:27 +0200)]
Bluetooth: btmtk: Fix btmtk.c undefined reference build error harder

The previous fix was incomplete as the link failure still persists
with CONFIG_USB=m when the sdio or serial wrappers for btmtk.c
are build-in:

btmtk.c:(.text+0x468): undefined reference to `usb_alloc_urb'
btmtk.c:(.text+0x488): undefined reference to `usb_free_urb'
btmtk.c:(.text+0x500): undefined reference to `usb_anchor_urb'
btmtk.c:(.text+0x50a): undefined reference to `usb_submit_urb'
btmtk.c:(.text+0x92c): undefined reference to `usb_control_msg'
btmtk.c:(.text+0xa92): undefined reference to `usb_unanchor_urb'
btmtk.c:(.text+0x11e4): undefined reference to `usb_set_interface'
btmtk.c:(.text+0x120a): undefined reference to `usb_kill_anchored_urbs'

Disallow this configuration.

Fixes: f0c83a23fcbb ("Bluetooth: btmtk: Fix btmtk.c undefined reference build error")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btmtk: Fix btmtk.c undefined reference build error
Chris Lu [Fri, 19 Jul 2024 03:30:19 +0000 (11:30 +0800)]
Bluetooth: btmtk: Fix btmtk.c undefined reference build error

MediaTek moved some usb interface related function to btmtk.c which
may cause build failed if BT USB Kconfig wasn't enabled.
Fix undefined reference by adding config check.

btmtk.c:(.text+0x89c): undefined reference to `usb_alloc_urb'
btmtk.c:(.text+0x8e3): undefined reference to `usb_free_urb'
btmtk.c:(.text+0x956): undefined reference to `usb_free_urb'
btmtk.c:(.text+0xa0e): undefined reference to `usb_anchor_urb'
btmtk.c:(.text+0xb43): undefined reference to `usb_autopm_get_interface'
btmtk.c:(.text+0xb7e): undefined reference to `usb_autopm_put_interface'
btmtk.c:(.text+0xf70): undefined reference to `usb_disable_autosuspend'
btmtk.c:(.text+0x133a): undefined reference to `usb_control_msg'

Fixes: d019930b0049 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407091928.AH0aGZnx-lkp@intel.com/
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: hci_sync: Fix suspending with wrong filter policy
Luiz Augusto von Dentz [Mon, 15 Jul 2024 14:40:03 +0000 (10:40 -0400)]
Bluetooth: hci_sync: Fix suspending with wrong filter policy

When suspending the scan filter policy cannot be 0x00 (no acceptlist)
since that means the host has to process every advertisement report
waking up the system, so this attempts to check if hdev is marked as
suspended and if the resulting filter policy would be 0x00 (no
acceptlist) then skip passive scanning if thre no devices in the
acceptlist otherwise reset the filter policy to 0x01 so the acceptlist
is used since the devices programmed there can still wakeup be system.

Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btmtk: Fix kernel crash when entering btmtk_usb_suspend
Chris Lu [Tue, 16 Jul 2024 07:49:47 +0000 (15:49 +0800)]
Bluetooth: btmtk: Fix kernel crash when entering btmtk_usb_suspend

If MediaTek's Bluetooth setup is unsuccessful, a NULL pointer issue
occur when the system is suspended and the anchored kill function
is called. To avoid this, add protection to prevent executing the
anchored kill function if the setup is unsuccessful.

[    6.922106] Hardware name: Acer Tomato (rev2) board (DT)
[    6.922114] Workqueue: pm pm_runtime_work
[    6.922132] pstate: 804000c9
(Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    6.922147] pc : usb_kill_anchored_urbs+0x6c/0x1e0
[    6.922164] lr : usb_kill_anchored_urbs+0x48/0x1e0
[    6.922181] sp : ffff800080903b60
[    6.922187] x29: ffff800080903b60
x28: ffff2c7b85c32b80 x27: ffff2c7bbb370930
[    6.922211] x26: 00000000000f4240
x25: 00000000ffffffff x24: ffffd49ece2dcb48
[    6.922255] x20: ffffffffffffffd8
x19: 0000000000000000 x18: 0000000000000006
[    6.922276] x17: 6531656337386238
x16: 3632373862333863 x15: ffff800080903480
[    6.922297] x14: 0000000000000000
x13: 303278302f303178 x12: ffffd49ecf090e30
[    6.922318] x11: 0000000000000001
x10: 0000000000000001 x9 : ffffd49ecd2c5bb4
[    6.922339] x8 : c0000000ffffdfff
x7 : ffffd49ecefe0db8 x6 : 00000000000affa8
[    6.922360] x5 : ffff2c7bbb35dd48
x4 : 0000000000000000 x3 : 0000000000000000
[    6.922379] x2 : 0000000000000000
x1 : 0000000000000003 x0 : ffffffffffffffd8
[    6.922400] Call trace:
[    6.922405]  usb_kill_anchored_urbs+0x6c/0x1e0
[    6.922422]  btmtk_usb_suspend+0x20/0x38
[btmtk 5f200a97badbdfda4266773fee49acfc8e0224d5]
[    6.922444]  btusb_suspend+0xd0/0x210
[btusb 0bfbf19a87ff406c83b87268b87ce1e80e9a829b]
[    6.922469]  usb_suspend_both+0x90/0x288
[    6.922487]  usb_runtime_suspend+0x3c/0xa8
[    6.922507]  __rpm_callback+0x50/0x1f0
[    6.922523]  rpm_callback+0x70/0x88
[    6.922538]  rpm_suspend+0xe4/0x5a0
[    6.922553]  pm_runtime_work+0xd4/0xe0
[    6.922569]  process_one_work+0x18c/0x440
[    6.922588]  worker_thread+0x314/0x428
[    6.922606]  kthread+0x128/0x138
[    6.922621]  ret_from_fork+0x10/0x20
[    6.922644] Code: f100a274 54000520 d503201f d100a260 (b8370000)
[    6.922654] ---[ end trace 0000000000000000 ]---

Fixes: ceac1cb0259d ("Bluetooth: btusb: mediatek: add ISO data transmission functions")
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btintel: Fail setup on error
Kiran K [Wed, 3 Jul 2024 08:52:42 +0000 (14:22 +0530)]
Bluetooth: btintel: Fail setup on error

Do not attempt to send any hci command to controller if *setup* function
fails.

Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agonet: phy: realtek: add support for RTL8366S Gigabit PHY
Mark Mentovai [Thu, 25 Jul 2024 20:41:44 +0000 (16:41 -0400)]
net: phy: realtek: add support for RTL8366S Gigabit PHY

The PHY built in to the Realtek RTL8366S switch controller was
previously supported by genphy_driver. This PHY does not implement MMD
operations. Since commit 9b01c885be36 ("net: phy: c22: migrate to
genphy_c45_write_eee_adv()"), MMD register reads have been made during
phy_probe to determine EEE support. For genphy_driver, these reads are
transformed into 802.3 annex 22D clause 45-over-clause 22
mmd_phy_indirect operations that perform MII register writes to
MII_MMD_CTRL and MII_MMD_DATA. This overwrites those two MII registers,
which on this PHY are reserved and have another function, rendering the
PHY unusable while so configured.

Proper support for this PHY is restored by providing a phy_driver that
declares MMD operations as unsupported by using the helper functions
provided for that purpose, while remaining otherwise identical to
genphy_driver.

Fixes: 9b01c885be36 ("net: phy: c22: migrate to genphy_c45_write_eee_adv()")
Reported-by: Russell Senior <russell@personaltelco.net>
Closes: https://github.com/openwrt/openwrt/issues/15981
Link: https://github.com/openwrt/openwrt/issues/15739
Signed-off-by: Mark Mentovai <mark@mentovai.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agowifi: ath12k: fix soft lockup on suspend
Johan Hovold [Tue, 9 Jul 2024 07:31:32 +0000 (09:31 +0200)]
wifi: ath12k: fix soft lockup on suspend

The ext interrupts are enabled when the firmware has been started, but
this may never happen, for example, if the board configuration file is
missing.

When the system is later suspended, the driver unconditionally tries to
disable interrupts, which results in an irq disable imbalance and causes
the driver to spin indefinitely in napi_synchronize().

Make sure that the interrupts have been enabled before attempting to
disable them.

Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Cc: stable@vger.kernel.org # 6.3
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://patch.msgid.link/20240709073132.9168-1-johan+linaro@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agowifi: mt76: mt7921: fix null pointer access in mt792x_mac_link_bss_remove
Sean Wang [Thu, 18 Jul 2024 23:46:33 +0000 (16:46 -0700)]
wifi: mt76: mt7921: fix null pointer access in mt792x_mac_link_bss_remove

Fix null pointer access in mt792x_mac_link_bss_remove.

To prevent null pointer access, we should assign the vif to bss_conf in
mt7921_add_interface. This ensures that subsequent operations on the BSS
can properly reference the correct vif.

[  T843] Call Trace:
[  T843]  <TASK>
[  T843]  ? __die+0x1e/0x60
[  T843]  ? page_fault_oops+0x157/0x450
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? search_bpf_extables+0x5a/0x80
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? exc_page_fault+0x2bb/0x670
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? lock_timer_base+0x71/0x90
[  T843]  ? asm_exc_page_fault+0x26/0x30
[  T843]  ? mt792x_mac_link_bss_remove+0x24/0x110 [mt792x_lib]
[  T843]  ? mt792x_remove_interface+0x6e/0x90 [mt792x_lib]
[  T843]  ? ieee80211_do_stop+0x507/0x7e0 [mac80211]
[  T843]  ? ieee80211_stop+0x53/0x190 [mac80211]
[  T843]  ? __dev_close_many+0xa5/0x120
[  T843]  ? __dev_change_flags+0x18c/0x220
[  T843]  ? dev_change_flags+0x21/0x60
[  T843]  ? do_setlink+0xdf9/0x11d0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? security_sock_rcv_skb+0x33/0x50
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? __nla_validate_parse+0x61/0xd10
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? genl_done+0x53/0x80
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? netlink_dump+0x357/0x410
[  T843]  ? __rtnl_newlink+0x5d6/0x980
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? genl_family_rcv_msg_dumpit+0xdf/0xf0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? __kmalloc_cache_noprof+0x44/0x210
[  T843]  ? rtnl_newlink+0x42/0x60
[  T843]  ? rtnetlink_rcv_msg+0x152/0x3f0
[  T843]  ? mptcp_pm_nl_dump_addr+0x180/0x180
[  T843]  ? rtnl_calcit.isra.0+0x130/0x130
[  T843]  ? netlink_rcv_skb+0x56/0x100
[  T843]  ? netlink_unicast+0x199/0x290
[  T843]  ? netlink_sendmsg+0x21d/0x490
[  T843]  ? __sock_sendmsg+0x78/0x80
[  T843]  ? ____sys_sendmsg+0x23f/0x2e0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? copy_msghdr_from_user+0x68/0xa0
[  T843]  ? ___sys_sendmsg+0x81/0xd0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? crng_fast_key_erasure+0xbc/0xf0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? get_random_bytes_user+0x126/0x140
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? __fdget+0xb1/0xe0
[  T843]  ? __sys_sendmsg+0x56/0xa0
[  T843]  ? srso_alias_return_thunk+0x5/0xfbef5
[  T843]  ? do_syscall_64+0x5f/0x170
[  T843]  ? entry_SYSCALL_64_after_hwframe+0x55/0x5d
[  T843]  </TASK>

Fixes: 1541d63c5fe2 ("wifi: mt76: mt7925: add mt7925_mac_link_bss_remove to remove per-link BSS")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/linux-wireless/2fee61f8c903d02a900ca3188c3742c7effd102e.camel@web.de/#b
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Link: https://patch.msgid.link/20240718234633.12737-1-sean.wang@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agowifi: ath12k: fix reusing outside iterator in ath12k_wow_vif_set_wakeups()
Baochen Qiang [Mon, 22 Jul 2024 03:33:32 +0000 (11:33 +0800)]
wifi: ath12k: fix reusing outside iterator in ath12k_wow_vif_set_wakeups()

Smatch throws below warning:

drivers/net/wireless/ath/ath12k/wow.c:434 ath12k_wow_vif_set_wakeups()
warn: reusing outside iterator: 'i'

drivers/net/wireless/ath/ath12k/wow.c
    411         default:
    412                 break;
    413         }
    414
    415         for (i = 0; i < wowlan->n_patterns; i++) {
                            ^^^^^^^^^^^^^^^^^^^^^^
Here we loop until ->n_patterns

    416                 const struct cfg80211_pkt_pattern *eth_pattern = &patterns[i];
    417                 struct ath12k_pkt_pattern new_pattern = {};
    418
    419                 if (WARN_ON(eth_pattern->pattern_len > WOW_MAX_PATTERN_SIZE))
    420                         return -EINVAL;
    421
    422                 if (ar->ab->wow.wmi_conf_rx_decap_mode ==
    423                     ATH12K_HW_TXRX_NATIVE_WIFI) {
    424                         ath12k_wow_convert_8023_to_80211(ar, eth_pattern,
    425                                                          &new_pattern);
    426
    427                         if (WARN_ON(new_pattern.pattern_len > WOW_MAX_PATTERN_SIZE))
    428                                 return -EINVAL;
    429                 } else {
    430                         memcpy(new_pattern.pattern, eth_pattern->pattern,
    431                                eth_pattern->pattern_len);
    432
    433                         /* convert bitmask to bytemask */
--> 434                         for (i = 0; i < eth_pattern->pattern_len; i++)
    435                                 if (eth_pattern->mask[i / 8] & BIT(i % 8))
    436                                         new_pattern.bytemask[i] = 0xff;

This loop re-uses i and the loop ends with i == eth_pattern->pattern_len.
This looks like a bug.

Change to use a new iterator 'j' for the inner loop to fix it.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4

Fixes: 4a3c212eee0e ("wifi: ath12k: add basic WoW functionalities")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/d4975b95-9c43-45af-a0ab-80253f18c7f2@stanley.mountain/
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://patch.msgid.link/20240722033332.6273-1-quic_bqiang@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agowifi: cfg80211: correct S1G beacon length calculation
Johannes Berg [Wed, 24 Jul 2024 11:29:12 +0000 (13:29 +0200)]
wifi: cfg80211: correct S1G beacon length calculation

The minimum header length calculation (equivalent to the start
of the elements) for the S1G long beacon erroneously required
only up to the start of u.s1g_beacon rather than the start of
u.s1g_beacon.variable. Fix that, and also shuffle the branches
around a bit to not assign useless values that are overwritten
later.

Reported-by: syzbot+0f3afa93b91202f21939@syzkaller.appspotmail.com
Fixes: 9eaffe5078ca ("cfg80211: convert S1G beacon to scan results")
Link: https://patch.msgid.link/20240724132912.9662972db7c1.I8779675b5bbda4994cc66f876b6b87a2361c3c0b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agowifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_done
Veerendranath Jakkam [Wed, 24 Jul 2024 12:53:27 +0000 (18:23 +0530)]
wifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_done

Individual MLO links connection status is not copied to
EVENT_CONNECT_RESULT data while processing the connect response
information in cfg80211_connect_done(). Due to this failed links
are wrongly indicated with success status in EVENT_CONNECT_RESULT.

To fix this, copy the individual MLO links status to the
EVENT_CONNECT_RESULT data.

Fixes: 53ad07e9823b ("wifi: cfg80211: support reporting failed links")
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20240724125327.3495874-1-quic_vjakkam@quicinc.com
[commit message editorial changes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agowifi: mac80211: use monitor sdata with driver only if desired
Johannes Berg [Thu, 25 Jul 2024 16:48:36 +0000 (18:48 +0200)]
wifi: mac80211: use monitor sdata with driver only if desired

In commit 0d9c2beed116 ("wifi: mac80211: fix monitor channel
with chanctx emulation") I changed mac80211 to always have an
internal monitor_sdata to have something to have the chanctx
bound to.

However, if the driver didn't also have the WANT_MONITOR flag
this would cause mac80211 to allocate it without telling the
driver (which was intentional) but also use it for later APIs
to the driver without it ever having known about it which was
_not_ intentional.

Check through the code and only use the monitor_sdata in the
relevant places (TX, MU-MIMO follow settings, TX power, and
interface iteration) when the WANT_MONITOR flag is set.

Cc: stable@vger.kernel.org
Fixes: 0d9c2beed116 ("wifi: mac80211: fix monitor channel with chanctx emulation")
Reported-by: ZeroBeat <ZeroBeat@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219086
Tested-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20240725184836.25d334157a8e.I02574086da2c5cf0e18264ce5807db6f14ffd9c0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agosched: act_ct: take care of padding in struct zones_ht_key
Eric Dumazet [Thu, 25 Jul 2024 09:27:45 +0000 (09:27 +0000)]
sched: act_ct: take care of padding in struct zones_ht_key

Blamed commit increased lookup key size from 2 bytes to 16 bytes,
because zones_ht_key got a struct net pointer.

Make sure rhashtable_lookup() is not using the padding bytes
which are not initialized.

 BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline]
 BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline]
 BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline]
 BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
 BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329
  rht_ptr_rcu include/linux/rhashtable.h:376 [inline]
  __rhashtable_lookup include/linux/rhashtable.h:607 [inline]
  rhashtable_lookup include/linux/rhashtable.h:646 [inline]
  rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
  tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329
  tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408
  tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425
  tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488
  tcf_action_add net/sched/act_api.c:2061 [inline]
  tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118
  rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647
  netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550
  rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665
  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
  netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357
  netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:745
  ____sys_sendmsg+0x877/0xb60 net/socket.c:2597
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651
  __sys_sendmsg net/socket.c:2680 [inline]
  __do_sys_sendmsg net/socket.c:2689 [inline]
  __se_sys_sendmsg net/socket.c:2687 [inline]
  __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687
  x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable key created at:
  tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324
  tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408

Fixes: 88c67aeb1407 ("sched: act_ct: add netns into the key of tcf_ct_flow_table")
Reported-by: syzbot+1b5e4e187cc586d05ea0@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: usb: sr9700: fix uninitialized variable use in sr_mdio_read
Ma Ke [Thu, 25 Jul 2024 02:29:42 +0000 (10:29 +0800)]
net: usb: sr9700: fix uninitialized variable use in sr_mdio_read

It could lead to error happen because the variable res is not updated if
the call to sr_share_read_word returns an error. In this particular case
error code was returned and res stayed uninitialized. Same issue also
applies to sr_read_reg.

This can be avoided by checking the return value of sr_share_read_word
and sr_read_reg, and propagating the error if the read operation failed.

Found by code review.

Cc: stable@vger.kernel.org
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Reviewed-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge branch 'ethtool-rss-small-fixes-to-spec-and-get'
Jakub Kicinski [Thu, 25 Jul 2024 23:23:49 +0000 (16:23 -0700)]
Merge branch 'ethtool-rss-small-fixes-to-spec-and-get'

Jakub Kicinski says:

====================
ethtool: rss: small fixes to spec and GET

Two small fixes to the ethtool RSS_GET over Netlink.
Spec is a bit inaccurate and responses miss an identifier.
====================

Link: https://patch.msgid.link/20240724234249.2621109-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoethtool: rss: echo the context number back
Jakub Kicinski [Wed, 24 Jul 2024 23:42:49 +0000 (16:42 -0700)]
ethtool: rss: echo the context number back

The response to a GET request in Netlink should fully identify
the queried object. RSS_GET accepts context id as an input,
so it must echo that attribute back to the response.

After (assuming context 1 has been created):

  $ ./cli.py --spec netlink/specs/ethtool.yaml \
             --do rss-get \
     --json '{"header": {"dev-index": 2}, "context": 1}'
  {'context': 1,
   'header': {'dev-index': 2, 'dev-name': 'eth0'},
  [...]

Fixes: 7112a04664bf ("ethtool: add netlink based get rss support")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240724234249.2621109-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonetlink: specs: correct the spec of ethtool
Jakub Kicinski [Wed, 24 Jul 2024 23:42:48 +0000 (16:42 -0700)]
netlink: specs: correct the spec of ethtool

The spec for Ethtool is a bit inaccurate. We don't currently
support dump. Context is only accepted as input and not echoed
to output (which is a separate bug).

Fixes: a353318ebf24 ("tools: ynl: populate most of the ethtool spec")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240724234249.2621109-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agobnxt_en: Fix RSS logic in __bnxt_reserve_rings()
Pavan Chebbi [Wed, 24 Jul 2024 22:21:06 +0000 (15:21 -0700)]
bnxt_en: Fix RSS logic in __bnxt_reserve_rings()

In __bnxt_reserve_rings(), the existing code unconditionally sets the
default RSS indirection table to default if netif_is_rxfh_configured()
returns false.  This used to be correct before we added RSS contexts
support.  For example, if the user is changing the number of ethtool
channels, we will enter this path to reserve the new number of rings.
We will then set the RSS indirection table to default to cover the new
number of rings if netif_is_rxfh_configured() is false.

Now, with RSS contexts support, if the user has added or deleted RSS
contexts, we may now enter this path to reserve the new number of VNICs.
However, netif_is_rxfh_configured() will not return the correct state if
we are still in the middle of set_rxfh().  So the existing code may
set the indirection table of the default RSS context to default by
mistake.

Fix it to check if the reservation of the RX rings is changing.  Only
check netif_is_rxfh_configured() if it is changing.  RX rings will not
change in the middle of set_rxfh() and this will fix the issue.

Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Reported-and-tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20240625010210.2002310-1-kuba@kernel.org
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240724222106.147744-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 Jul 2024 20:32:25 +0000 (13:32 -0700)]
Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  A lot of networking people were at a conference last week, busy
  catching COVID, so relatively short PR.

  Current release - regressions:

   - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP

  Current release - new code bugs:

   - l2tp: protect session IDR and tunnel session list with one lock,
     make sure the state is coherent to avoid a warning

   - eth: bnxt_en: update xdp_rxq_info in queue restart logic

   - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field

  Previous releases - regressions:

   - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
     the field reuses previously un-validated pad

  Previous releases - always broken:

   - tap/tun: drop short frames to prevent crashes later in the stack

   - eth: ice: add a per-VF limit on number of FDIR filters

   - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"

* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  tun: add missing verification for short frame
  tap: add missing verification for short frame
  mISDN: Fix a use after free in hfcmulti_tx()
  gve: Fix an edge case for TSO skb validity check
  bnxt_en: update xdp_rxq_info in queue restart logic
  tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
  MAINTAINERS: make Breno the netconsole maintainer
  MAINTAINERS: Update bonding entry
  net: nexthop: Initialize all fields in dumped nexthops
  net: stmmac: Correct byte order of perfect_match
  selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
  tipc: Return non-zero value from tipc_udp_addr2str() on error
  netfilter: nft_set_pipapo_avx2: disable softinterrupts
  ice: Fix recipe read procedure
  ice: Add a per-VF limit on number of FDIR filters
  net: bonding: correctly annotate RCU in bond_should_notify_peers()
  ...

9 months agoMerge tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 25 Jul 2024 20:18:41 +0000 (13:18 -0700)]
Merge tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - trivial printk changes

The bigger "real" printk work is still being discussed.

* tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsprintf: add missing MODULE_DESCRIPTION() macro
  printk: Rename console_replay_all() and update context

9 months agoMerge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 25 Jul 2024 19:58:36 +0000 (12:58 -0700)]
Merge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl constification from Joel Granados:
 "Treewide constification of the ctl_table argument of proc_handlers
  using a coccinelle script and some manual code formatting fixups.

  This is a prerequisite to moving the static ctl_table structs into
  read-only data section which will ensure that proc_handler function
  pointers cannot be modified"

* tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: treewide: constify the ctl_table argument of proc_handlers

9 months agoMerge tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 25 Jul 2024 19:55:21 +0000 (12:55 -0700)]
Merge tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Wipe screen_info after allocating it from the heap - used by arm32
   and EFI zboot, other EFI architectures allocate it statically

 - Revert to allocating boot_params from the heap on x86 when entering
   via the native PE entrypoint, to work around a regression on older
   Dell hardware

* tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efistub: Revert to heap allocated boot_params for PE entrypoint
  efi/libstub: Zero initialize heap allocated struct screen_info

9 months agoMerge tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt...
Linus Torvalds [Thu, 25 Jul 2024 19:48:42 +0000 (12:48 -0700)]
Merge tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "Three small changes this cycle:

   - Clean up an architecture abstraction that is no longer needed
     because all the architectures have converged.

   - Actually use the prompt argument to kdb_position_cursor() instead
     of ignoring it (functionally this fix is a nop but that was due to
     luck rather than good judgement)

   - Fix a -Wformat-security warning"

* tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Get rid of redundant kdb_curr_task()
  kdb: Use the passed prompt in kdb_position_cursor()
  kdb: address -Wformat-security warnings

9 months agoMerge tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Linus Torvalds [Thu, 25 Jul 2024 19:41:53 +0000 (12:41 -0700)]
Merge tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:

 - Use improved timer sync for Loongson64

 - Fix address of GCR_ACCESS register

 - Add missing MODULE_DESCRIPTION

* tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: sibyte: add missing MODULE_DESCRIPTION() macro
  MIPS: SMP-CPS: Fix address for GCR_ACCESS register for CM3 and later
  MIPS: Loongson64: Switch to SYNC_R4K

9 months agoMerge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 25 Jul 2024 19:37:42 +0000 (12:37 -0700)]
Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
 "The gettimeofday() and clock_gettime() syscalls are now available as
  vDSO functions, and Dave added a patch which allows to use NVMe cards
  in the PCI slots as fast and easy alternative to SCSI discs.

  Summary:

   - add gettimeofday() and clock_gettime() vDSO functions

   - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
     with PCIe NVME card to function in parisc machines

   - allow users to reduce kernel unaligned runtime warnings

   - minor code cleanups"

* tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
  parisc: Use max() to calculate parisc_tlb_flush_threshold
  parisc: Fix warning at drivers/pci/msi/msi.h:121
  parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Clean up unistd.h file

9 months agoMerge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 25 Jul 2024 19:33:08 +0000 (12:33 -0700)]
Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Support for preemption

 - i386 Rust support

 - Huge cleanup by Benjamin Berg

 - UBSAN support

 - Removal of dead code

* tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
  um: vector: always reset vp->opened
  um: vector: remove vp->lock
  um: register power-off handler
  um: line: always fill *error_out in setup_one_line()
  um: remove pcap driver from documentation
  um: Enable preemption in UML
  um: refactor TLB update handling
  um: simplify and consolidate TLB updates
  um: remove force_flush_all from fork_handler
  um: Do not flush MM in flush_thread
  um: Delay flushing syscalls until the thread is restarted
  um: remove copy_context_skas0
  um: remove LDT support
  um: compress memory related stub syscalls while adding them
  um: Rework syscall handling
  um: Add generic stub_syscall6 function
  um: Create signal stack memory assignment in stub_data
  um: Remove stub-data.h include from common-offsets.h
  um: time-travel: fix signal blocking race/hang
  um: time-travel: remove time_exit()
  ...

9 months agoMerge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 25 Jul 2024 17:42:22 +0000 (10:42 -0700)]
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core changes for 6.11-rc1.

  Lots of stuff in here, with not a huge diffstat, but apis are evolving
  which required lots of files to be touched. Highlights of the changes
  in here are:

   - platform remove callback api final fixups (Uwe took many releases
     to get here, finally!)

   - Rust bindings for basic firmware apis and initial driver-core
     interactions.

     It's not all that useful for a "write a whole driver in rust" type
     of thing, but the firmware bindings do help out the phy rust
     drivers, and the driver core bindings give a solid base on which
     others can start their work.

     There is still a long way to go here before we have a multitude of
     rust drivers being added, but it's a great first step.

   - driver core const api changes.

     This reached across all bus types, and there are some fix-ups for
     some not-common bus types that linux-next and 0-day testing shook
     out.

     This work is being done to help make the rust bindings more safe,
     as well as the C code, moving toward the end-goal of allowing us to
     put driver structures into read-only memory. We aren't there yet,
     but are getting closer.

   - minor devres cleanups and fixes found by code inspection

   - arch_topology minor changes

   - other minor driver core cleanups

  All of these have been in linux-next for a very long time with no
  reported problems"

* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
  ARM: sa1100: make match function take a const pointer
  sysfs/cpu: Make crash_hotplug attribute world-readable
  dio: Have dio_bus_match() callback take a const *
  zorro: make match function take a const pointer
  driver core: module: make module_[add|remove]_driver take a const *
  driver core: make driver_find_device() take a const *
  driver core: make driver_[create|remove]_file take a const *
  firmware_loader: fix soundness issue in `request_internal`
  firmware_loader: annotate doctests as `no_run`
  devres: Correct code style for functions that return a pointer type
  devres: Initialize an uninitialized struct member
  devres: Fix memory leakage caused by driver API devm_free_percpu()
  devres: Fix devm_krealloc() wasting memory
  driver core: platform: Switch to use kmemdup_array()
  driver core: have match() callback in struct bus_type take a const *
  MAINTAINERS: add Rust device abstractions to DRIVER CORE
  device: rust: improve safety comments
  MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
  MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
  firmware: rust: improve safety comments
  ...

9 months agoMerge tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Thu, 25 Jul 2024 17:18:35 +0000 (10:18 -0700)]
Merge tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - make watchdog_class const

 - rework of the rzg2l_wdt driver

 - other small fixes and improvements

* tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog:
  dt-bindings: watchdog: dlg,da9062-watchdog: Drop blank space
  watchdog: rzn1: Convert comma to semicolon
  watchdog: lenovo_se10_wdt: Convert comma to semicolon
  dt-bindings: watchdog: renesas,wdt: Document RZ/G3S support
  watchdog: rzg2l_wdt: Add suspend/resume support
  watchdog: rzg2l_wdt: Rely on the reset driver for doing proper reset
  watchdog: rzg2l_wdt: Remove comparison with zero
  watchdog: rzg2l_wdt: Remove reset de-assert from probe
  watchdog: rzg2l_wdt: Check return status of pm_runtime_put()
  watchdog: rzg2l_wdt: Use pm_runtime_resume_and_get()
  watchdog: rzg2l_wdt: Make the driver depend on PM
  watchdog: rzg2l_wdt: Restrict the driver to ARCH_RZG2L and ARCH_R9A09G011
  watchdog: imx7ulp_wdt: keep already running watchdog enabled
  watchdog: starfive: Add missing clk_disable_unprepare()
  watchdog: Make watchdog_class const

9 months agoMerge tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma...
Linus Torvalds [Thu, 25 Jul 2024 17:10:34 +0000 (10:10 -0700)]
Merge tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - fix the order of actions in dmam_free_coherent (Lance Richardson)

* tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping:
  dma: fix call order in dmam_free_coherent

9 months agoMerge branch 'tap-tun-harden-by-dropping-short-frame'
Jakub Kicinski [Thu, 25 Jul 2024 15:07:06 +0000 (08:07 -0700)]
Merge branch 'tap-tun-harden-by-dropping-short-frame'

Dongli Zhang says:

====================
tap/tun: harden by dropping short frame

This is to harden all of tap/tun to avoid any short frame smaller than the
Ethernet header (ETH_HLEN).

While the xen-netback already rejects short frame smaller than ETH_HLEN ...

 914 static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 915                                      int budget,
 916                                      unsigned *copy_ops,
 917                                      unsigned *map_ops)
 918 {
... ...
1007                 if (unlikely(txreq.size < ETH_HLEN)) {
1008                         netdev_dbg(queue->vif->dev,
1009                                    "Bad packet size: %d\n", txreq.size);
1010                         xenvif_tx_err(queue, &txreq, extra_count, idx);
1011                         break;
1012                 }

... the short frame may not be dropped by vhost-net/tap/tun.

This fixes CVE-2024-41090 and CVE-2024-41091.
====================

Link: https://patch.msgid.link/20240724170452.16837-1-dongli.zhang@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agotun: add missing verification for short frame
Dongli Zhang [Wed, 24 Jul 2024 17:04:52 +0000 (10:04 -0700)]
tun: add missing verification for short frame

The cited commit missed to check against the validity of the frame length
in the tun_xdp_one() path, which could cause a corrupted skb to be sent
downstack. Even before the skb is transmitted, the
tun_xdp_one-->eth_type_trans() may access the Ethernet header although it
can be less than ETH_HLEN. Once transmitted, this could either cause
out-of-bound access beyond the actual length, or confuse the underlayer
with incorrect or inconsistent header length in the skb metadata.

In the alternative path, tun_get_user() already prohibits short frame which
has the length less than Ethernet header size from being transmitted for
IFF_TAP.

This is to drop any frame shorter than the Ethernet header size just like
how tun_get_user() does.

CVE: CVE-2024-41091
Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()")
Cc: stable@vger.kernel.org
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agotap: add missing verification for short frame
Si-Wei Liu [Wed, 24 Jul 2024 17:04:51 +0000 (10:04 -0700)]
tap: add missing verification for short frame

The cited commit missed to check against the validity of the frame length
in the tap_get_user_xdp() path, which could cause a corrupted skb to be
sent downstack. Even before the skb is transmitted, the
tap_get_user_xdp()-->skb_set_network_header() may assume the size is more
than ETH_HLEN. Once transmitted, this could either cause out-of-bound
access beyond the actual length, or confuse the underlayer with incorrect
or inconsistent header length in the skb metadata.

In the alternative path, tap_get_user() already prohibits short frame which
has the length less than Ethernet header size from being transmitted.

This is to drop any frame shorter than the Ethernet header size just like
how tap_get_user() does.

CVE: CVE-2024-41090
Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()")
Cc: stable@vger.kernel.org
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agomISDN: Fix a use after free in hfcmulti_tx()
Dan Carpenter [Wed, 24 Jul 2024 16:08:18 +0000 (11:08 -0500)]
mISDN: Fix a use after free in hfcmulti_tx()

Don't dereference *sp after calling dev_kfree_skb(*sp).

Fixes: af69fb3a8ffa ("Add mISDN HFC multiport driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8be65f5a-c2dd-4ba0-8a10-bfe5980b8cfb@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agogve: Fix an edge case for TSO skb validity check
Bailey Forrest [Wed, 24 Jul 2024 14:34:31 +0000 (07:34 -0700)]
gve: Fix an edge case for TSO skb validity check

The NIC requires each TSO segment to not span more than 10
descriptors. NIC further requires each descriptor to not exceed
16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).

The descriptors for an skb are generated by
gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
generates a descriptor for the entire frag if the frag size is
not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).

gve_can_send_tso() checks if the descriptors thus generated for an
skb would meet the requirement that each TSO-segment not span more
than 10 descriptors. However, the current code misses an edge case
when a TSO segment spans multiple descriptors within a large frag.
This change fixes the edge case.

gve_can_send_tso() relies on the assumption that max gso size (9728)
is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
fragment a TSO segment can never span more than 2 descriptors.

Fixes: a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agobnxt_en: update xdp_rxq_info in queue restart logic
Taehee Yoo [Sun, 21 Jul 2024 05:35:54 +0000 (05:35 +0000)]
bnxt_en: update xdp_rxq_info in queue restart logic

When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver
updates(creates and deletes) a page_pool.
But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still
connected to an old page_pool.
So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but
bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool.

An old page_pool is no longer used so it is supposed to be
deleted by page_pool_destroy() but it isn't.
Because the xdp_rxq_info is holding the reference count for it and the
xdp_rxq_info is not updated, an old page_pool will not be deleted in
the queue restart logic.

Before restarting 1 queue:
./tools/net/ynl/samples/page-pool
enp10s0f1np1[6] page pools: 4 (zombies: 0)
refs: 8192 bytes: 33554432 (refs: 0 bytes: 0)
recycling: 0.0% (alloc: 128:8048 recycle: 0:0)

After restarting 1 queue:
./tools/net/ynl/samples/page-pool
enp10s0f1np1[6] page pools: 5 (zombies: 0)
refs: 10240 bytes: 41943040 (refs: 0 bytes: 0)
recycling: 20.0% (alloc: 160:10080 recycle: 1920:128)

Before restarting queues, an interface has 4 page_pools.
After restarting one queue, an interface has 5 page_pools, but it
should be 4, not 5.
The reason is that queue restarting logic creates a new page_pool and
an old page_pool is not deleted due to the absence of an update of
xdp_rxq_info logic.

Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 25 Jul 2024 14:40:24 +0000 (07:40 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-07-25

We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).

The main changes are:

1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
   BPF sockhash. Also add test coverage for this case, from Michal Luczaj.

2) Fix a segmentation issue when downgrading gso_size in the BPF helper
   bpf_skb_adjust_room(), from Fred Li.

3) Fix a compiler warning in resolve_btfids due to a missing type cast,
   from Liwei Song.

4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
   boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.

5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
   from Stanislav Fomichev.

6) Fix function prototype BTF dumping in libbpf for prototypes that have
   no input arguments, from Andrii Nakryiko.

7) Fix stacktrace symbol resolution in perf script for BPF programs
   containing subprograms, from Hou Tao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
  bpf, events: Use prog to emit ksymbol event for main program
  selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
  selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
  selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
  af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
  bpftool: Fix typo in usage help
  libbpf: Fix no-args func prototype BTF dumping syntax
  MAINTAINERS: Update powerpc BPF JIT maintainers
  MAINTAINERS: Update email address of Naveen
  selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================

Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agotcp: process the 3rd ACK with sk_socket for TFO/MPTCP
Matthieu Baerts (NGI0) [Wed, 24 Jul 2024 10:25:16 +0000 (12:25 +0200)]
tcp: process the 3rd ACK with sk_socket for TFO/MPTCP

The 'Fixes' commit recently changed the behaviour of TCP by skipping the
processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
unnecessary ACK in case of simultaneous connect(). Unfortunately, that
had an impact on TFO and MPTCP.

I started to look at the impact on MPTCP, because the MPTCP CI found
some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
suggested me to look at the impact on TFO with "plain" TCP.

For MPTCP, when receiving the 3rd ACK of a request adding a new path
(MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
has been created when the MPTCP connection got established before with
the first path. The newly added 'goto' will then skip the processing of
the segment text (step 7) and not go through tcp_data_queue() where the
MPTCP options are validated, and some actions are triggered, e.g.
sending the MPJ 4th ACK [2] as demonstrated by the new errors when
running a packetdrill test [3] establishing a second subflow.

This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
delayed. Still, we don't want to have this behaviour as it delays the
switch to the fully established mode, and invalid MPTCP options in this
3rd ACK will not be caught any more. This modification also affects the
MPTCP + TFO feature as well, and being the reason why the selftests
started to be unstable the last few days [4].

For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
passing: if the 3rd ACK contains data, and the connection is accept()ed
before receiving them, these data would no longer be processed, and thus
not ACKed.

One last thing about MPTCP, in case of simultaneous connect(), a
fallback to TCP will be done, which seems fine:

  `../common/defaults.sh`

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
  +0 >  . 1:1(0) ack 1           <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>

Simultaneous SYN-data crossing is also not supported by TFO, see [6].

Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
that's a more generic solution than the one initially proposed, and
also enough to fix the issues described above.

Later on, Eric Dumazet mentioned that an ACK should still be sent in
reaction to the second SYN+ACK that is received: not sending a DUPACK
here seems wrong and could hurt:

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
  +0 < S  0:0(0)       win 1000 <mss 1000, sackOK, nop, nop>
  +0 > S. 0:0(0) ack 1          <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
  +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
  +0 >  . 1:1(0) ack 1          <nop, nop, sack 0:1>  // <== Here

So in this version, the 'goto consume' is dropped, to always send an ACK
when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
simultaneous SYN crossing: that's what is expected here.

Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696
Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens
Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28
Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt
Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoselftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
Stanislav Fomichev [Sat, 13 Jul 2024 01:52:52 +0000 (18:52 -0700)]
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test

This flag is now required to use tx_metadata_len.

Fixes: 40808a237d9c ("selftests/bpf: Add TX side to xdp_metadata")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
9 months agoxsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
Stanislav Fomichev [Sat, 13 Jul 2024 01:52:51 +0000 (18:52 -0700)]
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len

Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len")
can break existing use cases which don't zero-initialize xdp_umem_reg
padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we
interpret the padding as tx_metadata_len only when being explicitly
asked.

Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
9 months agobpf: Fix a segment issue when downgrading gso_size
Fred Li [Fri, 19 Jul 2024 02:46:53 +0000 (10:46 +0800)]
bpf: Fix a segment issue when downgrading gso_size

Linearize the skb when downgrading gso_size because it may trigger a
BUG_ON() later when the skb is segmented as described in [1,2].

Fixes: 2be7e212d5419 ("bpf: add bpf_skb_adjust_room helper")
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/all/20240626065555.35460-2-dracodingfly@gmail.com
Link: https://lore.kernel.org/all/668d5cf1ec330_1c18c32947@willemb.c.googlers.com.notmuch
Link: https://lore.kernel.org/bpf/20240719024653.77006-1-dracodingfly@gmail.com
9 months agonet: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
Breno Leitao [Wed, 24 Jul 2024 08:05:23 +0000 (01:05 -0700)]
net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling

Move the freeing of the dummy net_device from mtk_free_dev() to
mtk_remove().

Previously, if alloc_netdev_dummy() failed in mtk_probe(),
eth->dummy_dev would be NULL. The error path would then call
mtk_free_dev(), which in turn called free_netdev() assuming dummy_dev
was allocated (but it was not), potentially causing a NULL pointer
dereference.

By moving free_netdev() to mtk_remove(), we ensure it's only called when
mtk_probe() has succeeded and dummy_dev is fully allocated. This
addresses a potential NULL pointer dereference detected by Smatch[1].

Fixes: b209bd6d0bff ("net: mediatek: mtk_eth_sock: allocate dummy net_device dynamically")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/4160f4e0-cbef-4a22-8b5d-42c4d399e1f7@stanley.mountain/ [1]
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240724080524.2734499-1-leitao@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 25 Jul 2024 09:17:21 +0000 (11:17 +0200)]
Merge tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a Netfilter fix for net:

Patch #1 if FPU is busy, then pipapo set backend falls back to standard
         set element lookup. Moreover, disable bh while at this.
 From Florian Westphal.

netfilter pull request 24-07-24

* tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_pipapo_avx2: disable softinterrupts
====================

Link: https://patch.msgid.link/20240724081305.3152-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Paolo Abeni [Thu, 25 Jul 2024 08:10:05 +0000 (10:10 +0200)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
This series contains updates to ice driver only.

Ahmed enforces the iavf per VF filter limit on ice (PF) driver to prevent
possible resource exhaustion.

Wojciech corrects assignment of l2 flags read from firmware.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix recipe read procedure
  ice: Add a per-VF limit on number of FDIR filters
====================

Link: https://patch.msgid.link/20240723233242.3146628-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge tag 'phy-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Linus Torvalds [Wed, 24 Jul 2024 20:11:28 +0000 (13:11 -0700)]
Merge tag 'phy-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy updates from Vinod Koul:
 "New Support
   - Samsung Exynos gs101 drd combo phy
   - Qualcomm SC8180x USB uniphy, IPQ9574 QMP PCIe phy
   - Airoha EN7581 PCIe phy
   - Freescale i.MX8Q HSIO SerDes phy
   - Starfive jh7110 dphy tx

  Updates:
   - Resume support for j721e-wiz driver
   - Updates to Exynos usbdrd driver
   - Support for optional power domains in g12a usb2-phy driver
   - Debugfs support and updates to zynqmp driver"

* tag 'phy-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (56 commits)
  phy: airoha: Add dtime and Rx AEQ IO registers
  dt-bindings: phy: airoha: Add dtime and Rx AEQ IO registers
  dt-bindings: phy: rockchip-emmc-phy: Convert to dtschema
  dt-bindings: phy: qcom,qmp-usb: fix spelling error
  phy: exynos5-usbdrd: support Exynos USBDRD 3.1 combo phy (HS & SS)
  phy: exynos5-usbdrd: convert Vbus supplies to regulator_bulk
  phy: exynos5-usbdrd: convert (phy) register access clock to clk_bulk
  phy: exynos5-usbdrd: convert core clocks to clk_bulk
  phy: exynos5-usbdrd: support isolating HS and SS ports independently
  dt-bindings: phy: samsung,usb3-drd-phy: add gs101 compatible
  phy: core: Fix documentation of of_phy_get
  phy: starfive: Correct the dphy configure process
  phy: zynqmp: Add debugfs support
  phy: zynqmp: Take the phy mutex in xlate
  phy: zynqmp: Only wait for PLL lock "primary" instances
  phy: zynqmp: Store instance instead of type
  phy: zynqmp: Enable reference clock correctly
  phy: cadence-torrent: Check return value on register read
  phy: Fix the cacography in phy-exynos5250-usb2.c
  phy: phy-rockchip-samsung-hdptx: Select CONFIG_MFD_SYSCON
  ...