]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
9 months agovsock/bpf: return early if transport is not assigned
Stefano Garzarella [Fri, 10 Jan 2025 08:35:08 +0000 (09:35 +0100)]
vsock/bpf: return early if transport is not assigned

Some of the core functions can only be called if the transport
has been assigned.

As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:

    BUG: kernel NULL pointer dereference, address: 00000000000000a0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
    Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
    RIP: 0010:vsock_connectible_has_data+0x1f/0x40
    Call Trace:
     vsock_bpf_recvmsg+0xca/0x5e0
     sock_recvmsg+0xb9/0xc0
     __sys_recvfrom+0xb3/0x130
     __x64_sys_recvfrom+0x20/0x30
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agovsock/virtio: discard packets if the transport changes
Stefano Garzarella [Fri, 10 Jan 2025 08:35:07 +0000 (09:35 +0100)]
vsock/virtio: discard packets if the transport changes

If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.

A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Reported-by: Wongi Lee <qwerty@theori.io>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge branch 'gtp-pfcp-fix-use-after-free-of-udp-tunnel-socket'
Paolo Abeni [Tue, 14 Jan 2025 10:20:06 +0000 (11:20 +0100)]
Merge branch 'gtp-pfcp-fix-use-after-free-of-udp-tunnel-socket'

Kuniyuki Iwashima says:

====================
gtp/pfcp: Fix use-after-free of UDP tunnel socket.

Xiao Liang pointed out weird netns usages in ->newlink() of
gtp and pfcp.

This series fixes the issues.

Link: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/
Changes:
  v2:
    * Patch 1
      * Fix uninit/unused local var

  v1: https://lore.kernel.org/netdev/20250108062834.11117-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20250110014754.33847-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agopfcp: Destroy device along with udp socket's netns dismantle.
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:54 +0000 (10:47 +0900)]
pfcp: Destroy device along with udp socket's netns dismantle.

pfcp_newlink() links the device to a list in dev_net(dev) instead
of net, where a udp tunnel socket is created.

Even when net is removed, the device stays alive on dev_net(dev).
Then, removing net triggers the splat below. [0]

In this example, pfcp0 is created in ns2, but the udp socket is
created in ns1.

  ip netns add ns1
  ip netns add ns2
  ip -n ns1 link add netns ns2 name pfcp0 type pfcp
  ip netns del ns1

Let's link the device to the socket's netns instead.

Now, pfcp_net_exit() needs another netdev iteration to remove
all pfcp devices in the netns.

pfcp_dev_list is not used under RCU, so the list API is converted
to the non-RCU variant.

pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next.

[0]:
ref_tracker: net notrefcnt@00000000128b34dc has 1/1 users at
     sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
     inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
     __sock_create (net/socket.c:1558)
     udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
     pfcp_create_sock (drivers/net/pfcp.c:168)
     pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197)
     rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
     rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
     netlink_rcv_skb (net/netlink/af_netlink.c:2542)
     netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
     netlink_sendmsg (net/netlink/af_netlink.c:1891)
     ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
     ___sys_sendmsg (net/socket.c:2639)
     __sys_sendmsg (net/socket.c:2669)
     do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
     entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000007f3fb60 EFLAGS: 00010286
RAX: 00000000000020ef RBX: ff1100000d6481e0 RCX: 1ffffffff0e40d82
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000d648230 R08: 0000000000000001 R09: fffffbfff0e395af
R10: 0000000000000001 R11: 0000000000000000 R12: ff1100000d648230
R13: dead000000000100 R14: ff1100000d648230 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005620e1363990 CR3: 000000000eeb2002 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn (kernel/panic.c:748)
 ? ref_tracker_dir_exit (lib/ref_tracker.c:179)
 ? report_bug (lib/bug.c:201 lib/bug.c:219)
 ? handle_bug (arch/x86/kernel/traps.c:285)
 ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
 ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
 ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
 ? ref_tracker_dir_exit (lib/ref_tracker.c:179)
 ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
 ? kfree (mm/slub.c:4613 mm/slub.c:4761)
 net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
 cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
 process_one_work (kernel/workqueue.c:3229)
 worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
 kthread (kernel/kthread.c:389)
 ret_from_fork (arch/x86/kernel/process.c:147)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
  </TASK>

Fixes: 76c8764ef36a ("pfcp: add PFCP module")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agogtp: Destroy device along with udp socket's netns dismantle.
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:53 +0000 (10:47 +0900)]
gtp: Destroy device along with udp socket's netns dismantle.

gtp_newlink() links the device to a list in dev_net(dev) instead of
src_net, where a udp tunnel socket is created.

Even when src_net is removed, the device stays alive on dev_net(dev).
Then, removing src_net triggers the splat below. [0]

In this example, gtp0 is created in ns2, and the udp socket is created
in ns1.

  ip netns add ns1
  ip netns add ns2
  ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn
  ip netns del ns1

Let's link the device to the socket's netns instead.

Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove
all gtp devices in the netns.

[0]:
ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at
     sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
     inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
     __sock_create (net/socket.c:1558)
     udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
     gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423)
     gtp_create_sockets (drivers/net/gtp.c:1447)
     gtp_newlink (drivers/net/gtp.c:1507)
     rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
     rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
     netlink_rcv_skb (net/netlink/af_netlink.c:2542)
     netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
     netlink_sendmsg (net/netlink/af_netlink.c:1891)
     ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
     ___sys_sendmsg (net/socket.c:2639)
     __sys_sendmsg (net/socket.c:2669)
     do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)

WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000009a07b60 EFLAGS: 00010286
RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae
R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0
R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn (kernel/panic.c:748)
 ? ref_tracker_dir_exit (lib/ref_tracker.c:179)
 ? report_bug (lib/bug.c:201 lib/bug.c:219)
 ? handle_bug (arch/x86/kernel/traps.c:285)
 ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
 ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
 ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
 ? ref_tracker_dir_exit (lib/ref_tracker.c:179)
 ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
 ? kfree (mm/slub.c:4613 mm/slub.c:4761)
 net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
 cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
 process_one_work (kernel/workqueue.c:3229)
 worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
 kthread (kernel/kthread.c:389)
 ret_from_fork (arch/x86/kernel/process.c:147)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
 </TASK>

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agogtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:52 +0000 (10:47 +0900)]
gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().

gtp_newlink() links the gtp device to a list in dev_net(dev).

However, even after the gtp device is moved to another netns,
it stays on the list but should be invisible.

Let's use for_each_netdev_rcu() for netdev traversal in
gtp_genl_dump_pdp().

Note that gtp_dev_list is no longer used under RCU, so list
helpers are converted to the non-RCU variant.

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_=PmFPA@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoudp: Make rehash4 independent in udp_lib_rehash()
Philo Lu [Fri, 10 Jan 2025 01:08:10 +0000 (09:08 +0800)]
udp: Make rehash4 independent in udp_lib_rehash()

As discussed in [0], rehash4 could be missed in udp_lib_rehash() when
udp hash4 changes while hash2 doesn't change. This patch fixes this by
moving rehash4 codes out of rehash2 checking, and then rehash2 and
rehash4 are done separately.

By doing this, we no longer need to call rehash4 explicitly in
udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes
it. Thus, now udp_lib_hash4() returns directly if the sk is already
hashed.

Note that uhash4 may fail to work under consecutive connect(<dst
address>) calls because rehash() is not called with every connect(). To
overcome this, connect(<AF_UNSPEC>) needs to be called after the next
connect to a new destination.

[0]
https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com/

Fixes: 78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://patch.msgid.link/20250110010810.107145-1-lulie@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agor8169: remove redundant hwmon support
Heiner Kallweit [Thu, 9 Jan 2025 22:43:12 +0000 (23:43 +0100)]
r8169: remove redundant hwmon support

The temperature sensor is actually part of the integrated PHY and available
also on the standalone versions of the PHY. Therefore hwmon support will
be added to the Realtek PHY driver and can be removed here.

Fixes: 1ffcc8d41306 ("r8169: add support for the temperature sensor being available from RTL8125B")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/afba85f5-987b-4449-83cc-350438af7fe7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet/ncsi: fix locking in Get MAC Address handling
Paul Fertser [Thu, 9 Jan 2025 14:50:54 +0000 (17:50 +0300)]
net/ncsi: fix locking in Get MAC Address handling

Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.

Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Tested-by: Potin Lai <potin.lai.pt@gmail.com>
Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agopktgen: Avoid out-of-bounds access in get_imix_entries
Artem Chernyshev [Thu, 9 Jan 2025 08:30:39 +0000 (11:30 +0300)]
pktgen: Avoid out-of-bounds access in get_imix_entries

Passing a sufficient amount of imix entries leads to invalid access to the
pkt_dev->imix_entries array because of the incorrect boundary check.

UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24
index 20 is out of range for type 'imix_pkt [20]'
CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl lib/dump_stack.c:117
__ubsan_handle_out_of_bounds lib/ubsan.c:429
get_imix_entries net/core/pktgen.c:874
pktgen_if_write net/core/pktgen.c:1063
pde_write fs/proc/inode.c:334
proc_reg_write fs/proc/inode.c:346
vfs_write fs/read_write.c:593
ksys_write fs/read_write.c:644
do_syscall_64 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 52a62f8603f9 ("pktgen: Parse internet mix (imix) input")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
[ fp: allow to fill the array completely; minor changelog cleanup ]
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoopenvswitch: fix lockup on tx to unregistering netdev with carrier
Ilya Maximets [Thu, 9 Jan 2025 12:21:24 +0000 (13:21 +0100)]
openvswitch: fix lockup on tx to unregistering netdev with carrier

Commit in a fixes tag attempted to fix the issue in the following
sequence of calls:

    do_output
    -> ovs_vport_send
       -> dev_queue_xmit
          -> __dev_queue_xmit
             -> netdev_core_pick_tx
                -> skb_tx_hash

When device is unregistering, the 'dev->real_num_tx_queues' goes to
zero and the 'while (unlikely(hash >= qcount))' loop inside the
'skb_tx_hash' becomes infinite, locking up the core forever.

But unfortunately, checking just the carrier status is not enough to
fix the issue, because some devices may still be in unregistering
state while reporting carrier status OK.

One example of such device is a net/dummy.  It sets carrier ON
on start, but it doesn't implement .ndo_stop to set the carrier off.
And it makes sense, because dummy doesn't really have a carrier.
Therefore, while this device is unregistering, it's still easy to hit
the infinite loop in the skb_tx_hash() from the OVS datapath.  There
might be other drivers that do the same, but dummy by itself is
important for the OVS ecosystem, because it is frequently used as a
packet sink for tcpdump while debugging OVS deployments.  And when the
issue is hit, the only way to recover is to reboot.

Fix that by also checking if the device is running.  The running
state is handled by the net core during unregistering, so it covers
unregistering case better, and we don't really need to send packets
to devices that are not running anyway.

While only checking the running state might be enough, the carrier
check is preserved.  The running and the carrier states seem disjoined
throughout the code and different drivers.  And other core functions
like __dev_direct_xmit() check both before attempting to transmit
a packet.  So, it seems safer to check both flags in OVS as well.

Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250109122225.4034688-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: ravb: Fix max TX frame size for RZ/V2M
Paul Barker [Thu, 9 Jan 2025 11:37:06 +0000 (11:37 +0000)]
net: ravb: Fix max TX frame size for RZ/V2M

When tx_max_frame_size was added to struct ravb_hw_info, no value was
set in ravb_rzv2m_hw_info so the default value of zero was used.

The maximum MTU is set by subtracting from tx_max_frame_size to allow
space for headers and frame checksums. As ndev->max_mtu is unsigned,
this subtraction wraps around leading to a ridiculously large positive
value that is obviously incorrect.

Before tx_max_frame_size was introduced, the maximum MTU was based on
rx_max_frame_size. So, we can restore the correct maximum MTU by copying
the rx_max_frame_size value into tx_max_frame_size for RZ/V2M.

Fixes: 1d63864299ca ("net: ravb: Fix maximum TX frame size for GbEth devices")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://patch.msgid.link/20250109113706.1409149-1-paul.barker.ct@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: mana: Cleanup "mana" debugfs dir after cleanup of all children
Shradha Gupta [Thu, 9 Jan 2025 05:03:11 +0000 (21:03 -0800)]
net: mana: Cleanup "mana" debugfs dir after cleanup of all children

In mana_driver_exit(), mana_debugfs_root gets cleanup before any of it's
children (which happens later in the pci_unregister_driver()).
Due to this, when mana driver is configured as a module and rmmod is
invoked, following stack gets printed along with failure in rmmod command.

[ 2399.317651] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ 2399.318657] #PF: supervisor write access in kernel mode
[ 2399.319057] #PF: error_code(0x0002) - not-present page
[ 2399.319528] PGD 10eb68067 P4D 0
[ 2399.319914] Oops: Oops: 0002 [#1] SMP NOPTI
[ 2399.320308] CPU: 72 UID: 0 PID: 5815 Comm: rmmod Not tainted 6.13.0-rc5+ #89
[ 2399.320986] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
[ 2399.321892] RIP: 0010:down_write+0x1a/0x50
[ 2399.322303] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 9d cd ff ff 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 17 65 48 8b 05 f6 84 dd 5f 49 89 44 24 08 4c
[ 2399.323669] RSP: 0018:ff53859d6c663a70 EFLAGS: 00010246
[ 2399.324061] RAX: 0000000000000000 RBX: ff1d4eb505060180 RCX: ffffff8100000000
[ 2399.324620] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098
[ 2399.325167] RBP: ff53859d6c663a78 R08: 00000000000009c4 R09: ff1d4eb4fac90000
[ 2399.325681] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000098
[ 2399.326185] R13: ff1d4e42e1a4a0c8 R14: ff1d4eb538ce0000 R15: 0000000000000098
[ 2399.326755] FS:  00007fe729570000(0000) GS:ff1d4eb2b7200000(0000) knlGS:0000000000000000
[ 2399.327269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2399.327690] CR2: 0000000000000098 CR3: 00000001c0584005 CR4: 0000000000373ef0
[ 2399.328166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2399.328623] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 2399.329055] Call Trace:
[ 2399.329243]  <TASK>
[ 2399.329379]  ? show_regs+0x69/0x80
[ 2399.329602]  ? __die+0x25/0x70
[ 2399.329856]  ? page_fault_oops+0x271/0x550
[ 2399.330088]  ? psi_group_change+0x217/0x470
[ 2399.330341]  ? do_user_addr_fault+0x455/0x7b0
[ 2399.330667]  ? finish_task_switch.isra.0+0x91/0x2f0
[ 2399.331004]  ? exc_page_fault+0x73/0x160
[ 2399.331275]  ? asm_exc_page_fault+0x27/0x30
[ 2399.343324]  ? down_write+0x1a/0x50
[ 2399.343631]  simple_recursive_removal+0x4d/0x2c0
[ 2399.343977]  ? __pfx_remove_one+0x10/0x10
[ 2399.344251]  debugfs_remove+0x45/0x70
[ 2399.344511]  mana_destroy_rxq+0x44/0x400 [mana]
[ 2399.344845]  mana_destroy_vport+0x54/0x1c0 [mana]
[ 2399.345229]  mana_detach+0x2f1/0x4e0 [mana]
[ 2399.345466]  ? ida_free+0x150/0x160
[ 2399.345718]  ? __cond_resched+0x1a/0x50
[ 2399.345987]  mana_remove+0xf4/0x1a0 [mana]
[ 2399.346243]  mana_gd_remove+0x25/0x80 [mana]
[ 2399.346605]  pci_device_remove+0x41/0xb0
[ 2399.346878]  device_remove+0x46/0x70
[ 2399.347150]  device_release_driver_internal+0x1e3/0x250
[ 2399.347831]  ? klist_remove+0x81/0xe0
[ 2399.348377]  driver_detach+0x4b/0xa0
[ 2399.348906]  bus_remove_driver+0x83/0x100
[ 2399.349435]  driver_unregister+0x31/0x60
[ 2399.349919]  pci_unregister_driver+0x40/0x90
[ 2399.350492]  mana_driver_exit+0x1c/0xb50 [mana]
[ 2399.351102]  __do_sys_delete_module.constprop.0+0x184/0x320
[ 2399.351664]  ? __fput+0x1a9/0x2d0
[ 2399.352200]  __x64_sys_delete_module+0x12/0x20
[ 2399.352760]  x64_sys_call+0x1e66/0x2140
[ 2399.353316]  do_syscall_64+0x79/0x150
[ 2399.353813]  ? syscall_exit_to_user_mode+0x49/0x230
[ 2399.354346]  ? do_syscall_64+0x85/0x150
[ 2399.354816]  ? irqentry_exit+0x1d/0x30
[ 2399.355287]  ? exc_page_fault+0x7f/0x160
[ 2399.355756]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2399.356302] RIP: 0033:0x7fe728d26aeb
[ 2399.356776] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48
[ 2399.358372] RSP: 002b:00007ffff954d6f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 2399.359066] RAX: ffffffffffffffda RBX: 00005609156cc760 RCX: 00007fe728d26aeb
[ 2399.359779] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005609156cc7c8
[ 2399.360535] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2399.361261] R10: 00007fe728dbeac0 R11: 0000000000000206 R12: 00007ffff954d950
[ 2399.361952] R13: 00005609156cc2a0 R14: 00007ffff954ee5f R15: 00005609156cc760
[ 2399.362688]  </TASK>

Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device")
Cc: stable@vger.kernel.org
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1736398991-764-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoxsk: Bring back busy polling support
Stanislav Fomichev [Thu, 9 Jan 2025 00:34:36 +0000 (16:34 -0800)]
xsk: Bring back busy polling support

Commit 86e25f40aa1e ("net: napi: Add napi_config") moved napi->napi_id
assignment to a later point in time (napi_hash_add_with_id). This breaks
__xdp_rxq_info_reg which copies napi_id at an earlier time and now
stores 0 napi_id. It also makes sk_mark_napi_id_once_xdp and
__sk_mark_napi_id_once useless because they now work against 0 napi_id.
Since sk_busy_loop requires valid napi_id to busy-poll on, there is no way
to busy-poll AF_XDP sockets anymore.

Bring back the ability to busy-poll on XSK by resolving socket's napi_id
at bind time. This relies on relatively recent netif_queue_set_napi,
but (assume) at this point most popular drivers should have been converted.
This also removes per-tx/rx cycles which used to check and/or set
the napi_id value.

Confirmed by running a busy-polling AF_XDP socket
(github.com/fomichev/xskrtt) on mlx5 and looking at BusyPollRxPackets
from /proc/net/netstat.

Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250109003436.2829560-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoeth: bnxt: always recalculate features after XDP clearing, fix null-deref
Jakub Kicinski [Thu, 9 Jan 2025 04:30:57 +0000 (20:30 -0800)]
eth: bnxt: always recalculate features after XDP clearing, fix null-deref

Recalculate features when XDP is detached.

Before:
  # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
  # ip li set dev eth0 xdp off
  # ethtool -k eth0 | grep gro
  rx-gro-hw: off [requested on]

After:
  # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
  # ip li set dev eth0 xdp off
  # ethtool -k eth0 | grep gro
  rx-gro-hw: on

The fact that HW-GRO doesn't get re-enabled automatically is just
a minor annoyance. The real issue is that the features will randomly
come back during another reconfiguration which just happens to invoke
netdev_update_features(). The driver doesn't handle reconfiguring
two things at a time very robustly.

Starting with commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") we only reconfigure the RSS hash table
if the "effective" number of Rx rings has changed. If HW-GRO is
enabled "effective" number of rings is 2x what user sees.
So if we are in the bad state, with HW-GRO re-enablement "pending"
after XDP off, and we lower the rings by / 2 - the HW-GRO rings
doing 2x and the ethtool -L doing / 2 may cancel each other out,
and the:

  if (old_rx_rings != bp->hw_resc.resv_rx_rings &&

condition in __bnxt_reserve_rings() will be false.
The RSS map won't get updated, and we'll crash with:

  BUG: kernel NULL pointer dereference, address: 0000000000000168
  RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0
    bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180
    __bnxt_setup_vnic_p5+0x58/0x110
    bnxt_init_nic+0xb72/0xf50
    __bnxt_open_nic+0x40d/0xab0
    bnxt_open_nic+0x2b/0x60
    ethtool_set_channels+0x18c/0x1d0

As we try to access a freed ring.

The issue is present since XDP support was added, really, but
prior to commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") it wasn't causing major issues.

Fixes: 1054aee82321 ("bnxt_en: Use NETIF_F_GRO_HW.")
Fixes: 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agobpf: Fix bpf_sk_select_reuseport() memory leak
Michal Luczaj [Fri, 10 Jan 2025 13:21:55 +0000 (14:21 +0100)]
bpf: Fix bpf_sk_select_reuseport() memory leak

As pointed out in the original comment, lookup in sockmap can return a TCP
ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF
set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb
does not imply a non-refcounted socket.

Drop sk's reference in both error paths.

unreferenced object 0xffff888101911800 (size 2048):
  comm "test_progs", pid 44109, jiffies 4297131437
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 9336483b):
    __kmalloc_noprof+0x3bf/0x560
    __reuseport_alloc+0x1d/0x40
    reuseport_alloc+0xca/0x150
    reuseport_attach_prog+0x87/0x140
    sk_reuseport_attach_bpf+0xc8/0x100
    sk_setsockopt+0x1181/0x1990
    do_sock_setsockopt+0x12b/0x160
    __sys_setsockopt+0x7b/0xc0
    __x64_sys_setsockopt+0x1b/0x30
    do_syscall_64+0x93/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 64d85290d79c ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-reuseport-memleak-v1-1-fa1ddab0adfe@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoipv4: route: fix drop reason being overridden in ip_route_input_slow
Antoine Tenart [Wed, 8 Jan 2025 16:57:15 +0000 (17:57 +0100)]
ipv4: route: fix drop reason being overridden in ip_route_input_slow

When jumping to 'martian_destination' a drop reason is always set but
that label falls-through the 'e_nobufs' one, overriding the value.

The behavior was introduced by the mentioned commit. The logic went
from,

goto martian_destination;
...
  martian_destination:
...
  e_inval:
err = -EINVAL;
goto out;
  e_nobufs:
err = -ENOBUFS;
goto out;

to,

reason = ...;
goto martian_destination;
...
  martian_destination:
...
  e_nobufs:
reason = SKB_DROP_REASON_NOMEM;
goto out;

A 'goto out' is clearly missing now after 'martian_destination' to avoid
overriding the drop reason.

Fixes: 5b92112acd8e ("net: ip: make ip_route_input_slow() return drop reasons")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250108165725.404564-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()
Sudheer Kumar Doredla [Wed, 8 Jan 2025 17:24:33 +0000 (22:54 +0530)]
net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()

CPSW ALE has 75-bit ALE entries stored across three 32-bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions support
ALE field entries spanning up to two words at the most.

The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as
expected when ALE field spanned across word1 and word2, but fails when
ALE field spanned across word2 and word3.

For example, while reading the ALE field spanned across word2 and word3
(i.e. bits 62 to 64), the word3 data shifted to an incorrect position
due to the index becoming zero while flipping.
The same issue occurred when setting an ALE entry.

This issue has not been seen in practice but will be an issue in the future
if the driver supports accessing ALE fields spanning word2 and word3

Fix the methods to handle getting/setting fields spanning up to two words.

Fixes: b685f1a58956 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()")
Signed-off-by: Sudheer Kumar Doredla <s-doredla@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 9 Jan 2025 20:40:58 +0000 (12:40 -0800)]
Merge tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, Bluetooth and WPAN.

  No outstanding fixes / investigations at this time.

  Current release - new code bugs:

   - eth: fbnic: revert HWMON support, it doesn't work at all and revert
     is similar size as the fixes

  Previous releases - regressions:

   - tcp: allow a connection when sk_max_ack_backlog is zero

   - tls: fix tls_sw_sendmsg error handling

  Previous releases - always broken:

   - netdev netlink family:
       - prevent accessing NAPI instances from another namespace
       - don't dump Tx and uninitialized NAPIs

   - net: sysctl: avoid using current->nsproxy, fix null-deref if task
     is exiting and stick to opener's netns

   - sched: sch_cake: add bounds checks to host bulk flow fairness
     counts

  Misc:

   - annual cleanup of inactive maintainers"

* tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
  sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
  sctp: sysctl: udp_port: avoid using current->nsproxy
  sctp: sysctl: auth_enable: avoid using current->nsproxy
  sctp: sysctl: rto_min/max: avoid using current->nsproxy
  sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
  mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
  mptcp: sysctl: sched: avoid using current->nsproxy
  mptcp: sysctl: avail sched: remove write access
  MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
  MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
  MAINTAINERS: remove Ying Xue from TIPC
  MAINTAINERS: remove Mark Lee from MediaTek Ethernet
  MAINTAINERS: mark stmmac ethernet as an Orphan
  MAINTAINERS: remove Andy Gospodarek from bonding
  MAINTAINERS: update maintainers for Microchip LAN78xx
  MAINTAINERS: mark Synopsys DW XPCS as Orphan
  net/mlx5: Fix variable not being completed when function returns
  rtase: Fix a check for error in rtase_alloc_msix()
  net: stmmac: dwmac-tegra: Read iommu stream id from device tree
  ...

9 months agoMerge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 9 Jan 2025 18:16:45 +0000 (10:16 -0800)]
Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes.

  Besides the one-liners in Btrfs there's fix to the io_uring and
  encoded read integration (added in this development cycle). The update
  to io_uring provides more space for the ongoing command that is then
  used in Btrfs to handle some cases.

   - io_uring and encoded read:
       - provide stable storage for io_uring command data
       - make a copy of encoded read ioctl call, reuse that in case the
         call would block and will be called again

   - properly initialize zlib context for hardware compression on s390

   - fix max extent size calculation on filesystems with non-zoned
     devices

   - fix crash in scrub on crafted image due to invalid extent tree"

* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
  btrfs: zoned: calculate max_extent_size properly on non-zoned setup
  btrfs: avoid NULL pointer dereference if no valid extent tree
  btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
  io_uring: add io_uring_cmd_get_async_data helper
  io_uring/cmd: add per-op data to struct io_uring_cmd_data
  io_uring/cmd: rename struct uring_cache to io_uring_cmd_data

9 months agoMerge tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 9 Jan 2025 16:54:49 +0000 (08:54 -0800)]
Merge tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix imbalance between flowtable BIND and UNBIND calls to configure
   hardware offload, this fixes a possible kmemleak.

2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible
   WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from
   init_netns.

* tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: conntrack: clamp maximum hashtable size to INT_MAX
  netfilter: nf_tables: imbalance in flowtable binding
====================

Link: https://patch.msgid.link/20250109123532.41768-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge branch 'net-sysctl-avoid-using-current-nsproxy'
Jakub Kicinski [Thu, 9 Jan 2025 16:53:37 +0000 (08:53 -0800)]
Merge branch 'net-sysctl-avoid-using-current-nsproxy'

Matthieu Baerts says:

====================
net: sysctl: avoid using current->nsproxy

As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns as it is usually done. This could cause
  unexpected issues when other operations are done on the wrong netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' or 'pernet' structure can be obtained from the table->data
using container_of().

Note that table->data could also be used directly in more places, but
that would increase the size of this fix to replace all accesses via
'net'. Probably best to avoid that for fixes.

Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
in MPTCP, SCTP and RDS. There are multiple patches doing almost the same
thing, but the reason is to ease the backports.

Patch 1 is not directly linked to this, but it is a small fix for MPTCP
available_schedulers sysctl knob to explicitly mark it as read-only.

Please note that this series does not address Al's comment [2]. In SCTP,
some sysctl knobs set other sysfs-exposed variables for the min/max: two
processes could then write two linked values at the same time, resulting
in new values being outside the new boundaries. It would be great if
SCTP developers can look at this problem.

Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Link: https://lore.kernel.org/20250105211158.GL1977892@ZenIV
====================

Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-0-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agords: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:37 +0000 (16:34 +0100)]
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).

Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:36 +0000 (16:34 +0100)]
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.

Fixes: d1e462a7a5f3 ("sctp: add probe_interval in sysctl and sock/asoc/transport")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-8-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosctp: sysctl: udp_port: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:35 +0000 (16:34 +0100)]
sctp: sysctl: udp_port: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.

Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-7-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosctp: sysctl: auth_enable: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:34 +0000 (16:34 +0100)]
sctp: sysctl: auth_enable: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.

Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-6-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosctp: sysctl: rto_min/max: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:33 +0000 (16:34 +0100)]
sctp: sysctl: rto_min/max: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.

Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:32 +0000 (16:34 +0100)]
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy

As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.

Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-4-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agomptcp: sysctl: blackhole timeout: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:31 +0000 (16:34 +0100)]
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy

As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'pernet' structure can be obtained from the table->data using
container_of().

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agomptcp: sysctl: sched: avoid using current->nsproxy
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:30 +0000 (16:34 +0100)]
mptcp: sysctl: sched: avoid using current->nsproxy

Using the 'net' structure via 'current' is not recommended for different
reasons.

First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.

Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
  CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
   __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
   __kernel_write+0xf6/0x140 fs/read_write.c:632
   do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
   acct_pin_kill+0x2d/0x100 kernel/acct.c:192
   pin_kill+0x194/0x7c0 fs/fs_pin.c:44
   mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
   cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
   task_work_run+0x14e/0x250 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xad8/0x2d70 kernel/exit.c:938
   do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
   get_signal+0x2576/0x2610 kernel/signal.c:3017
   arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
   do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fee3cb87a6a
  Code: Unable to access opcode bytes at 0x7fee3cb87a40.
  RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
  RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
  R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
  R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
   </TASK>
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  ----------------
  Code disassembly (best guess), 1 bytes skipped:
     0: 42 80 3c 38 00        cmpb   $0x0,(%rax,%r15,1)
     5: 0f 85 fe 02 00 00     jne    0x309
     b: 4d 8b a4 24 08 09 00  mov    0x908(%r12),%r12
    12: 00
    13: 48 b8 00 00 00 00 00  movabs $0xdffffc0000000000,%rax
    1a: fc ff df
    1d: 49 8d 7c 24 28        lea    0x28(%r12),%rdi
    22: 48 89 fa              mov    %rdi,%rdx
    25: 48 c1 ea 03           shr    $0x3,%rdx
  * 29: 80 3c 02 00           cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
    2d: 0f 85 cc 02 00 00     jne    0x2ff
    33: 4d 8b 7c 24 28        mov    0x28(%r12),%r15
    38: 48                    rex.W
    39: 8d                    .byte 0x8d
    3a: 84 24 c8              test   %ah,(%rax,%rcx,8)

Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.

Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agomptcp: sysctl: avail sched: remove write access
Matthieu Baerts (NGI0) [Wed, 8 Jan 2025 15:34:29 +0000 (16:34 +0100)]
mptcp: sysctl: avail sched: remove write access

'net.mptcp.available_schedulers' sysctl knob is there to list available
schedulers, not to modify this list.

There are then no reasons to give write access to it.

Nothing would have been written anyway, but no errors would have been
returned, which is unexpected.

Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-1-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge branch 'maintainers-spring-2025-cleanup-of-networking-maintainers'
Jakub Kicinski [Thu, 9 Jan 2025 16:30:04 +0000 (08:30 -0800)]
Merge branch 'maintainers-spring-2025-cleanup-of-networking-maintainers'

Jakub Kicinski says:

====================
MAINTAINERS: spring 2025 cleanup of networking maintainers

Annual cleanup of inactive maintainers. To identify inactive maintainers
we use Jon Corbet's maintainer analysis script from gitdm, and some manual
scanning of lore.

v1: https://lore.kernel.org/20250106165404.1832481-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250108155242.2575530-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
Jakub Kicinski [Wed, 8 Jan 2025 15:52:42 +0000 (07:52 -0800)]
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC

We have not seen emails or tags from Lars in almost 4 years.
Steen and Daniel are pretty active, but the review coverage
isn't stellar (35% of changes go in without a review tag).

Subsystem ARM/Microchip Sparx5 SoC support
  Changes 28 / 79 (35%)
  Last activity: 2024-11-24
  Lars Povlsen <lars.povlsen@microchip.com>:
  Steen Hegelund <Steen.Hegelund@microchip.com>:
    Tags 6c7c4b91aa43 2024-04-08 00:00:00 15
  Daniel Machon <daniel.machon@microchip.com>:
    Author 48ba00da2eb4 2024-04-09 00:00:00 2
    Tags f164b296638d 2024-11-24 00:00:00 6
  Top reviewers:
    [7]: horms@kernel.org
    [1]: jacob.e.keller@intel.com
    [1]: jensemil.schulzostergaard@microchip.com
    [1]: horatiu.vultur@microchip.com
  INACTIVE MAINTAINER Lars Povlsen <lars.povlsen@microchip.com>

Acked-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20250108155242.2575530-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
Jakub Kicinski [Wed, 8 Jan 2025 15:52:41 +0000 (07:52 -0800)]
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET

Noam Dagan was added to ENA reviewers in 2021, we have not seen
a single email from this person to any list, ever (according to lore).
Git history mentions the name in 2 SoB tags from 2020.

Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20250108155242.2575530-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: remove Ying Xue from TIPC
Jakub Kicinski [Wed, 8 Jan 2025 15:52:40 +0000 (07:52 -0800)]
MAINTAINERS: remove Ying Xue from TIPC

There is a steady stream of fixes for TIPC, even tho the development
has slowed down a lot. Over last 2 years we have merged almost 70
TIPC patches, but we haven't heard from Ying Xue once:

Subsystem TIPC NETWORK LAYER
  Changes 42 / 69 (60%)
  Last activity: 2023-10-04
  Jon Maloy <jmaloy@redhat.com>:
    Tags 08e50cf07184 2023-10-04 00:00:00 6
  Ying Xue <ying.xue@windriver.com>:
  Top reviewers:
    [9]: horms@kernel.org
    [8]: tung.q.nguyen@dektech.com.au
    [4]: jiri@nvidia.com
    [3]: tung.q.nguyen@endava.com
    [2]: kuniyu@amazon.com
  INACTIVE MAINTAINER Ying Xue <ying.xue@windriver.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: remove Mark Lee from MediaTek Ethernet
Jakub Kicinski [Wed, 8 Jan 2025 15:52:39 +0000 (07:52 -0800)]
MAINTAINERS: remove Mark Lee from MediaTek Ethernet

The mailing lists have seen no email from Mark Lee in the last 4 years.

gitdm missingmaints says:

Subsystem MEDIATEK ETHERNET DRIVER
  Changes 103 / 400 (25%)
  Last activity: 2024-12-19
  Felix Fietkau <nbd@nbd.name>:
    Author 88806efc034a 2024-10-17 00:00:00 44
    Tags 88806efc034a 2024-10-17 00:00:00 51
  Sean Wang <sean.wang@mediatek.com>:
    Tags a5d75538295b 2020-04-07 00:00:00 1
  Mark Lee <Mark-MC.Lee@mediatek.com>:
  Lorenzo Bianconi <lorenzo@kernel.org>:
    Author 0c7469ee718e 2024-12-19 00:00:00 123
    Tags 0c7469ee718e 2024-12-19 00:00:00 139
  Top reviewers:
    [32]: horms@kernel.org
    [15]: leonro@nvidia.com
    [9]: andrew@lunn.ch
  INACTIVE MAINTAINER Mark Lee <Mark-MC.Lee@mediatek.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: mark stmmac ethernet as an Orphan
Jakub Kicinski [Wed, 8 Jan 2025 15:52:38 +0000 (07:52 -0800)]
MAINTAINERS: mark stmmac ethernet as an Orphan

I tried a couple of things to reinvigorate the stmmac maintainers
over the last few years but with little effect. The maintainers
are not active, let the MAINTAINERS file reflect reality.
The Synopsys IP this driver supports is very popular we need
a solid maintainer to deal with the complexity of the driver.

gitdm missingmaints says:

Subsystem STMMAC ETHERNET DRIVER
  Changes 344 / 978 (35%)
  Last activity: 2020-05-01
  Alexandre Torgue <alexandre.torgue@foss.st.com>:
    Tags 1bb694e20839 2020-05-01 00:00:00 1
  Jose Abreu <joabreu@synopsys.com>:
  Top reviewers:
    [75]: horms@kernel.org
    [49]: andrew@lunn.ch
    [46]: fancer.lancer@gmail.com
  INACTIVE MAINTAINER Jose Abreu <joabreu@synopsys.com>

Acked-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Link: https://patch.msgid.link/20250108155242.2575530-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: remove Andy Gospodarek from bonding
Jakub Kicinski [Wed, 8 Jan 2025 15:52:37 +0000 (07:52 -0800)]
MAINTAINERS: remove Andy Gospodarek from bonding

Andy does not participate much in bonding reviews, unfortunately.
Move him to CREDITS.

gitdm missingmaint says:

Subsystem BONDING DRIVER
  Changes 149 / 336 (44%)
  Last activity: 2024-09-05
  Jay Vosburgh <jv@jvosburgh.net>:
    Tags 68db604e16d5 2024-09-05 00:00:00 8
  Andy Gospodarek <andy@greyhouse.net>:
  Top reviewers:
    [65]: jay.vosburgh@canonical.com
    [23]: liuhangbin@gmail.com
    [16]: razor@blackwall.org
  INACTIVE MAINTAINER Andy Gospodarek <andy@greyhouse.net>

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250108155242.2575530-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: update maintainers for Microchip LAN78xx
Jakub Kicinski [Wed, 8 Jan 2025 15:52:36 +0000 (07:52 -0800)]
MAINTAINERS: update maintainers for Microchip LAN78xx

Woojung Huh seems to have only replied to the list 35 times
in the last 5 years, and didn't provide any reviews in 3 years.
The LAN78XX driver has seen quite a bit of activity lately.

gitdm missingmaints says:

Subsystem USB LAN78XX ETHERNET DRIVER
  Changes 35 / 91 (38%)
  (No activity)
  Top reviewers:
    [23]: andrew@lunn.ch
    [3]: horms@kernel.org
    [2]: mateusz.polchlopek@intel.com
  INACTIVE MAINTAINER Woojung Huh <woojung.huh@microchip.com>

Move Woojung to CREDITS and add new maintainers who are more
likely to review LAN78xx patches.

Acked-by: Woojung Huh <woojung.huh@microchip.com>
Acked-by: Rengarajan Sundararajan <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250108155242.2575530-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMAINTAINERS: mark Synopsys DW XPCS as Orphan
Jakub Kicinski [Wed, 8 Jan 2025 15:52:35 +0000 (07:52 -0800)]
MAINTAINERS: mark Synopsys DW XPCS as Orphan

There's not much review support from Jose, there is a sharp
drop in his participation around 4 years ago.
The DW XPCS IP is very popular and the driver requires active
maintenance.

gitdm missingmaints says:

Subsystem SYNOPSYS DESIGNWARE ETHERNET XPCS DRIVER
  Changes 33 / 94 (35%)
  (No activity)
  Top reviewers:
    [16]: andrew@lunn.ch
    [12]: vladimir.oltean@nxp.com
    [2]: f.fainelli@gmail.com
  INACTIVE MAINTAINER Jose Abreu <Jose.Abreu@synopsys.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet/mlx5: Fix variable not being completed when function returns
Chenguang Zhao [Wed, 8 Jan 2025 03:00:09 +0000 (11:00 +0800)]
net/mlx5: Fix variable not being completed when function returns

When cmd_alloc_index(), fails cmd_work_handler() needs
to complete ent->slotted before returning early.
Otherwise the task which issued the command may hang:

   mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry
   INFO: task kworker/13:2:4055883 blocked for more than 120 seconds.
         Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   kworker/13:2    D    0 4055883      2 0x00000228
   Workqueue: events mlx5e_tx_dim_work [mlx5_core]
   Call trace:
      __switch_to+0xe8/0x150
      __schedule+0x2a8/0x9b8
      schedule+0x2c/0x88
      schedule_timeout+0x204/0x478
      wait_for_common+0x154/0x250
      wait_for_completion+0x28/0x38
      cmd_exec+0x7a0/0xa00 [mlx5_core]
      mlx5_cmd_exec+0x54/0x80 [mlx5_core]
      mlx5_core_modify_cq+0x6c/0x80 [mlx5_core]
      mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core]
      mlx5e_tx_dim_work+0x54/0x68 [mlx5_core]
      process_one_work+0x1b0/0x448
      worker_thread+0x54/0x468
      kthread+0x134/0x138
      ret_from_fork+0x10/0x18

Fixes: 485d65e13571 ("net/mlx5: Add a timeout to acquire the command queue semaphore")
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agortase: Fix a check for error in rtase_alloc_msix()
Dan Carpenter [Wed, 8 Jan 2025 09:15:53 +0000 (12:15 +0300)]
rtase: Fix a check for error in rtase_alloc_msix()

The pci_irq_vector() function never returns zero.  It returns negative
error codes or a positive non-zero IRQ number.  Fix the error checking to
test for negatives.

Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/f2ecc88d-af13-4651-9820-7cc665230019@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: stmmac: dwmac-tegra: Read iommu stream id from device tree
Parker Newman [Tue, 7 Jan 2025 21:24:59 +0000 (16:24 -0500)]
net: stmmac: dwmac-tegra: Read iommu stream id from device tree

Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be
written to the MGBE_WRAP_AXI_ASID0_CTRL register.

The current driver is hard coded to use MGBE0's SID for all controllers.
This causes softirq time outs and kernel panics when using controllers
other than MGBE0.

Example dmesg errors when an ethernet cable is connected to MGBE1:

[  116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[  121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms
[  121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter.
[  121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
[  121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171)
[  121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features
[  121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[  121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock
[  121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode
[  125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[  181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  181.921404] rcu:  7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337
[  181.921684] rcu:  (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8)
[  181.921878] Sending NMI from CPU 4 to CPUs 7:
[  181.921886] NMI backtrace for cpu 7
[  181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[  181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[  181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  181.922847] pc : handle_softirqs+0x98/0x368
[  181.922978] lr : __do_softirq+0x18/0x20
[  181.923095] sp : ffff80008003bf50
[  181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[  181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[  181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[  181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[  181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[  181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[  181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[  181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74
[  181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1
[  181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[  181.967591] Call trace:
[  181.970043]  handle_softirqs+0x98/0x368 (P)
[  181.974240]  __do_softirq+0x18/0x20
[  181.977743]  ____do_softirq+0x14/0x28
[  181.981415]  call_on_irq_stack+0x24/0x30
[  181.985180]  do_softirq_own_stack+0x20/0x30
[  181.989379]  __irq_exit_rcu+0x114/0x140
[  181.993142]  irq_exit_rcu+0x14/0x28
[  181.996816]  el1_interrupt+0x44/0xb8
[  182.000316]  el1h_64_irq_handler+0x14/0x20
[  182.004343]  el1h_64_irq+0x80/0x88
[  182.007755]  cpuidle_enter_state+0xc4/0x4a8 (P)
[  182.012305]  cpuidle_enter+0x3c/0x58
[  182.015980]  cpuidle_idle_call+0x128/0x1c0
[  182.020005]  do_idle+0xe0/0xf0
[  182.023155]  cpu_startup_entry+0x3c/0x48
[  182.026917]  secondary_start_kernel+0xdc/0x120
[  182.031379]  __secondary_switched+0x74/0x78
[  212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/.
[  212.985935] rcu: blocking rcu_node structures (internal RCU debug):
[  212.992758] Sending NMI from CPU 0 to CPUs 7:
[  212.998539] NMI backtrace for cpu 7
[  213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[  213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[  213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  213.040528] pc : handle_softirqs+0x98/0x368
[  213.046563] lr : __do_softirq+0x18/0x20
[  213.051293] sp : ffff80008003bf50
[  213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[  213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[  213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[  213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[  213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[  213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[  213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[  213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608
[  213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04
[  213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[  213.162063] Call trace:
[  213.165494]  handle_softirqs+0x98/0x368 (P)
[  213.171256]  __do_softirq+0x18/0x20
[  213.177291]  ____do_softirq+0x14/0x28
[  213.182017]  call_on_irq_stack+0x24/0x30
[  213.186565]  do_softirq_own_stack+0x20/0x30
[  213.191815]  __irq_exit_rcu+0x114/0x140
[  213.196891]  irq_exit_rcu+0x14/0x28
[  213.202401]  el1_interrupt+0x44/0xb8
[  213.207741]  el1h_64_irq_handler+0x14/0x20
[  213.213519]  el1h_64_irq+0x80/0x88
[  213.217541]  cpuidle_enter_state+0xc4/0x4a8 (P)
[  213.224364]  cpuidle_enter+0x3c/0x58
[  213.228653]  cpuidle_idle_call+0x128/0x1c0
[  213.233993]  do_idle+0xe0/0xf0
[  213.237928]  cpu_startup_entry+0x3c/0x48
[  213.243791]  secondary_start_kernel+0xdc/0x120
[  213.249830]  __secondary_switched+0x74/0x78

This bug has existed since the dwmac-tegra driver was added in Dec 2022
(See Fixes tag below for commit hash).

The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit
only uses MGBE0 which is why the bug was not found previously. Connect Tech
has many products that use 2 (or more) MGBE controllers.

The solution is to read the controller's SID from the existing "iommus"
device tree property. The 2nd field of the "iommus" device tree property
is the controller's SID.

Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property:

smmu_niso0: iommu@12000000 {
        compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
...
}

/* MGBE1 */
ethernet@6900000 {
compatible = "nvidia,tegra234-mgbe";
...
iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
...
}

Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in
the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the
SID using the tegra_dev_iommu_get_stream_id() helper function found in
linux/iommu.h.

Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus"
property is removed from the device tree or the IOMMU is disabled.

While the Tegra234 SOC technically supports bypassing the IOMMU, it is not
supported by the current firmware, has not been tested and not recommended.
More detailed discussion with Thierry Reding from Nvidia linked below.

Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@connecttech.com
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/6fb97f32cf4accb4f7cf92846f6b60064ba0a3bd.1736284360.git.pnewman@connecttech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agosched: sch_cake: add bounds checks to host bulk flow fairness counts
Toke Høiland-Jørgensen [Tue, 7 Jan 2025 12:01:05 +0000 (13:01 +0100)]
sched: sch_cake: add bounds checks to host bulk flow fairness counts

Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.

To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).

As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.

Fixes: 546ea84d07e3 ("sched: sch_cake: fix bulk flow accounting logic for host fairness")
Reported-by: syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonetfilter: conntrack: clamp maximum hashtable size to INT_MAX
Pablo Neira Ayuso [Wed, 8 Jan 2025 21:56:33 +0000 (22:56 +0100)]
netfilter: conntrack: clamp maximum hashtable size to INT_MAX

Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:

  0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")

Note: hashtable resize is only possible from init_netns.

Fixes: 9cc1c73ad666 ("netfilter: conntrack: avoid integer overflow when resizing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 months agonetfilter: nf_tables: imbalance in flowtable binding
Pablo Neira Ayuso [Thu, 2 Jan 2025 12:01:13 +0000 (13:01 +0100)]
netfilter: nf_tables: imbalance in flowtable binding

All these cases cause imbalance between BIND and UNBIND calls:

- Delete an interface from a flowtable with multiple interfaces

- Add a (device to a) flowtable with --check flag

- Delete a netns containing a flowtable

- In an interactive nft session, create a table with owner flag and
  flowtable inside, then quit.

Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
remove late FLOW_BLOCK_UNBIND call when destroying flowtable.

Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: Phil Sutter <phil@nwl.cc>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 months agomctp i3c: fix MCTP I3C driver multi-thread issue
Leo Yang [Tue, 7 Jan 2025 03:15:30 +0000 (11:15 +0800)]
mctp i3c: fix MCTP I3C driver multi-thread issue

We found a timeout problem with the pldm command on our system.  The
reason is that the MCTP-I3C driver has a race condition when receiving
multiple-packet messages in multi-thread, resulting in a wrong packet
order problem.

We identified this problem by adding a debug message to the
mctp_i3c_read function.

According to the MCTP spec, a multiple-packet message must be composed
in sequence, and if there is a wrong sequence, the whole message will be
discarded and wait for the next SOM.
For example, SOM → Pkt Seq #2 → Pkt Seq #1 → Pkt Seq #3 → EOM.

Therefore, we try to solve this problem by adding a mutex to the
mctp_i3c_read function.  Before the modification, when a command
requesting a multiple-packet message response is sent consecutively, an
error usually occurs within 100 loops.  After the mutex, it can go
through 40000 loops without any error, and it seems to run well.

Fixes: c8755b29b58e ("mctp i3c: MCTP I3C driver")
Signed-off-by: Leo Yang <Leo-Yang@quantatw.com>
Link: https://patch.msgid.link/20250107031529.3296094-1-Leo-Yang@quantatw.com
[pabeni@redhat.com: dropped already answered question from changelog]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodt-bindings: net: pse-pd: Fix unusual character in documentation
Kory Maincent [Tue, 7 Jan 2025 14:26:59 +0000 (15:26 +0100)]
dt-bindings: net: pse-pd: Fix unusual character in documentation

The documentation contained an unusual character due to an issue in my
personal b4 setup. Fix the problem by providing the correct PSE Pinout
Alternatives table number description.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250107142659.425877-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 9 Jan 2025 03:33:26 +0000 (19:33 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-01-07 (ice, igc)

For ice:

Arkadiusz corrects mask value being used to determine DPLL phase range.

Przemyslaw corrects frequency value for E823 devices.

For igc:

En-Wei Wu adds a check and, early, return for failed register read.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: return early when failing to read EECD register
  ice: fix incorrect PHY settings for 100 GB/s
  ice: fix max values for dpll pin phase adjust
====================

Link: https://patch.msgid.link/20250107190150.1758577-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Thu, 9 Jan 2025 03:08:18 +0000 (19:08 -0800)]
Merge tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btmtk: Fix failed to send func ctrl for MediaTek devices.
 - hci_sync: Fix not setting Random Address when required
 - MGMT: Fix Add Device to responding before completing
 - btnxpuart: Fix driver sending truncated data

* tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
  Bluetooth: btnxpuart: Fix driver sending truncated data
  Bluetooth: MGMT: Fix Add Device to responding before completing
  Bluetooth: hci_sync: Fix not setting Random Address when required
====================

Link: https://patch.msgid.link/20250108162627.1623760-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Wed, 8 Jan 2025 19:55:20 +0000 (11:55 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four driver fixes in UFS, mostly to do with power management"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Power down the controller/device during system suspend for SM8550/SM8650 SoCs
  scsi: ufs: qcom: Allow passing platform specific OF data
  scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers
  scsi: ufs: qcom: Power off the PHY if it was already powered on in ufs_qcom_power_up_sequence()

9 months agoMerge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jakub Kicinski [Wed, 8 Jan 2025 18:33:16 +0000 (10:33 -0800)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

There's a series of bugfix that's been accepted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=d80a3091308491455b6501b1c4b68698c4a7cd24

However, The series is making the driver poke into IOMMU internals instead of
implementing appropriate IOMMU workarounds. After discussion, the series was reverted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=249cfa318fb1b77eb726c2ff4f74c9685f04e568

But only two patches are related to the IOMMU.
Other patches involve only the modification of the driver.
This series resends other patches.

v2*: https://lore.kernel.org/20241217010839.1742227-1-shaojijie@huawei.com
v2: https://lore.kernel.org/20241216132346.1197079-1-shaojijie@huawei.com
v1: https://lore.kernel.org/20241107133023.3813095-1-shaojijie@huawei.com
====================

Link: https://patch.msgid.link/20250106143642.539698-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: fix kernel crash when 1588 is sent on HIP08 devices
Jie Wang [Mon, 6 Jan 2025 14:36:42 +0000 (22:36 +0800)]
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices

Currently, HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL. But the tx process would still try to set hardware time
stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash.

[  128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
...
[  128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge]
[  128.292938] sp : ffff800059b93140
[  128.297200] x29: ffff800059b93140 x28: 0000000000003280
[  128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080
[  128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001
[  128.315969] x23: 0000000000000000 x22: 0000000000000194
[  128.322219] x21: ffff0cd94f986000 x20: 0000000000000000
[  128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000
[  128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24
[  128.340934] x15: 0000ffffd530a518 x14: 0000000000000000
[  128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368
[  128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02
[  128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0
[  128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000
[  128.372040] x5 : 0000000000000000 x4 : 000000000000ffff
[  128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294
[  128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080
[  128.390626] Call trace:
[  128.393964]  hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.399893]  hns3_nic_net_xmit+0x39c/0x4c4 [hns3]
[  128.405468]  xmit_one.constprop.0+0xc4/0x200
[  128.410600]  dev_hard_start_xmit+0x54/0xf0
[  128.415556]  sch_direct_xmit+0xe8/0x634
[  128.420246]  __dev_queue_xmit+0x224/0xc70
[  128.425101]  dev_queue_xmit+0x1c/0x40
[  128.429608]  ovs_vport_send+0xac/0x1a0 [openvswitch]
[  128.435409]  do_output+0x60/0x17c [openvswitch]
[  128.440770]  do_execute_actions+0x898/0x8c4 [openvswitch]
[  128.446993]  ovs_execute_actions+0x64/0xf0 [openvswitch]
[  128.453129]  ovs_dp_process_packet+0xa0/0x224 [openvswitch]
[  128.459530]  ovs_vport_receive+0x7c/0xfc [openvswitch]
[  128.465497]  internal_dev_xmit+0x34/0xb0 [openvswitch]
[  128.471460]  xmit_one.constprop.0+0xc4/0x200
[  128.476561]  dev_hard_start_xmit+0x54/0xf0
[  128.481489]  __dev_queue_xmit+0x968/0xc70
[  128.486330]  dev_queue_xmit+0x1c/0x40
[  128.490856]  ip_finish_output2+0x250/0x570
[  128.495810]  __ip_finish_output+0x170/0x1e0
[  128.500832]  ip_finish_output+0x3c/0xf0
[  128.505504]  ip_output+0xbc/0x160
[  128.509654]  ip_send_skb+0x58/0xd4
[  128.513892]  udp_send_skb+0x12c/0x354
[  128.518387]  udp_sendmsg+0x7a8/0x9c0
[  128.522793]  inet_sendmsg+0x4c/0x8c
[  128.527116]  __sock_sendmsg+0x48/0x80
[  128.531609]  __sys_sendto+0x124/0x164
[  128.536099]  __arm64_sys_sendto+0x30/0x5c
[  128.540935]  invoke_syscall+0x50/0x130
[  128.545508]  el0_svc_common.constprop.0+0x10c/0x124
[  128.551205]  do_el0_svc+0x34/0xdc
[  128.555347]  el0_svc+0x20/0x30
[  128.559227]  el0_sync_handler+0xb8/0xc0
[  128.563883]  el0_sync+0x160/0x180

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106143642.539698-8-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
Hao Lan [Mon, 6 Jan 2025 14:36:41 +0000 (22:36 +0800)]
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue

The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.

The error message:

Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
 hclge_fetch_pf_reg+0x138/0x250 [hclge]
 hclge_get_regs+0x84/0x1d0 [hclge]
 hns3_get_regs+0x2c/0x50 [hns3]
 ethtool_get_regs+0xf4/0x270
 dev_ethtool+0x674/0x8a0
 dev_ioctl+0x270/0x36c
 sock_do_ioctl+0x110/0x2a0
 sock_ioctl+0x2ac/0x530
 __arm64_sys_ioctl+0xa8/0x100
 invoke_syscall+0x4c/0x124
 el0_svc_common.constprop.0+0x140/0x15c
 do_el0_svc+0x30/0xd0
 el0_svc+0x1c/0x2c
 el0_sync_handler+0xb0/0xb4
 el0_sync+0x168/0x180

Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: initialize reset_timer before hclgevf_misc_irq_init()
Jian Shen [Mon, 6 Jan 2025 14:36:40 +0000 (22:36 +0800)]
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()

Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.

Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: don't auto enable misc vector
Jian Shen [Mon, 6 Jan 2025 14:36:39 +0000 (22:36 +0800)]
net: hns3: don't auto enable misc vector

Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:

[   16.324639] Call trace:
[   16.324641]  __queue_delayed_work+0xb8/0xe0
[   16.324643]  mod_delayed_work_on+0x78/0xd0
[   16.324655]  hclge_errhand_task_schedule+0x58/0x90 [hclge]
[   16.324662]  hclge_misc_irq_handle+0x168/0x240 [hclge]
[   16.324666]  __handle_irq_event_percpu+0x64/0x1e0
[   16.324667]  handle_irq_event+0x80/0x170
[   16.324670]  handle_fasteoi_edge_irq+0x110/0x2bc
[   16.324671]  __handle_domain_irq+0x84/0xfc
[   16.324673]  gic_handle_irq+0x88/0x2c0
[   16.324674]  el1_irq+0xb8/0x140
[   16.324677]  arch_cpu_idle+0x18/0x40
[   16.324679]  default_idle_call+0x5c/0x1bc
[   16.324682]  cpuidle_idle_call+0x18c/0x1c4
[   16.324684]  do_idle+0x174/0x17c
[   16.324685]  cpu_startup_entry+0x30/0x6c
[   16.324687]  secondary_start_kernel+0x1a4/0x280
[   16.324688] ---[ end trace 6aa0bff672a964aa ]---

So don't auto enable misc vector when request irq..

Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: Resolved the issue that the debugfs query result is inconsistent.
Hao Lan [Mon, 6 Jan 2025 14:36:38 +0000 (22:36 +0800)]
net: hns3: Resolved the issue that the debugfs query result is inconsistent.

This patch modifies the implementation of debugfs:

When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the
user process is called next time, save_buf is used to copy the cached
data to the user space. As a result, the queried data is stale.

To solve this problem, this patch implements .open() and .release() handler
for debugfs file_operations. moving allocation buffer and execution
of the cmd to the .open() handler and freeing in to the .release() handler.
Allocate separate buffer for each reader and associate the buffer
with the file pointer.
When different user read processes no longer share the buffer,
the stale data problem is fixed.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106143642.539698-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: fix missing features due to dev->features configuration too early
Hao Lan [Mon, 6 Jan 2025 14:36:37 +0000 (22:36 +0800)]
net: hns3: fix missing features due to dev->features configuration too early

Currently, the netdev->features is configured in hns3_nic_set_features.
As a result, __netdev_update_features considers that there is no feature
difference, and the procedures of the real features are missing.

Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106143642.539698-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonet: hns3: fixed reset failure issues caused by the incorrect reset type
Hao Lan [Mon, 6 Jan 2025 14:36:36 +0000 (22:36 +0800)]
net: hns3: fixed reset failure issues caused by the incorrect reset type

When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.

The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized

Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
default_reset_request = 0x20,
reset_level = HNAE3_NONE_RESET,
reset_pending = 0x100,
reset_type = HNAE3_NONE_RESET,
...
};

Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agotcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset
Daniel Borkmann [Tue, 7 Jan 2025 10:14:39 +0000 (11:14 +0100)]
tcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset

This is a follow-up to 3c5b4d69c358 ("net: annotate data-races around
sk->sk_mark"). sk->sk_mark can be read and written without holding
the socket lock. IPv6 equivalent is already covered with READ_ONCE()
annotation in tcp_v6_send_response().

Fixes: 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/f459d1fc44f205e13f6d8bdca2c8bfb9902ffac9.1736244569.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agonetdev: prevent accessing NAPI instances from another namespace
Jakub Kicinski [Mon, 6 Jan 2025 18:01:36 +0000 (10:01 -0800)]
netdev: prevent accessing NAPI instances from another namespace

The NAPI IDs were not fully exposed to user space prior to the netlink
API, so they were never namespaced. The netlink API must ensure that
at the very least NAPI instance belongs to the same netns as the owner
of the genl sock.

napi_by_id() can become static now, but it needs to move because of
dev_get_by_napi_id().

Cc: stable@vger.kernel.org
Fixes: 1287c1ae0fc2 ("netdev-genl: Support setting per-NAPI config values")
Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250106180137.1861472-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Wed, 8 Jan 2025 18:12:01 +0000 (10:12 -0800)]
Merge tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mikulas Patocka:

 - dm-array fixes

 - dm-verity forward error correction fixes

 - remove the flag DM_TARGET_PASSES_INTEGRITY from dm-ebs

 - dm-thin RCU list fix

* tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin: make get_first_thin use rcu-safe list first function
  dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
  dm-verity FEC: Avoid copying RS parity bytes twice.
  dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)
  dm array: fix cursor index when skipping across block boundaries
  dm array: fix unreleased btree blocks on closing a faulty array cursor
  dm array: fix releasing a faulty array block twice in dm_array_cursor_end

9 months agoBluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Chris Lu [Wed, 8 Jan 2025 09:50:28 +0000 (17:50 +0800)]
Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.

Use usb_autopm_get_interface() and usb_autopm_put_interface()
in btmtk_usb_shutdown(), it could send func ctrl after enabling
autosuspend.

Bluetooth: btmtk_usb_hci_wmt_sync() hci0: Execution of wmt command
           timed out
Bluetooth: btmtk_usb_shutdown() hci0: Failed to send wmt func ctrl
           (-110)

Fixes: 5c5e8c52e3ca ("Bluetooth: btmtk: move btusb_mtk_[setup, shutdown] to btmtk.c")
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: btnxpuart: Fix driver sending truncated data
Neeraj Sanjay Kale [Fri, 20 Dec 2024 13:02:52 +0000 (18:32 +0530)]
Bluetooth: btnxpuart: Fix driver sending truncated data

This fixes the apparent controller hang issue seen during stress test
where the host sends a truncated payload, followed by HCI commands. The
controller treats these HCI commands as a part of previously truncated
payload, leading to command timeouts.

Adding a serdev_device_wait_until_sent() call after
serdev_device_write_buf() fixed the issue.

Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")
Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: MGMT: Fix Add Device to responding before completing
Luiz Augusto von Dentz [Mon, 25 Nov 2024 20:42:10 +0000 (15:42 -0500)]
Bluetooth: MGMT: Fix Add Device to responding before completing

Add Device with LE type requires updating resolving/accept list which
requires quite a number of commands to complete and each of them may
fail, so instead of pretending it would always work this checks the
return of hci_update_passive_scan_sync which indicates if everything
worked as intended.

Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agoBluetooth: hci_sync: Fix not setting Random Address when required
Luiz Augusto von Dentz [Mon, 25 Nov 2024 20:42:09 +0000 (15:42 -0500)]
Bluetooth: hci_sync: Fix not setting Random Address when required

This fixes errors such as the following when Own address type is set to
Random Address but it has not been programmed yet due to either be
advertising or connecting:

< HCI Command: LE Set Exte.. (0x08|0x0041) plen 13
        Own address type: Random (0x03)
        Filter policy: Ignore not in accept list (0x01)
        PHYs: 0x05
        Entry 0: LE 1M
          Type: Passive (0x00)
          Interval: 60.000 msec (0x0060)
          Window: 30.000 msec (0x0030)
        Entry 1: LE Coded
          Type: Passive (0x00)
          Interval: 180.000 msec (0x0120)
          Window: 90.000 msec (0x0090)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Exten.. (0x08|0x0042) plen 6
        Extended scan: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
        Duration: 0 msec (0x0000)
        Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
        Status: Invalid HCI Command Parameters (0x12)

Fixes: c45074d68a9b ("Bluetooth: Fix not generating RPA when required")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
9 months agodm thin: make get_first_thin use rcu-safe list first function
Krister Johansen [Tue, 7 Jan 2025 23:24:58 +0000 (15:24 -0800)]
dm thin: make get_first_thin use rcu-safe list first function

The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() ->
list_first() sequence in RCU safe code.  This is because each of these
functions performs its own READ_ONCE() of the list head.  This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.

In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path.  This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself.  The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself.  When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.

The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.

Fortunately, the fix here is straight forward.  Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.

This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep")
Cc: stable@vger.kernel.org
Acked-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
9 months agodm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
Mikulas Patocka [Tue, 7 Jan 2025 16:47:01 +0000 (17:47 +0100)]
dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY

dm-ebs uses dm-bufio to process requests that are not aligned on logical
sector size. dm-bufio doesn't support passing integrity data (and it is
unclear how should it do it), so we shouldn't set the
DM_TARGET_PASSES_INTEGRITY flag.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: d3c7b35c20d6 ("dm: add emulated block size target")
9 months agoeth: gve: use appropriate helper to set xdp_features
Jakub Kicinski [Mon, 6 Jan 2025 18:02:10 +0000 (10:02 -0800)]
eth: gve: use appropriate helper to set xdp_features

Commit f85949f98206 ("xdp: add xdp_set_features_flag utility routine")
added routines to inform the core about XDP flag changes.
GVE support was added around the same time and missed using them.

GVE only changes the flags on error recover or resume.
Presumably the flags may change during resume if VM migrated.
User would not get the notification and upper devices would
not get a chance to recalculate their flags.

Fixes: 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
Reviewed-By: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250106180210.1861784-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoipvlan: Fix use-after-free in ipvlan_get_iflink().
Kuniyuki Iwashima [Mon, 6 Jan 2025 07:19:11 +0000 (16:19 +0900)]
ipvlan: Fix use-after-free in ipvlan_get_iflink().

syzbot presented an use-after-free report [0] regarding ipvlan and
linkwatch.

ipvlan does not hold a refcnt of the lower device unlike vlan and
macvlan.

If the linkwatch work is triggered for the ipvlan dev, the lower dev
might have already been freed, resulting in UAF of ipvlan->phy_dev in
ipvlan_get_iflink().

We can delay the lower dev unregistration like vlan and macvlan by
holding the lower dev's refcnt in dev->netdev_ops->ndo_init() and
releasing it in dev->priv_destructor().

Jakub pointed out calling .ndo_XXX after unregister_netdevice() has
returned is error prone and suggested [1] addressing this UAF in the
core by taking commit 750e51603395 ("net: avoid potential UAF in
default_operstate()") further.

Let's assume unregistering devices DOWN and use RCU protection in
default_operstate() not to race with the device unregistration.

[0]:
BUG: KASAN: slab-use-after-free in ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
Read of size 4 at addr ffff0000d768c0e0 by task kworker/u8:35/6944

CPU: 0 UID: 0 PID: 6944 Comm: kworker/u8:35 Not tainted 6.13.0-rc2-g9bc5c9515b48 #12 4c3cb9e8b4565456f6a355f312ff91f4f29b3c47
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound linkwatch_event
Call trace:
 show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:484 (C)
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x16c/0x6f0 mm/kasan/report.c:489
 kasan_report+0xc0/0x120 mm/kasan/report.c:602
 __asan_report_load4_noabort+0x20/0x30 mm/kasan/report_generic.c:380
 ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
 dev_get_iflink+0x7c/0xd8 net/core/dev.c:674
 default_operstate net/core/link_watch.c:45 [inline]
 rfc2863_policy+0x144/0x360 net/core/link_watch.c:72
 linkwatch_do_dev+0x60/0x228 net/core/link_watch.c:175
 __linkwatch_run_queue+0x2f4/0x5b8 net/core/link_watch.c:239
 linkwatch_event+0x64/0xa8 net/core/link_watch.c:282
 process_one_work+0x700/0x1398 kernel/workqueue.c:3229
 process_scheduled_works kernel/workqueue.c:3310 [inline]
 worker_thread+0x8c4/0xe10 kernel/workqueue.c:3391
 kthread+0x2b0/0x360 kernel/kthread.c:389
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862

Allocated by task 9303:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __do_kmalloc_node mm/slub.c:4283 [inline]
 __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4289
 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:650
 alloc_netdev_mqs+0xb4/0x1118 net/core/dev.c:11209
 rtnl_create_link+0x2b8/0xb60 net/core/rtnetlink.c:3595
 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3771
 __rtnl_newlink net/core/rtnetlink.c:3896 [inline]
 rtnl_newlink+0x122c/0x15c0 net/core/rtnetlink.c:4011
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 __sys_sendto+0x2ec/0x438 net/socket.c:2197
 __do_sys_sendto net/socket.c:2204 [inline]
 __se_sys_sendto net/socket.c:2200 [inline]
 __arm64_sys_sendto+0xe4/0x110 net/socket.c:2200
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

Freed by task 10200:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2338 [inline]
 slab_free mm/slub.c:4598 [inline]
 kfree+0x140/0x420 mm/slub.c:4746
 kvfree+0x4c/0x68 mm/util.c:693
 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2034
 device_release+0x98/0x1c0
 kobject_cleanup lib/kobject.c:689 [inline]
 kobject_release lib/kobject.c:720 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x2b0/0x438 lib/kobject.c:737
 netdev_run_todo+0xdd8/0xf48 net/core/dev.c:10924
 rtnl_unlock net/core/rtnetlink.c:152 [inline]
 rtnl_net_unlock net/core/rtnetlink.c:209 [inline]
 rtnl_dellink+0x484/0x680 net/core/rtnetlink.c:3526
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 ____sys_sendmsg+0x410/0x708 net/socket.c:2583
 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2637
 __sys_sendmsg net/socket.c:2669 [inline]
 __do_sys_sendmsg net/socket.c:2674 [inline]
 __se_sys_sendmsg net/socket.c:2672 [inline]
 __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2672
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

The buggy address belongs to the object at ffff0000d768c000
 which belongs to the cache kmalloc-cg-4k of size 4096
The buggy address is located 224 bytes inside of
 freed 4096-byte region [ffff0000d768c000ffff0000d768d000)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x117688
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff0000c77ef981
flags: 0xbfffe0000000040(head|node=0|zone=2|lastcpupid=0x1ffff)
page_type: f5(slab)
raw: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
head: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000003 fffffdffc35da201 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff0000d768bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff0000d768c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff0000d768c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff0000d768c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff0000d768c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 8c55facecd7a ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250102174400.085fd8ac@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106071911.64355-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agotls: Fix tls_sw_sendmsg error handling
Benjamin Coddington [Sat, 4 Jan 2025 15:29:45 +0000 (10:29 -0500)]
tls: Fix tls_sw_sendmsg error handling

We've noticed that NFS can hang when using RPC over TLS on an unstable
connection, and investigation shows that the RPC layer is stuck in a tight
loop attempting to transmit, but forever getting -EBADMSG back from the
underlying network.  The loop begins when tcp_sendmsg_locked() returns
-EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when
calling the socket's error reporting handler.

Instead of converting errors from tcp_sendmsg_locked(), let's pass them
along in this path.  The RPC layer handles -EPIPE by reconnecting the
transport, which prevents the endless attempts to transmit on a broken
connection.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Link: https://patch.msgid.link/9594185559881679d81f071b181a10eb07cd079f.1736004079.git.bcodding@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'selinux-pr-20250107' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 7 Jan 2025 22:49:48 +0000 (14:49 -0800)]
Merge tag 'selinux-pr-20250107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "A single SELinux patch to address a problem with a single domain using
  multiple xperm classes"

* tag 'selinux-pr-20250107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: match extended permissions to their base permissions

9 months agoeth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"
Su Hui [Mon, 6 Jan 2025 02:36:48 +0000 (10:36 +0800)]
eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"

There is a garbage value problem in fbnic_mac_get_sensor_asic(). 'fw_cmpl'
is uninitialized which makes 'sensor' and '*val' to be stored garbage
value. Revert commit d85ebade02e8 ("eth: fbnic: Add hardware monitoring
support via HWMON interface") to avoid this problem.

Fixes: d85ebade02e8 ("eth: fbnic: Add hardware monitoring support via HWMON interface")
Signed-off-by: Su Hui <suhui@nfschina.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106023647.47756-1-suhui@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoigc: return early when failing to read EECD register
En-Wei Wu [Wed, 18 Dec 2024 02:37:42 +0000 (10:37 +0800)]
igc: return early when failing to read EECD register

When booting with a dock connected, the igc driver may get stuck for ~40
seconds if PCIe link is lost during initialization.

This happens because the driver access device after EECD register reads
return all F's, indicating failed reads. Consequently, hw->hw_addr is set
to NULL, which impacts subsequent rd32() reads. This leads to the driver
hanging in igc_get_hw_semaphore_i225(), as the invalid hw->hw_addr
prevents retrieving the expected value.

To address this, a validation check and a corresponding return value
catch is added for the EECD register read result. If all F's are
returned, indicating PCIe link loss, the driver will return -ENXIO
immediately. This avoids the 40-second hang and significantly improves
boot time when using a dock with an igc NIC.

Log before the patch:
[    0.911913] igc 0000:70:00.0: enabling device (0000 -> 0002)
[    0.912386] igc 0000:70:00.0: PTM enabled, 4ns granularity
[    1.571098] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
[   43.449095] igc_get_hw_semaphore_i225: igc 0000:70:00.0 (unnamed net_device) (uninitialized): Driver can't access device - SMBI bit is set.
[   43.449186] igc 0000:70:00.0: probe with driver igc failed with error -13
[   46.345701] igc 0000:70:00.0: enabling device (0000 -> 0002)
[   46.345777] igc 0000:70:00.0: PTM enabled, 4ns granularity

Log after the patch:
[    1.031000] igc 0000:70:00.0: enabling device (0000 -> 0002)
[    1.032097] igc 0000:70:00.0: PTM enabled, 4ns granularity
[    1.642291] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
[    5.480490] igc 0000:70:00.0: enabling device (0000 -> 0002)
[    5.480516] igc 0000:70:00.0: PTM enabled, 4ns granularity

Fixes: ab4056126813 ("igc: Add NVM support")
Cc: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: fix incorrect PHY settings for 100 GB/s
Przemyslaw Korba [Wed, 4 Dec 2024 13:22:18 +0000 (14:22 +0100)]
ice: fix incorrect PHY settings for 100 GB/s

ptp4l application reports too high offset when ran on E823 device
with a 100GB/s link. Those values cannot go under 100ns, like in a
working case when using 100 GB/s cable.

This is due to incorrect frequency settings on the PHY clocks for
100 GB/s speed. Changes are introduced to align with the internal
hardware documentation, and correctly initialize frequency in PHY
clocks with the frequency values that are in our HW spec.

To reproduce the issue run ptp4l as a Time Receiver on E823 device,
and observe the offset, which will never approach values seen
in the PTP working case.

Reproduction output:
ptp4l -i enp137s0f3 -m -2 -s -f /etc/ptp4l_8275.conf
ptp4l[5278.775]: master offset      12470 s2 freq  +41288 path delay -3002
ptp4l[5278.837]: master offset      10525 s2 freq  +39202 path delay -3002
ptp4l[5278.900]: master offset     -24840 s2 freq  -20130 path delay -3002
ptp4l[5278.963]: master offset      10597 s2 freq  +37908 path delay -3002
ptp4l[5279.025]: master offset       8883 s2 freq  +36031 path delay -3002
ptp4l[5279.088]: master offset       7267 s2 freq  +34151 path delay -3002
ptp4l[5279.150]: master offset       5771 s2 freq  +32316 path delay -3002
ptp4l[5279.213]: master offset       4388 s2 freq  +30526 path delay -3002
ptp4l[5279.275]: master offset     -30434 s2 freq  -28485 path delay -3002
ptp4l[5279.338]: master offset     -28041 s2 freq  -27412 path delay -3002
ptp4l[5279.400]: master offset       7870 s2 freq  +31118 path delay -3002

Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Reviewed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agoice: fix max values for dpll pin phase adjust
Arkadiusz Kubalewski [Wed, 20 Nov 2024 07:51:12 +0000 (08:51 +0100)]
ice: fix max values for dpll pin phase adjust

Mask admin command returned max phase adjust value for both input and
output pins. Only 31 bits are relevant, last released data sheet wrongly
points that 32 bits are valid - see [1] 3.2.6.4.1 Get CCU Capabilities
Command for reference. Fix of the datasheet itself is in progress.

Fix the min/max assignment logic, previously the value was wrongly
considered as negative value due to most significant bit being set.

Example of previous broken behavior:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-get --json '{"id":1}'| grep phase-adjust
 'phase-adjust': 0,
 'phase-adjust-max': 16723,
 'phase-adjust-min': -16723,

Correct behavior with the fix:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-get --json '{"id":1}'| grep phase-adjust
 'phase-adjust': 0,
 'phase-adjust-max': 2147466925,
 'phase-adjust-min': -2147466925,

[1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true

Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 months agonet: don't dump Tx and uninitialized NAPIs
Jakub Kicinski [Fri, 3 Jan 2025 18:32:07 +0000 (10:32 -0800)]
net: don't dump Tx and uninitialized NAPIs

We use NAPI ID as the key for continuing dumps. We also depend
on the NAPIs being sorted by ID within the driver list. Tx NAPIs
(which don't have an ID assigned) break this expectation, it's
not currently possible to dump them reliably. Since Tx NAPIs
are relatively rare, and can't be used in doit (GET or SET)
hide them from the dump API as well.

Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250103183207.1216004-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agocxgb4: Avoid removal of uninserted tid
Anumula Murali Mohan Reddy [Fri, 3 Jan 2025 09:23:27 +0000 (14:53 +0530)]
cxgb4: Avoid removal of uninserted tid

During ARP failure, tid is not inserted but _c4iw_free_ep()
attempts to remove tid which results in error.
This patch fixes the issue by avoiding removal of uninserted tid.

Fixes: 59437d78f088 ("cxgb4/chtls: fix ULD connection failures due to wrong TID base")
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://patch.msgid.link/20250103092327.1011925-1-anumula@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge branch 'bnxt_en-2-bug-fixes'
Jakub Kicinski [Tue, 7 Jan 2025 00:40:28 +0000 (16:40 -0800)]
Merge branch 'bnxt_en-2-bug-fixes'

Michael Chan says:

====================
bnxt_en: 2 Bug fixes

The first patch fixes a potential memory leak when sending a FW
message for the RoCE driver.  The second patch fixes the potential
issue of DIM modifying the coalescing parameters of a ring that has
been freed.
====================

Link: https://patch.msgid.link/20250104043849.3482067-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agobnxt_en: Fix DIM shutdown
Michael Chan [Sat, 4 Jan 2025 04:38:48 +0000 (20:38 -0800)]
bnxt_en: Fix DIM shutdown

DIM work will call the firmware to adjust the coalescing parameters on
the RX rings.  We should cancel DIM work before we call the firmware
to free the RX rings.  Otherwise, FW will reject the call from DIM
work if the RX ring has been freed.  This will generate an error
message like this:

bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2

and cause unnecessary concern for the user.  It is also possible to
modify the coalescing parameters of the wrong ring if the ring has
been re-allocated.

To prevent this, cancel DIM work right before freeing the RX rings.
We also have to add a check in NAPI poll to not schedule DIM if the
RX rings are shutting down.  Check that the VNIC is active before we
schedule DIM.  The VNIC is always disabled before we free the RX rings.

Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agobnxt_en: Fix possible memory leak when hwrm_req_replace fails
Kalesh AP [Sat, 4 Jan 2025 04:38:47 +0000 (20:38 -0800)]
bnxt_en: Fix possible memory leak when hwrm_req_replace fails

When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop()
which could cause a memory leak.

Fixes: bbf33d1d9805 ("bnxt_en: update all firmware calls to use the new APIs")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agopds_core: limit loop over fw name list
Shannon Nelson [Fri, 3 Jan 2025 19:51:47 +0000 (11:51 -0800)]
pds_core: limit loop over fw name list

Add an array size limit to the for-loop to be sure we don't try
to reference a fw_version string off the end of the fw info names
array.  We know that our firmware only has a limited number
of firmware slot names, but we shouldn't leave this unchecked.

Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250103195147.7408-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 months agoMerge tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Mon, 6 Jan 2025 18:26:39 +0000 (10:26 -0800)]
Merge tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Relax assertions on failure to encode file handles

   The ->encode_fh() method can fail for various reasons. None of them
   warrant a WARN_ON().

 - Fix overlayfs file handle encoding by allowing encoding an fid from
   an inode without an alias

 - Make sure fuse_dir_open() handles FOPEN_KEEP_CACHE. If it's not
   specified fuse needs to invaludate the directory inode page cache

 - Fix qnx6 so it builds with gcc-15

 - Various fixes for netfslib and ceph and nfs filesystems:
     - Ignore silly rename files from afs and nfs when building header
       archives
     - Fix read result collection in netfslib with multiple subrequests
     - Handle ENOMEM for netfslib buffered reads
     - Fix oops in nfs_netfs_init_request()
     - Parse the secctx command immediately in cachefiles
     - Remove a redundant smp_rmb() in netfslib
     - Handle recursion in read retry in netfslib
     - Fix clearing of folio_queue
     - Fix missing cancellation of copy-to_cache when the cache for a
       file is temporarly disabled in netfslib

 - Sanity check the hfs root record

 - Fix zero padding data issues in concurrent write scenarios

 - Fix is_mnt_ns_file() after converting nsfs to path_from_stashed()

 - Fix missing declaration of init_files

 - Increase I/O priority when writing revoke records in jbd2

 - Flush filesystem device before updating tail sequence in jbd2

* tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  ovl: support encoding fid from inode with no alias
  ovl: pass realinode to ovl_encode_real_fh() instead of realdentry
  fuse: respect FOPEN_KEEP_CACHE on opendir
  netfs: Fix is-caching check in read-retry
  netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
  netfs: Fix ceph copy to cache on write-begin
  netfs: Work around recursion by abandoning retry if nothing read
  netfs: Fix missing barriers by using clear_and_wake_up_bit()
  netfs: Remove redundant use of smp_rmb()
  cachefiles: Parse the "secctx" immediately
  nfs: Fix oops in nfs_netfs_init_request() when copying to cache
  netfs: Fix enomem handling in buffered reads
  netfs: Fix non-contiguous donation between completed reads
  kheaders: Ignore silly-rename files
  fs: relax assertions on failure to encode file handles
  fs: fix missing declaration of init_files
  fs: fix is_mnt_ns_file()
  iomap: fix zero padding data issue in concurrent append writes
  iomap: pass byte granular end position to iomap_add_to_ioend
  jbd2: flush filesystem device before updating tail sequence
  ...

9 months agobtrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
Mikhail Zaslonko [Wed, 18 Dec 2024 10:32:51 +0000 (11:32 +0100)]
btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path

Since the input data length passed to zlib_compress_folios() can be
arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may
cause read-in bytes to exceed the input range. Currently this triggers
an assert in btrfs_compress_folios() on the debug kernel (see below).
Fix strm.avail_in calculation for S390 hardware acceleration path.

  assertion failed: *total_in <= orig_len, in fs/btrfs/compression.c:1041
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/compression.c:1041!
  monitor event: 0040 ilc:2 [#1] PREEMPT SMP
  CPU: 16 UID: 0 PID: 325 Comm: kworker/u273:3 Not tainted 6.13.0-20241204.rc1.git6.fae3b21430ca.300.fc41.s390x+debug #1
  Hardware name: IBM 3931 A01 703 (z/VM 7.4.0)
  Workqueue: btrfs-delalloc btrfs_work_helper
  Krnl PSW : 0704d00180000000 0000021761df6538 (btrfs_compress_folios+0x198/0x1a0)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  Krnl GPRS: 0000000080000000 0000000000000001 0000000000000047 0000000000000000
             0000000000000006 ffffff01757bb000 000001976232fcc0 000000000000130c
             000001976232fcd0 000001976232fcc8 00000118ff4a0e30 0000000000000001
             00000111821ab400 0000011100000000 0000021761df6534 000001976232fb58
  Krnl Code: 0000021761df6528c020006f5ef4        larl    %r2,0000021762be2310
             0000021761df652ec0e5ffbd09d5        brasl   %r14,00000217615978d8
            #0000021761df6534af000000            mc      0,0
            >0000021761df6538: 0707                bcr     0,%r7
             0000021761df653a: 0707                bcr     0,%r7
             0000021761df653c: 0707                bcr     0,%r7
             0000021761df653e: 0707                bcr     0,%r7
             0000021761df6540c004004bb7ec        brcl    0,000002176276d518
  Call Trace:
   [<0000021761df6538>] btrfs_compress_folios+0x198/0x1a0
  ([<0000021761df6534>] btrfs_compress_folios+0x194/0x1a0)
   [<0000021761d97788>] compress_file_range+0x3b8/0x6d0
   [<0000021761dcee7c>] btrfs_work_helper+0x10c/0x160
   [<0000021761645760>] process_one_work+0x2b0/0x5d0
   [<000002176164637e>] worker_thread+0x20e/0x3e0
   [<000002176165221a>] kthread+0x15a/0x170
   [<00000217615b859c>] __ret_from_fork+0x3c/0x60
   [<00000217626e72d2>] ret_from_fork+0xa/0x38
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<0000021761597924>] _printk+0x4c/0x58
  Kernel panic - not syncing: Fatal exception: panic_on_oops

Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage compatible")
CC: stable@vger.kernel.org # 6.12+
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agobtrfs: zoned: calculate max_extent_size properly on non-zoned setup
Christoph Hellwig [Fri, 13 Dec 2024 06:43:43 +0000 (07:43 +0100)]
btrfs: zoned: calculate max_extent_size properly on non-zoned setup

Since commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors"),
queue_limits's max_zone_append_sectors is default to be 0 and it is only
updated when there is a zoned device. So, we have
lim->max_zone_append_sectors = 0 when there is no zoned device in the
filesystem.

That leads to fs_info->max_zone_append_size and thus
fs_info->max_extent_size to be 0, which is wrong and can for example
lead to a divide by zero in count_max_extents().

Fix this by only capping fs_info->max_extent_size to
fs_info->max_zone_append_size when it is non-zero.

Based on a patch from Naohiro Aota <naohiro.aota@wdc.com>, from which
much of this commit message is stolen as well.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 559218d43ec9 ("block: pre-calculate max_zone_append_sectors")
Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agobtrfs: avoid NULL pointer dereference if no valid extent tree
Qu Wenruo [Thu, 2 Jan 2025 04:14:16 +0000 (14:44 +1030)]
btrfs: avoid NULL pointer dereference if no valid extent tree

[BUG]
Syzbot reported a crash with the following call trace:

  BTRFS info (device loop0): scrub: started on devid 1
  BUG: kernel NULL pointer dereference, address: 0000000000000208
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 106e70067 P4D 106e70067 PUD 107143067 PMD 0
  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 UID: 0 PID: 689 Comm: repro Kdump: loaded Tainted: G           O       6.13.0-rc4-custom+ #206
  Tainted: [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:find_first_extent_item+0x26/0x1f0 [btrfs]
  Call Trace:
   <TASK>
   scrub_find_fill_first_stripe+0x13d/0x3b0 [btrfs]
   scrub_simple_mirror+0x175/0x260 [btrfs]
   scrub_stripe+0x5d4/0x6c0 [btrfs]
   scrub_chunk+0xbb/0x170 [btrfs]
   scrub_enumerate_chunks+0x2f4/0x5f0 [btrfs]
   btrfs_scrub_dev+0x240/0x600 [btrfs]
   btrfs_ioctl+0x1dc8/0x2fa0 [btrfs]
   ? do_sys_openat2+0xa5/0xf0
   __x64_sys_ioctl+0x97/0xc0
   do_syscall_64+0x4f/0x120
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

[CAUSE]
The reproducer is using a corrupted image where extent tree root is
corrupted, thus forcing to use "rescue=all,ro" mount option to mount the
image.

Then it triggered a scrub, but since scrub relies on extent tree to find
where the data/metadata extents are, scrub_find_fill_first_stripe()
relies on an non-empty extent root.

But unfortunately scrub_find_fill_first_stripe() doesn't really expect
an NULL pointer for extent root, it use extent_root to grab fs_info and
triggered a NULL pointer dereference.

[FIX]
Add an extra check for a valid extent root at the beginning of
scrub_find_fill_first_stripe().

The new error path is introduced by 42437a6386ff ("btrfs: introduce
mount option rescue=ignorebadroots"), but that's pretty old, and later
commit b979547513ff ("btrfs: scrub: introduce helper to find and fill
sector info for a scrub_stripe") changed how we do scrub.

So for kernels older than 6.6, the fix will need manual backport.

Reported-by: syzbot+339e9dbe3a2ca419b85d@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/67756935.050a0220.25abdd.0a12.GAE@google.com/
Fixes: 42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agoMerge tag 'vfio-v6.13-rc7' of https://github.com/awilliam/linux-vfio
Linus Torvalds [Mon, 6 Jan 2025 14:56:23 +0000 (06:56 -0800)]
Merge tag 'vfio-v6.13-rc7' of https://github.com/awilliam/linux-vfio

Pull vfio fix from Alex Williamson:

 - Fix a missed order alignment requirement of the pfn when inserting
   mappings through the new huge fault handler introduced in v6.12 (Alex
   Williamson)

* tag 'vfio-v6.13-rc7' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Fallback huge faults for unaligned pfn

9 months agoMerge patch series "Fix encoding overlayfs fid for fanotify delete events"
Christian Brauner [Mon, 6 Jan 2025 14:43:58 +0000 (15:43 +0100)]
Merge patch series "Fix encoding overlayfs fid for fanotify delete events"

Amir Goldstein <amir73il@gmail.com> says:

This is a followup fix to the reported regression [1] that was
introduced by overlayfs non-decodable file handles support in v6.6.

The first fix posted two weeks ago [2] was a quick band aid which is
justified on its own and is still queued on your vfs.fixes branch.

This followup fix fixes the root cause of overlayfs file handle encoding
failure and it also solves a bug with fanotify FAN_DELETE_SELF events on
overlayfs, that was discovered from analysis of the first report.

The fix to fanotify delete events was verified with a new LTP test [3].

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiie81voLZZi2zXS1BziXZCM24nXqPAxbu8kxXCUWdwOg@mail.gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20241219115301.465396-1-amir73il@gmail.com/
[3] https://github.com/amir73il/ltp/commits/ovl_encode_fid/

* patches from https://lore.kernel.org/r/20250105162404.357058-1-amir73il@gmail.com:
  ovl: support encoding fid from inode with no alias
  ovl: pass realinode to ovl_encode_real_fh() instead of realdentry

Link: https://lore.kernel.org/r/20250105162404.357058-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
9 months agoovl: support encoding fid from inode with no alias
Amir Goldstein [Sun, 5 Jan 2025 16:24:04 +0000 (17:24 +0100)]
ovl: support encoding fid from inode with no alias

Dmitry Safonov reported that a WARN_ON() assertion can be trigered by
userspace when calling inotify_show_fdinfo() for an overlayfs watched
inode, whose dentry aliases were discarded with drop_caches.

The WARN_ON() assertion in inotify_show_fdinfo() was removed, because
it is possible for encoding file handle to fail for other reason, but
the impact of failing to encode an overlayfs file handle goes beyond
this assertion.

As shown in the LTP test case mentioned in the link below, failure to
encode an overlayfs file handle from a non-aliased inode also leads to
failure to report an fid with FAN_DELETE_SELF fanotify events.

As Dmitry notes in his analyzis of the problem, ovl_encode_fh() fails
if it cannot find an alias for the inode, but this failure can be fixed.
ovl_encode_fh() seldom uses the alias and in the case of non-decodable
file handles, as is often the case with fanotify fid info,
ovl_encode_fh() never needs to use the alias to encode a file handle.

Defer finding an alias until it is actually needed so ovl_encode_fh()
will not fail in the common case of FAN_DELETE_SELF fanotify events.

Fixes: 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles")
Reported-by: Dmitry Safonov <dima@arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiie81voLZZi2zXS1BziXZCM24nXqPAxbu8kxXCUWdwOg@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250105162404.357058-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
9 months agoovl: pass realinode to ovl_encode_real_fh() instead of realdentry
Amir Goldstein [Sun, 5 Jan 2025 16:24:03 +0000 (17:24 +0100)]
ovl: pass realinode to ovl_encode_real_fh() instead of realdentry

We want to be able to encode an fid from an inode with no alias.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250105162404.357058-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
9 months agoMerge tag 'exfat-for-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linki...
Linus Torvalds [Mon, 6 Jan 2025 14:19:36 +0000 (06:19 -0800)]
Merge tag 'exfat-for-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:
 "All fixes are for issues reported by syzbot:

   - Fix wrong error return in exfat_find_empty_entry()

   - Fix a endless loop by self-linked chain

   - fix a KMSAN uninit-value issue in exfat_extend_valid_size()"

* tag 'exfat-for-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix the infinite loop in __exfat_free_cluster()
  exfat: fix the new buffer was not zeroed before writing
  exfat: fix the infinite loop in exfat_readdir()
  exfat: fix exfat_find_empty_entry() not returning error on failure

9 months agoRevert "vmstat: disable vmstat_work on vmstat_cpu_down_prep()"
Linus Torvalds [Mon, 6 Jan 2025 14:10:24 +0000 (06:10 -0800)]
Revert "vmstat: disable vmstat_work on vmstat_cpu_down_prep()"

This reverts commit adcfb264c3ed51fbbf5068ddf10d309a63683868.

It turns out this just causes a different warning splat instead that
seems to be much easier to trigger, so let's revert ASAP.

Reported-and-bisected-by: Borislav Petkov <bp@alien8.de>
Tested-by: Breno Leitao <leitao@debian.org>
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/all/20250106131817.GAZ3vYGVr3-hWFFPLj@fat_crate.local/
Cc: Koichiro Den <koichiro.den@canonical.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 months agobtrfs: don't read from userspace twice in btrfs_uring_encoded_read()
Mark Harmstone [Fri, 3 Jan 2025 15:02:26 +0000 (15:02 +0000)]
btrfs: don't read from userspace twice in btrfs_uring_encoded_read()

If we return -EAGAIN the first time because we need to block,
btrfs_uring_encoded_read() will get called twice. Take a copy of args,
the iovs, and the iter the first time, as by the time we are called the
second time these may have gone out of scope.

Reported-by: Jens Axboe <axboe@kernel.dk>
Fixes: 34310c442e17 ("btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)")
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agoio_uring: add io_uring_cmd_get_async_data helper
Mark Harmstone [Fri, 3 Jan 2025 15:02:25 +0000 (15:02 +0000)]
io_uring: add io_uring_cmd_get_async_data helper

Add a helper function in include/linux/io_uring/cmd.h to read the
async_data pointer from a struct io_uring_cmd.

Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agoio_uring/cmd: add per-op data to struct io_uring_cmd_data
Jens Axboe [Fri, 3 Jan 2025 15:02:24 +0000 (15:02 +0000)]
io_uring/cmd: add per-op data to struct io_uring_cmd_data

In case an op handler for ->uring_cmd() needs stable storage for user
data, it can allocate io_uring_cmd_data->op_data and use it for the
duration of the request. When the request gets cleaned up, uring_cmd
will free it automatically.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agoio_uring/cmd: rename struct uring_cache to io_uring_cmd_data
Jens Axboe [Fri, 3 Jan 2025 15:02:23 +0000 (15:02 +0000)]
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data

In preparation for making this more generically available for
->uring_cmd() usage that needs stable command data, rename it and move
it to io_uring/cmd.h instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David Sterba <dsterba@suse.com>
9 months agoLinux 6.13-rc6
Linus Torvalds [Sun, 5 Jan 2025 22:13:40 +0000 (14:13 -0800)]
Linux 6.13-rc6

9 months agoMerge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jan 2025 18:52:47 +0000 (10:52 -0800)]
Merge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix escaping of '$' in scripts/mksysmap

 - Fix a modpost crash observed with the latest binutils

 - Fix 'provides' in the linux-api-headers pacman package

* tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: pacman-pkg: provide versioned linux-api-headers package
  modpost: work around unaligned data access error
  modpost: refactor do_vmbus_entry()
  modpost: fix the missed iteration for the max bit in do_input()
  scripts/mksysmap: Fix escape chars '$'

9 months agoMerge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 5 Jan 2025 18:37:45 +0000 (10:37 -0800)]
Merge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "25 hotfixes.  16 are cc:stable.  18 are MM and 7 are non-MM.

  The usual bunch of singletons and two doubletons - please see the
  relevant changelogs for details"

* tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  MAINTAINERS: change Arınç _NAL's name and email address
  scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity
  mm/util: make memdup_user_nul() similar to memdup_user()
  mm, madvise: fix potential workingset node list_lru leaks
  mm/damon/core: fix ignored quota goals and filters of newly committed schemes
  mm/damon/core: fix new damon_target objects leaks on damon_commit_targets()
  mm/list_lru: fix false warning of negative counter
  vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  mm: shmem: fix the update of 'shmem_falloc->nr_unswapped'
  mm: shmem: fix incorrect index alignment for within_size policy
  percpu: remove intermediate variable in PERCPU_PTR()
  mm: zswap: fix race between [de]compression and CPU hotunplug
  ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv
  fs/proc/task_mmu: fix pagemap flags with PMD THP entries on 32bit
  kcov: mark in_softirq_really() as __always_inline
  docs: mm: fix the incorrect 'FileHugeMapped' field
  mailmap: modify the entry for Mathieu Othacehe
  mm/kmemleak: fix sleeping function called from invalid context at print message
  mm: hugetlb: independent PMD page table shared count
  maple_tree: reload mas before the second call for mas_empty_area
  ...

9 months agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Jan 2025 18:28:34 +0000 (10:28 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A randconfig build fix and a performance fix:

   - Fix the CONFIG_RESET_CONTROLLER=n path signature of
     clk_imx8mp_audiomix_reset_controller_register() to appease
     randconfig

   - Speed up the sdhci clk on TH1520 by a factor of 4 by adding
     a fixed factor clk"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: clk-imx8mp-audiomix: fix function signature
  clk: thead: Fix TH1520 emmc and shdci clock rate