]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
20 months agoaf_unix: Drop oob_skb ref before purging queue in GC.
Kuniyuki Iwashima [Mon, 19 Feb 2024 17:46:57 +0000 (09:46 -0800)]
af_unix: Drop oob_skb ref before purging queue in GC.

syzbot reported another task hung in __unix_gc().  [0]

The current while loop assumes that all of the left candidates
have oob_skb and calling kfree_skb(oob_skb) releases the remaining
candidates.

However, I missed a case that oob_skb has self-referencing fd and
another fd and the latter sk is placed before the former in the
candidate list.  Then, the while loop never proceeds, resulting
the task hung.

__unix_gc() has the same loop just before purging the collected skb,
so we can call kfree_skb(oob_skb) there and let __skb_queue_purge()
release all inflight sockets.

[0]:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 2784 Comm: kworker/u4:8 Not tainted 6.8.0-rc4-syzkaller-01028-g71b605d32017 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Workqueue: events_unbound __unix_gc
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:200
Code: 89 fb e8 23 00 00 00 48 8b 3d 84 f5 1a 0c 48 89 de 5b e9 43 26 57 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0d 90 52 70 7e 65 8b 15 91 52 70
RSP: 0018:ffffc9000a17fa78 EFLAGS: 00000287
RAX: ffffffff8a0a6108 RBX: ffff88802b6c2640 RCX: ffff88802c0b3b80
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffc9000a17fbf0 R08: ffffffff89383f1d R09: 1ffff1100ee5ff84
R10: dffffc0000000000 R11: ffffed100ee5ff85 R12: 1ffff110056d84ee
R13: ffffc9000a17fae0 R14: 0000000000000000 R15: ffffffff8f47b840
FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffef5687ff8 CR3: 0000000029b34000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 __unix_gc+0xe69/0xf40 net/unix/garbage.c:343
 process_one_work kernel/workqueue.c:2633 [inline]
 process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
 worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
 kthread+0x2ef/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
 </TASK>

Reported-and-tested-by: syzbot+ecab4d36f920c3574bf9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ecab4d36f920c3574bf9
Fixes: 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: ipa: don't overrun IPA suspend interrupt registers
Alex Elder [Mon, 19 Feb 2024 14:40:15 +0000 (08:40 -0600)]
net: ipa: don't overrun IPA suspend interrupt registers

In newer hardware, IPA supports more than 32 endpoints.  Some
registers--such as IPA interrupt registers--represent endpoints
as bits in a 4-byte register, and such registers are repeated as
needed to represent endpoints beyond the first 32.

In ipa_interrupt_suspend_clear_all(), we clear all pending IPA
suspend interrupts by reading all status register(s) and writing
corresponding registers to clear interrupt conditions.

Unfortunately the number of registers to read/write is calculated
incorrectly, and as a result we access *many* more registers than
intended.  This bug occurs only when the IPA hardware signals a
SUSPEND interrupt, which happens when a packet is received for an
endpoint (or its underlying GSI channel) that is suspended.  This
situation is difficult to reproduce, but possible.

Fix this by correctly computing the number of interrupt registers to
read and write.  This is the only place in the code where registers
that map endpoints or channels this way perform this calculation.

Fixes: f298ba785e2d ("net: ipa: add a parameter to suspend registers")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: implement lockless setsockopt(SO_PEEK_OFF)
Eric Dumazet [Mon, 19 Feb 2024 14:12:20 +0000 (14:12 +0000)]
net: implement lockless setsockopt(SO_PEEK_OFF)

syzbot reported a lockdep violation [1] involving af_unix
support of SO_PEEK_OFF.

Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
sk_peek_off field), there is really no point to enforce a pointless
thread safety in the kernel.

After this patch :

- setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.

- skb_consume_udp() no longer has to acquire the socket lock.

- af_unix no longer needs a special version of sk_set_peek_off(),
  because it does not lock u->iolock anymore.

As a followup, we could replace prot->set_peek_off to be a boolean
and avoid an indirect call, since we always use sk_set_peek_off().

[1]

WARNING: possible circular locking dependency detected
6.8.0-rc4-syzkaller-00267-g0f1dd5e91e2b #0 Not tainted

syz-executor.2/30025 is trying to acquire lock:
 ffff8880765e7d80 (&u->iolock){+.+.}-{3:3}, at: unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789

but task is already holding lock:
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_UNIX){+.+.}-{0:0}:
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        lock_sock_nested+0x48/0x100 net/core/sock.c:3524
        lock_sock include/net/sock.h:1691 [inline]
        __unix_dgram_recvmsg+0x1275/0x12c0 net/unix/af_unix.c:2415
        sock_recvmsg_nosec+0x18e/0x1d0 net/socket.c:1046
        ____sys_recvmsg+0x3c0/0x470 net/socket.c:2801
        ___sys_recvmsg net/socket.c:2845 [inline]
        do_recvmmsg+0x474/0xae0 net/socket.c:2939
        __sys_recvmmsg net/socket.c:3018 [inline]
        __do_sys_recvmmsg net/socket.c:3041 [inline]
        __se_sys_recvmmsg net/socket.c:3034 [inline]
        __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77

-> #0 (&u->iolock){+.+.}-{3:3}:
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __mutex_lock_common kernel/locking/mutex.c:608 [inline]
        __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
        unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
       sk_setsockopt+0x207e/0x3360
        do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
        __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_UNIX);
                               lock(&u->iolock);
                               lock(sk_lock-AF_UNIX);
  lock(&u->iolock);

 *** DEADLOCK ***

1 lock held by syz-executor.2/30025:
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193

stack backtrace:
CPU: 0 PID: 30025 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00267-g0f1dd5e91e2b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
  check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
  check_prev_add kernel/locking/lockdep.c:3134 [inline]
  check_prevs_add kernel/locking/lockdep.c:3253 [inline]
  validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
  __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
  __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
  unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
 sk_setsockopt+0x207e/0x3360
  do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
  __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
  __do_sys_setsockopt net/socket.c:2343 [inline]
  __se_sys_setsockopt net/socket.c:2340 [inline]
  __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f78a1c7dda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f78a0fde0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007f78a1dac050 RCX: 00007f78a1c7dda9
RDX: 000000000000002a RSI: 0000000000000001 RDI: 0000000000000006
RBP: 00007f78a1cca47a R08: 0000000000000004 R09: 0000000000000000
R10: 0000000020000180 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f78a1dac050 R15: 00007ffe5cd81ae8

Fixes: 859051dd165e ("bpf: Implement cgroup sockaddr hooks for unix sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daan De Meyer <daan.j.demeyer@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoocteontx2-af: Consider the action set by PF
Subbaraya Sundeep [Mon, 19 Feb 2024 12:55:14 +0000 (18:25 +0530)]
octeontx2-af: Consider the action set by PF

AF reserves MCAM entries for each PF, VF present in the
system and populates the entry with DMAC and action with
default RSS so that basic packet I/O works. Since PF/VF is
not aware of the RSS action installed by AF, AF only fixup
the actions of the rules installed by PF/VF with corresponding
default RSS action. This worked well for rules installed by
PF/VF for features like RX VLAN offload and DMAC filters but
rules involving action like drop/forward to queue are also
getting modified by AF. Hence fix it by setting the default
RSS action only if requested by PF/VF.

Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agodocs: netdev: update the link to the CI repo
Jakub Kicinski [Fri, 16 Feb 2024 16:19:45 +0000 (08:19 -0800)]
docs: netdev: update the link to the CI repo

Netronome graciously transferred the original NIPA repo
to our new netdev umbrella org. Link to that instead of
my private fork.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240216161945.2208842-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoarp: Prevent overflow in arp_req_get().
Kuniyuki Iwashima [Thu, 15 Feb 2024 23:05:16 +0000 (15:05 -0800)]
arp: Prevent overflow in arp_req_get().

syzkaller reported an overflown write in arp_req_get(). [0]

When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour
entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.

The arp_ha here is struct sockaddr, not struct sockaddr_storage, so
the sa_data buffer is just 14 bytes.

In the splat below, 2 bytes are overflown to the next int field,
arp_flags.  We initialise the field just after the memcpy(), so it's
not a problem.

However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN),
arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL)
in arp_ioctl() before calling arp_req_get().

To avoid the overflow, let's limit the max length of memcpy().

Note that commit b5f0de6df6dc ("net: dev: Convert sa_data to flexible
array in struct sockaddr") just silenced syzkaller.

[0]:
memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14)
WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
Modules linked in:
CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6
RSP: 0018:ffffc900050b7998 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001
RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000
R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010
FS:  00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261
 inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981
 sock_do_ioctl+0xdf/0x260 net/socket.c:1204
 sock_ioctl+0x3ef/0x650 net/socket.c:1321
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x64/0xce
RIP: 0033:0x7f172b262b8d
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d
RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003
RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000
 </TASK>

Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Bjoern Doebel <doebel@amazon.de>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agodevlink: fix possible use-after-free and memory leaks in devlink_init()
Vasiliy Kovalev [Thu, 15 Feb 2024 20:34:00 +0000 (23:34 +0300)]
devlink: fix possible use-after-free and memory leaks in devlink_init()

The pernet operations structure for the subsystem must be registered
before registering the generic netlink family.

Make an unregister in case of unsuccessful registration.

Fixes: 687125b5799c ("devlink: split out core code")
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoipv6: sr: fix possible use-after-free and null-ptr-deref
Vasiliy Kovalev [Thu, 15 Feb 2024 20:27:17 +0000 (23:27 +0300)]
ipv6: sr: fix possible use-after-free and null-ptr-deref

The pernet operations structure for the subsystem must be registered
before registering the generic netlink family.

Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoenic: Avoid false positive under FORTIFY_SOURCE
Kees Cook [Fri, 16 Feb 2024 23:30:05 +0000 (15:30 -0800)]
enic: Avoid false positive under FORTIFY_SOURCE

FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
code base has been converted to flexible arrays. In order to enforce
the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
destinations need to be handled. Unfortunately, struct vic_provinfo
resists full conversion, as it contains a flexible array of flexible
arrays, which is only possible with the 0-sized fake flexible array.

Use unsafe_memcpy() to avoid future false positives under
CONFIG_FORTIFY_SOURCE.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoionic: use pci_is_enabled not open code
Shannon Nelson [Fri, 16 Feb 2024 22:52:59 +0000 (14:52 -0800)]
ionic: use pci_is_enabled not open code

Since there is a utility available for this, use
the API rather than open code.

Fixes: 13943d6c8273 ("ionic: prevent pci disable of already disabled device")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: bonding: set active slave to primary eth1 specifically
Hangbin Liu [Thu, 15 Feb 2024 02:33:25 +0000 (10:33 +0800)]
selftests: bonding: set active slave to primary eth1 specifically

In bond priority testing, we set the primary interface to eth1 and add
eth0,1,2 to bond in serial. This is OK in normal times. But when in
debug kernel, the bridge port that eth0,1,2 connected would start
slowly (enter blocking, forwarding state), which caused the primary
interface down for a while after enslaving and active slave changed.
Here is a test log from Jakub's debug test[1].

 [  400.399070][   T50] br0: port 1(s0) entered disabled state
 [  400.400168][   T50] br0: port 4(s2) entered disabled state
 [  400.941504][ T2791] bond0: (slave eth0): making interface the new active one
 [  400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
 [  400.943633][ T2766] br0: port 1(s0) entered blocking state
 [  400.944119][ T2766] br0: port 1(s0) entered forwarding state
 [  401.128792][ T2792] bond0: (slave eth1): making interface the new active one
 [  401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
 [  401.131643][   T69] br0: port 2(s1) entered blocking state
 [  401.132067][   T69] br0: port 2(s1) entered forwarding state
 [  401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
 [  401.348414][   T50] br0: port 4(s2) entered blocking state
 [  401.348857][   T50] br0: port 4(s2) entered forwarding state
 [  401.519669][  T250] bond0: (slave eth0): link status definitely down, disabling slave
 [  401.526522][  T250] bond0: (slave eth1): link status definitely down, disabling slave
 [  401.526986][  T250] bond0: (slave eth2): making interface the new active one
 [  401.629470][  T250] bond0: (slave eth0): link status definitely up
 [  401.630089][  T250] bond0: (slave eth1): link status definitely up
 [...]
 # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
 # Current active slave is eth2 but not eth1

Fix it by setting active slave to primary slave specifically before
testing.

[1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout

Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'bcmasp-fixes'
David S. Miller [Sun, 18 Feb 2024 11:32:10 +0000 (11:32 +0000)]
Merge branch 'bcmasp-fixes'

Justin Chen says:

====================
net: bcmasp: bug fixes for bcmasp

Fix two bugs.

- Indicate that PM is managed by mac to prevent double pm calls. This
  doesn't lead to a crash, but waste a noticable amount of time
  suspending/resuming.

- Sanity check for OOB write was off by one. Leading to a false error
  when using the full array.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: bcmasp: Sanity check is off by one
Justin Chen [Thu, 15 Feb 2024 18:27:32 +0000 (10:27 -0800)]
net: bcmasp: Sanity check is off by one

A sanity check for OOB write is off by one leading to a false positive
when the array is full.

Fixes: 9b90aca97f6d ("net: ethernet: bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: bcmasp: Indicate MAC is in charge of PHY PM
Florian Fainelli [Thu, 15 Feb 2024 18:27:31 +0000 (10:27 -0800)]
net: bcmasp: Indicate MAC is in charge of PHY PM

Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The ASP driver
essentially does exactly what mdio_bus_phy_resume() does.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'mptcp-fixes'
David S. Miller [Sun, 18 Feb 2024 10:25:01 +0000 (10:25 +0000)]
Merge branch 'mptcp-fixes'

Matthieu Baerts says:

====================
mptcp: misc. fixes for v6.8

This series includes 4 types of fixes:

Patches 1 and 2 force the path-managers not to allocate a new address
entry when dealing with the "special" ID 0, reserved to the address of
the initial subflow. These patches can be backported up to v5.19 and
v5.12 respectively.

Patch 3 to 6 fix the in-kernel path-manager not to create duplicated
subflows. Patch 6 is the main fix, but patches 3 to 5 are some kind of
pre-requisities: they fix some data races that could also lead to the
creation of unexpected subflows. These patches can be backported up to
v5.7, v5.10, v6.0, and v5.15 respectively.

Note that patch 3 modifies the existing ULP API. No better solutions
have been found for -net, and there is some similar prior art, see
commit 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info"). Please
also note that TLS ULP Diag has likely the same issue.

Patches 7 to 9 fix issues in the selftests, when executing them on older
kernels, e.g. when testing the last version of these kselftests on the
v5.15.148 kernel as it is done by LKFT when validating stable kernels.
These patches only avoid printing expected errors the console and
marking some tests as "OK" while they have been skipped. Patches 7 and 8
can be backported up to v6.6.

Patches 10 to 13 make sure all MPTCP selftests subtests have a unique
name. It is important to have a unique (sub)test name in TAP, because
that's the test identifier. Some CI environments might drop tests with
duplicated names. Patches 10 to 12 can be backported up to v6.6.
====================

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: diag: unique 'cestab' subtest names
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:40 +0000 (19:25 +0100)]
selftests: mptcp: diag: unique 'cestab' subtest names

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Some 'cestab' subtests from the diag selftest had the same names, e.g.:

    ....chk 0 cestab

Now the previous value is taken, to have different names, e.g.:

    ....chk 2->0 cestab after flush

While at it, the 'after flush' info is added, similar to what is done
with the 'in use' subtests. Also inspired by these 'in use' subtests,
'many' is displayed instead of a large number:

    many msk socket present                           [  ok  ]
    ....chk many msk in use                           [  ok  ]
    ....chk many cestab                               [  ok  ]
    ....chk many->0 msk in use after flush            [  ok  ]
    ....chk many->0 cestab after flush                [  ok  ]

Fixes: 81ab772819da ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: diag: unique 'in use' subtest names
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:39 +0000 (19:25 +0100)]
selftests: mptcp: diag: unique 'in use' subtest names

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Some 'in use' subtests from the diag selftest had the same names, e.g.:

    chk 0 msk in use after flush

Now the previous value is taken, to have different names, e.g.:

    chk 2->0 msk in use after flush

While at it, avoid repeating the full message, declare it once in the
helper.

Fixes: ce9902573652 ("selftests: mptcp: diag: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: userspace_pm: unique subtest names
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:38 +0000 (19:25 +0100)]
selftests: mptcp: userspace_pm: unique subtest names

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated names.

Some subtests from the userspace_pm selftest had the same names. That's
because different subflows are created (and deleted) between the same
pair of IP addresses.

Simply adding the destination port in the name is then enough to have
different names, because the destination port is always different.

Note that adding such info takes a bit more space, so we need to
increase a bit the width to print the name, simply to keep all the
'[ OK ]' aligned as before.

Fixes: f589234e1af0 ("selftests: mptcp: userspace_pm: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: simult flows: fix some subtest names
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:37 +0000 (19:25 +0100)]
selftests: mptcp: simult flows: fix some subtest names

The selftest was correctly recording all the results, but the 'reverse
direction' part was missing in the name when needed.

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Fixes: 675d99338e7a ("selftests: mptcp: simult flows: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: diag: fix bash warnings on older kernels
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:36 +0000 (19:25 +0100)]
selftests: mptcp: diag: fix bash warnings on older kernels

Since the 'Fixes' commit mentioned below, the command that is executed
in __chk_nr() helper can return nothing if the feature is not supported.
This is the case when the MPTCP CURRESTAB counter is not supported.

To avoid this warning ...

  ./diag.sh: line 65: [: !=: unary operator expected

... we just need to surround '$nr' with double quotes, to support an
empty string when the feature is not supported.

Fixes: 81ab772819da ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: pm nl: avoid error msg on older kernels
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:35 +0000 (19:25 +0100)]
selftests: mptcp: pm nl: avoid error msg on older kernels

Since the 'Fixes' commit mentioned below, and if the kernel being tested
doesn't support the 'fullmesh' flag, this error will be printed:

  netlink error -22 (Invalid argument)
  ./pm_nl_ctl: bailing out due to netlink error[s]

But that can be normal if the kernel doesn't support the feature, no
need to print this worrying error message while everything else looks
OK. So we can mute stderr. Failures will still be detected if any.

Fixes: 1dc88d241f92 ("selftests: mptcp: pm_nl_ctl: always look for errors")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoselftests: mptcp: pm nl: also list skipped tests
Matthieu Baerts (NGI0) [Thu, 15 Feb 2024 18:25:34 +0000 (19:25 +0100)]
selftests: mptcp: pm nl: also list skipped tests

If the feature is not supported by older kernels, and instead of just
ignoring some tests, we should mark them as skipped, so we can still
track them.

Fixes: d85555ac11f9 ("selftests: mptcp: pm_netlink: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: fix duplicate subflow creation
Paolo Abeni [Thu, 15 Feb 2024 18:25:33 +0000 (19:25 +0100)]
mptcp: fix duplicate subflow creation

Fullmesh endpoints could end-up unexpectedly generating duplicate
subflows - same local and remote addresses - when multiple incoming
ADD_ADDR are processed before the PM creates the subflow for the local
endpoints.

Address the issue explicitly checking for duplicates at subflow
creation time.

To avoid a quadratic computational complexity, track the unavailable
remote address ids in a temporary bitmap and initialize such bitmap
with the remote ids of all the existing subflows matching the local
address currently processed.

The above allows additionally replacing the existing code checking
for duplicate entry in the current set with a simple bit test
operation.

Fixes: 2843ff6f36db ("mptcp: remote addresses fullmesh")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/435
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: fix data races on remote_id
Paolo Abeni [Thu, 15 Feb 2024 18:25:32 +0000 (19:25 +0100)]
mptcp: fix data races on remote_id

Similar to the previous patch, address the data race on
remote_id, adding the suitable ONCE annotations.

Fixes: bedee0b56113 ("mptcp: address lookup improvements")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: fix data races on local_id
Paolo Abeni [Thu, 15 Feb 2024 18:25:31 +0000 (19:25 +0100)]
mptcp: fix data races on local_id

The local address id is accessed lockless by the NL PM, add
all the required ONCE annotation. There is a caveat: the local
id can be initialized late in the subflow life-cycle, and its
validity is controlled by the local_id_valid flag.

Remove such flag and encode the validity in the local_id field
itself with negative value before initialization. That allows
accessing the field consistently with a single read operation.

Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: fix lockless access in subflow ULP diag
Paolo Abeni [Thu, 15 Feb 2024 18:25:30 +0000 (19:25 +0100)]
mptcp: fix lockless access in subflow ULP diag

Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.

We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.

Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: add needs_id for netlink appending addr
Geliang Tang [Thu, 15 Feb 2024 18:25:29 +0000 (19:25 +0100)]
mptcp: add needs_id for netlink appending addr

Just the same as userspace PM, a new parameter needs_id is added for
in-kernel PM mptcp_pm_nl_append_new_local_addr() too.

Add a new helper mptcp_pm_has_addr_attr_id() to check whether an address
ID is set from PM or not.

In mptcp_pm_nl_get_local_id(), needs_id is always true, but in
mptcp_pm_nl_add_addr_doit(), pass mptcp_pm_has_addr_attr_id() to
needs_it.

Fixes: efd5a4c04e18 ("mptcp: add the address ID assignment bitmap")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agomptcp: add needs_id for userspace appending addr
Geliang Tang [Thu, 15 Feb 2024 18:25:28 +0000 (19:25 +0100)]
mptcp: add needs_id for userspace appending addr

When userspace PM requires to create an ID 0 subflow in "userspace pm
create id 0 subflow" test like this:

        userspace_pm_add_sf $ns2 10.0.3.2 0

An ID 1 subflow, in fact, is created.

Since in mptcp_pm_nl_append_new_local_addr(), 'id 0' will be treated as
no ID is set by userspace, and will allocate a new ID immediately:

     if (!e->addr.id)
             e->addr.id = find_next_zero_bit(pernet->id_bitmap,
                                             MPTCP_PM_MAX_ADDR_ID + 1,
                                             1);

To solve this issue, a new parameter needs_id is added for
mptcp_userspace_pm_append_new_local_addr() to distinguish between
whether userspace PM has set an ID 0 or whether userspace PM has
not set any address.

needs_id is true in mptcp_userspace_pm_get_local_id(), but false in
mptcp_pm_nl_announce_doit() and mptcp_pm_nl_subflow_create_doit().

Fixes: e5ed101a6028 ("mptcp: userspace pm allow creating id 0 subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'inet-fix-NLM_F_DUMP_INTR-logic'
David S. Miller [Sun, 18 Feb 2024 10:22:27 +0000 (10:22 +0000)]
Merge branch 'inet-fix-NLM_F_DUMP_INTR-logic'

Eric Dumazet says:

====================
inet: fix NLM_F_DUMP_INTR logic

Make sure NLM_F_DUMP_INTR is generated if dev_base_seq and
dev_addr_genid are changed by the same amount.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
Eric Dumazet [Thu, 15 Feb 2024 17:21:07 +0000 (17:21 +0000)]
ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid

net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing.

If we XOR their values, we could miss to detect if both values
were changed with the same amount.

Fixes: 63998ac24f83 ("ipv6: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
Eric Dumazet [Thu, 15 Feb 2024 17:21:06 +0000 (17:21 +0000)]
ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid

net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.

If we XOR their values, we could miss to detect if both values
were changed with the same amount.

Fixes: 0465277f6b3f ("ipv4: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: stmmac: Fix incorrect dereference in interrupt handlers
Pavel Sakharov [Wed, 14 Feb 2024 09:27:17 +0000 (12:27 +0300)]
net: stmmac: Fix incorrect dereference in interrupt handlers

If 'dev' or 'data' is NULL, the 'priv' variable has an incorrect address
when dereferencing calling netdev_err().

Since we get as 'dev_id' or 'data' what was passed as the 'dev' argument
to request_irq() during interrupt initialization (that is, the net_device
and rx/tx queue pointers initialized at the time of the call) and since
there are usually no checks for the 'dev_id' argument in such handlers
in other drivers, remove these checks from the handlers in stmmac driver.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX")
Signed-off-by: Pavel Sakharov <p.sakharov@ispras.ru>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet/sched: act_mirred: don't override retval if we already lost the skb
Jakub Kicinski [Thu, 15 Feb 2024 14:33:46 +0000 (06:33 -0800)]
net/sched: act_mirred: don't override retval if we already lost the skb

If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.

Move the retval override to the error path which actually need it.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Fixes: e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet/sched: act_mirred: use the backlog for mirred ingress
Jakub Kicinski [Thu, 15 Feb 2024 14:33:45 +0000 (06:33 -0800)]
net/sched: act_mirred: use the backlog for mirred ingress

The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.

The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.

In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.

Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: ethernet: adi: requires PHYLIB support
Randy Dunlap [Thu, 15 Feb 2024 07:00:50 +0000 (23:00 -0800)]
net: ethernet: adi: requires PHYLIB support

This driver uses functions that are supplied by the Kconfig symbol
PHYLIB, so select it to ensure that they are built as needed.

When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
(linker) errors that are resolved by this Kconfig change:

   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
   drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
   drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
   ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
   drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
   include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
   drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
   drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
   drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
   drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
   ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
   drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
   ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
   ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'

Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/
Cc: Lennart Franzen <lennart@lfdomain.com>
Cc: Alexandru Tachici <alexandru.tachici@analog.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agodccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().
Kuniyuki Iwashima [Wed, 14 Feb 2024 19:13:08 +0000 (11:13 -0800)]
dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().

syzkaller reported a warning [0] in inet_csk_destroy_sock() with no
repro.

  WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash);

However, the syzkaller's log hinted that connect() failed just before
the warning due to FAULT_INJECTION.  [1]

When connect() is called for an unbound socket, we search for an
available ephemeral port.  If a bhash bucket exists for the port, we
call __inet_check_established() or __inet6_check_established() to check
if the bucket is reusable.

If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num.

Later, we look up the corresponding bhash2 bucket and try to allocate
it if it does not exist.

Although it rarely occurs in real use, if the allocation fails, we must
revert the changes by check_established().  Otherwise, an unconnected
socket could illegally occupy an ehash entry.

Note that we do not put tw back into ehash because sk might have
already responded to a packet for tw and it would be better to free
tw earlier under such memory presure.

[0]:
WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
Modules linked in:
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05
RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40
RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8
RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0
R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000
FS:  00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
 dccp_close (net/dccp/proto.c:1078)
 inet_release (net/ipv4/af_inet.c:434)
 __sock_release (net/socket.c:660)
 sock_close (net/socket.c:1423)
 __fput (fs/file_table.c:377)
 __fput_sync (fs/file_table.c:462)
 __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539)
 do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
RIP: 0033:0x7f03e53852bb
Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44
RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c
R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000
R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170
 </TASK>

[1]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f88f5 #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
 should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
 should_failslab (mm/slub.c:3748)
 kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867)
 inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135)
 __inet_hash_connect (net/ipv4/inet_hashtables.c:1100)
 dccp_v4_connect (net/dccp/ipv4.c:116)
 __inet_stream_connect (net/ipv4/af_inet.c:676)
 inet_stream_connect (net/ipv4/af_inet.c:747)
 __sys_connect_file (net/socket.c:2048 (discriminator 2))
 __sys_connect (net/socket.c:2065)
 __x64_sys_connect (net/socket.c:2072)
 do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
RIP: 0033:0x7f03e5284e5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d
RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000
 </TASK>

Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'bridge-mdb-events'
David S. Miller [Fri, 16 Feb 2024 09:36:38 +0000 (09:36 +0000)]
Merge branch 'bridge-mdb-events'

Tobias Waldekranz says:

====================
net: bridge: switchdev: Ensure MDB events are delivered exactly once

When a device is attached to a bridge, drivers will request a replay
of objects that were created before the device joined the bridge, that
are still of interest to the joining port. Typical examples include
FDB entries and MDB memberships on other ports ("foreign interfaces")
or on the bridge itself.

Conversely when a device is detached, the bridge will synthesize
deletion events for all those objects that are still live, but no
longer applicable to the device in question.

This series eliminates two races related to the synching and
unsynching phases of a bridge's MDB with a joining or leaving device,
that would cause notifications of such objects to be either delivered
twice (1/2), or not at all (2/2).

A similar race to the one solved by 1/2 still remains for the
FDB. This is much harder to solve, due to the lockless operation of
the FDB's rhashtable, and is therefore knowingly left out of this
series.

v1 -> v2:
- Squash the previously separate addition of
  switchdev_port_obj_act_is_deferred into first consumer.
- Use ether_addr_equal to compare MAC addresses.
- Document switchdev_port_obj_act_is_deferred (renamed from
  switchdev_port_obj_is_deferred in v1, to indicate that we also match
  on the action).
- Delay allocations of MDB objects until we know they're needed.
- Use non-RCU version of the hash list iterator, now that the MDB is
  not scanned while holding the RCU read lock.
- Add Fixes tag to commit message

v2 -> v3:
- Fix unlocking in error paths
- Access RCU protected port list via mlock_dereference, since MDB is
  guaranteed to remain constant for the duration of the scan.

v3 -> v4:
- Limit the search for exiting deferred events in 1/2 to only apply to
  additions, since the problem does not exist in the deletion case.
- Add 2/2, to plug a related race when unoffloading an indirectly
  associated device.

v4 -> v5:
- Fix grammatical errors in kerneldoc of
  switchdev_port_obj_act_is_deferred
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: bridge: switchdev: Ensure deferred event delivery on unoffload
Tobias Waldekranz [Wed, 14 Feb 2024 21:40:04 +0000 (22:40 +0100)]
net: bridge: switchdev: Ensure deferred event delivery on unoffload

When unoffloading a device, it is important to ensure that all
relevant deferred events are delivered to it before it disassociates
itself from the bridge.

Before this change, this was true for the normal case when a device
maps 1:1 to a net_bridge_port, i.e.

   br0
   /
swp0

When swp0 leaves br0, the call to switchdev_deferred_process() in
del_nbp() makes sure to process any outstanding events while the
device is still associated with the bridge.

In the case when the association is indirect though, i.e. when the
device is attached to the bridge via an intermediate device, like a
LAG...

    br0
    /
  lag0
  /
swp0

...then detaching swp0 from lag0 does not cause any net_bridge_port to
be deleted, so there was no guarantee that all events had been
processed before the device disassociated itself from the bridge.

Fix this by always synchronously processing all deferred events before
signaling completion of unoffloading back to the driver.

Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: bridge: switchdev: Skip MDB replays of deferred events on offload
Tobias Waldekranz [Wed, 14 Feb 2024 21:40:03 +0000 (22:40 +0100)]
net: bridge: switchdev: Skip MDB replays of deferred events on offload

Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.

While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.

The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.

This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.

To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.

For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:

    root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
    > ip link set dev x3 up master br0

And then destroy the bridge:

    root@infix-06-0b-00:~$ ip link del dev br0
    root@infix-06-0b-00:~$ mvls atu
    ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
    DEV:0 Marvell 88E6393X
    33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
    root@infix-06-0b-00:~$

The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.

Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:

    root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
    > ip link set dev x3 up master br1

All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).

Eliminate the race in two steps:

1. Grab the write-side lock of the MDB while generating the replay
   list.

This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:

2. Make sure that no deferred version of a replay event is already
   enqueued to the switchdev deferred queue, before adding it to the
   replay list, when replaying additions.

Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet/iucv: fix the allocation size of iucv_path_table array
Alexander Gordeev [Wed, 14 Feb 2024 16:32:40 +0000 (17:32 +0100)]
net/iucv: fix the allocation size of iucv_path_table array

iucv_path_table is a dynamically allocated array of pointers to
struct iucv_path items. Yet, its size is calculated as if it was
an array of struct iucv_path items.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 15 Feb 2024 19:39:27 +0000 (11:39 -0800)]
Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can, wireless and netfilter.

  Current release - regressions:

   - af_unix: fix task hung while purging oob_skb in GC

   - pds_core: do not try to run health-thread in VF path

  Current release - new code bugs:

   - sched: act_mirred: don't zero blockid when net device is being
     deleted

  Previous releases - regressions:

   - netfilter:
      - nat: restore default DNAT behavior
      - nf_tables: fix bidirectional offload, broken when unidirectional
        offload support was added

   - openvswitch: limit the number of recursions from action sets

   - eth: i40e: do not allow untrusted VF to remove administratively set
     MAC address

  Previous releases - always broken:

   - tls: fix races and bugs in use of async crypto

   - mptcp: prevent data races on some of the main socket fields, fix
     races in fastopen handling

   - dpll: fix possible deadlock during netlink dump operation

   - dsa: lan966x: fix crash when adding interface under a lag when some
     of the ports are disabled

   - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

  Misc:

   - a handful of fixes and reliability improvements for selftests

   - fix sysfs documentation missing net/ in paths

   - finish the work of squashing the missing MODULE_DESCRIPTION()
     warnings in networking"

* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  net: fill in MODULE_DESCRIPTION()s for missing arcnet
  net: fill in MODULE_DESCRIPTION()s for mdio_devres
  net: fill in MODULE_DESCRIPTION()s for ppp
  net: fill in MODULE_DESCRIPTION()s for fddik/skfp
  net: fill in MODULE_DESCRIPTION()s for plip
  net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
  net: fill in MODULE_DESCRIPTION()s for xen-netback
  net: ravb: Count packets instead of descriptors in GbEth RX path
  pppoe: Fix memory leak in pppoe_sendmsg()
  net: sctp: fix skb leak in sctp_inq_free()
  net: bcmasp: Handle RX buffer allocation failure
  net-timestamp: make sk_tskey more predictable in error path
  selftests: tls: increase the wait in poll_partial_rec_async
  ice: Add check for lport extraction to LAG init
  netfilter: nf_tables: fix bidirectional offload regression
  netfilter: nat: restore default DNAT behavior
  netfilter: nft_set_pipapo: fix missing : in kdoc
  igc: Remove temporary workaround
  igb: Fix string truncation warnings in igb_set_fw_version
  can: netlink: Fix TDCO calculation using the old data bittiming
  ...

20 months agoMerge tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 15 Feb 2024 19:33:35 +0000 (11:33 -0800)]
Merge tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Fixes and simple cleanups:

   - use a proper flexible array instead of a one-element array in order
     to avoid array-bounds sanitizer errors

   - add NULL pointer checks after allocating memory

   - use memdup_array_user() instead of open-coding it

   - fix a rare race condition in Xen event channel allocation code

   - make struct bus_type instances const

   - make kerneldoc inline comments match reality"

* tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: close evtchn after mapping cleanup
  xen/gntalloc: Replace UAPI 1-element array
  xen: balloon: make balloon_subsys const
  xen: pcpu: make xen_pcpu_subsys const
  xen/privcmd: Use memdup_array_user() in alloc_ioreq()
  x86/xen: Add some null pointer checking to smp.c
  xen/xenbus: document will_handle argument for xenbus_watch_path()

20 months agoupdate workarounds for gcc "asm goto" issue
Linus Torvalds [Thu, 15 Feb 2024 19:14:33 +0000 (11:14 -0800)]
update workarounds for gcc "asm goto" issue

In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with
outputs") I did the gcc workaround unconditionally, because the cause of
the bad code generation wasn't entirely clear.

In the meantime, Jakub Jelinek debugged the issue, and has come up with
a fix in gcc [2], which also got backported to the still maintained
branches of gcc-11, gcc-12 and gcc-13.

Note that while the fix technically wasn't in the original gcc-14
branch, Jakub says:

 "while it is true that no GCC 14 snapshots until today (or whenever the
  fix will be committed) have the fix, for GCC trunk it is up to the
  distros to use the latest snapshot if they use it at all and would
  allow better testing of the kernel code without the workaround, so
  that if there are other issues they won't be discovered years later.
  Most userland code doesn't actually use asm goto with outputs..."

so we will consider gcc-14 to be fixed - if somebody is using gcc
snapshots of the gcc-14 before the fix, they should upgrade.

Note that while the bug goes back to gcc-11, in practice other gcc
changes seem to have effectively hidden it since gcc-12.1 as per a
bisect by Jakub.  So even a gcc-14 snapshot without the fix likely
doesn't show actual problems.

Also, make the default 'asm_goto_output()' macro mark the asm as
volatile by hand, because of an unrelated gcc issue [1] where it doesn't
match the documented behavior ("asm goto is always volatile").

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Requested-by: Jakub Jelinek <jakub@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
20 months agoMerge tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 15 Feb 2024 18:19:55 +0000 (10:19 -0800)]
Merge tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Improve devlink dependency parsing for DT graphs

 - Fix devlink handling of io-channels dependencies

 - Fix PCI addressing in marvell,prestera example

 - A few schema fixes for property constraints

 - Improve performance of DT unprobed devices kselftest

 - Fix regression in DT_SCHEMA_FILES handling

 - Fix compile error in unittest for !OF_DYNAMIC

* tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
  of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
  of: property: Improve finding the supplier of a remote-endpoint property
  of: property: Improve finding the consumer of a remote-endpoint property
  net: marvell,prestera: Fix example PCI bus addressing
  of: unittest: Fix compile in the non-dynamic case
  of: property: fix typo in io-channels
  dt-bindings: tpm: Drop type from "resets"
  dt-bindings: display: nxp,tda998x: Fix 'audio-ports' constraints
  dt-bindings: xilinx: replace Piyush Mehta maintainership
  kselftest: dt: Stop relying on dirname to improve performance
  dt-bindings: don't anchor DT_SCHEMA_FILES to bindings directory

20 months agoMerge tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Thu, 15 Feb 2024 17:13:12 +0000 (09:13 -0800)]
Merge tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A smallish collection of fixes for SPI, all driver specific, plus one
  device ID addition for a new Intel part.

  The ppc4xx isn't routinely covered by most of the automated testing so
  there were some errors that were missed in some of the recent API
  conversions, otherwise there's nothing super remarkable here"

* tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi-mxs: Fix chipselect glitch
  spi: intel-pci: Add support for Lunar Lake-M SPI serial flash
  spi: omap2-mcspi: Revert FIFO support without DMA
  spi: ppc4xx: Drop write-only variable
  spi: ppc4xx: Fix fallout from rename in struct spi_bitbang
  spi: ppc4xx: Fix fallout from include cleanup
  spi: spi-ppc4xx: include missing platform_device.h
  spi: imx: fix the burst length at DMA mode and CPU mode

20 months agoMerge tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 15 Feb 2024 17:11:06 +0000 (09:11 -0800)]
Merge tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap test fixes from Mark Brown:
 "Guenter runs a lot of KUnit tests so noticed that there were a couple
  of the regmap tests, including the newly added noinc test, which could
  show spurious failures due to the use of randomly generated test
  values. These changes handle the randomly generated data properly"

* tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: kunit: Ensure that changed bytes are actually different
  regmap: kunit: fix raw noinc write test wrapping

20 months agoMerge tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 15 Feb 2024 17:08:19 +0000 (09:08 -0800)]
Merge tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - fix for 'MSC_SERIAL = 0' corner case handling in wacom driver (Jason
   Gerecke)

 - ACPI S3 suspend/resume fix for intel-ish-hid (Even Xu)

 - race condition fix preventing Wacom driver from losing events shortly
   after initialization (Jason Gerecke)

 - fix preventing certain Logitech HID++ devices from spamming kernel
   log (Oleksandr Natalenko)

* tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: wacom: generic: Avoid reporting a serial of '0' to userspace
  HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
  HID: multitouch: Add required quirk for Synaptics 0xcddc device
  HID: wacom: Do not register input devices until after hid_hw_start
  HID: logitech-hidpp: Do not flood kernel log

20 months agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Thu, 15 Feb 2024 16:06:50 +0000 (08:06 -0800)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-02-06 (igb, igc)

This series contains updates to igb and igc drivers.

Kunwu Chan adjusts firmware version string implementation to resolve
possible NULL pointer issue for igb.

Sasha removes workaround on igc.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Remove temporary workaround
  igb: Fix string truncation warnings in igb_set_fw_version
====================

Link: https://lore.kernel.org/r/20240214180347.3219650-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'fix-module_description-for-net-p6'
Jakub Kicinski [Thu, 15 Feb 2024 16:03:49 +0000 (08:03 -0800)]
Merge branch 'fix-module_description-for-net-p6'

Breno Leitao says:

====================
Fix MODULE_DESCRIPTION() for net (p6)

There are a few network modules left that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:

        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/....

This last patchset solves the problem for all the missing driver. It is
not expect to see any warning for the driver/net and net/ directory once
all these patches have landed.

v1: https://lore.kernel.org/all/20240213112122.404045-1-leitao@debian.org/
====================

Link: https://lore.kernel.org/r/20240214152741.670178-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for missing arcnet
Breno Leitao [Wed, 14 Feb 2024 15:27:41 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for missing arcnet

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the ARC modules.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240214152741.670178-8-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for mdio_devres
Breno Leitao [Wed, 14 Feb 2024 15:27:40 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for mdio_devres

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the PHY MDIO helpers.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240214152741.670178-7-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for ppp
Breno Leitao [Wed, 14 Feb 2024 15:27:39 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for ppp

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the PPP modules.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240214152741.670178-6-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for fddik/skfp
Breno Leitao [Wed, 14 Feb 2024 15:27:38 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for fddik/skfp

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the SysKonnect FDDI PCI module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240214152741.670178-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for plip
Breno Leitao [Wed, 14 Feb 2024 15:27:37 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for plip

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the PLIP (parallel port) network module

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240214152741.670178-4-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
Breno Leitao [Wed, 14 Feb 2024 15:27:36 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the IEEE 802.15.4 loopback driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240214152741.670178-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: fill in MODULE_DESCRIPTION()s for xen-netback
Breno Leitao [Wed, 14 Feb 2024 15:27:35 +0000 (07:27 -0800)]
net: fill in MODULE_DESCRIPTION()s for xen-netback

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Xen backend network module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240214152741.670178-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: ravb: Count packets instead of descriptors in GbEth RX path
Paul Barker [Wed, 14 Feb 2024 15:12:04 +0000 (15:12 +0000)]
net: ravb: Count packets instead of descriptors in GbEth RX path

The units of "work done" in the RX path should be packets instead of
descriptors, as large packets can be spread over multiple descriptors.

Fixes: 1c59eb678cbd ("ravb: Fillup ravb_rx_gbeth() stub")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20240214151204.2976-1-paul.barker.ct@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agopppoe: Fix memory leak in pppoe_sendmsg()
Gavrilov Ilia [Wed, 14 Feb 2024 09:01:50 +0000 (09:01 +0000)]
pppoe: Fix memory leak in pppoe_sendmsg()

syzbot reports a memory leak in pppoe_sendmsg [1].

The problem is in the pppoe_recvmsg() function that handles errors
in the wrong order. For the skb_recv_datagram() function, check
the pointer to skb for NULL first, and then check the 'error' variable,
because the skb_recv_datagram() function can set 'error'
to -EAGAIN in a loop but return a correct pointer to socket buffer
after a number of attempts, though 'error' remains set to -EAGAIN.

skb_recv_datagram
      __skb_recv_datagram          // Loop. if (err == -EAGAIN) then
                                   // go to the next loop iteration
          __skb_try_recv_datagram  // if (skb != NULL) then return 'skb'
                                   // else if a signal is received then
                                   // return -EAGAIN

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.

Link: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+6bdfd184eac7709e5cc9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20240214085814.3894917-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: sctp: fix skb leak in sctp_inq_free()
Dmitry Antipov [Wed, 14 Feb 2024 08:22:24 +0000 (11:22 +0300)]
net: sctp: fix skb leak in sctp_inq_free()

In case of GSO, 'chunk->skb' pointer may point to an entry from
fraglist created in 'sctp_packet_gso_append()'. To avoid freeing
random fraglist entry (and so undefined behavior and/or memory
leak), introduce 'sctp_inq_chunk_free()' helper to ensure that
'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head)
before calling 'sctp_chunk_free()', and use the aforementioned
helper in 'sctp_inq_pop()' as well.

Reported-by: syzbot+8bb053b5d63595ab47db@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=0d8351bbe54fd04a492c2daab0164138db008042
Fixes: 90017accff61 ("sctp: Add GSO support")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20240214082224.10168-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: bcmasp: Handle RX buffer allocation failure
Florian Fainelli [Tue, 13 Feb 2024 17:33:39 +0000 (09:33 -0800)]
net: bcmasp: Handle RX buffer allocation failure

The buffer_pg variable needs to hold an order-5 allocation (32 x
PAGE_SIZE) which, under memory pressure may fail to be allocated. Deal
with that error condition properly to avoid doing a NULL pointer
de-reference in the subsequent call to dma_map_page().

In addition, the err_reclaim_tx error label in bcmasp_netif_init() needs
to ensure that the TX NAPI object is properly deleted, otherwise
unregister_netdev() will spin forever attempting to test and clear
the NAPI_STATE_HASHED bit.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Link: https://lore.kernel.org/r/20240213173339.3438713-1-florian.fainelli@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 15 Feb 2024 11:48:56 +0000 (12:48 +0100)]
Merge tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains Netfilter fixes for net:

1) Missing : in kdoc field in nft_set_pipapo.

2) Restore default DNAT behavior When a DNAT rule is configured via
   iptables with different port ranges, from Kyle Swenson.

3) Restore flowtable hardware offload for bidirectional flows
   by setting NF_FLOW_HW_BIDIRECTIONAL flag, from Felix Fietkau.

netfilter pull request 24-02-15

* tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix bidirectional offload regression
  netfilter: nat: restore default DNAT behavior
  netfilter: nft_set_pipapo: fix missing : in kdoc
====================

Link: https://lore.kernel.org/r/20240214233818.7946-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Thu, 15 Feb 2024 11:31:22 +0000 (12:31 +0100)]
Merge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2024-02-14

this is a pull request of 3 patches for net/master.

the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.

Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).

Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).

* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: netlink: Fix TDCO calculation using the old data bittiming
  can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
  can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================

Link: https://lore.kernel.org/r/20240214140348.2412776-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet-timestamp: make sk_tskey more predictable in error path
Vadim Fedorenko [Tue, 13 Feb 2024 11:04:28 +0000 (03:04 -0800)]
net-timestamp: make sk_tskey more predictable in error path

When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.

Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams")
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240213110428.1681540-1-vadfed@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoselftests: tls: increase the wait in poll_partial_rec_async
Jakub Kicinski [Tue, 13 Feb 2024 14:20:55 +0000 (06:20 -0800)]
selftests: tls: increase the wait in poll_partial_rec_async

Test runners on debug kernels occasionally fail with:

 # #  RUN           tls_err.13_aes_gcm.poll_partial_rec_async ...
 # # tls.c:1883:poll_partial_rec_async:Expected poll(&pfd, 1, 5) (0) == 1 (1)
 # # tls.c:1870:poll_partial_rec_async:Expected status (256) == 0 (0)
 # # poll_partial_rec_async: Test failed at step #17
 # #          FAIL  tls_err.13_aes_gcm.poll_partial_rec_async
 # not ok 699 tls_err.13_aes_gcm.poll_partial_rec_async
 # # FAILED: 698 / 699 tests passed.

This points to the second poll() in the test which is expected
to wait for the sender to send the rest of the data.
Apparently under some conditions that doesn't happen within 5ms,
bump the timeout to 20ms.

Fixes: 23fcb62bc19c ("selftests: tls: add tests for poll behavior")
Link: https://lore.kernel.org/r/20240213142055.395564-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoice: Add check for lport extraction to LAG init
Dave Ertman [Tue, 13 Feb 2024 18:39:55 +0000 (10:39 -0800)]
ice: Add check for lport extraction to LAG init

To fully support initializing the LAG support code, a DDP package that
extracts the logical port from the metadata is required.  If such a
package is not present, there could be difficulties in supporting some
bond types.

Add a check into the initialization flow that will bypass the new paths
if any of the support pieces are missing.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240213183957.1483857-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 15 Feb 2024 01:32:37 +0000 (17:32 -0800)]
Merge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)

iwlwifi:
 - correct A3 in A-MSDUs
 - fix crash when operating as AP and running out of station
   slots to use
 - clear link ID to correct some later checks against it
 - fix error codes in SAR table loading
 - fix error path in PPAG table read

mac80211:
 - reload a pointer after SKB may have changed
   (only in certain monitor inject mode scenarios)

* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: fix a crash when we run out of stations
  wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
  wifi: iwlwifi: Fix some error codes
  wifi: iwlwifi: clear link_id in time_event
  wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
  wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================

Link: https://lore.kernel.org/r/20240214184326.132813-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Thu, 15 Feb 2024 00:06:31 +0000 (16:06 -0800)]
Merge tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - Fix for broken ipv6 checksums

 - Fix handling of exceptions in delay slots

* tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mm/memory: Use exception ip to search exception tables
  MIPS: Clear Cause.BD in instruction_pointer_set
  ptrace: Introduce exception_ip arch hook
  MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler

20 months agoMerge tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic...
Linus Torvalds [Thu, 15 Feb 2024 00:02:36 +0000 (16:02 -0800)]
Merge tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock test fixes from Mickaël Salaün:
 "Fix build issues for tests, and improve test compatibility"

* tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Fix capability for net_test
  selftests/landlock: Fix fs_test build with old libc
  selftests/landlock: Fix net_test build with old libc

20 months agoMerge tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 14 Feb 2024 23:47:02 +0000 (15:47 -0800)]
Merge tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few regular fixes and one fix for space reservation regression since
  6.7 that users have been reporting:

   - fix over-reservation of metadata chunks due to not keeping proper
     balance between global block reserve and delayed refs reserve; in
     practice this leaves behind empty metadata block groups, the
     workaround is to reclaim them by using the '-musage=1' balance
     filter

   - other space reservation fixes:
      - do not delete unused block group if it may be used soon
      - do not reserve space for checksums for NOCOW files

   - fix extent map assertion failure when writing out free space inode

   - reject encoded write if inode has nodatasum flag set

   - fix chunk map leak when loading block group zone info"

* tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't refill whole delayed refs block reserve when starting transaction
  btrfs: zoned: fix chunk map leak when loading block group zone info
  btrfs: reject encoded write if inode has nodatasum flag set
  btrfs: don't reserve space for checksums when writing to nocow files
  btrfs: add new unused block groups to the list of unused block groups
  btrfs: do not delete unused block group if it may be used soon
  btrfs: add and use helper to check if block group is used
  btrfs: don't drop extent_map for free space inode on write error

20 months agoMerge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Wed, 14 Feb 2024 23:34:03 +0000 (15:34 -0800)]
Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit fix from Shuah Khan:
 "One important fix to unregister kunit_bus when KUnit module is
  unloaded.

  Not doing so causes an error when KUnit module tries to re-register
  the bus when it gets reloaded"

* tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: device: Unregister the kunit_bus on shutdown

20 months agonetfilter: nf_tables: fix bidirectional offload regression
Felix Fietkau [Wed, 14 Feb 2024 14:42:35 +0000 (15:42 +0100)]
netfilter: nf_tables: fix bidirectional offload regression

Commit 8f84780b84d6 ("netfilter: flowtable: allow unidirectional rules")
made unidirectional flow offload possible, while completely ignoring (and
breaking) bidirectional flow offload for nftables.
Add the missing flag that was left out as an exercise for the reader :)

Cc: Vlad Buslov <vladbu@nvidia.com>
Fixes: 8f84780b84d6 ("netfilter: flowtable: allow unidirectional rules")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nat: restore default DNAT behavior
Kyle Swenson [Thu, 8 Feb 2024 23:56:31 +0000 (23:56 +0000)]
netfilter: nat: restore default DNAT behavior

When a DNAT rule is configured via iptables with different port ranges,

iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
-j DNAT --to-destination 192.168.0.10:21000-21010

we seem to be DNATing to some random port on the LAN side. While this is
expected if --random is passed to the iptables command, it is not
expected without passing --random.  The expected behavior (and the
observed behavior prior to the commit in the "Fixes" tag) is the traffic
will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
with that destination.  In that case, we expect the traffic to be
instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
the range.

This patch intends to restore the behavior observed prior to the "Fixes"
tag.

Fixes: 6ed5943f8735 ("netfilter: nat: remove l4 protocol port rovers")
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nft_set_pipapo: fix missing : in kdoc
Pablo Neira Ayuso [Thu, 8 Feb 2024 14:46:03 +0000 (15:46 +0100)]
netfilter: nft_set_pipapo: fix missing : in kdoc

Add missing : in kdoc field names.

Fixes: 8683f4b9950d ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agoigc: Remove temporary workaround
Sasha Neftin [Wed, 24 Jan 2024 05:57:00 +0000 (07:57 +0200)]
igc: Remove temporary workaround

PHY_CONTROL register works as defined in the IEEE 802.3 specification
(IEEE 802.3-2008 22.2.4.1). Tidy up the temporary workaround.

User impact: PHY can now be powered down when the ethernet link is down.

Testing hints: ip link set down <device> (or just disconnect the
ethernet cable).

Oldest tested NVM version is: 1045:740.

Fixes: 5586838fe9ce ("igc: Add code for PHY support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
20 months agoigb: Fix string truncation warnings in igb_set_fw_version
Kunwu Chan [Mon, 15 Jan 2024 08:28:25 +0000 (16:28 +0800)]
igb: Fix string truncation warnings in igb_set_fw_version

Commit 1978d3ead82c ("intel: fix string truncation warnings")
fixes '-Wformat-truncation=' warnings in igb_main.c by using kasprintf.

drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning:‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=]
 3092 |                                  "%d.%d, 0x%08x, %d.%d.%d",
      |                                                     ^~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
 3092 |                                  "%d.%d, 0x%08x, %d.%d.%d",
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note:‘snprintf’ output between 23 and 43 bytes into a destination of size 32

kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fix this warning by using a larger space for adapter->fw_version,
and then fall back and continue to use snprintf.

Fixes: 1978d3ead82c ("intel: fix string truncation warnings")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Cc: Kunwu Chan <kunwu.chan@hotmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
20 months agocan: netlink: Fix TDCO calculation using the old data bittiming
Maxime Jayat [Mon, 6 Nov 2023 18:01:58 +0000 (19:01 +0100)]
can: netlink: Fix TDCO calculation using the old data bittiming

The TDCO calculation was done using the currently applied data bittiming,
instead of the newly computed data bittiming, which means that the TDCO
had an invalid value unless setting the same data bittiming twice.

Fixes: d99755f71a80 ("can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)")
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/40579c18-63c0-43a4-8d4c-f3a6c1c0b417@munic.io
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
20 months agocan: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Oleksij Rempel [Fri, 20 Oct 2023 13:38:14 +0000 (15:38 +0200)]
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)

Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
modifies jsk->filters while receiving packets.

Following trace was seen on affected system:
 ==================================================================
 BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
 Read of size 4 at addr ffff888012144014 by task j1939/350

 CPU: 0 PID: 350 Comm: j1939 Tainted: G        W  OE      6.5.0-rc5 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 Call Trace:
  print_report+0xd3/0x620
  ? kasan_complete_mode_report_info+0x7d/0x200
  ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  kasan_report+0xc2/0x100
  ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  __asan_load4+0x84/0xb0
  j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  j1939_sk_recv+0x20b/0x320 [can_j1939]
  ? __kasan_check_write+0x18/0x20
  ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939]
  ? j1939_simple_recv+0x69/0x280 [can_j1939]
  ? j1939_ac_recv+0x5e/0x310 [can_j1939]
  j1939_can_recv+0x43f/0x580 [can_j1939]
  ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
  ? raw_rcv+0x42/0x3c0 [can_raw]
  ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
  can_rcv_filter+0x11f/0x350 [can]
  can_receive+0x12f/0x190 [can]
  ? __pfx_can_rcv+0x10/0x10 [can]
  can_rcv+0xdd/0x130 [can]
  ? __pfx_can_rcv+0x10/0x10 [can]
  __netif_receive_skb_one_core+0x13d/0x150
  ? __pfx___netif_receive_skb_one_core+0x10/0x10
  ? __kasan_check_write+0x18/0x20
  ? _raw_spin_lock_irq+0x8c/0xe0
  __netif_receive_skb+0x23/0xb0
  process_backlog+0x107/0x260
  __napi_poll+0x69/0x310
  net_rx_action+0x2a1/0x580
  ? __pfx_net_rx_action+0x10/0x10
  ? __pfx__raw_spin_lock+0x10/0x10
  ? handle_irq_event+0x7d/0xa0
  __do_softirq+0xf3/0x3f8
  do_softirq+0x53/0x80
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x6e/0x70
  netif_rx+0x16b/0x180
  can_send+0x32b/0x520 [can]
  ? __pfx_can_send+0x10/0x10 [can]
  ? __check_object_size+0x299/0x410
  raw_sendmsg+0x572/0x6d0 [can_raw]
  ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
  ? apparmor_socket_sendmsg+0x2f/0x40
  ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
  sock_sendmsg+0xef/0x100
  sock_write_iter+0x162/0x220
  ? __pfx_sock_write_iter+0x10/0x10
  ? __rtnl_unlock+0x47/0x80
  ? security_file_permission+0x54/0x320
  vfs_write+0x6ba/0x750
  ? __pfx_vfs_write+0x10/0x10
  ? __fget_light+0x1ca/0x1f0
  ? __rcu_read_unlock+0x5b/0x280
  ksys_write+0x143/0x170
  ? __pfx_ksys_write+0x10/0x10
  ? __kasan_check_read+0x15/0x20
  ? fpregs_assert_state_consistent+0x62/0x70
  __x64_sys_write+0x47/0x60
  do_syscall_64+0x60/0x90
  ? do_syscall_64+0x6d/0x90
  ? irqentry_exit+0x3f/0x50
  ? exc_page_fault+0x79/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 Allocated by task 348:
  kasan_save_stack+0x2a/0x50
  kasan_set_track+0x29/0x40
  kasan_save_alloc_info+0x1f/0x30
  __kasan_kmalloc+0xb5/0xc0
  __kmalloc_node_track_caller+0x67/0x160
  j1939_sk_setsockopt+0x284/0x450 [can_j1939]
  __sys_setsockopt+0x15c/0x2f0
  __x64_sys_setsockopt+0x6b/0x80
  do_syscall_64+0x60/0x90
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 Freed by task 349:
  kasan_save_stack+0x2a/0x50
  kasan_set_track+0x29/0x40
  kasan_save_free_info+0x2f/0x50
  __kasan_slab_free+0x12e/0x1c0
  __kmem_cache_free+0x1b9/0x380
  kfree+0x7a/0x120
  j1939_sk_setsockopt+0x3b2/0x450 [can_j1939]
  __sys_setsockopt+0x15c/0x2f0
  __x64_sys_setsockopt+0x6b/0x80
  do_syscall_64+0x60/0x90
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol")
Reported-by: Sili Luo <rootlab@huawei.com>
Suggested-by: Sili Luo <rootlab@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
20 months agocan: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Ziqi Zhao [Fri, 21 Jul 2023 16:22:26 +0000 (09:22 -0700)]
can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:

- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock

A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.

NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.

Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
[mkl: remove unrelated newline change]
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
20 months agoethernet: cpts: fix function pointer cast warnings
Arnd Bergmann [Tue, 13 Feb 2024 10:16:34 +0000 (11:16 +0100)]
ethernet: cpts: fix function pointer cast warnings

clang-16 warns about the mismatched prototypes for the devm_* callbacks:

drivers/net/ethernet/ti/cpts.c:691:12: error: cast from 'void (*)(struct clk_hw *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
  691 |                                        (void(*)(void *))clk_hw_unregister_mux,
      |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
  406 |         __devm_add_action_or_reset(dev, action, data, #action)
      |                                         ^~~~~~
drivers/net/ethernet/ti/cpts.c:703:12: error: cast from 'void (*)(struct device_node *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
  703 |                                        (void(*)(void *))of_clk_del_provider,
      |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
  406 |         __devm_add_action_or_reset(dev, action, data, #action)

Use separate helper functions for this instead, using the expected prototypes
with a void* argument.

Fixes: a3047a81ba13 ("net: ethernet: ti: cpts: add support for ext rftclk selection")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agobnad: fix work_queue type mismatch
Arnd Bergmann [Tue, 13 Feb 2024 10:07:50 +0000 (11:07 +0100)]
bnad: fix work_queue type mismatch

clang-16 warns about a function pointer cast:

drivers/net/ethernet/brocade/bna/bnad.c:1995:4: error: cast from 'void (*)(struct delayed_work *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 1995 |                         (work_func_t)bnad_tx_cleanup);
drivers/net/ethernet/brocade/bna/bnad.c:2252:4: error: cast from 'void (*)(void *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 2252 |                         (work_func_t)(bnad_rx_cleanup));

The problem here is mixing up work_struct and delayed_work, which relies
the former being the first member of the latter.

Change the code to use consistent types here to address the warning and
make it more robust against workqueue interface changes.

Side note: the use of a delayed workqueue for cleaning up TX descriptors
is probably a bad idea since this introduces a noticeable delay. The
driver currently does not appear to use BQL, but if one wanted to add
that, this would have to be changed as well.

Fixes: 01b54b145185 ("bna: tx rx cleanup fix")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: smc: fix spurious error message from __sock_release()
Dmitry Antipov [Mon, 12 Feb 2024 14:34:02 +0000 (17:34 +0300)]
net: smc: fix spurious error message from __sock_release()

Commit 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.

Fixes: 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Wed, 14 Feb 2024 10:37:33 +0000 (10:37 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-02-12 (i40e)

This series contains updates to i40e driver only.

Ivan Vecera corrects the looping value used while waiting for queues to
be disabled as well as an incorrect mask being used for DCB
configuration.

Maciej resolves an issue related to XDP traffic; removing a double call to
i40e_pf_rxq_wait() and accounting for XDP rings when stopping rings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoocteontx2-af: Remove the PF_FUNC validation for NPC transmit rules
Subbaraya Sundeep [Sun, 11 Feb 2024 19:00:38 +0000 (00:30 +0530)]
octeontx2-af: Remove the PF_FUNC validation for NPC transmit rules

NPC transmit side mcam rules can use the pcifunc (in packet metadata
added by hardware) of transmitting device for mcam lookup similar to
the channel of receiving device at receive side.
The commit 18603683d766 ("octeontx2-af: Remove channel verification
while installing MCAM rules") removed the receive side channel
verification to save hardware MCAM filters while switching packets
across interfaces but missed removing transmit side checks.
This patch removes transmit side rules validation.

Fixes: 18603683d766 ("octeontx2-af: Remove channel verification while installing MCAM rules")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'selftests-net-more-pmtu-sh-fixes'
Jakub Kicinski [Tue, 13 Feb 2024 18:19:07 +0000 (10:19 -0800)]
Merge branch 'selftests-net-more-pmtu-sh-fixes'

Paolo Abeni says:

====================
selftests: net: more pmtu.sh fixes

The mentioned test is still flaky, unusally enough in 'fast'
environments.

Patch 2/2 [try to] address the existing issues, while patch 1/2
introduces more strict tests for the existing net helpers, to hopefully
prevent future pain.
====================

Link: https://lore.kernel.org/r/cover.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: net: more pmtu.sh fixes
Paolo Abeni [Mon, 12 Feb 2024 10:19:24 +0000 (11:19 +0100)]
selftests: net: more pmtu.sh fixes

The netdev CI is reporting failures for the pmtu test:

  [  115.929264] br0: port 2(vxlan_a) entered forwarding state
  # 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:50000}, 28): Address already in use
  # 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused
  # TEST: IPv6, bridged vxlan4: PMTU exceptions                         [FAIL]
  # File size 0 mismatches exepcted value in locally bridged vxlan test

The root cause is apparently a socket created by a previous iteration
of the relevant loop still lasting in LAST_ACK state.

Note that even the file size check is racy, the receiver process dumping
the file could still be running in background

Allow the listener to bound on the same local port via SO_REUSEADDR and
collect file output file size only after the listener completion.

Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4f51c11a1ce7ca7a4dabd926cffff63dadac9ba1.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: net: more strict check in net_helper
Paolo Abeni [Mon, 12 Feb 2024 10:19:23 +0000 (11:19 +0100)]
selftests: net: more strict check in net_helper

The helper waiting for a listener port can match any socket whose
hexadecimal representation of source or destination addresses
matches that of the given port.

Additionally, any socket state is accepted.

All the above can let the helper return successfully before the
relevant listener is actually ready, with unexpected results.

So far I could not find any related failure in the netdev CI, but
the next patch is going to make the critical event more easily
reproducible.

Address the issue matching the port hex only vs the relevant socket
field and additionally checking the socket state for TCP sockets.

Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: net: cope with slow env in so_txtime.sh test
Paolo Abeni [Mon, 12 Feb 2024 09:43:31 +0000 (10:43 +0100)]
selftests: net: cope with slow env in so_txtime.sh test

The mentioned test is failing in slow environments:

  # SO_TXTIME ipv4 clock monotonic
  # ./so_txtime: recv: timeout: Resource temporarily unavailable
  not ok 1 selftests: net: so_txtime.sh # exit=1

Tuning the tolerance in the test binary is error-prone and doomed
to failures is slow-enough environment.

Just resort to suppress any error in such cases. Note to suppress
them we need first to refactor a bit the code moving it to explicit
error handling.

Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/2142d9ed4b5c5aa07dd1b455779625d91b175373.1707730902.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: net: cope with slow env in gro.sh test
Paolo Abeni [Mon, 12 Feb 2024 09:39:41 +0000 (10:39 +0100)]
selftests: net: cope with slow env in gro.sh test

The gro self-tests sends the packets to be aggregated with
multiple write operations.

When running is slow environment, it's hard to guarantee that
the GRO engine will wait for the last packet in an intended
train.

The above causes almost deterministic failures in our CI for
the 'large' test-case.

Address the issue explicitly ignoring failures for such case
in slow environments (KSFT_MACHINE_SLOW==true).

Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/97d3ba83f5a2bfeb36f6bc0fb76724eb3dafb608.1707729403.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agobtrfs: don't refill whole delayed refs block reserve when starting transaction
Filipe Manana [Fri, 2 Feb 2024 14:32:17 +0000 (14:32 +0000)]
btrfs: don't refill whole delayed refs block reserve when starting transaction

Since commit 28270e25c69a ("btrfs: always reserve space for delayed refs
when starting transaction") we started not only to reserve metadata space
for the delayed refs a caller of btrfs_start_transaction() might generate
but also to try to fully refill the delayed refs block reserve, because
there are several case where we generate delayed refs and haven't reserved
space for them, relying on the global block reserve. Relying too much on
the global block reserve is not always safe, and can result in hitting
-ENOSPC during transaction commits or worst, in rare cases, being unable
to mount a filesystem that needs to do orphan cleanup or anything that
requires modifying the filesystem during mount, and has no more
unallocated space and the metadata space is nearly full. This was
explained in detail in that commit's change log.

However the gap between the reserved amount and the size of the delayed
refs block reserve can be huge, so attempting to reserve space for such
a gap can result in allocating many metadata block groups that end up
not being used. After a recent patch, with the subject:

  "btrfs: add new unused block groups to the list of unused block groups"

We started to add new block groups that are unused to the list of unused
block groups, to avoid having them around for a very long time in case
they are never used, because a block group is only added to the list of
unused block groups when we deallocate the last extent or when mounting
the filesystem and the block group has 0 bytes used. This is not a problem
introduced by the commit mentioned earlier, it always existed as our
metadata space reservations are, most of the time, pessimistic and end up
not using all the space they reserved, so we can occasionally end up with
one or two unused metadata block groups for a long period. However after
that commit mentioned earlier, we are just more pessimistic in the
metadata space reservations when starting a transaction and therefore the
issue is more likely to happen.

This however is not always enough because we might create unused metadata
block groups when reserving metadata space at a high rate if there's
always a gap in the delayed refs block reserve and the cleaner kthread
isn't triggered often enough or is busy with other work (running delayed
iputs, cleaning deleted roots, etc), not to mention the block group's
allocated space is only usable for a new block group after the transaction
used to remove it is committed.

A user reported that he's getting a lot of allocated metadata block groups
but the usage percentage of metadata space was very low compared to the
total allocated space, specially after running a series of block group
relocations.

So for now stop trying to refill the gap in the delayed refs block reserve
and reserve space only for the delayed refs we are expected to generate
when starting a transaction.

CC: stable@vger.kernel.org # 6.7+
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/
Tested-by: Ivan Shapovalov <intelfx@intelfx.name>
Reported-by: Heddxh <g311571057@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
20 months agobtrfs: zoned: fix chunk map leak when loading block group zone info
Filipe Manana [Mon, 12 Feb 2024 21:50:53 +0000 (21:50 +0000)]
btrfs: zoned: fix chunk map leak when loading block group zone info

At btrfs_load_block_group_zone_info() we never drop a reference on the
chunk map we have looked up, therefore leaking a reference on it. So
add the missing btrfs_free_chunk_map() at the end of the function.

Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps")
Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
20 months agobtrfs: reject encoded write if inode has nodatasum flag set
Filipe Manana [Fri, 2 Feb 2024 12:09:22 +0000 (12:09 +0000)]
btrfs: reject encoded write if inode has nodatasum flag set

Currently we allow an encoded write against inodes that have the NODATASUM
flag set, either because they are NOCOW files or they were created while
the filesystem was mounted with "-o nodatasum". This results in having
compressed extents without corresponding checksums, which is a filesystem
inconsistency reported by 'btrfs check'.

For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers
this and 'btrfs check' errors out with:

   [1/7] checking root items
   [2/7] checking extents
   [3/7] checking free space tree
   [4/7] checking fs roots
   root 256 inode 257 errors 1040, bad file extent, some csum missing
   root 256 inode 258 errors 1040, bad file extent, some csum missing
   ERROR: errors found in fs roots
   (...)

So reject encoded writes if the target inode has NODATASUM set.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
20 months agobtrfs: don't reserve space for checksums when writing to nocow files
Filipe Manana [Wed, 31 Jan 2024 17:18:04 +0000 (17:18 +0000)]
btrfs: don't reserve space for checksums when writing to nocow files

Currently when doing a write to a file we always reserve metadata space
for inserting data checksums. However we don't need to do it if we have
a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums
are disabled (-o nodatasum mount option), as in that case we are only
adding unnecessary pressure to metadata reservations.

For example on x86_64, with the default node size of 16K, a 4K buffered
write into a nodatacow file is reserving 655360 bytes of metadata space,
as it's accounting for checksums. After this change, which stops reserving
space for checksums if we have a nodatacow file or checksums are disabled,
we only need to reserve 393216 bytes of metadata.

CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
20 months agoMerge tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 13 Feb 2024 16:38:57 +0000 (08:38 -0800)]
Merge tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tooling fixes from Steven Rostedt:
 "RTLA:

   - rtla tools are exiting with a positive value when usage() is
     called. Make them return 0 if the usage was called via -h/--help

   - the -P priority sets the sched priority for rtla workload. When the
     SCHED_OTHER scheduler is selected, it sets the rt_priority instead
     of the nice parameter. Setting the nice value is the correct thing,
     so fix it

   - rtla is failing to compile with clang due to unsupported options
     from gcc. Adjusting the compiler/linker options makes clang work
     properly

   - Remove the sched_getattr() unused function on utils.c

   - Fixes for variable initialization and size, reported by clang

  Verification:

   - rv is failing to compile with clang due to unsupported options from
     gcc. Adjusting the compiler/linker options makes clang work
     properly

   - Fix an uninitialized variable on in_kernel.c reported by clang"

* tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rtla: Exit with EXIT_SUCCESS when help is invoked
  tools/rtla: Replace setting prio with nice for SCHED_OTHER
  tools/rv: Fix curr_reactor uninitialized variable
  tools/rv: Fix Makefile compiler options for clang
  tools/rtla: Remove unused sched_getattr() function
  tools/rtla: Fix clang warning about mount_point var size
  tools/rtla: Fix uninitialized bucket/data->bucket_size warning
  tools/rtla: Fix Makefile compiler options for clang

20 months agodt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
Rob Herring [Wed, 24 Jan 2024 19:07:33 +0000 (13:07 -0600)]
dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"

The 'phandle-array' type is a bit ambiguous. It can be either just an
array of phandles or an array of phandles plus args. "samsung,sysreg" is
the latter and needs to be constrained to a single entry with a phandle and
offset.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240124190733.1554314-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
20 months agospi-mxs: Fix chipselect glitch
Ralf Schlatterbeck [Fri, 2 Feb 2024 11:53:30 +0000 (12:53 +0100)]
spi-mxs: Fix chipselect glitch

There was a change in the mxs-dma engine that uses a new custom flag.
The change was not applied to the mxs spi driver.
This results in chipselect being deasserted too early.
This fixes the chipselect problem by using the new flag in the mxs-spi
driver.

Fixes: ceeeb99cd821 ("dmaengine: mxs: rename custom flag")
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Link: https://msgid.link/r/20240202115330.wxkbfmvd76sy3a6a@runtux.com
Signed-off-by: Mark Brown <broonie@kernel.org>
20 months agonet: ti: icssg-prueth: add dependency for PTP
Randy Dunlap [Sun, 11 Feb 2024 06:11:52 +0000 (22:11 -0800)]
net: ti: icssg-prueth: add dependency for PTP

When CONFIG_PTP_1588_CLOCK=m and CONFIG_TI_ICSSG_PRUETH=y, there are
kconfig dependency warnings and build errors referencing PTP functions.

Fix these by making TI_ICSSG_PRUETH depend on PTP_1588_CLOCK_OPTIONAL.

Fixes these build errors and warnings:

WARNING: unmet direct dependencies detected for TI_ICSS_IEP
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y]
  Selected by [y]:
  - TI_ICSSG_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && ARCH_K3 [=y] && OF [=y] && TI_K3_UDMA_GLUE_LAYER [=y]

aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_get_ptp_clock_idx':
icss_iep.c:(.text+0x1d4): undefined reference to `ptp_clock_index'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_exit':
icss_iep.c:(.text+0xde8): undefined reference to `ptp_clock_unregister'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_init':
icss_iep.c:(.text+0x176c): undefined reference to `ptp_clock_register'

Fixes: 186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Md Danish Anwar <danishanwar@ti.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20240211061152.14696-1-rdunlap@infradead.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoHID: wacom: generic: Avoid reporting a serial of '0' to userspace
Tatsunosuke Tobita [Thu, 1 Feb 2024 04:40:55 +0000 (13:40 +0900)]
HID: wacom: generic: Avoid reporting a serial of '0' to userspace

The xf86-input-wacom driver does not treat '0' as a valid serial
number and will drop any input report which contains an
MSC_SERIAL = 0 event. The kernel driver already takes care to
avoid sending any MSC_SERIAL event if the value of serial[0] == 0
(which is the case for devices that don't actually report a
serial number), but this is not quite sufficient.
Only the lower 32 bits of the serial get reported to userspace,
so if this portion of the serial is zero then there can still
be problems.

This commit allows the driver to report either the lower 32 bits
if they are non-zero or the upper 32 bits otherwise.

Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Tatsunosuke Tobita <tatsunosuke.tobita@wacom.com>
Fixes: f85c9dc678a5 ("HID: wacom: generic: Support tool ID and additional tool types")
CC: stable@vger.kernel.org # v4.10
Signed-off-by: Jiri Kosina <jkosina@suse.com>
20 months agoaf_unix: Fix task hung while purging oob_skb in GC.
Kuniyuki Iwashima [Fri, 9 Feb 2024 22:04:53 +0000 (14:04 -0800)]
af_unix: Fix task hung while purging oob_skb in GC.

syzbot reported a task hung; at the same time, GC was looping infinitely
in list_for_each_entry_safe() for OOB skb.  [0]

syzbot demonstrated that the list_for_each_entry_safe() was not actually
safe in this case.

A single skb could have references for multiple sockets.  If we free such
a skb in the list_for_each_entry_safe(), the current and next sockets could
be unlinked in a single iteration.

unix_notinflight() uses list_del_init() to unlink the socket, so the
prefetched next socket forms a loop itself and list_for_each_entry_safe()
never stops.

Here, we must use while() and make sure we always fetch the first socket.

[0]:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5065 Comm: syz-executor236 Not tainted 6.8.0-rc3-syzkaller-00136-g1f719a2f3fa6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x60 kernel/kcov.c:207
Code: cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 48 8b 14 25 40 c2 03 00 <65> 8b 05 b4 7c 78 7e a9 00 01 ff 00 48 8b 34 24 74 0f f6 c4 01 74
RSP: 0018:ffffc900033efa58 EFLAGS: 00000283
RAX: ffff88807b077800 RBX: ffff88807b077800 RCX: 1ffffffff27b1189
RDX: ffff88802a5a3b80 RSI: ffffffff8968488d RDI: ffff88807b077f70
RBP: ffffc900033efbb0 R08: 0000000000000001 R09: fffffbfff27a900c
R10: ffffffff93d48067 R11: ffffffff8ae000eb R12: ffff88807b077800
R13: dffffc0000000000 R14: ffff88807b077e40 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564f4fc1e3a8 CR3: 000000000d57a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 unix_gc+0x563/0x13b0 net/unix/garbage.c:319
 unix_release_sock+0xa93/0xf80 net/unix/af_unix.c:683
 unix_release+0x91/0xf0 net/unix/af_unix.c:1064
 __sock_release+0xb0/0x270 net/socket.c:659
 sock_close+0x1c/0x30 net/socket.c:1421
 __fput+0x270/0xb80 fs/file_table.c:376
 task_work_run+0x14f/0x250 kernel/task_work.c:180
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa8a/0x2ad0 kernel/exit.c:871
 do_group_exit+0xd4/0x2a0 kernel/exit.c:1020
 __do_sys_exit_group kernel/exit.c:1031 [inline]
 __se_sys_exit_group kernel/exit.c:1029 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd5/0x270 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f9d6cbdac09
Code: Unable to access opcode bytes at 0x7f9d6cbdabdf.
RSP: 002b:00007fff5952feb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d6cbdac09
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007f9d6cc552b0 R08: ffffffffffffffb8 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 00007f9d6cc552b0
R13: 0000000000000000 R14: 00007f9d6cc55d00 R15: 00007f9d6cbabe70
 </TASK>

Reported-by: syzbot+4fa4a2d1f5a5ee06f006@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fa4a2d1f5a5ee06f006
Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240209220453.96053-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoHID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
Even Xu [Fri, 9 Feb 2024 06:52:32 +0000 (14:52 +0800)]
HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend

After legacy suspend/resume via ACPI S3, sensor read operation fails
with timeout. Also, it will cause delay in resume operation as there
will be retries on failure.

This is caused by commit f645a90e8ff7 ("HID: intel-ish-hid:
ishtp-hid-client: use helper functions for connection"), which used
helper functions to simplify connect, reset and disconnect process.
Also avoid freeing and allocating client buffers again during reconnect
process.

But there is a case, when ISH firmware resets after ACPI S3 suspend,
ishtp bus driver frees client buffers. Since there is no realloc again
during reconnect, there are no client buffers available to send connection
requests to the firmware. Without successful connection to the firmware,
subsequent sensor reads will timeout.

To address this issue, ishtp bus driver does not free client buffers on
warm reset after S3 resume. Simply add the buffers from the read list
to free list of buffers.

Fixes: f645a90e8ff7 ("HID: intel-ish-hid: ishtp-hid-client: use helper functions for connection")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218442
Signed-off-by: Even Xu <even.xu@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
20 months agoHID: multitouch: Add required quirk for Synaptics 0xcddc device
Manuel Fombuena [Sun, 11 Feb 2024 19:04:29 +0000 (19:04 +0000)]
HID: multitouch: Add required quirk for Synaptics 0xcddc device

Add support for the pointing stick (Accupoint) and 2 mouse buttons.

Present on some Toshiba/dynabook Portege X30 and X40 laptops.

It should close https://bugzilla.kernel.org/show_bug.cgi?id=205817

Signed-off-by: Manuel Fombuena <fombuena@outlook.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>