Simon Horman [Tue, 20 May 2025 14:33:33 +0000 (15:33 +0100)]
net: ethernet: mtk_eth_soc: Correct spelling
Correct spelling of platforms, various, and initial.
As flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Tue, 20 May 2025 14:25:41 +0000 (15:25 +0100)]
net: dlink: Correct endian treatment of t_SROM data
As it's name suggests, parse_eeprom() parses EEPROM data.
This is done by reading data, 16 bits at a time as follows:
for (i = 0; i < 128; i++)
((__le16 *) sromdata)[i] = cpu_to_le16(read_eeprom(np, i));
sromdata is at the same memory location as psrom.
And the type of psrom is a pointer to struct t_SROM.
As can be seen in the loop above, data is stored in sromdata, and thus
psrom, as 16-bit little-endian values. However, the integer fields of
t_SROM are host byte order.
In the case of the led_mode field this results in a but which has been
addressed by commit e7e5ae71831c ("net: dlink: Correct endianness
handling of led_mode").
In the case of the remaining fields, which are updated by this patch,
I do not believe this does not result in any bugs. But it does seem
best to correctly annotate the endianness of integers.
Flagged by Sparse as:
.../dl2k.c:344:35: warning: restricted __le32 degrades to integer
Compile tested only.
No run-time change intended.
Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Tue, 20 May 2025 06:09:52 +0000 (11:39 +0530)]
octeontx2-af: NPC: Clear Unicast rule on nixlf detach
The AF driver assigns reserved MCAM entries (for unicast, broadcast,
etc.) based on the NIXLF number. When a NIXLF is detached, these entries
are disabled.
For example,
PF NIXLF
--------------------
PF0 0
SDP-VF0 1
If the user unbinds both PF0 and SDP-VF0 interfaces and then binds them in
reverse order
PF NIXLF
---------------------
SDP-VF0 0
PF0 1
In this scenario, the PF0 unicast entry is getting corrupted because
the MCAM entry contains stale data (SDP-VF0 ucast data)
This patch resolves the issue by clearing the unicast MCAM entry during
NIXLF detach
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:58:00 +0000 (13:58 -0700)]
selftest: af_unix: Test SO_PASSRIGHTS.
scm_rights.c has various patterns of tests to exercise GC.
Let's add cases where SO_PASSRIGHTS is disabled.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:59 +0000 (13:57 -0700)]
af_unix: Introduce SO_PASSRIGHTS.
As long as recvmsg() or recvmmsg() is used with cmsg, it is not
possible to avoid receiving file descriptors via SCM_RIGHTS.
This behaviour has occasionally been flagged as problematic, as
it can be (ab)used to trigger DoS during close(), for example, by
passing a FUSE-controlled fd or a hung NFS fd.
For instance, as noted on the uAPI Group page [0], an untrusted peer
could send a file descriptor pointing to a hung NFS mount and then
close it. Once the receiver calls recvmsg() with msg_control, the
descriptor is automatically installed, and then the responsibility
for the final close() now falls on the receiver, which may result
in blocking the process for a long time.
Regarding this, systemd calls cmsg_close_all() [1] after each
recvmsg() to close() unwanted file descriptors sent via SCM_RIGHTS.
However, this cannot work around the issue at all, because the final
fput() may still occur on the receiver's side once sendmsg() with
SCM_RIGHTS succeeds. Also, even filtering by LSM at recvmsg() does
not work for the same reason.
Thus, we need a better way to refuse SCM_RIGHTS at sendmsg().
Let's introduce SO_PASSRIGHTS to disable SCM_RIGHTS.
Note that this option is enabled by default for backward
compatibility.
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:58 +0000 (13:57 -0700)]
af_unix: Inherit sk_flags at connect().
For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
are inherited from the parent listen()ing socket.
Currently, this inheritance happens at accept(), because these
attributes were stored in sk->sk_socket->flags and the struct socket
is not allocated until accept().
This leads to unintentional behaviour.
When a peer sends data to an embryo socket in the accept() queue,
unix_maybe_add_creds() embeds credentials into the skb, even if
neither the peer nor the listener has enabled these options.
If the option is enabled, the embryo socket receives the ancillary
data after accept(). If not, the data is silently discarded.
This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but
would not for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file
descriptor was sent, it'd be game over.
To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
sockets.
Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.")
made it possible to access the parent's flags in sendmsg() via
unix_sk(other)->listener->sk->sk_socket->flags, but this introduces
an unnecessary condition that is irrelevant for most sockets,
accept()ed sockets and clients.
Therefore, we moved SOCK_PASSXXX into struct sock.
Let’s inherit sk->sk_scm_recv_flags at connect() to avoid receiving
SCM_RIGHTS on embryo sockets created from a parent with SO_PASSRIGHTS=0.
Note that the parent socket is locked in connect() so we don't need
READ_ONCE() for sk_scm_recv_flags.
Now, we can remove !other->sk_socket check in unix_maybe_add_creds()
to avoid slow SOCK_PASS{CRED,PIDFD} handling for embryo sockets
created from a parent with SO_PASS{CRED,PIDFD}=0.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:57 +0000 (13:57 -0700)]
af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
As explained in the next patch, SO_PASSRIGHTS would have a problem
if we assigned a corresponding bit to socket->flags, so it must be
managed in struct sock.
Mixing socket->flags and sk->sk_flags for similar options will look
confusing, and sk->sk_flags does not have enough space on 32bit system.
Also, as mentioned in commit 16e572626961 ("af_unix: dont send
SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
is known to be slow, and managing the flags in struct socket cannot
avoid that for embryo sockets.
Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
While at it, other SOCK_XXX flags in net.h are grouped as enum.
Note that assign_bit() was atomic, so the writer side is moved down
after lock_sock() in setsockopt(), but the bit is only read once
in sendmsg() and recvmsg(), so lock_sock() is not needed there.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:56 +0000 (13:57 -0700)]
net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.
SCM_CREDENTIALS and SCM_SECURITY can be recv()ed by calling
scm_recv() or scm_recv_unix(), and SCM_PIDFD is only used by
scm_recv_unix().
scm_recv() is called from AF_NETLINK and AF_BLUETOOTH.
scm_recv_unix() is literally called from AF_UNIX.
Let's restrict SO_PASSCRED and SO_PASSSEC to such sockets and
SO_PASSPIDFD to AF_UNIX only.
Later, SOCK_PASS{CRED,PIDFD,SEC} will be moved to struct sock
and united with another field.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:55 +0000 (13:57 -0700)]
tcp: Restrict SO_TXREHASH to TCP socket.
sk->sk_txrehash is only used for TCP.
Let's restrict SO_TXREHASH to TCP to reflect this.
Later, we will make sk_txrehash a part of the union for other
protocol families.
Note that we need to modify BPF selftest not to get/set
SO_TEREHASH for non-TCP sockets.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:54 +0000 (13:57 -0700)]
scm: Move scm_recv() from scm.h to scm.c.
scm_recv() has been placed in scm.h since the pre-git era for no
particular reason (I think), which makes the file really fragile.
For example, when you move SOCK_PASSCRED from include/linux/net.h to
enum sock_flags in include/net/sock.h, you will see weird build failure
due to terrible dependency.
To avoid the build failure in the future, let's move scm_recv(_unix())?
and its callees to scm.c.
Note that only scm_recv() needs to be exported for Bluetooth.
scm_send() should be moved to scm.c too, but I'll revisit later.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:53 +0000 (13:57 -0700)]
af_unix: Don't pass struct socket to maybe_add_creds().
We will move SOCK_PASS{CRED,PIDFD,SEC} from struct socket.flags
to struct sock for better handling with SOCK_PASSRIGHTS.
Then, we don't need to access struct socket in maybe_add_creds().
Let's pass struct sock to maybe_add_creds() and its caller
queue_oob().
While at it, we append the unix_ prefix and fix double spaces
around the pid assignment.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 19 May 2025 20:57:52 +0000 (13:57 -0700)]
af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.
Currently, the same checks for SOCK_PASSCRED and SOCK_PASSPIDFD
are scattered across many places.
Let's centralise the bit tests to make the following changes cleaner.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 20 May 2025 16:31:35 +0000 (09:31 -0700)]
Bluetooth: btintel: Check dsbr size from EFI variable
Since the size of struct btintel_dsbr is already known, we can just
start there instead of querying the EFI variable size. If the final
result doesn't match what we expect also fail. This fixes a stack buffer
overflow when the EFI variable is larger than struct btintel_dsbr.
Reported-by: zepta <z3ptaa@gmail.com> Closes: https://lore.kernel.org/all/CAPBS6KoaWV9=dtjTESZiU6KK__OZX0KpDk-=JEH8jCHFLUYv3Q@mail.gmail.com Fixes: eb9e749c0182 ("Bluetooth: btintel: Allow configuring drive strength of BRI") Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Dmitry Antipov [Tue, 20 May 2025 08:42:30 +0000 (11:42 +0300)]
Bluetooth: MGMT: iterate over mesh commands in mgmt_mesh_foreach()
In 'mgmt_mesh_foreach()', iterate over mesh commands
rather than generic mgmt ones. Compile tested only.
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Wed, 7 May 2025 19:00:30 +0000 (15:00 -0400)]
Bluetooth: L2CAP: Fix not checking l2cap_chan security level
l2cap_check_enc_key_size shall check the security level of the
l2cap_chan rather than the hci_conn since for incoming connection
request that may be different as hci_conn may already been
encrypted using a different security level.
Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
Linus Torvalds [Thu, 22 May 2025 16:15:19 +0000 (09:15 -0700)]
Merge tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"This is somewhat larger than what I hoped for, with a few PRs from
subsystems and follow-ups for the recent netdev locking changes,
anyhow there are no known pending regressions.
Including fixes from bluetooth, ipsec and CAN.
Current release - regressions:
- eth: team: grab team lock during team_change_rx_flags
- eth: bnxt_en: fix netdev locking in ULP IRQ functions
Current release - new code bugs:
- xfrm: ipcomp: fix truesize computation on receive
- eth: airoha: fix page recycling in airoha_qdma_rx_process()
Previous releases - regressions:
- sched: hfsc: fix qlen accounting bug when using peek in
hfsc_enqueue()
- mr: consolidate the ipmr_can_free_table() checks.
- bridge: netfilter: fix forwarding of fragmented packets
- xsk: bring back busy polling support in XDP_COPY
- can:
- add missing rcu read protection for procfs content
- kvaser_pciefd: force IRQ edge in case of nested IRQ
- eth: hibmcge: fix wrong ndo.open() after reset fail issue"
* tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
octeontx2-af: Set LMT_ENA bit for APR table entries
net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf
selftests/tc-testing: Add an HFSC qlen accounting test
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
idpf: fix idpf_vport_splitq_napi_poll()
net: hibmcge: fix wrong ndo.open() after reset fail issue.
net: hibmcge: fix incorrect statistics update issue
xsk: Bring back busy polling support in XDP_COPY
can: slcan: allow reception of short error messages
net: lan743x: Restore SGMII CTRL register on resume
bnxt_en: Fix netdev locking in ULP IRQ functions
MAINTAINERS: Drop myself to reviewer for ravb driver
net: dwmac-sun8i: Use parsed internal PHY address instead of 1
net: ethernet: ti: am65-cpsw: Lower random mac address error print to info
can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
can: kvaser_pciefd: Fix echo_skb race
can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
idpf: fix null-ptr-deref in idpf_features_check
...
====================
net/mlx5: Convert mlx5 to netdev instance locking
Cosmin Ratiu says:
mlx5 manages multiple netdevices, from basic Ethernet to Infiniband
netdevs. This patch series converts the driver to use netdev instance
locking for everything in preparation for TCP devmem Zero Copy.
Because mlx5 is tightly coupled with the ipoib driver, a series of
changes first happen in ipoib to allow it to work with mlx5 netdevs that
use instance locking:
IB/IPoIB: Enqueue separate work_structs for each flushed interface
IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
IB/IPoIB: Allow using netdevs that require the instance lock
A small patch then avoids dropping RTNL during firmware update:
net/mlx5e: Don't drop RTNL during firmware flash
The main patch then converts all mlx5 netdevs to use instance locking:
net/mlx5e: Convert mlx5 netdevs to instance locking
====================
Cosmin Ratiu [Wed, 21 May 2025 12:09:02 +0000 (15:09 +0300)]
net/mlx5e: Convert mlx5 netdevs to instance locking
This patch convert mlx5 to use the new netdev instance lock in addition
to the pre-existing state_lock (and the RTNL).
mlx5e_priv.state_lock was already used throughout mlx5 to protect
against concurrent state modifications on the same netdev, usually in
addition to the RTNL. The new netdev instance lock will eventually
replace it, but for now, it is acquired in addition to the existing
locks in the order RTNL -> instance lock -> state_lock.
All three netdev types handled by mlx5 are converted to the new style of
locking, because they share a lot of code related to initializing
channels and dealing with NAPI, so it's better to convert all three
rather than introduce different assumptions deep in the call stack
depending on the type of device.
Because of the nature of the call graphs in mlx5, it wasn't possible to
incrementally convert parts of the driver to use the new lock, since
either all call paths into NAPI have to possess the new lock if the
*_locked variants are used, or none of them can have the lock.
One area which required extra care is the interaction between closing
channels and devlink health reporter tasks.
Previously, the recovery tasks were unconditionally acquiring the
RTNL, which could lead to deadlocks in these scenarios:
Another similar instance of this is:
T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
-> mlx5e_close_channels -> mlx5e_ptp_close
-> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
-> mlx5e_ptp_close_txqsq
-> cancel_work_sync(&sq->recover_work) waits for
T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe
-> mlx5e_health_report -> devlink_health_report
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_err_cqe_recover which does:
rtnl_lock(); => Another deadlock.
Fix that by using the same pattern previously done in
mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be
acquired until either:
a) it is successfully acquired or
b) there's no need for the work to be done any more (channel is being
closed).
Now, for all three recovery tasks, the instance lock is repeatedly tried
to be acquired until successful or the channel/SQ is closed.
As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state)
check from mlx5e_tx_timeout_work, it's weaker than
!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary.
Future patches will introduce new call paths (from netdev queue
management ops) which can close channels (and call cancel_work_sync on
the recovery tasks) without the RTNL lock and only with the netdev
instance lock.
Cosmin Ratiu [Wed, 21 May 2025 12:09:01 +0000 (15:09 +0300)]
net/mlx5e: Don't drop RTNL during firmware flash
There's no explanation in the original commit of why that was done, but
presumably flashing takes a long time and holding RTNL for so long
blocks other interactions with the netdev layer.
However, the stack is moving towards netdev instance locking and
dropping and reacquiring RTNL in the context of flashing introduces
locking ordering issues: RTNL must be acquired before the netdev
instance lock and released after it.
This patch therefore takes the simpler approach by no longer dropping
and reacquiring the RTNL, as soon RTNL for ethtool will be removed,
leaving only the instance lock to protect against races.
Cosmin Ratiu [Wed, 21 May 2025 12:09:00 +0000 (15:09 +0300)]
IB/IPoIB: Allow using netdevs that require the instance lock
After the last patch removing vlan_rwsem, it is an incremental step to
allow ipoib to work with netdevs that require the instance lock.
In several places, netdev_lock() is changed to netdev_lock_ops_to_full()
which takes care of not acquiring the lock again when the netdev is
already locked.
In ipoib_ib_tx_timeout_work() and __ipoib_ib_dev_flush() for HEAVY
flushes, the netdev lock is acquired/released. This is needed because
these functions end up calling .ndo_stop()/.ndo_open() on subinterfaces,
and the device may expect the netdev instance lock to be held.
ipoib_set_mode() now explicitly acquires ops lock while manipulating the
features, mtu and tx queues.
Finally, ipoib_napi_enable()/ipoib_napi_disable() now use the *_locked
variants of the napi_enable()/napi_disable() calls and optionally
acquire the netdev lock themselves depending on the dev they operate on.
Cosmin Ratiu [Wed, 21 May 2025 12:08:59 +0000 (15:08 +0300)]
IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
vlan_rwsem was added more than a decade ago to work around a deadlock
involving the original mutex being acquired twice, once from the wq.
Subsequent changes then tweaked it to partially protect access to
ipoib_dev_priv->child_intfs together with the RTNL. Flushing the wq
synchronously was also since then refactored to happen separately.
This semaphore unfortunately prevents updating ipoib to work with
devices that require the netdev lock, because of lock ordering issues
between RTNL, vlan_rwsem and the netdev instance locks of parent and
child devices.
To uncomplicate things, this commit replaces vlan_rwsem with the netdev
instance lock of the parent device. Both parent child_intfs list and the
children's list membership in it require holding the parent netdev
instance lock.
All call paths were carefully reviewed and no-longer-needed ASSERT_RTNL
calls were dropped. Some non-trivial changes:
- ipoib_match_gid_pkey_addr() now only acquires the instance lock and
iterates through child_intfs for the first level of recursion (the
parent), as it's not possible to have multiple levels of nested
subinterfaces.
- ipoib_open() and ipoib_stop() schedule tasks on the global workqueue
to open/stop child interfaces to avoid potentially acquiring nested
netdev instance locks. To avoid the device going away between the task
scheduling and execution, netdev_hold/netdev_put are used.
Cosmin Ratiu [Wed, 21 May 2025 12:08:58 +0000 (15:08 +0300)]
IB/IPoIB: Enqueue separate work_structs for each flushed interface
Previously, flushing a netdevice involved first flushing all child
devices from the flush task itself. That requires holding the lock that
protects the list for the entire duration of the flush.
This poses a problem when converting from vlan_rwsem to the netdev
instance lock (next patch), because holding the parent lock while
trying to acquire a child lock makes lockdep unhappy, rightfully.
Fix this by splitting a big flush task into individual flush tasks
(all are already created in their respective ipoib_dev_priv structs)
and defining a helper function to enqueue all of them while holding the
list lock.
In ipoib_set_mac, the function is not used and the task is enqueued
directly, because in the subsequent patches locking is changed and this
function may be called with the netdev instance lock held.
This is effectively a noop, the wq is single-threaded and ordered and
will execute the same flush operations in the same order as before.
Furthermore, there should be no new races because
ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new
work generation to wait for pending work to complete. flush_workqueue()
waits for all currently enqueued work to finish before returning.
Linus Torvalds [Thu, 22 May 2025 16:08:54 +0000 (09:08 -0700)]
Merge tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"This deals with a crash in the Qualcomm pin controller GPIO
parts when using hogs.
The first patch to gpiolib makes gpiochip_line_is_valid()
NULL-tolerant.
The second patch fixes the actual problem"
* tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: switch to devm_register_sys_off_handler()
gpiolib: don't crash on enabling GPIO HOG pins
Linus Torvalds [Thu, 22 May 2025 16:05:29 +0000 (09:05 -0700)]
Merge tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes for 6.15 final. It became slightly a
higher amount than expected, but all look easy and safe to apply:
- A fix for PCM core race spotted by fuzzing
- ASoC topology fix for single DAI link
- UAF fix for ASoC SOF Intel HD-audio at reloading
- ASoC SOF Intel and Mediatek fixes
- Trivial HD-audio quirks as usual"
* tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Add new HP ZBook laptop with micmute led fixup
ALSA: hda/realtek: Add support for HP Agusta using CS35L41 HDA
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10
ALSA: hda/realtek - restore auto-mute mode for Dell Chrome platform
ALSA: pcm: Fix race of buffer access at PCM OSS layer
ASoC: SOF: Intel: hda: Fix UAF when reloading module
ASoc: SOF: topology: connect DAI to a single DAI link
ASoC: SOF: Intel: hda-bus: Use PIO mode on ACE2+ platforms
ASoC: SOF: ipc4-pcm: Delay reporting is only supported for playback direction
ASoC: SOF: ipc4-control: Use SOF_CTRL_CMD_BINARY as numid for bytes_ext
ASoC: mediatek: mt8188-mt6359: Depend on MT6359_ACCDET set or disabled
ASoC: mediatek: mt8188-mt6359: select CONFIG_SND_SOC_MT6359_ACCDET
Taehee Yoo [Tue, 20 May 2025 07:11:55 +0000 (07:11 +0000)]
eth: bnxt: fix deadlock when xdp is attached or detached
When xdp is attached or detached, dev->ndo_bpf() is called by
do_setlink(), and it acquires netdev_lock() if needed.
Unlike other drivers, the bnxt driver is protected by netdev_lock while
xdp is attached/detached because it sets dev->request_ops_lock to true.
So, the bnxt_xdp(), that is callback of ->ndo_bpf should not acquire
netdev_lock().
But the xdp_features_{set | clear}_redirect_target() was changed to
acquire netdev_lock() internally.
It causes a deadlock.
To fix this problem, bnxt driver should use
xdp_features_{set | clear}_redirect_target_locked() instead.
Splat looks like:
============================================
WARNING: possible recursive locking detected
6.15.0-rc6+ #1 Not tainted
--------------------------------------------
bpftool/1745 is trying to acquire lock: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: xdp_features_set_redirect_target+0x1f/0x80
but task is already holding lock: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by bpftool/1745:
#0: ffffffffa56131c8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x1fe/0x570
#1: ffffffffaafa75a0 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x236/0x570
#2: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
Kory Maincent [Mon, 19 May 2025 08:45:05 +0000 (10:45 +0200)]
net: Add support for providing the PTP hardware source in tsinfo
Multi-PTP source support within a network topology has been merged,
but the hardware timestamp source is not yet exposed to users.
Currently, users only see the PTP index, which does not indicate
whether the timestamp comes from a PHY or a MAC.
Add support for reporting the hwtstamp source using a
hwtstamp-source field, alongside hwtstamp-phyindex, to describe
the origin of the hardware timestamp.
Remove HWTSTAMP_SOURCE_UNSPEC enum value as it is not used at all.
Shayne Chen [Thu, 15 May 2025 03:29:52 +0000 (11:29 +0800)]
wifi: mt76: support power delta calculation for 5 TX paths
One variant of MT7992 has 5 TX paths, so extend the power delta function
to support it. Also, rename nss_delta to path_delta since the value is
based on the number of TX paths rather tha the number of spatial streams.
(path delta [0.5 dBm] = 10 * log(path number) [dBm] * 2)
Co-developed-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com> Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20250515032952.1653494-9-shayne.chen@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Shayne Chen [Thu, 15 May 2025 03:29:51 +0000 (11:29 +0800)]
wifi: mt76: fix available_antennas setting
Check if available_antennas_tx and available_antennas_rx are already set
during the per-chip initialization phase; otherwise, they could be
overwritten with incorrect values.
Shayne Chen [Thu, 15 May 2025 03:29:50 +0000 (11:29 +0800)]
wifi: mt76: mt7996: fix RX buffer size of MCU event
Some management frames are first processed by the firmware and then
passed to the driver through the MCU event rings. In CONNAC3, event rings
do not support scatter-gather and have a size limitation of 2048 bytes.
If a packet sized between 1728 and 2048 bytes arrives from an event ring,
the ring will hang because the driver attempts to use scatter-gather to
process it.
To fix this, include the size of struct skb_shared_info in the MCU RX
buffer size to prevent scatter-gather from being used for event skb in
mt76_dma_rx_fill_buf().
Fixes: 98686cd21624 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices") Co-developed-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20250515032952.1653494-7-shayne.chen@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Peter Chiu [Thu, 15 May 2025 03:29:48 +0000 (11:29 +0800)]
wifi: mt76: mt7996: fix invalid NSS setting when TX path differs from NSS
The maximum TX path and NSS may differ on a band. For example, one variant
of the MT7992 has 5 TX paths and 4 NSS on the 5 GHz band. To address this,
add orig_antenna_mask to record the maximum NSS and prevent setting an
invalid NSS in mt7996_set_antenna().
Fixes: 69d54ce7491d ("wifi: mt76: mt7996: switch to single multi-radio wiphy") Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20250515032952.1653494-5-shayne.chen@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Benjamin Lin [Thu, 15 May 2025 03:29:47 +0000 (11:29 +0800)]
wifi: mt76: mt7996: drop fragments with multicast or broadcast RA
IEEE 802.11 fragmentation can only be applied to unicast frames.
Therefore, drop fragments with multicast or broadcast RA. This patch
addresses vulnerabilities such as CVE-2020-26145.
Peter Chiu [Thu, 15 May 2025 03:29:46 +0000 (11:29 +0800)]
wifi: mt76: mt7996: set EHT max ampdu length capability
Set the max AMPDU length in the EHT MAC CAP. Without this patch, the
peer station cannot obtain the correct capability, which prevents
achieving peak throughput on the 2 GHz band.
Fixes: 1816ad9381e0 ("wifi: mt76: mt7996: add max mpdu len capability") Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20250515032952.1653494-3-shayne.chen@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Howard Hsu [Thu, 15 May 2025 03:29:45 +0000 (11:29 +0800)]
wifi: mt76: mt7996: fix beamformee SS field
Fix the beamformee SS field for the mt7996, mt7992 and mt7990 chipsets.
For the mt7992, this value shall be set to 0x4, while the others shall
be set to 0x3.
Fixes: 5b20557593d4 ("wifi: mt76: connac: adjust phy capabilities based on band constraints") Signed-off-by: Howard Hsu <howard-yh.hsu@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20250515032952.1653494-2-shayne.chen@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Michael Lo [Mon, 5 May 2025 23:36:18 +0000 (16:36 -0700)]
wifi: mt76: mt7925: add test mode support
The test mode interface allows controlled execution of chip-level
operations such as continuous transmission, reception tests, and
register access, which are essential during bring-up, diagnostics,
and factory testing.
Michael Lo [Mon, 14 Apr 2025 01:39:54 +0000 (09:39 +0800)]
wifi: mt76: mt7925: ensure all MCU commands wait for response
Modify MCU command sending functions to wait for a response,
ensuring consistent behavior across all commands and improves
reliability by confirming that each command is processed
successfully.
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips") Signed-off-by: Michael Lo <michael.lo@mediatek.com> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250414013954.1151774-3-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
Introduce mt76_connac_mcu_build_rnr_scan_param routine for handling
RNR scan. This is a preliminary patch to enable RNR scan in mt7921 and
mt7925 driver.
Paolo Abeni [Thu, 22 May 2025 10:32:38 +0000 (12:32 +0200)]
Merge tag 'linux-can-fixes-for-6.15-20250521' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-05-22
this is a pull request of 4 patches for net/main.
The first 3 patches are by Axel Forsman and fix a ISR race condition
in the kvaser_pciefd driver.
The last patch is by Carlos Sanchez and fixes the reception of short
error messages in the slcan driver.
* tag 'linux-can-fixes-for-6.15-20250521' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: slcan: allow reception of short error messages
can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
can: kvaser_pciefd: Fix echo_skb race
can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
====================
This patch series includes fixes related to APR (LMT)
mapping and debugfs support.
Changes include:
Patch 1:Set LMT_ENA bit for APR table entries.
Enables the LMT line for each PF/VF by setting
the LMT_ENA bit in the APR_LMT_MAP_ENTRY_S
structure.
Patch-2:Fix APR entry in debugfs
The APR table was previously mapped using a fixed size,
which could lead to incorrect mappings when the number
of PFs and VFs differed from the assumed value.
This patch updates the logic to calculate the APR table
size dynamically, based on values from the APR_LMT_CFG
register, ensuring correct representation in debugfs.
====================
Geetha sowjanya [Wed, 21 May 2025 06:08:34 +0000 (11:38 +0530)]
octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
The current implementation maps the APR table using a fixed size,
which can lead to incorrect mapping when the number of PFs and VFs
varies.
This patch corrects the mapping by calculating the APR table
size dynamically based on the values configured in the
APR_LMT_CFG register, ensuring accurate representation
of APR entries in debugfs.
1) Fix some missing kfree_skb in the error paths of espintcp.
From Sabrina Dubroca.
2) Fix a reference leak in espintcp.
From Sabrina Dubroca.
3) Fix UDP GRO handling for ESPINUDP.
From Tobias Brunner.
4) Fix ipcomp truesize computation on the receive path.
From Sabrina Dubroca.
5) Sanitize marks before policy/state insertation.
From Paul Chaignon.
* tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Sanitize marks before insert
xfrm: ipcomp: fix truesize computation on receive
xfrm: Fix UDP GRO handling for some corner cases
espintcp: remove encap socket caching to avoid reference leak
espintcp: fix skb leaks
====================
Wang Liang [Tue, 20 May 2025 10:14:04 +0000 (18:14 +0800)]
net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
Syzbot reported a slab-use-after-free with the following call trace:
==================================================================
BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25
After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done
may still visit it in cryptd_queue_worker workqueue.
I reproduce this issue by:
ip netns add ns1
ip link add veth1 type veth peer name veth2
ip link set veth1 netns ns1
ip netns exec ns1 tipc bearer enable media eth dev veth1
ip netns exec ns1 tipc node set key this_is_a_master_key master
ip netns exec ns1 tipc bearer disable media eth dev veth1
ip netns del ns1
The key of reproduction is that, simd_aead_encrypt is interrupted, leading
to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is
triggered, and the tipc_crypto tx will be visited.
====================
net_sched: Fix HFSC qlen/backlog accounting bug and add selftest
This series addresses a long-standing bug in the HFSC qdisc where queue length
and backlog accounting could become inconsistent if a packet is dropped during
a peek-induced dequeue operation, and adds a corresponding selftest to tc-testing.
====================
Cong Wang [Sun, 18 May 2025 22:20:38 +0000 (15:20 -0700)]
selftests/tc-testing: Add an HFSC qlen accounting test
This test reproduces a scenario where HFSC queue length and backlog accounting
can become inconsistent when a peek operation triggers a dequeue and possible
drop before the parent qdisc updates its counters. The test sets up a DRR root
qdisc with an HFSC class, netem, and blackhole children, and uses Scapy to
inject a packet. It helps to verify that HFSC correctly tracks qlen and backlog
even when packets are dropped during peek-induced dequeue.
Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250518222038.58538-3-xiyou.wangcong@gmail.com Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cong Wang [Sun, 18 May 2025 22:20:37 +0000 (15:20 -0700)]
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.
This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.
Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline") Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Having adjacent accelerated modify header actions (so-called
pattern-argument actions) may result in inconsistent outcome.
These inconsistencies can take the form of writes to the same
field or a read coupled with a write to the same field. The
solution is to detect such dependencies and insert nops between
the offending actions.
The existing implementation had a few issues, which pretty much
required a complete rewrite of the code that handles these
dependencies.
In the new implementation we're doing the following:
* Checking any two adjacent actions for conflicts (not just
odd-even pairs).
* Marking 'set' and 'add' action fields as destination, rather
than source, for the purposes of checking for conflicts.
* Checking all types of actions ('add', 'set', 'copy') for
dependencies.
* Managing offsets of the args in the buffer - copy the action
args to the right place in the buffer.
* Checking that after inserting nops we're still within the number
of supported actions - return an error otherwise.
Yevgeny Kliteynik [Tue, 20 May 2025 18:46:41 +0000 (21:46 +0300)]
net/mlx5: HWS, fix typo - 'nope' to 'nop'
Fix typo - rename 'nope_locations' to 'nop_locations', which describes
the locations of 'nop' actions. To shorten the lines, this renaming
also required some refactoring.
Vlad Dogaru [Tue, 20 May 2025 18:46:40 +0000 (21:46 +0300)]
net/mlx5: HWS, register reformat actions with fw
Hardware steering handles actions differently from firmware, but for
termination rules that use encapsulation the firmware needs to be aware
of the action.
Fix this by registering reformat actions with the firmware the first
time this is needed. To do this, add a third possible owner for an
action, and also a lock to protect against registration of the same
action from different threads.
Vlad Dogaru [Tue, 20 May 2025 18:46:39 +0000 (21:46 +0300)]
net/mlx5: SWS, fix reformat id error handling
The firmware reformat id is a u32 and can't safely be returned as an
int. Because the functions also need a way to signal error, prefer to
return the id as an output parameter and keep the return code only for
success/error.
While we're at it, also extract some duplicate code to fetch the
reformat id from a more generic struct pkt_reformat.
Nelson Escobar [Wed, 21 May 2025 01:19:29 +0000 (01:19 +0000)]
net/enic: Allow at least 8 RQs to always be used
Enic started using netif_get_num_default_rss_queues() to set the number
of RQs used in commit cc94d6c4d40c ("enic: Adjust used MSI-X
wq/rq/cq/interrupt resources in a more robust way")
This resulted in machines with less than 16 cpus using less than 8 RQs.
Allow enic to use at least 8 RQs no matter how many cpus are in the
machine to not impact existing enic workloads after a kernel upgrade.
Reviewed-by: John Daley <johndale@cisco.com> Reviewed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250521-enic_min_8rq-v1-1-691bd2353273@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fan Gong [Tue, 20 May 2025 10:26:59 +0000 (13:26 +0300)]
hinic3: module initialization and tx/rx logic
This is [1/3] part of hinic3 Ethernet driver initial submission.
With this patch hinic3 is a valid kernel module but non-functional
driver.
The driver parts contained in this patch:
Module initialization.
PCI driver registration but with empty id_table.
Auxiliary driver registration.
Net device_ops registration but open/stop are empty stubs.
tx/rx logic.
All major data structures of the driver are fully introduced with the
code that uses them but without their initialization code that requires
management interface with the hw.
Alok Tiwari [Mon, 19 May 2025 14:17:19 +0000 (07:17 -0700)]
emulex/benet: correct command version selection in be_cmd_get_stats()
Logic here always sets hdr->version to 2 if it is not a BE3 or Lancer chip,
even if it is BE2. Use 'else if' to prevent multiple assignments, setting
version 0 for BE2, version 1 for BE3 and Lancer, and version 2 for others.
Fixes potential incorrect version setting when BE2_chip and
BE3_chip/lancer_chip checks could both be true.
Linus Torvalds [Thu, 22 May 2025 00:24:18 +0000 (17:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Fixes for some SoC clk drivers:
- Define the gate clk for the OTG PHY on Rockchip RK3576 so the nvmem
driver actually works
- Initialize clk_hw_onecell_data::num before accessing the 'hws'
array to keep UBSAN happy
- Fix a perf degradation on the Allwinner D1 MMC clk that was making
things half bad
- Fix the Allwinner SNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro to have
proper order of arguments"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()
clk: sunxi-ng: fix order of arguments in clock macro
clk: rockchip: rk3576: define clk_otp_phy_g
====================
net: phy: Add support for new Aeonsemi PHYs
Add support for new Aeonsemi 10G C45 PHYs. These PHYs intergate an IPC
to setup some configuration and require special handling to sync with
the parity bit. The parity bit is a way the IPC use to follow correct
order of command sent.
Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1,
AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1,
AS21210PB1 that all register with the PHY ID 0x7500 0x7500
before the firmware is loaded.
The big special thing about this PHY is that it does provide
a generic PHY ID in C45 register that change to the correct one
one the firmware is loaded.
In practice:
- MMD 0x7 ID 0x7500 0x9410 -> FW LOAD -> ID 0x7500 0x9422
To handle this, we operate on .match_phy_device where
we check the PHY ID, if the ID match the generic one,
we load the firmware and we return 0 (PHY driver doesn't
match). Then PHY core will try the next PHY driver in the list
and this time the PHY is correctly filled in and we register
for it.
To help in the matching and not modify part of the PHY device
struct, .match_phy_device is extended to provide also the
current phy_driver is trying to match for. This add the
extra benefits that some other PHY can simplify their
.match_phy_device OP.
====================
Christian Marangi [Sat, 17 May 2025 20:13:50 +0000 (22:13 +0200)]
dt-bindings: net: Document support for Aeonsemi PHYs
Add Aeonsemi PHYs and the requirement of a firmware to correctly work.
Also document the max number of LEDs supported and what PHY ID expose
when no firmware is loaded.
Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1,
AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1,
AS21210PB1 that all register with the PHY ID 0x7500 0x9410 on C45
registers before the firmware is loaded.
Christian Marangi [Sat, 17 May 2025 20:13:49 +0000 (22:13 +0200)]
net: phy: Add support for Aeonsemi AS21xxx PHYs
Add support for Aeonsemi AS21xxx 10G C45 PHYs. These PHYs integrate
an IPC to setup some configuration and require special handling to
sync with the parity bit. The parity bit is a way the IPC use to
follow correct order of command sent.
Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1,
AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1,
AS21210PB1 that all register with the PHY ID 0x7500 0x7510
before the firmware is loaded.
They all support up to 5 LEDs with various HW mode supported.
While implementing it was found some strange coincidence with using the
same logic for implementing C22 in MMD regs in Broadcom PHYs.
For reference here the AS21xxx PHY name logic:
AS21x1xxB1
^ ^^
| |J: Supports SyncE/PTP
| |P: No SyncE/PTP support
| 1: Supports 2nd Serdes
| 2: Not 2nd Serdes support
0: 10G, 5G, 2.5G
5: 5G, 2.5G
2: 2.5G
Christian Marangi [Sat, 17 May 2025 20:13:48 +0000 (22:13 +0200)]
net: phy: introduce genphy_match_phy_device()
Introduce new API, genphy_match_phy_device(), to provide a way to check
to match a PHY driver for a PHY device based on the info stored in the
PHY device struct.
The function generalize the logic used in phy_bus_match() to check the
PHY ID whether if C45 or C22 ID should be used for matching.
This is useful for custom .match_phy_device function that wants to use
the generic logic under some condition. (example a PHY is already setup
and provide the correct PHY ID)
Christian Marangi [Sat, 17 May 2025 20:13:47 +0000 (22:13 +0200)]
net: phy: nxp-c45-tja11xx: simplify .match_phy_device OP
Simplify .match_phy_device OP by using a generic function and using the
new phy_id PHY driver info instead of hardcoding the matching PHY ID
with new variant for macsec and no_macsec PHYs.
Also make use of PHY_ID_MATCH_MODEL macro and drop PHY_ID_MASK define to
introduce phy_id and phy_id_mask again in phy_driver struct.
Christian Marangi [Sat, 17 May 2025 20:13:45 +0000 (22:13 +0200)]
net: phy: pass PHY driver to .match_phy_device OP
Pass PHY driver pointer to .match_phy_device OP in addition to phydev.
Having access to the PHY driver struct might be useful to check the
PHY ID of the driver is being matched for in case the PHY ID scanned in
the phydev is not consistent.
A scenario for this is a PHY that change PHY ID after a firmware is
loaded, in such case, the PHY ID stored in PHY device struct is not
valid anymore and PHY will manually scan the ID in the match_phy_device
function.
Having the PHY driver info is also useful for those PHY driver that
implement multiple simple .match_phy_device OP to match specific MMD PHY
ID. With this extra info if the parsing logic is the same, the matching
function can be generalized by using the phy_id in the PHY driver
instead of hardcoding.
Rust wrapper callback is updated to align to the new match_phy_device
arguments.
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Benno Lossin <lossin@kernel.org> # for Rust Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-2-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jijie Shao [Sat, 17 May 2025 09:58:28 +0000 (17:58 +0800)]
net: hibmcge: fix wrong ndo.open() after reset fail issue.
If the driver reset fails, it may not work properly.
Therefore, the ndo.open() operation should be rejected.
In this patch, the driver calls netif_device_detach()
before the reset and calls netif_device_attach()
after the reset succeeds. If the reset fails,
netif_device_attach() is not called. Therefore,
netdev does not present and cannot be opened.
If reset fails, only the PCI reset (via sysfs)
can be used to attempt recovery.
Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the user dumps statistics, the hibmcge driver automatically
updates all statistics. If the driver is performing the reset operation,
the error data of 0xFFFFFFFF is updated.
Therefore, if the driver is resetting, the hbg_update_stats_by_info()
needs to return directly.
Fixes: c0bf9bf31e79 ("net: hibmcge: Add support for dump statistics") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: faster and simpler CRC32C computation
Update networking code that computes the CRC32C of packets to just call
crc32c() without unnecessary abstraction layers. The result is faster
and simpler code.
Patches 1-7 add skb_crc32c() and remove the overly-abstracted and
inefficient __skb_checksum().
Patches 8-10 replace skb_copy_and_hash_datagram_iter() with
skb_copy_and_crc32c_datagram_iter(), eliminating the unnecessary use of
the crypto layer. This unblocks the conversion of nvme-tcp to call
crc32c() directly instead of using the crypto layer, which patch 9 does.
Eric Biggers [Mon, 19 May 2025 17:50:11 +0000 (10:50 -0700)]
nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
Now that the crc32c() library function directly takes advantage of
architecture-specific optimizations and there also now exists a function
skb_copy_and_crc32c_datagram_iter(), it is unnecessary to go through the
crypto_ahash API. Just use those functions. This is much simpler, and
it also improves performance due to eliminating the crypto API overhead.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-10-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Biggers [Mon, 19 May 2025 17:50:10 +0000 (10:50 -0700)]
net: add skb_copy_and_crc32c_datagram_iter()
Since skb_copy_and_hash_datagram_iter() is used only with CRC32C, the
crypto_ahash abstraction provides no value. Add
skb_copy_and_crc32c_datagram_iter() which just calls crc32c() directly.
This is faster and simpler. It also doesn't have the weird dependency
issue where skb_copy_and_hash_datagram_iter() depends on
CONFIG_CRYPTO_HASH=y without that being expressed explicitly in the
kconfig (presumably because it was too heavyweight for NET to select).
The new function is conditional on the hidden boolean symbol NET_CRC32C,
which selects CRC32. So it gets compiled only when something that
actually needs CRC32C packet checksums is enabled, it has no implicit
dependency, and it doesn't depend on the heavyweight crypto layer.
Eric Biggers [Mon, 19 May 2025 17:50:09 +0000 (10:50 -0700)]
lib/crc32: remove unused support for CRC32C combination
crc32c_combine() and crc32c_shift() are no longer used (except by the
KUnit test that tests them), and their current implementation is very
slow. Remove them.
Eric Biggers [Mon, 19 May 2025 17:50:08 +0000 (10:50 -0700)]
net: fold __skb_checksum() into skb_checksum()
Now that the only remaining caller of __skb_checksum() is
skb_checksum(), fold __skb_checksum() into skb_checksum(). This makes
struct skb_checksum_ops unnecessary, so remove that too and simply do
the "regular" net checksum. It also makes the wrapper functions
csum_partial_ext() and csum_block_add_ext() unnecessary, so remove those
too and just use the underlying functions.
Eric Biggers [Mon, 19 May 2025 17:50:07 +0000 (10:50 -0700)]
sctp: use skb_crc32c() instead of __skb_checksum()
Make sctp_compute_cksum() just use the new function skb_crc32c(),
instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C. This is faster and simpler.
Eric Biggers [Mon, 19 May 2025 17:50:06 +0000 (10:50 -0700)]
RDMA/siw: use skb_crc32c() instead of __skb_checksum()
Instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C, just call the new function skb_crc32c(). This is faster
and simpler.
Acked-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Biggers [Mon, 19 May 2025 17:50:05 +0000 (10:50 -0700)]
net: use skb_crc32c() in skb_crc32c_csum_help()
Instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C, just call the new function skb_crc32c(). This is faster
and simpler.
Eric Biggers [Mon, 19 May 2025 17:50:04 +0000 (10:50 -0700)]
net: add skb_crc32c()
Add skb_crc32c(), which calculates the CRC32C of a sk_buff. It will
replace __skb_checksum(), which unnecessarily supports arbitrary
checksums. Compared to __skb_checksum(), skb_crc32c():
- Uses the correct type for CRC32C values (u32, not __wsum).
- Does not require the caller to provide a skb_checksum_ops struct.
- Is faster because it does not use indirect calls and does not use
the very slow crc32c_combine().
According to commit 2817a336d4d5 ("net: skb_checksum: allow custom
update/combine for walking skb") which added __skb_checksum(), the
original motivation for the abstraction layer was to avoid code
duplication for CRC32C and other checksums in the future. However:
- No additional checksums showed up after CRC32C. __skb_checksum()
is only used with the "regular" net checksum and CRC32C.
- Indirect calls are expensive. Commit 2544af0344ba ("net: avoid
indirect calls in L4 checksum calculation") worked around this
using the INDIRECT_CALL_1 macro. But that only avoided the indirect
call for the net checksum, and at the cost of an extra branch.
- The checksums use different types (__wsum and u32), causing casts
to be needed.
- It made the checksums of fragments be combined (rather than
chained) for both checksums, despite this being highly
counterproductive for CRC32C due to how slow crc32c_combine() is.
This can clearly be seen in commit 4c2f24549644 ("sctp: linearize
early if it's not GSO") which tried to work around this performance
bug. With a dedicated function for each checksum, we can instead
just use the proper strategy for each checksum.
As shown by the following tables, the new function skb_crc32c() is
faster than __skb_checksum(), with the improvement varying greatly from
5% to 2500% depending on the case. The largest improvements come from
fragmented packets, mainly due to eliminating the inefficient
crc32c_combine(). But linear packets are improved too, especially
shorter ones, mainly due to eliminating indirect calls. These
benchmarks were done on AMD Zen 5. On that CPU, Linux uses IBRS instead
of retpoline; an even greater improvement might be seen with retpoline:
Eric Biggers [Mon, 19 May 2025 17:50:03 +0000 (10:50 -0700)]
net: introduce CONFIG_NET_CRC32C
Add a hidden kconfig symbol NET_CRC32C that will group together the
functions that calculate CRC32C checksums of packets, so that these
don't have to be built into NET-enabled kernels that don't need them.
Make skb_crc32c_csum_help() (which is called only when IP_SCTP is
enabled) conditional on this symbol, and make IP_SCTP select it.
====================
tools: ynl-gen: add support for "inherited" selector and therefore TC
Add C codegen support for constructs needed by TC, namely passing
sub-message selector from a lower nest, and sub-messages with
fixed headers.
====================