The buggy address belongs to the object at ffff888043eba000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 432 bytes inside of
freed 2048-byte region [ffff888043eba000, ffff888043eba800)
Fixes: 8c55facecd7a ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down") Reported-by: syzbot+1939f24bdb783e9e43d9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/674f3a18.050a0220.48a03.0041.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241203170933.2449307-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Dec 2024 10:49:14 +0000 (11:49 +0100)]
Merge tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix esoteric undefined behaviour due to uninitialized stack access
in ip_vs_protocol_init(), from Jinghao Jia.
2) Fix iptables xt_LED slab-out-of-bounds due to incorrect sanitization
of the led string identifier, reported by syzbot. Patch from
Dmitry Antipov.
3) Remove WARN_ON_ONCE reachable from userspace to check for the maximum
cgroup level, nft_socket cgroup matching is restricted to 255 levels,
but cgroups allow for INT_MAX levels by default. Reported by syzbot.
4) Fix nft_inner incorrect use of percpu area to store tunnel parser
context with softirqs, resulting in inconsistent inner header
offsets that could lead to bogus rule mismatches, reported by syzbot.
5) Grab module reference on ipset core while requesting set type modules,
otherwise kernel crash is possible by removing ipset core module,
patch from Phil Sutter.
6) Fix possible double-free in nft_hash garbage collector due to unstable
walk interator that can provide twice the same element. Use a sequence
number to skip expired/dead elements that have been already scheduled
for removal. Based on patch from Laurent Fasnach
netfilter pull request 24-12-05
* tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
netfilter: nft_inner: incorrect percpu area handling under softirq
netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
netfilter: x_tables: fix LED ID check in led_tg_check()
ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
====================
Parameters were created using wrong C types, which caused them to be of
wrong size on some architectures, causing problems.
The problem with SO_RCVLOWAT was found on s390 (big endian), while x86-64
didn't show it. After the fix, all tests pass on s390.
Then Stefano Garzarella pointed out that SO_VM_SOCKETS_* calls might have
a similar problem, which turned out to be true, hence, the second patch.
Changes for v8:
- Fix whitespace warnings from "checkpatch.pl --strict"
- Add maintainers to Cc:
Changes for v7:
- Rebase on top of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
- Add the "net" tags to the subjects
Changes for v6:
- rework the patch #3 to avoid creating a new file for new functions,
and exclude vsock_perf from calling the new functions.
- add "Reviewed-by:" to the patch #2.
Changes for v5:
- in the patch #2 replace the introduced uint64_t with unsigned long long
to match documentation
- add a patch #3 that verifies every setsockopt() call.
Changes for v4:
- add "Reviewed-by:" to the first patch, and add a second patch fixing
SO_VM_SOCKETS_* calls, which depends on the first one (hence, it's now
a patch series.)
Changes for v3:
- fix the same problem in vsock_perf and update commit message
Changes for v2:
- add "Fixes:" lines to the commit message
====================
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:56 +0000 (09:06 -0600)]
vsock/test: verify socket options after setting them
Replace setsockopt() calls with calls to functions that follow
setsockopt() with getsockopt() and check that the returned value and its
size are the same as have been set. (Except in vsock_perf.)
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:55 +0000 (09:06 -0600)]
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
Change parameters of SO_VM_SOCKETS_* to unsigned long long as documented
in the vm_sockets.h, because the corresponding kernel code requires them
to be at least 64-bit, no matter what architecture. Otherwise they are
too small on 32-bit machines.
Fixes: 5c338112e48a ("test/vsock: rework message bounds test") Fixes: 685a21c314a8 ("test/vsock: add big message test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:54 +0000 (09:06 -0600)]
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
This happens on 64-bit big-endian machines.
SO_RCVLOWAT requires an int parameter. However, instead of int, the test
uses unsigned long in one place and size_t in another. Both are 8 bytes
long on 64-bit machines. The kernel, having received the 8 bytes, doesn't
test for the exact size of the parameter, it only cares that it's >=
sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on
a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT
returns with success and the socket stays with the default SO_RCVLOWAT = 1,
which results in vsock_test failures, while vsock_perf doesn't even notice
that it's failed to change it.
Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This series contains updates to ice, idpf, ixgbe, ixgbevf, and igb
drivers.
For ice:
Arkadiusz corrects search for determining whether PHY clock recovery is
supported on the device.
Przemyslaw corrects mask used for PHY timestamps on ETH56G devices.
Wojciech adds missing virtchnl ops which caused NULL pointer
dereference.
Marcin fixes VLAN filter settings for uplink VSI in switchdev mode.
For idpf:
Josh restores setting of completion tag for empty buffers.
For ixgbevf:
Jake removes incorrect initialization/support of IPSEC for mailbox
version 1.5.
For ixgbe:
Jake rewords and downgrades misleading message when negotiation
of VF mailbox version is not supported.
Tore Amundsen corrects value for BASE-BX10 capability.
For igb:
Yuan Can adds proper teardown on failed pci_register_driver() call.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igb: Fix potential invalid memory access in igb_init_module()
ixgbe: Correct BASE-BX10 compliance code
ixgbe: downgrade logging of unsupported VF API version to debug
ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5
idpf: set completion tag for "empty" bufs associated with a packet
ice: Fix VLAN pruning in switchdev mode
ice: Fix NULL pointer dereference in switchdev
ice: fix PHY timestamp extraction for ETH56G
ice: fix PHY Clock Recovery availability check
====================
Jianbo Liu [Tue, 3 Dec 2024 20:49:20 +0000 (22:49 +0200)]
net/mlx5e: Remove workaround to avoid syndrome for internal port
Previously a workaround was added to avoid syndrome 0xcdb051. It is
triggered when offload a rule with tunnel encapsulation, and
forwarding to another table, but not matching on the internal port in
firmware steering mode. The original workaround skips internal tunnel
port logic, which is not correct as not all cases are considered. As
an example, if vlan is configured on the uplink port, traffic can't
pass because vlan header is not added with this workaround. Besides,
there is no such issue for software steering. So, this patch removes
that, and returns error directly if trying to offload such rule for
firmware steering.
Fixes: 06b4eac9c4be ("net/mlx5e: Don't offload internal port if filter device is out device") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Tested-by: Frode Nordahl <frode.nordahl@canonical.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan [Tue, 3 Dec 2024 20:49:19 +0000 (22:49 +0200)]
net/mlx5e: SD, Use correct mdev to build channel param
In a multi-PF netdev, each traffic channel creates its own resources
against a specific PF.
In the cited commit, where this support was added, the channel_param
logic was mistakenly kept unchanged, so it always used the primary PF
which is found at priv->mdev.
In this patch we fix this by moving the logic to be per-channel, and
passing the correct mdev instance.
This bug happened to be usually harmless, as the resulting cparam
structures would be the same for all channels, due to identical FW logic
and decisions.
However, in some use cases, like fwreset, this gets broken.
This could lead to different symptoms. Example:
Error cqe on cqn 0x428, ci 0x0, qn 0x10a9, opcode 0xe, syndrome 0x4,
vendor syndrome 0x32
Fixes: e4f9686bdee7 ("net/mlx5e: Let channels be SD-aware") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20241203204920.232744-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patrisious Haddad [Tue, 3 Dec 2024 20:49:18 +0000 (22:49 +0200)]
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
Fix the mentioned commit change for MPV mode, since in MPV mode the IB
device is shared between different core devices, so under this change
when moving both devices simultaneously to switchdev mode the IB device
removal and re-addition can race with itself causing unexpected behavior.
In such case do rescan_drivers() only once in order to add the ethernet
representor auxiliary device, and skip adding and removing IB devices.
Patrisious Haddad [Tue, 3 Dec 2024 20:49:17 +0000 (22:49 +0200)]
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
In case that IB device is already disabled when moving to switchdev mode,
which can happen when working with LAG, need to do rescan_drivers()
before leaving in order to add ethernet representor auxiliary device.
====================
bnxt_en: support header page pool in queue API
Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
separate page pool for header frags. Now, frags are allocated from this
header page pool e.g. rxr->tpa_info.data.
The queue API did not properly handle rxr->tpa_info and so using the
queue API to i.e. reset any queues will result in pages being returned
to the incorrect page pool, causing inflight != 0 warnings.
Fix this bug by properly allocating/freeing tpa_info and copying/freeing
head_pool in the queue API implementation.
The 1st patch is a prep patch that refactors helpers out to be used by
the implementation patch later.
The 2nd patch is a drive-by refactor. Happy to take it out and re-send
to net-next if there are any objections.
The 3rd patch is the implementation patch that will properly alloc/free
rxr->tpa_info.
====================
David Wei [Wed, 4 Dec 2024 04:10:22 +0000 (20:10 -0800)]
bnxt_en: handle tpa_info in queue API implementation
Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.
But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.
Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.
Ido Schimmel [Tue, 3 Dec 2024 15:16:05 +0000 (16:16 +0100)]
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
The driver is currently using an ACL key block that is not supported by
Spectrum-4. This works because the driver is only using a single field
from this key block which is located in the same offset in the
equivalent Spectrum-4 key block.
The issue was discovered when the firmware started rejecting the use of
the unsupported key block. The change has been reverted to avoid
breaking users that only update their firmware.
Nonetheless, fix the issue by using the correct key block.
Kory Maincent [Mon, 2 Dec 2024 15:33:57 +0000 (16:33 +0100)]
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
A bitset without mask in a _SET request means we want exactly the bits in
the bitset to be set. This works correctly for compact format but when
verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
bits present in the request bitset but does not clear the rest. The commit 6699170376ab ("ethtool: fix application of verbose no_mask bitset") fixes
this issue by clearing the whole target bitmap before we start iterating.
The solution proposed brought an issue with the behavior of the mod
variable. As the bitset is always cleared the old value will always
differ to the new value.
Fix it by adding a new function to compare bitmaps and a temporary variable
which save the state of the old bitmap.
The root cause is a network namespace creation failing after successful
initialization of the ipmr subsystem. Such a case is not currently
matched by the ipmr_can_free_table() helper.
New namespaces are zeroed on allocation and inserted into net ns list
only after successful creation; when deleting an ipmr table, the list
next pointer can be NULL only on netns initialization failure.
Update the ipmr_can_free_table() checks leveraging such condition.
Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+6e8cb445d4b43d006e0c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e8cb445d4b43d006e0c Fixes: 11b6e701bce9 ("ipmr: add debug check for mr table cleanup") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/8bde975e21bbca9d9c27e36209b2dd4f1d7a3f00.1733212078.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso [Sun, 1 Dec 2024 23:04:49 +0000 (00:04 +0100)]
netfilter: nft_set_hash: skip duplicated elements pending gc run
rhashtable does not provide stable walk, duplicated elements are
possible in case of resizing. I considered that checking for errors when
calling rhashtable_walk_next() was sufficient to detect the resizing.
However, rhashtable_walk_next() returns -EAGAIN only at the end of the
iteration, which is too late, because a gc work containing duplicated
elements could have been already scheduled for removal to the worker.
Add a u32 gc worker sequence number per set, bump it on every workqueue
run. Annotate gc worker sequence number on the expired element. Use it
to skip those already seen in this gc workqueue run.
Note that this new field is never reset in case gc transaction fails, so
next gc worker run on the expired element overrides it. Wraparound of gc
worker sequence number should not be an issue with stale gc worker
sequence number in the element, that would just postpone the element
removal in one gc run.
Note that it is not possible to use flags to annotate that element is
pending gc run to detect duplicates, given that gc transaction can be
invalidated in case of update from the control plane, therefore, not
allowing to clear such flag.
On x86_64, pahole reports no changes in the size of nft_rhash_elem.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Tested-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 29 Nov 2024 15:30:38 +0000 (16:30 +0100)]
netfilter: ipset: Hold module reference while requesting a module
User space may unload ip_set.ko while it is itself requesting a set type
backend module, leading to a kernel crash. The race condition may be
provoked by inserting an mdelay() right after the nfnl_unlock() call.
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lion Ackermann [Mon, 2 Dec 2024 16:22:57 +0000 (17:22 +0100)]
net: sched: fix ordering of qlen adjustment
Changes to sch->q.qlen around qdisc_tree_reduce_backlog() need to happen
_before_ a call to said function because otherwise it may fail to notify
parent qdiscs when the child is about to become empty.
Signed-off-by: Lion Ackermann <nnamrec@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 2 Dec 2024 15:21:38 +0000 (10:21 -0500)]
net: sched: fix erspan_opt settings in cls_flower
When matching erspan_opt in cls_flower, only the (version, dir, hwid)
fields are relevant. However, in fl_set_erspan_opt() it initializes
all bits of erspan_opt and its mask to 1. This inadvertently requires
packets to match not only the (version, dir, hwid) fields but also the
other fields that are unexpectedly set to 1.
This patch resolves the issue by ensuring that only the (version, dir,
hwid) fields are configured in fl_set_erspan_opt(), leaving the other
fields to 0 in erspan_opt.
Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Mon, 2 Dec 2024 16:48:05 +0000 (18:48 +0200)]
ethtool: Fix access to uninitialized fields in set RXNFC command
The check for non-zero ring with RSS is only relevant for
ETHTOOL_SRXCLSRLINS command, in other cases the check tries to access
memory which was not initialized by the userspace tool. Only perform the
check in case of ETHTOOL_SRXCLSRLINS.
Without this patch, filter deletion (for example) could statistically
result in a false error:
# ethtool --config-ntuple eth3 delete 484
rmgr: Cannot delete RX class rule: Invalid argument
Cannot delete classification rule
Fernando Fernandez Mancera [Mon, 2 Dec 2024 15:56:08 +0000 (15:56 +0000)]
Revert "udp: avoid calling sock_def_readable() if possible"
This reverts commit 612b1c0dec5bc7367f90fc508448b8d0d7c05414. On a
scenario with multiple threads blocking on a recvfrom(), we need to call
sock_def_readable() on every __udp_enqueue_schedule_skb() otherwise the
threads won't be woken up as __skb_wait_for_more_packets() is using
prepare_to_wait_exclusive().
Joe Damato [Mon, 2 Dec 2024 18:21:02 +0000 (18:21 +0000)]
net: Make napi_hash_lock irq safe
Make napi_hash_lock IRQ safe. It is used during the control path, and is
taken and released in napi_hash_add and napi_hash_del, which will
typically be called by calls to napi_enable and napi_disable.
This change avoids a deadlock in pcnet32 (and other any other drivers
which follow the same pattern):
CPU 0:
pcnet32_open
spin_lock_irqsave(&lp->lock, ...)
napi_enable
napi_hash_add <- before this executes, CPU 1 proceeds
spin_lock(napi_hash_lock)
[...]
spin_unlock_irqrestore(&lp->lock, flags);
Pablo Neira Ayuso [Wed, 27 Nov 2024 11:46:54 +0000 (12:46 +0100)]
netfilter: nft_inner: incorrect percpu area handling under softirq
Softirq can interrupt ongoing packet from process context that is
walking over the percpu area that contains inner header offsets.
Disable bh and perform three checks before restoring the percpu inner
header offsets to validate that the percpu area is valid for this
skbuff:
1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
has already been parsed before for inner header fetching to
register.
2) Validate that the percpu area refers to this skbuff using the
skbuff pointer as a cookie. If there is a cookie mismatch, then
this skbuff needs to be parsed again.
3) Finally, validate if the percpu area refers to this tunnel type.
Only after these three checks the percpu area is restored to a on-stack
copy and bh is enabled again.
After inner header fetching, the on-stack copy is stored back to the
percpu area.
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yuan Can [Wed, 23 Oct 2024 12:10:48 +0000 (20:10 +0800)]
igb: Fix potential invalid memory access in igb_init_module()
The pci_register_driver() can fail and when this happened, the dca_notifier
needs to be unregistered, otherwise the dca_notifier can be called when
igb fails to install, resulting to invalid memory access.
Fixes: bbd98fe48a43 ("igb: Fix DCA errors and do not use context index for 82576") Signed-off-by: Yuan Can <yuancan@huawei.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tore Amundsen [Fri, 15 Nov 2024 14:17:36 +0000 (14:17 +0000)]
ixgbe: Correct BASE-BX10 compliance code
SFF-8472 (section 5.4 Transceiver Compliance Codes) defines bit 6 as
BASE-BX10. Bit 6 means a value of 0x40 (decimal 64).
The current value in the source code is 0x64, which appears to be a
mix-up of hex and decimal values. A value of 0x64 (binary 01100100)
incorrectly sets bit 2 (1000BASE-CX) and bit 5 (100BASE-FX) as well.
Fixes: 1b43e0d20f2d ("ixgbe: Add 1000BASE-BX support") Signed-off-by: Tore Amundsen <tore@amundsen.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Ernesto Castellotti <ernesto@castellotti.net> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Fri, 1 Nov 2024 23:05:43 +0000 (16:05 -0700)]
ixgbe: downgrade logging of unsupported VF API version to debug
The ixgbe PF driver logs an info message when a VF attempts to negotiate an
API version which it does not support:
VF 0 requested invalid api version 6
The ixgbevf driver attempts to load with mailbox API v1.5, which is
required for best compatibility with other hosts such as the ESX VMWare PF.
The Linux PF only supports API v1.4, and does not currently have support
for the v1.5 API.
The logged message can confuse users, as the v1.5 API is valid, but just
happens to not currently be supported by the Linux PF.
Downgrade the info message to a debug message, and fix the language to
use 'unsupported' instead of 'invalid' to improve message clarity.
Long term, we should investigate whether the improvements in the v1.5 API
make sense for the Linux PF, and if so implement them properly. This may
require yet another API version to resolve issues with negotiating IPSEC
offload support.
Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Reported-by: Yifei Liu <yifei.l.liu@oracle.com> Link: https://lore.kernel.org/intel-wired-lan/20240301235837.3741422-1-yifei.l.liu@oracle.com/ Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Fri, 1 Nov 2024 23:05:42 +0000 (16:05 -0700)]
ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5
Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication
between PF and VF") added support for v1.5 of the PF to VF mailbox
communication API. This commit mistakenly enabled IPSEC offload for API
v1.5.
No implementation of the v1.5 API has support for IPSEC offload. This
offload is only supported by the Linux PF as mailbox API v1.4. In fact, the
v1.5 API is not implemented in any Linux PF.
Attempting to enable IPSEC offload on a PF which supports v1.5 API will not
work. Only the Linux upstream ixgbe and ixgbevf support IPSEC offload, and
only as part of the v1.4 API.
Fix the ixgbevf Linux driver to stop attempting IPSEC offload when
the mailbox API does not support it.
The existing API design choice makes it difficult to support future API
versions, as other non-Linux hosts do not implement IPSEC offload. If we
add support for v1.5 to the Linux PF, then we lose support for IPSEC
offload.
A full solution likely requires a new mailbox API with a proper negotiation
to check that IPSEC is actually supported by the host.
Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Joshua Hay [Mon, 7 Oct 2024 20:24:35 +0000 (13:24 -0700)]
idpf: set completion tag for "empty" bufs associated with a packet
Commit d9028db618a6 ("idpf: convert to libeth Tx buffer completion")
inadvertently removed code that was necessary for the tx buffer cleaning
routine to iterate over all buffers associated with a packet.
When a frag is too large for a single data descriptor, it will be split
across multiple data descriptors. This means the frag will span multiple
buffers in the buffer ring in order to keep the descriptor and buffer
ring indexes aligned. The buffer entries in the ring are technically
empty and no cleaning actions need to be performed. These empty buffers
can precede other frags associated with the same packet. I.e. a single
packet on the buffer ring can look like:
The cleaning routine iterates through these buffers based on a matching
completion tag. If the completion tag is not set for buf2, the loop will
end prematurely. Frag2 will be left uncleaned and next_to_clean will be
left pointing to the end of packet, which will break the cleaning logic
for subsequent cleans. This consequently leads to tx timeouts.
Assign the empty bufs the same completion tag for the packet to ensure
the cleaning routine iterates over all of the buffers associated with
the packet.
Fixes: d9028db618a6 ("idpf: convert to libeth Tx buffer completion") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Madhu chittim <madhu.chittim@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Marcin Szycik [Mon, 4 Nov 2024 18:49:09 +0000 (19:49 +0100)]
ice: Fix VLAN pruning in switchdev mode
In switchdev mode the uplink VSI should receive all unmatched packets,
including VLANs. Therefore, VLAN pruning should be disabled if uplink is
in switchdev mode. It is already being done in ice_eswitch_setup_env(),
however the addition of ice_up() in commit 44ba608db509 ("ice: do
switchdev slow-path Rx using PF VSI") caused VLAN pruning to be
re-enabled after disabling it.
Add a check to ice_set_vlan_filtering_features() to ensure VLAN
filtering will not be enabled if uplink is in switchdev mode. Note that
ice_is_eswitch_mode_switchdev() is being used instead of
ice_is_switchdev_running(), as the latter would only return true after
the whole switchdev setup completes.
Fixes: 44ba608db509 ("ice: do switchdev slow-path Rx using PF VSI") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wojciech Drewek [Tue, 29 Oct 2024 09:42:59 +0000 (10:42 +0100)]
ice: Fix NULL pointer dereference in switchdev
Commit 608a5c05c39b ("virtchnl: support queue rate limit and quanta
size configuration") introduced new virtchnl ops:
- get_qos_caps
- cfg_q_bw
- cfg_q_quanta
New ops were added to ice_virtchnl_dflt_ops, in
commit 015307754a19 ("ice: Support VF queue rate limit and quanta
size configuration"), but not to the ice_virtchnl_repr_ops. Because
of that, if we get one of those messages in switchdev mode we end up
with NULL pointer dereference:
To check if PHY Clock Recovery mechanic is available for a device, there
is a need to verify if given PHY is available within the netlist, but the
netlist node type used for the search is wrong, also the search context
shall be specified.
Modify the search function to allow specifying the context in the
search.
Use the PHY node type instead of CLOCK CONTROLLER type, also use proper
search context which for PHY search is PORT, as defined in E810
Datasheet [1] ('3.3.8.2.4 Node Part Number and Node Options (0x0003)' and
'Table 3-105. Program Topology Device NVM Admin Command').
Cong Wang [Fri, 29 Nov 2024 21:25:19 +0000 (13:25 -0800)]
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create
peer devices, once in rtnl_add_peer_net() and once in each ->newlink()
implementation.
This looks safer, however, it leads to a classic Time-of-Check to
Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And
because of the lack of checking error pointer of the second call, it
also leads to a kernel crash as reported by syzbot.
Fix this by getting rid of the second call, which already becomes
redudant after Kuniyuki's work. We have to propagate the result of the
first rtnl_link_get_net_ifla() down to each ->newlink().
Reported-by: syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Louis Leseur [Thu, 28 Nov 2024 08:33:58 +0000 (09:33 +0100)]
net/qed: allow old cards not supporting "num_images" to work
Commit 43645ce03e00 ("qed: Populate nvm image attribute shadow.")
added support for populating flash image attributes, notably
"num_images". However, some cards were not able to return this
information. In such cases, the driver would return EINVAL, causing the
driver to exit.
Add check to return EOPNOTSUPP instead of EINVAL when the card is not
able to return these information. The caller function already handles
EOPNOTSUPP without error.
Wen Gu [Wed, 27 Nov 2024 13:30:14 +0000 (21:30 +0800)]
net/smc: fix LGR and link use-after-free issue
We encountered a LGR/link use-after-free issue, which manifested as
the LGR/link refcnt reaching 0 early and entering the clear process,
making resource access unsafe.
It is caused by repeated release of LGR/link refcnt. One suspect is that
smc_conn_free() is called repeatedly because some smc_conn_free() from
server listening path are not protected by sock lock.
So here add sock lock protection in smc_listen_work() path, making it
exclusive with other connection operations.
Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Co-developed-by: Kai <KaiShen@linux.alibaba.com> Signed-off-by: Kai <KaiShen@linux.alibaba.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This is because when smc_close_cancel_work is triggered, e.g. the RDMA
driver is rmmod and the LGR is terminated, the conn->close_work is
flushed before initialization, resulting in WARN_ON(!work->func).
So fix this by initializing close_work before establishing the
connection.
Fixes: 46c28dbd4c23 ("net/smc: no socket state changes in tasklet context") Fixes: 413498440e30 ("net/smc: add SMC-D support in af_smc") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Wed, 27 Nov 2024 05:05:12 +0000 (14:05 +0900)]
tipc: Fix use-after-free of kernel socket in cleanup_bearer().
syzkaller reported a use-after-free of UDP kernel socket
in cleanup_bearer() without repro. [0][1]
When bearer_disable() calls tipc_udp_disable(), cleanup
of the UDP kernel socket is deferred by work calling
cleanup_bearer().
tipc_net_stop() waits for such works to finish by checking
tipc_net(net)->wq_count. However, the work decrements the
count too early before releasing the kernel socket,
unblocking cleanup_net() and resulting in use-after-free.
Let's move the decrement after releasing the socket in
cleanup_bearer().
[0]:
ref_tracker: net notrefcnt@000000009b3d1faf has 1/1 users at
sk_alloc+0x438/0x608
inet_create+0x4c8/0xcb0
__sock_create+0x350/0x6b8
sock_create_kern+0x58/0x78
udp_sock_create4+0x68/0x398
udp_sock_create+0x88/0xc8
tipc_udp_enable+0x5e8/0x848
__tipc_nl_bearer_enable+0x84c/0xed8
tipc_nl_bearer_enable+0x38/0x60
genl_family_rcv_msg_doit+0x170/0x248
genl_rcv_msg+0x400/0x5b0
netlink_rcv_skb+0x1dc/0x398
genl_rcv+0x44/0x68
netlink_unicast+0x678/0x8b0
netlink_sendmsg+0x5e4/0x898
____sys_sendmsg+0x500/0x830
Ivan Solodovnikov [Tue, 26 Nov 2024 14:39:02 +0000 (17:39 +0300)]
dccp: Fix memory leak in dccp_feat_change_recv
If dccp_feat_push_confirm() fails after new value for SP feature was accepted
without reconciliation ('entry == NULL' branch), memory allocated for that value
with dccp_feat_clone_sp_val() is never freed.
Jiri Wiesner [Thu, 28 Nov 2024 08:59:50 +0000 (09:59 +0100)]
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was 92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2b22. The bug was introduced in 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.
Oleksij Rempel [Mon, 25 Nov 2024 08:40:50 +0000 (09:40 +0100)]
net: phy: microchip: Reset LAN88xx PHY to ensure clean link state on LAN7800/7850
Fix outdated MII_LPA data in the LAN88xx PHY, which is used in LAN7800
and LAN7850 USB Ethernet controllers. Due to a hardware limitation, the
PHY cannot reliably update link status after parallel detection when the
link partner does not support auto-negotiation. To mitigate this, add a
PHY reset in `lan88xx_link_change_notify()` when `phydev->state` is
`PHY_NOLINK`, ensuring the PHY starts in a clean state and reports
accurate fixed link parallel detection results.
Jakub Kicinski [Tue, 3 Dec 2024 02:04:10 +0000 (18:04 -0800)]
Merge tag 'linux-can-fixes-for-6.13-20241202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-12-02
The first patch is by me and allows the use of sleeping GPIOs to set
termination GPIOs.
Alexander Kozhinov fixes the gs_usb driver to use the endpoints
provided by the usb endpoint descriptions instead of hard coded ones.
Dario Binacchi contributes 11 statistics related patches for various
CAN driver. A potential use after free in the hi311x is fixed. The
statistics for the c_can, sun4i_can, hi311x, m_can, ifi_canfd,
sja1000, sun4i_can, ems_usb, f81604 are fixed: update statistics even
if the allocation of the error skb fails and fix the incrementing of
the rx,tx error counters.
A patch by me fixes the workaround for DS80000789E 6 erratum in the
mcp251xfd driver.
The last patch is by Dmitry Antipov, targets the j1939 CAN protocol
and fixes a skb reference counting issue.
* tag 'linux-can-fixes-for-6.13-20241202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: j1939_session_new(): fix skb reference counting
can: mcp251xfd: mcp251xfd_get_tef_len(): work around erratum DS80000789E 6.
can: f81604: f81604_handle_can_bus_errors(): fix {rx,tx}_errors statistics
can: ems_usb: ems_usb_rx_err(): fix {rx,tx}_errors statistics
can: sun4i_can: sun4i_can_err(): fix {rx,tx}_errors statistics
can: sja1000: sja1000_err(): fix {rx,tx}_errors statistics
can: hi311x: hi3110_can_ist(): fix {rx,tx}_errors statistics
can: ifi_canfd: ifi_canfd_handle_lec_err(): fix {rx,tx}_errors statistics
can: m_can: m_can_handle_lec_err(): fix {rx,tx}_errors statistics
can: hi311x: hi3110_can_ist(): update state error statistics if skb allocation fails
can: hi311x: hi3110_can_ist(): fix potential use-after-free
can: sun4i_can: sun4i_can_err(): call can_change_state() even if cf is NULL
can: c_can: c_can_handle_bus_err(): update statistics if skb allocation fails
can: gs_usb: add usb endpoint address detection at driver probe step
can: dev: can_set_termination(): allow sleeping GPIOs
====================
Geetha sowjanya [Tue, 26 Nov 2024 11:44:31 +0000 (17:14 +0530)]
octeontx2-af: Fix SDP MAC link credits configuration
Current driver allows only packet size < 512B as SDP_LINK_CREDIT
register is set to default value.
This patch fixes this issue by configure the register with
maximum HW supported value to allow packet size > 512B.
Fixes: 2f7f33a09516 ("octeontx2-pf: Add representors for sdp MAC") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since j1939_session_skb_queue() does an extra skb_get() for each new
skb, do the same for the initial one in j1939_session_new() to avoid
refcount underflow.
Reported-by: syzbot+d4e8dc385d9258220c31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d4e8dc385d9258220c31 Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241105094823.2403806-1-dmantipov@yandex.ru
[mkl: clean up commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> CC: Kui-Feng Lee <thinker.li@gmail.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The script below reproduces this scenario:
ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
dir out priority 0 ptype main flag localok icmp
ip l a veth1 type veth
ip a a 192.168.141.111/24 dev veth0
ip l s veth0 up
ping 192.168.141.155 -c 1
icmp_route_lookup() create input routes for locally generated packets
while xfrm relookup ICMP traffic.Then it will set input route
(dst->out = ip_rt_bug) to skb for DESTUNREACH.
For ICMP err triggered by locally generated packets, dst->dev of output
route is loopback. Generally, xfrm relookup verification is not required
on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1).
Skip icmp relookup for locally generated packets to fix it.
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
bnxt: Fix failure to report RSS context in ntuple rule
This patchset fixes a bug where bnxt driver was failing to report that
an ntuple rule is redirecting to an RSS context. First commit is the
fix, then second commit extends selftests to detect if other/new drivers
are compliant with ntuple/rss_ctx API.
====================
Daniel Xu [Wed, 27 Nov 2024 22:58:30 +0000 (15:58 -0700)]
selftests: drv-net: rss_ctx: Add test for ntuple rule
Extend the rss_ctx test suite to test that an ntuple action that
redirects to an RSS context contains that information in `ethtool -n`.
Otherwise the output from ethtool is highly deceiving. This test helps
ensure drivers are compliant with the API.
Commit 2f4f9fe5bf5f ("bnxt_en: Support adding ntuple rules on RSS
contexts") added support for redirecting to an RSS context as an ntuple
rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE
codepath. This caused `ethtool -n` to always report the action as
"Action: Direct to queue 0" which is wrong.
Fix by teaching bnxt driver to report the RSS context when applicable.
Vyshnav Ajith [Thu, 21 Nov 2024 22:48:27 +0000 (04:18 +0530)]
docs: net: bareudp: fix spelling and grammar mistakes
The BareUDP documentation had several grammar and spelling mistakes,
making it harder to read. This patch fixes those errors to improve
clarity and readability for developers.
Eric Dumazet [Tue, 26 Nov 2024 14:59:11 +0000 (14:59 +0000)]
selinux: use sk_to_full_sk() in selinux_ip_output()
In blamed commit, TCP started to attach timewait sockets to
some skbs.
syzbot reported that selinux_ip_output() was not expecting them yet.
Note that using sk_to_full_sk() is still allowing the
following sk_listener() check to work as before.
BUG: KASAN: slab-out-of-bounds in selinux_sock security/selinux/include/objsec.h:207 [inline]
BUG: KASAN: slab-out-of-bounds in selinux_ip_output+0x1e0/0x1f0 security/selinux/hooks.c:5761
Read of size 8 at addr ffff88804e86e758 by task syz-executor347/5894
Fixes: 79636038d37e ("ipv4: tcp: give socket pointer to control skbs") Reported-by: syzbot+2d9f5f948c31dcb7745e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/6745e1a2.050a0220.1286eb.001c.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241126145911.4187198-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Ottens [Mon, 25 Nov 2024 17:46:07 +0000 (18:46 +0100)]
net/sched: tbf: correct backlog statistic for GSO packets
When the length of a GSO packet in the tbf qdisc is larger than the burst
size configured the packet will be segmented by the tbf_segment function.
Whenever this function is used to enqueue SKBs, the backlog statistic of
the tbf is not increased correctly. This can lead to underflows of the
'backlog' byte-statistic value when these packets are dequeued from tbf.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configured the tbf on
the outgoing interface of the machine as follows (burstsize = 1 MTU):
$ tc qdisc add dev <oif> root handle 1: tbf rate 50Mbit burst 1514 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on this machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
The 'backlog' byte-statistic has incorrect values while traffic is
transferred, e.g., high values due to u32 underflows. When the transfer
is stopped, the value is != 0, which should never happen.
This patch fixes this bug by updating the statistics correctly, even if
single SKBs of a GSO SKB cannot be enqueued.
Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets") Signed-off-by: Martin Ottens <martin.ottens@fau.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241125174608.1484356-1-martin.ottens@fau.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Fri, 29 Nov 2024 12:52:04 +0000 (12:52 +0000)]
Merge branch 'enetc-mqprio-fixes'
Wei Fang sayus:
====================
fix crash issue when setting MQPRIO for VFs
There is a crash issue when setting MQPRIO for ENETC VFs, the root casue
is that ENETC VFs don't like ENETC PFs, they don't have port registers,
so hw->port of VFs is NULL. However, this NULL pointer will be accessed
without any checks in enetc_mm_commit_preemptible_tcs() when configuring
MQPRIO for VFs. Therefore, two patches are added to fix this issue. The
first patch sets ENETC_SI_F_QBU flag only for SIs that support 802.1Qbu.
The second patch adds a check in enetc_change_preemptible_tcs() to ensure
that SIs that do not support 802.1Qbu do not configure preemptible TCs.
Wei Fang [Mon, 25 Nov 2024 09:07:19 +0000 (17:07 +0800)]
net: enetc: Do not configure preemptible TCs if SIs do not support
Both ENETC PF and VF drivers share enetc_setup_tc_mqprio() to configure
MQPRIO. And enetc_setup_tc_mqprio() calls enetc_change_preemptible_tcs()
to configure preemptible TCs. However, only PF is able to configure
preemptible TCs. Because only PF has related registers, while VF does not
have these registers. So for VF, its hw->port pointer is NULL. Therefore,
VF will access an invalid pointer when accessing a non-existent register,
which will cause a crash issue. The simplified log is as follows.
In addition, some PFs also do not support configuring preemptible TCs,
such as eno1 and eno3 on LS1028A. It won't crash like it does for VFs,
but we should prevent these PFs from accessing these unimplemented
registers.
Fixes: 827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active") Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 25 Nov 2024 09:07:18 +0000 (17:07 +0800)]
net: enetc: read TSN capabilities from port register, not SI
Configuring TSN (Qbv, Qbu, PSFP) capabilities requires access to port
registers, which are available to the PSI but not the VSI.
Yet, the SI port capability register 0 (PSICAPR0), exposed to both PSIs
and VSIs, presents the same capabilities to the VF as to the PF, thus
leading the VF driver into thinking it can configure these features.
In the case of ENETC_SI_F_QBU, having it set in the VF leads to a crash:
while the other TSN features in the VF are harmless, because the
net_device_ops used for the VF driver do not expose entry points for
these other features.
These capability bits are in the process of being defeatured from the SI
registers. We should read them from the port capability register, where
they are also present, and which is naturally only exposed to the PF.
The change to blame (relevant for stable backports) is the one where
this started being a problem, aka when the kernel started to crash due
to the wrong capability seen by the VF driver.
Fixes: 827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active") Reported-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 28 Nov 2024 18:15:20 +0000 (10:15 -0800)]
Merge tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - regressions:
- rtnetlink: fix rtnl_dump_ifinfo() error path
- bluetooth: remove the redundant sco_conn_put
Previous releases - regressions:
- netlink: fix false positive warning in extack during dumps
- sched: sch_fq: don't follow the fast path if Tx is behind now
- ipv6: delete temporary address if mngtmpaddr is removed or
unmanaged
- tcp: fix use-after-free of nreq in reqsk_timer_handler().
- bluetooth: fix slab-use-after-free Read in set_powered_sync
- l2tp: fix warning in l2tp_exit_net found
- eth:
- bnxt_en: fix receive ring space parameters when XDP is active
- lan78xx: fix double free issue with interrupt buffer allocation
- tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets
Previous releases - always broken:
- ipmr: fix tables suspicious RCU usage
- iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
- eth:
- octeontx2-af: fix low network performance
- stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
- rtase: correct the speed for RTL907XD-V1
Misc:
- some documentation fixup"
* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ipmr: fix build with clang and DEBUG_NET disabled.
Documentation: tls_offload: fix typos and grammar
Fix spelling mistake
ipmr: fix tables suspicious RCU usage
ip6mr: fix tables suspicious RCU usage
ipmr: add debug check for mr table cleanup
selftests: rds: move test.py to TEST_FILES
net_sched: sch_fq: don't follow the fast path if Tx is behind now
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
net: Comment copy_from_sockptr() explaining its behaviour
rxrpc: Improve setsockopt() handling of malformed user input
llc: Improve setsockopt() handling of malformed user input
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
bnxt_en: Unregister PTP during PCI shutdown and suspend
bnxt_en: Refactor bnxt_ptp_init()
bnxt_en: Fix receive ring space parameters when XDP is active
bnxt_en: Fix queue start to update vnic RSS table
...
Linus Torvalds [Thu, 28 Nov 2024 18:06:00 +0000 (10:06 -0800)]
Merge tag 'spi-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fairly minor driver specific fixes, plus one core fix for the
handling of deferred probe on ACPI systems - ignoring probe deferral
and incorrectly treating it like a fatal error while parsing the
generic ACPI bindings for SPI devices"
* tag 'spi-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Fix acpi deferred irq probe
spi: atmel-quadspi: Fix register name in verbose logging function
spi-imx: prevent overflow when estimating transfer time
spi: rockchip-sfc: Embedded DMA only support 4B aligned address
Linus Torvalds [Thu, 28 Nov 2024 17:40:53 +0000 (09:40 -0800)]
Merge tag 'regulator-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes that came in during the merge window, plus
documetation of a new device ID for the Qualcomm LABIBB driver.
There's a core fix for the rarely used current constraints and a fix
for the Qualcomm RPMH driver which had described only one of the two
voltage ranges that the hardware could control, creating a potential
incompatibility with the configuration left by firmware"
* tag 'regulator-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Ignore unset max_uA constraints in current limit check
dt-bindings: regulator: qcom-labibb-regulator: document the pmi8950 labibb regulator
regulator: qcom-rpmh: Update ranges for FTSMPS525
Linus Torvalds [Thu, 28 Nov 2024 17:22:00 +0000 (09:22 -0800)]
Merge tag 'ntfs3_for_6.13' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
- additional checks to address issues identified by syzbot
- continuation of the transition from 'page' to 'folio'
* tag 'ntfs3_for_6.13' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Accumulated refactoring changes
fs/ntfs3: Switch to folio to release resources
fs/ntfs3: Add check in ntfs_extend_initialized_size
fs/ntfs3: Add more checks in mi_enum_attr (part 2)
fs/ntfs3: Equivalent transition from page to folio
fs/ntfs3: Fix case when unmarked clusters intersect with zone
fs/ntfs3: Fix warning in ni_fiemap
Linus Torvalds [Thu, 28 Nov 2024 17:18:11 +0000 (09:18 -0800)]
Merge tag 'exfat-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- If the start cluster of stream entry is invalid, treat it as the
empty directory
- Valid size of steam entry cannot be greater than data size. If
valid_size is invalid, use data_size
- Move Direct-IO alignment check to before extending the valid size
- Fix uninit-value issue reported by syzbot
- Optimize finding directory entry-set in write_inode, rename, unlink
* tag 'exfat-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: reduce FAT chain traversal
exfat: code cleanup for exfat_readdir()
exfat: remove argument 'p_dir' from exfat_add_entry()
exfat: move exfat_chain_set() out of __exfat_resolve_path()
exfat: add exfat_get_dentry_set_by_ei() helper
exfat: rename argument name for exfat_move_file and exfat_rename_file
exfat: remove unnecessary read entry in __exfat_rename()
exfat: fix file being changed by unaligned direct write
exfat: fix uninit-value in __exfat_get_dentry_set
exfat: fix out-of-bounds access of directory entries
Paolo Abeni [Thu, 28 Nov 2024 16:18:04 +0000 (17:18 +0100)]
ipmr: fix build with clang and DEBUG_NET disabled.
Sasha reported a build issue in ipmr::
net/ipv4/ipmr.c:320:13: error: function 'ipmr_can_free_table' is not \
needed and will not be emitted \
[-Werror,-Wunneeded-internal-declaration]
320 | static bool ipmr_can_free_table(struct net *net)
Apparently clang is too smart with BUILD_BUG_ON_INVALID(), let's
fallback to a plain WARN_ON_ONCE().
Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.11-25635-g6813e2326f1e/testrun/26111580/suite/build/test/clang-nightly-lkftconfig/details/ Fixes: 11b6e701bce9 ("ipmr: add debug check for mr table cleanup") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ee75faa926b2446b8302ee5fc30e129d2df73b90.1732810228.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso [Tue, 26 Nov 2024 10:59:06 +0000 (11:59 +0100)]
netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
restrict this maximum depth to a more reasonable value not to harm
performance. Remove unnecessary WARN_ON_ONCE which is reachable from
userspace.
Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since an invalid (without '\0' byte at all) byte sequence may be passed
from userspace, add an extra check to ensure that such a sequence is
rejected as possible ID and so never passed to 'kstrdup()' and further.
Jinghao Jia [Sat, 23 Nov 2024 09:42:56 +0000 (03:42 -0600)]
ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:
vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.
Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:
The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:
This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:
In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.
Zero the on-stack buffer to avoid this possible UB.
Paolo Abeni [Thu, 28 Nov 2024 09:23:26 +0000 (10:23 +0100)]
Merge branch 'net-fix-mcast-rcu-splats'
Paolo Abeni says:
====================
net: fix mcast RCU splats
This series addresses the RCU splat triggered by the forwarding
mroute tests.
The first patch does not address any specific issue, but makes the
following ones more clear. Patch 2 and 3 address the issue for ipv6 and
ipv4 respectively.
====================
Paolo Abeni [Sun, 24 Nov 2024 15:40:58 +0000 (16:40 +0100)]
ipmr: fix tables suspicious RCU usage
Similar to the previous patch, plumb the RCU lock inside
the ipmr_get_table(), provided a lockless variant and apply
the latter in the few spots were the lock is already held.
Fixes: 709b46e8d90b ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:57 +0000 (16:40 +0100)]
ip6mr: fix tables suspicious RCU usage
Several places call ip6mr_get_table() with no RCU nor RTNL lock.
Add RCU protection inside such helper and provide a lockless variant
for the few callers that already acquired the relevant lock.
Note that some users additionally reference the table outside the RCU
lock. That is actually safe as the table deletion can happen only
after all table accesses are completed.
Fixes: e2d57766e674 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.") Fixes: d7c31cbde4bc ("net: ip6mr: add RTM_GETROUTE netlink op") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:56 +0000 (16:40 +0100)]
ipmr: add debug check for mr table cleanup
The multicast route tables lifecycle, for both ipv4 and ipv6, is
protected by RCU using the RTNL lock for write access. In many
places a table pointer escapes the RCU (or RTNL) protected critical
section, but such scenarios are actually safe because tables are
deleted only at namespace cleanup time or just after allocation, in
case of default rule creation failure.
Tables freed at namespace cleanup time are assured to be alive for the
whole netns lifetime; tables freed just after creation time are never
exposed to other possible users.
Ensure that the free conditions are respected in ip{,6}mr_free_table, to
document the locking schema and to prevent future possible introduction
of 'table del' operation from breaking it.
Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Sun, 24 Nov 2024 07:32:43 +0000 (07:32 +0000)]
selftests: rds: move test.py to TEST_FILES
The test.py should not be run separately. It should be run via run.sh,
which will do some sanity checks first. Move the test.py from TEST_PROGS
to TEST_FILES.
Reported-by: Maximilian Heyne <mheyne@amazon.de> Closes: https://lore.kernel.org/netdev/20241122150129.GB18887@dev-dsk-mheyne-1b-55676e6a.eu-west-1.amazon.com Fixes: 3ade6ce1255e ("selftests: rds: add testing infrastructure") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20241124073243.847932-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Replacing the qdisc with pfifo makes retransmissions go away.
It appears that a flow may have a delayed packet with a very near
Tx time. Later, we may get busy processing Rx and the target Tx time
will pass, but we won't service Tx since the CPU is busy with Rx.
If Rx sees an ACK and we try to push more data for the delayed flow
we may fastpath the skb, not realizing that there are already "ready
to send" packets for this flow sitting in the qdisc.
Don't trust the fastpath if we are "behind" according to the projected
Tx time for next flow waiting in the Qdisc. Because we consider anything
within the offload window to be okay for fastpath we must consider
the entire offload window as "now".
Kuniyuki Iwashima [Sat, 23 Nov 2024 17:42:36 +0000 (09:42 -0800)]
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
__inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
Then, oreq should be passed to reqsk_put() instead of req; otherwise
use-after-free of nreq could happen when reqsk is migrated but the
retry attempt failed (e.g. due to timeout).
Let's pass oreq to reqsk_put().
Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().") Reported-by: Liu Jian <liujian56@huawei.com> Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When phy_ethtool_set_eee_noneg() detects a change in the LPI
parameters, it attempts to update phylib state and trigger the link
to cycle so the MAC sees the updated parameters.
However, in doing so, it sets phydev->enable_tx_lpi depending on
whether the EEE configuration allows the MAC to generate LPI without
taking into account the result of negotiation.
This can be demonstrated with a 1000base-T FD interface by:
# ethtool --set-eee eno0 advertise 8 # cause EEE to be not negotiated
# ethtool --set-eee eno0 tx-lpi off
# ethtool --set-eee eno0 tx-lpi on
This results in being true, despite EEE not having been negotiated and:
# ethtool --show-eee eno0
EEE status: enabled - inactive
Tx LPI: 250 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Fix this by keeping track of whether EEE was negotiated via a new
eee_active member in struct phy_device, and include this state in
the decision whether phydev->enable_tx_lpi should be set.
Fixes: 3e43b903da04 ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tErSe-005RhB-2R@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Nov 2024 08:23:02 +0000 (09:23 +0100)]
Merge tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- SCO: remove the redundant sco_conn_put
- MGMT: Fix slab-use-after-free Read in set_powered_sync
- MGMT: Fix possible deadlocks
* tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
====================
====================
net: Fix some callers of copy_from_sockptr()
Some callers misinterpret copy_from_sockptr()'s return value. The function
follows copy_from_user(), i.e. returns 0 for success, or the number of
bytes not copied on error. Simply returning the result in a non-zero case
isn't usually what was intended.
Compile tested with CONFIG_LLC, CONFIG_AF_RXRPC, CONFIG_BT enabled.
Last patch probably belongs more to net-next, if any. Here as an RFC.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================
Michal Luczaj [Tue, 19 Nov 2024 13:31:43 +0000 (14:31 +0100)]
net: Comment copy_from_sockptr() explaining its behaviour
copy_from_sockptr() has a history of misuse. Add a comment explaining that
the function follows API of copy_from_user(), i.e. returns 0 for success,
or number of bytes not copied on error.
Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:42 +0000 (14:31 +0100)]
rxrpc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() does not return negative value on error; instead, it
reports the number of bytes that failed to copy. Since it's deprecated,
switch to copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(unsigned int)` check as
copy_safe_from_sockptr() by itself would also accept
optlen > sizeof(unsigned int). Which would allow a more lenient handling
of inputs.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:41 +0000 (14:31 +0100)]
llc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() is used incorrectly: return value is the number of
bytes that could not be copied. Since it's deprecated, switch to
copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
by itself would also accept optlen > sizeof(int). Which would allow a more
lenient handling of inputs.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: David Wei <dw@davidwei.uk> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Wed, 27 Nov 2024 22:50:31 +0000 (14:50 -0800)]
Merge tag 'acpi-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These add a common init function for arch-specific ACPI
initialization, clean up idle states initialization in the ACPI
processor_idle driver and update quirks:
- Introduce acpi_arch_init() for architecture-specific ACPI subsystem
initialization (Miao Wang)
- Clean up Asus quirks in acpi_quirk_skip_dmi_ids[] and add a quirk
to skip I2C clients on Acer Iconia One 8 A1-840 (Hans de Goede)
- Make the ACPI processor_idle driver use acpi_idle_play_dead() for
all idle states regardless of their types (Rafael Wysocki)"
* tag 'acpi-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: introduce acpi_arch_init()
ACPI: x86: Clean up Asus entries in acpi_quirk_skip_dmi_ids[]
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 8 A1-840
ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
Linus Torvalds [Wed, 27 Nov 2024 22:40:33 +0000 (14:40 -0800)]
Merge tag 'pm-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull morepower management updates from Rafael Wysocki:
"These update the OPP (Operating Performance Points) DT bindings for
ti-cpu (Dhruva Gole) and remove unused declarations from the OPP
header file (Zhang Zekun)"
* tag 'pm-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
dt-bindings: opp: operating-points-v2-ti-cpu: Describe opp-supported-hw
OPP: Remove unused declarations in header file
Linus Torvalds [Wed, 27 Nov 2024 22:36:00 +0000 (14:36 -0800)]
Merge tag 'thermal-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix a Power Allocator thermal governor issue reported recently,
update the Intel int3400 thermal driver and simplify DT data parsing
in the thermal control subsystem:
- Add a NULL pointer check that was missed by recent modifications of
the Power Allocator thermal governor (Rafael Wysocki)
- Remove the data_vault attribute_group from int3400 because it is
only used for exposing one binary file that can be exposed directly
(Thomas Weißschuh)
- Prevent the current_uuid sysfs attribute in int3400 from mistakenly
treating valid UUID values as invalid on some older systems
(Srinivas Pandruvada)
- Use the cleanup.h mechanics to simplify DT data parsing in the
thermal core and some drivers (Krzysztof Kozlowski)"
* tag 'thermal-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: sun8i: Use scoped device node handling to simplify error paths
thermal: tegra: Simplify with scoped for each OF child loop
thermal: qcom-spmi-adc-tm5: Simplify with scoped for each OF child loop
thermal: of: Use scoped device node handling to simplify of_thermal_zone_find()
thermal: of: Use scoped memory and OF handling to simplify thermal_of_trips_init()
thermal: of: Simplify thermal_of_should_bind with scoped for each OF child
thermal: gov_power_allocator: Add missing NULL pointer check
thermal: int3400: Remove unneeded data_vault attribute_group
thermal: int3400: Fix reading of current_uuid for active policy
Linus Torvalds [Wed, 27 Nov 2024 22:24:34 +0000 (14:24 -0800)]
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull more iommufd updates from Jason Gunthorpe:
"Change the driver callback op domain_alloc_user() into two ops:
domain_alloc_paging_flags() and domain_alloc_nesting() that better
describe what the ops are expected to do.
There will be per-driver cleanup based on this going into the next
cycle via the driver trees"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommu: Rename ops->domain_alloc_user() to domain_alloc_paging_flags()
iommu: Add ops->domain_alloc_nested()
Linus Torvalds [Wed, 27 Nov 2024 21:33:43 +0000 (13:33 -0800)]
Merge tag 'phy-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"New hardware support:
- ST STM32MP25 combophy support
- Sparx5 support for lan969x serdes and updates to driver to support
this
- NXP PTN3222 eUSB2 to USB2 redriver
- Qualcomm SAR2130P eusb2 support, QCS8300 USB DW3 and QMP USB2
support, X1E80100 QMP PCIe PHY Gen4 support, QCS615 and QCS8300 QMP
UFS PHY support and SA8775P eDP PHY support
- Rockchip rk3576 usbdp and rk3576 usb2 phy support
- Binding for Microchip ATA6561 can phy
Updates:
- Freescale driver updates from hdmi support
- Conversion of rockchip rk3228 hdmi phy binding to yaml
- Broadcom usb2-phy deprecated support dropped and USB init array
update for BCM4908
- TI USXGMII mode support in J7200
- Switch back to platform_driver::remove() subsystem update"
* tag 'phy-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (59 commits)
phy: qcom: qmp: Fix lecacy-legacy typo
phy: lan969x-serdes: add support for lan969x serdes driver
dt-bindings: phy: sparx5: document lan969x
phy: sparx5-serdes: add support for branching on chip type
phy: sparx5-serdes: add indirection layer to register macros
phy: sparx5-serdes: add function for getting the CMU index
phy: sparx5-serdes: add ops to match data
phy: sparx5-serdes: add constant for the number of CMU's
phy: sparx5-serdes: add constants to match data
phy: sparx5-serdes: add support for private match data
phy: bcm-ns-usb2: drop support for old binding variant
dt-bindings: phy: bcm-ns-usb2-phy: drop deprecated variant
dt-bindings: phy: Add QMP UFS PHY compatible for QCS8300
dt-bindings: phy: qcom: snps-eusb2: Add SAR2130P compatible
dt-bindings: phy: ti,tcan104x-can: Document Microchip ATA6561
phy: airoha: Fix REG_CSR_2L_RX{0,1}_REV0 definitions
phy: airoha: Fix REG_CSR_2L_JCPLL_SDM_HREN config in airoha_pcie_phy_init_ssc_jcpll()
phy: airoha: Fix REG_PCIE_PMA_TX_RESET config in airoha_pcie_phy_init_csr_2l()
phy: airoha: Fix REG_CSR_2L_PLL_CMN_RESERVE0 config in airoha_pcie_phy_init_clk_out()
phy: phy-rockchip-samsung-hdptx: Don't request RST_PHY/RST_ROPLL/RST_LCPLL
...
Linus Torvalds [Wed, 27 Nov 2024 21:23:13 +0000 (13:23 -0800)]
Merge tag 'gpio-fixes-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Apart from the gpio-exar fix which addresses an older issue, they all
fix regressions from this release cycle:
- fix missing GPIO chip labels in gpio-zevio and gpio-altera
- for the latter: also set GPIO base to -1 to use dynamic range
allocation
- fix value setting with external pull-up/down resistor in gpio-exar
- use the recommended IDA interfaces in gpio-mpsse"
* tag 'gpio-fixes-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mpsse: Remove usage of the deprecated ida_simple_xx() API
gpio: exar: set value when external pull-up or pull-down is present
gpio: altera: Add missed base and label initialisations
gpio: zevio: Add missed label initialisation