Pablo Neira Ayuso [Sun, 1 Dec 2024 23:04:49 +0000 (00:04 +0100)]
netfilter: nft_set_hash: skip duplicated elements pending gc run
rhashtable does not provide stable walk, duplicated elements are
possible in case of resizing. I considered that checking for errors when
calling rhashtable_walk_next() was sufficient to detect the resizing.
However, rhashtable_walk_next() returns -EAGAIN only at the end of the
iteration, which is too late, because a gc work containing duplicated
elements could have been already scheduled for removal to the worker.
Add a u32 gc worker sequence number per set, bump it on every workqueue
run. Annotate gc worker sequence number on the expired element. Use it
to skip those already seen in this gc workqueue run.
Note that this new field is never reset in case gc transaction fails, so
next gc worker run on the expired element overrides it. Wraparound of gc
worker sequence number should not be an issue with stale gc worker
sequence number in the element, that would just postpone the element
removal in one gc run.
Note that it is not possible to use flags to annotate that element is
pending gc run to detect duplicates, given that gc transaction can be
invalidated in case of update from the control plane, therefore, not
allowing to clear such flag.
On x86_64, pahole reports no changes in the size of nft_rhash_elem.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Tested-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 29 Nov 2024 15:30:38 +0000 (16:30 +0100)]
netfilter: ipset: Hold module reference while requesting a module
User space may unload ip_set.ko while it is itself requesting a set type
backend module, leading to a kernel crash. The race condition may be
provoked by inserting an mdelay() right after the nfnl_unlock() call.
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 27 Nov 2024 11:46:54 +0000 (12:46 +0100)]
netfilter: nft_inner: incorrect percpu area handling under softirq
Softirq can interrupt ongoing packet from process context that is
walking over the percpu area that contains inner header offsets.
Disable bh and perform three checks before restoring the percpu inner
header offsets to validate that the percpu area is valid for this
skbuff:
1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
has already been parsed before for inner header fetching to
register.
2) Validate that the percpu area refers to this skbuff using the
skbuff pointer as a cookie. If there is a cookie mismatch, then
this skbuff needs to be parsed again.
3) Finally, validate if the percpu area refers to this tunnel type.
Only after these three checks the percpu area is restored to a on-stack
copy and bh is enabled again.
After inner header fetching, the on-stack copy is stored back to the
percpu area.
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 26 Nov 2024 10:59:06 +0000 (11:59 +0100)]
netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
restrict this maximum depth to a more reasonable value not to harm
performance. Remove unnecessary WARN_ON_ONCE which is reachable from
userspace.
Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since an invalid (without '\0' byte at all) byte sequence may be passed
from userspace, add an extra check to ensure that such a sequence is
rejected as possible ID and so never passed to 'kstrdup()' and further.
Jinghao Jia [Sat, 23 Nov 2024 09:42:56 +0000 (03:42 -0600)]
ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:
vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.
Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:
The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:
This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:
In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.
Zero the on-stack buffer to avoid this possible UB.
Paolo Abeni [Thu, 28 Nov 2024 09:23:26 +0000 (10:23 +0100)]
Merge branch 'net-fix-mcast-rcu-splats'
Paolo Abeni says:
====================
net: fix mcast RCU splats
This series addresses the RCU splat triggered by the forwarding
mroute tests.
The first patch does not address any specific issue, but makes the
following ones more clear. Patch 2 and 3 address the issue for ipv6 and
ipv4 respectively.
====================
Paolo Abeni [Sun, 24 Nov 2024 15:40:58 +0000 (16:40 +0100)]
ipmr: fix tables suspicious RCU usage
Similar to the previous patch, plumb the RCU lock inside
the ipmr_get_table(), provided a lockless variant and apply
the latter in the few spots were the lock is already held.
Fixes: 709b46e8d90b ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:57 +0000 (16:40 +0100)]
ip6mr: fix tables suspicious RCU usage
Several places call ip6mr_get_table() with no RCU nor RTNL lock.
Add RCU protection inside such helper and provide a lockless variant
for the few callers that already acquired the relevant lock.
Note that some users additionally reference the table outside the RCU
lock. That is actually safe as the table deletion can happen only
after all table accesses are completed.
Fixes: e2d57766e674 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.") Fixes: d7c31cbde4bc ("net: ip6mr: add RTM_GETROUTE netlink op") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:56 +0000 (16:40 +0100)]
ipmr: add debug check for mr table cleanup
The multicast route tables lifecycle, for both ipv4 and ipv6, is
protected by RCU using the RTNL lock for write access. In many
places a table pointer escapes the RCU (or RTNL) protected critical
section, but such scenarios are actually safe because tables are
deleted only at namespace cleanup time or just after allocation, in
case of default rule creation failure.
Tables freed at namespace cleanup time are assured to be alive for the
whole netns lifetime; tables freed just after creation time are never
exposed to other possible users.
Ensure that the free conditions are respected in ip{,6}mr_free_table, to
document the locking schema and to prevent future possible introduction
of 'table del' operation from breaking it.
Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Sun, 24 Nov 2024 07:32:43 +0000 (07:32 +0000)]
selftests: rds: move test.py to TEST_FILES
The test.py should not be run separately. It should be run via run.sh,
which will do some sanity checks first. Move the test.py from TEST_PROGS
to TEST_FILES.
Reported-by: Maximilian Heyne <mheyne@amazon.de> Closes: https://lore.kernel.org/netdev/20241122150129.GB18887@dev-dsk-mheyne-1b-55676e6a.eu-west-1.amazon.com Fixes: 3ade6ce1255e ("selftests: rds: add testing infrastructure") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20241124073243.847932-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Replacing the qdisc with pfifo makes retransmissions go away.
It appears that a flow may have a delayed packet with a very near
Tx time. Later, we may get busy processing Rx and the target Tx time
will pass, but we won't service Tx since the CPU is busy with Rx.
If Rx sees an ACK and we try to push more data for the delayed flow
we may fastpath the skb, not realizing that there are already "ready
to send" packets for this flow sitting in the qdisc.
Don't trust the fastpath if we are "behind" according to the projected
Tx time for next flow waiting in the Qdisc. Because we consider anything
within the offload window to be okay for fastpath we must consider
the entire offload window as "now".
Kuniyuki Iwashima [Sat, 23 Nov 2024 17:42:36 +0000 (09:42 -0800)]
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
__inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
Then, oreq should be passed to reqsk_put() instead of req; otherwise
use-after-free of nreq could happen when reqsk is migrated but the
retry attempt failed (e.g. due to timeout).
Let's pass oreq to reqsk_put().
Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().") Reported-by: Liu Jian <liujian56@huawei.com> Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When phy_ethtool_set_eee_noneg() detects a change in the LPI
parameters, it attempts to update phylib state and trigger the link
to cycle so the MAC sees the updated parameters.
However, in doing so, it sets phydev->enable_tx_lpi depending on
whether the EEE configuration allows the MAC to generate LPI without
taking into account the result of negotiation.
This can be demonstrated with a 1000base-T FD interface by:
# ethtool --set-eee eno0 advertise 8 # cause EEE to be not negotiated
# ethtool --set-eee eno0 tx-lpi off
# ethtool --set-eee eno0 tx-lpi on
This results in being true, despite EEE not having been negotiated and:
# ethtool --show-eee eno0
EEE status: enabled - inactive
Tx LPI: 250 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Fix this by keeping track of whether EEE was negotiated via a new
eee_active member in struct phy_device, and include this state in
the decision whether phydev->enable_tx_lpi should be set.
Fixes: 3e43b903da04 ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tErSe-005RhB-2R@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Nov 2024 08:23:02 +0000 (09:23 +0100)]
Merge tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- SCO: remove the redundant sco_conn_put
- MGMT: Fix slab-use-after-free Read in set_powered_sync
- MGMT: Fix possible deadlocks
* tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
====================
====================
net: Fix some callers of copy_from_sockptr()
Some callers misinterpret copy_from_sockptr()'s return value. The function
follows copy_from_user(), i.e. returns 0 for success, or the number of
bytes not copied on error. Simply returning the result in a non-zero case
isn't usually what was intended.
Compile tested with CONFIG_LLC, CONFIG_AF_RXRPC, CONFIG_BT enabled.
Last patch probably belongs more to net-next, if any. Here as an RFC.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================
Michal Luczaj [Tue, 19 Nov 2024 13:31:43 +0000 (14:31 +0100)]
net: Comment copy_from_sockptr() explaining its behaviour
copy_from_sockptr() has a history of misuse. Add a comment explaining that
the function follows API of copy_from_user(), i.e. returns 0 for success,
or number of bytes not copied on error.
Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:42 +0000 (14:31 +0100)]
rxrpc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() does not return negative value on error; instead, it
reports the number of bytes that failed to copy. Since it's deprecated,
switch to copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(unsigned int)` check as
copy_safe_from_sockptr() by itself would also accept
optlen > sizeof(unsigned int). Which would allow a more lenient handling
of inputs.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:41 +0000 (14:31 +0100)]
llc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() is used incorrectly: return value is the number of
bytes that could not be copied. Since it's deprecated, switch to
copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
by itself would also accept optlen > sizeof(int). Which would allow a more
lenient handling of inputs.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: David Wei <dw@davidwei.uk> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Edward Adam Davis [Mon, 25 Nov 2024 23:58:43 +0000 (07:58 +0800)]
Bluetooth: SCO: remove the redundant sco_conn_put
When adding conn, it is necessary to increase and retain the conn reference
count at the same time.
Another problem was fixed along the way, conn_put is missing when hcon is NULL
in the timeout routine.
Fixes: e6720779ae61 ("Bluetooth: SCO: Use kref to track lifetime of sco_conn") Reported-and-tested-by: syzbot+489f78df4709ac2bfdd3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=489f78df4709ac2bfdd3 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Kiran K <kiran.k@intel.com> Fixes: f53e1c9c726d ("Bluetooth: MGMT: Fix possible crash on mgmt_index_removed") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Fri, 15 Nov 2024 15:45:31 +0000 (10:45 -0500)]
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
This fixes the following crash:
==================================================================
BUG: KASAN: slab-use-after-free in set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
Read of size 8 at addr ffff888029b4dd18 by task kworker/u9:0/54
Paolo Abeni [Tue, 26 Nov 2024 14:29:33 +0000 (15:29 +0100)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
This patchset fixes several things:
1. AER recovery for RoCE when NIC interface is down.
2. Set ethtool backplane link modes correctly.
3. Update RSS ring ID during RX queue restart.
4. Crash with XDP and MTU change.
5. PCIe completion timeout when reading PHC after shutdown.
====================
Michael Chan [Fri, 22 Nov 2024 22:45:46 +0000 (14:45 -0800)]
bnxt_en: Unregister PTP during PCI shutdown and suspend
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered. If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register. Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers. bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.
In bnxt_resume(), we need to re-initialize PTP.
Fixes: a521c8a01d26 ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()") Cc: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Chan [Fri, 22 Nov 2024 22:45:45 +0000 (14:45 -0800)]
bnxt_en: Refactor bnxt_ptp_init()
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init().
Store it in bp->ptp_cfg so that the caller doesn't need to know what
the value should be.
In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume()
and this will make it easier.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shravya KN [Fri, 22 Nov 2024 22:45:44 +0000 (14:45 -0800)]
bnxt_en: Fix receive ring space parameters when XDP is active
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode(). This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:
To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.
Fixes: 08450ea98ae9 ("bnxt_en: Fix max_mtu setting for multi-buf XDP") Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Somnath Kotur [Fri, 22 Nov 2024 22:45:43 +0000 (14:45 -0800)]
bnxt_en: Fix queue start to update vnic RSS table
HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to
have the same FW ring ID as before. So we must reinitialize the
RSS table with the correct ring IDs. Otherwise, traffic may not
resume properly if the restarted ring ID is stale. Since this
feature is only supported on P5_PLUS chips, we call
bnxt_vnic_set_rss_p5() to update the HW RSS table.
Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Cc: David Wei <dw@davidwei.uk> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shravya KN [Fri, 22 Nov 2024 22:45:42 +0000 (14:45 -0800)]
bnxt_en: Set backplane link modes correctly for ethtool
Use the return value from bnxt_get_media() to determine the port and
link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when
the PHY is backplane. This will correct the ethtool settings for
backplane devices.
Fixes: 5d4e1bf60664 ("bnxt_en: extend media types to supported and autoneg modes") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Saravanan Vajravel [Fri, 22 Nov 2024 22:45:41 +0000 (14:45 -0800)]
bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down
After successful PCIe AER recovery, FW will reset all resource
reservations. If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again. It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER. Without this
patch, RoCE fails to resume in this IF_DOWN scenario.
Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.
Paolo Abeni [Tue, 26 Nov 2024 11:15:07 +0000 (12:15 +0100)]
Merge branch 'octeontx2-af-misc-rpm-fixes'
Hariprasad Kelam says:
====================
octeontx2-af: misc RPM fixes
There are few issues with the RPM driver, such as FIFO overflow
and network performance problems due to wrong FIFO values. This
patchset adds fixes for the same.
Patch1: Fixes the mismatch between the lmac type reported by the driver
and the actual hardware configuration.
Patch2: Addresses low network performance observed even on RPMs with
larger FIFO lengths.
Patch 3 & 4: Fix the stale FEC counters reported by the driver by
accessing the correct CSRs
Patch 5: Resolves the issue related to RPM FIFO overflow during system
reboots
====================
Hariprasad Kelam [Fri, 22 Nov 2024 16:20:35 +0000 (21:50 +0530)]
octeontx2-af: Quiesce traffic before NIX block reset
During initialization, the AF driver resets all blocks. The RPM (MAC)
block and NIX block operate on a credit-based model. When the NIX block
resets during active traffic flow, it doesn't release credits to the RPM
block. This causes the RPM FIFO to overflow, leading to receive traffic
struck.
To address this issue, the patch introduces the following changes:
1. Stop receiving traffic at the MAC level during AF driver
initialization.
2. Perform an X2P reset (prevents RXFIFO of all LMACS from pushing data)
3. Reset the NIX block.
4. Clear the X2P reset and re-enable receiving traffic.
Fixes: 54d557815e15 ("octeontx2-af: Reset all RVU blocks") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hariprasad Kelam [Fri, 22 Nov 2024 16:20:34 +0000 (21:50 +0530)]
octeontx2-af: RPM: fix stale FCFEC counters
The corrected words register(FCFECX_VL0_CCW_LO)/Uncorrected words
register (FCFECX_VL0_NCCW_LO) of FCFEC counter has different LMAC
offset which needs to be accessed differently.
Fixes: 84ad3642115d ("octeontx2-af: Add FEC stats for RPM/RPM_USX block") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hariprasad Kelam [Fri, 22 Nov 2024 16:20:33 +0000 (21:50 +0530)]
octeontx2-af: RPM: fix stale RSFEC counters
The earlier patch sets the 'Stats control register' for RPM
receive/transmit statistics instead of RSFEC statistics,
causing the driver to return stale FEC counters.
Fixes: 84ad3642115d ("octeontx2-af: Add FEC stats for RPM/RPM_USX block") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The current design stores the FIFO length in a common structure for all
RPMs (mac_ops). As a result, the FIFO length of the last RPM is applied
to all RPMs, leading to reduced network performance.
This patch resolved the problem by storing the fifo length in per MAC
structure (cgx).
Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Maxime Chevallier [Fri, 22 Nov 2024 14:12:55 +0000 (15:12 +0100)]
net: stmmac: dwmac-socfpga: Set RX watchdog interrupt as broken
On DWMAC3 and later, there's a RX Watchdog interrupt that's used for
interrupt coalescing. It's known to be buggy on some platforms, and
dwmac-socfpga appears to be one of them. Changing the interrupt
coalescing from ethtool doesn't appear to have any effect here.
Without disabling RIWT (Received Interrupt Watchdog Timer, I
believe...), we observe latencies while receiving traffic that amount to
around ~0.4ms. This was discovered with NTP but can be easily reproduced
with a simple ping. Without this patch :
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.657 ms
With this patch :
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.254 ms
Rosen Penev [Thu, 21 Nov 2024 19:31:52 +0000 (11:31 -0800)]
net: mdio-ipq4019: add missing error check
If an optional resource is found but fails to remap, return on failure.
Avoids any potential problems when using the iomapped resource as the
assumption is that it's available.
Fixes: 23a890d493e3 ("net: mdio: Add the reset function for IPQ MDIO driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241121193152.8966-1-rosenp@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
ipv6: fix temporary address not removed correctly
Currently the temporary address is not removed when mngtmpaddr is deleted
or becomes unmanaged. The patch set fixed this issue and add a related
test.
v2:
1) delete the tempaddrs directly instead of remove them in addrconf_verify_rtnl(Sam Edwards)
2) Update the test case by checking the address including, add Sam in SOB (Sam Edwards)
====================
Hangbin Liu [Wed, 20 Nov 2024 09:51:08 +0000 (09:51 +0000)]
selftests/rtnetlink.sh: add mngtempaddr test
Add a test to check the temporary address could be added/removed
correctly when mngtempaddr is set or removed/unmanaged.
Signed-off-by: Sam Edwards <cfsworks@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 20 Nov 2024 09:51:07 +0000 (09:51 +0000)]
net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged
RFC8981 section 3.4 says that existing temporary addresses must have their
lifetimes adjusted so that no temporary addresses should ever remain "valid"
or "preferred" longer than the incoming SLAAC Prefix Information. This would
strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
un-flagged as such, its corresponding temporary addresses must be cleared out
right away.
But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
or becomes unmanaged as manage_tempaddrs() set temporary addresses
prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
failed to remove the addresses. Fix this by deleting the temporary address
directly for these situations.
Fixes: 778964f2fdf0 ("ipv6/addrconf: fix timing bug in tempaddr regen") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Correcting switch hardware versions and reported speeds
This patch set mainly involves correcting switch hardware versions and
reported speeds.
Details are as follows:
1. Refactor the rtase_check_mac_version_valid() function.
2. Correct the speed for RTL907XD-V1
3. Corrects error handling of the rtase_check_mac_version_valid()
v1 -> v2:
- Add Fixes: tag.
- Add defines for hardware version id.
- Modify the error message for an invalid hardware version ID.
v2 -> v3:
- Remove the patch "Add support for RTL907XD-VA PCIe port".
v3 -> v4:
- Modify commit message to describe the main reason for the fix.
v4 -> v5
- Integrate the addition of defines for hardware version ID into the patch
"rtase: Refactor the rtase_check_mac_version_valid() function."
====================
Justin Lai [Wed, 20 Nov 2024 07:56:24 +0000 (15:56 +0800)]
rtase: Corrects error handling of the rtase_check_mac_version_valid()
Previously, when the hardware version ID was determined to be invalid,
only an error message was printed without any further handling. Therefore,
this patch makes the necessary corrections to address this.
Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module") Signed-off-by: Justin Lai <justinlai0215@realtek.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Justin Lai [Wed, 20 Nov 2024 07:56:23 +0000 (15:56 +0800)]
rtase: Correct the speed for RTL907XD-V1
Previously, the reported speed was uniformly set to SPEED_5000, but the
RTL907XD-V1 actually operates at a speed of SPEED_10000. Therefore, this
patch makes the necessary correction.
Fixes: dd7f17c40fd1 ("rtase: Implement ethtool function") Signed-off-by: Justin Lai <justinlai0215@realtek.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Justin Lai [Wed, 20 Nov 2024 07:56:22 +0000 (15:56 +0800)]
rtase: Refactor the rtase_check_mac_version_valid() function
Different hardware requires different configurations, but this distinction
was not made previously. Additionally, the error message was not clear
enough. Therefore, this patch will address the issues mentioned above.
Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module") Signed-off-by: Justin Lai <justinlai0215@realtek.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sidraya Jayagond [Tue, 19 Nov 2024 15:22:19 +0000 (16:22 +0100)]
s390/iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
Passing MSG_PEEK flag to skb_recv_datagram() increments skb refcount
(skb->users) and iucv_sock_recvmsg() does not decrement skb refcount
at exit.
This results in skb memory leak in skb_queue_purge() and WARN_ON in
iucv_sock_destruct() during socket close. To fix this decrease
skb refcount by one if MSG_PEEK is set in order to prevent memory
leak and WARN_ON.
By forcing memory allocation failures in idr_alloc_32, syzbot is able
to provoke a condition where idr_is_empty returns false despite there
being no items in the IDR. This turns out to be because the radix tree
of the IDR contains only internal radix-tree nodes and it is this that
causes idr_is_empty to return false. The internal nodes are cleaned by
idr_destroy.
Use idr_for_each to check that the IDR is empty instead of
idr_is_empty to avoid the problem.
Reported-by: syzbot+332fe1e67018625f63c9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=332fe1e67018625f63c9 Fixes: 73d33bd063c4 ("l2tp: avoid using drain_workqueue in l2tp_pre_exit_net") Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20241118140411.1582555-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 19 Nov 2024 22:44:32 +0000 (14:44 -0800)]
selftests: net: test extacks in netlink dumps
Test that extacks in dumps work. The test fills up the receive buffer
to test both the inline dump (as part of sendmsg()) and delayed one
(run during recvmsg()).
Use YNL helpers to parse the messages. We need to add the test to YNL
file to make sure the right include path are used.
Jakub Kicinski [Tue, 19 Nov 2024 22:44:31 +0000 (14:44 -0800)]
netlink: fix false positive warning in extack during dumps
Commit under fixes extended extack reporting to dumps.
It works under normal conditions, because extack errors are
usually reported during ->start() or the first ->dump(),
it's quite rare that the dump starts okay but fails later.
If the dump does fail later, however, the input skb will
already have the initiating message pulled, so checking
if bad attr falls within skb->data will fail.
Switch the check to using nlh, which is always valid.
syzbot found a way to hit that scenario by filling up
the receive queue. In this case we initiate a dump
but don't call ->dump() until there is read space for
an skb.
Reported-by: syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com Fixes: 8af4f60472fc ("netlink: support all extack types in dumps") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guenter Roeck [Tue, 19 Nov 2024 21:32:02 +0000 (13:32 -0800)]
net: microchip: vcap: Add typegroup table terminators in kunit tests
VCAP API unit tests fail randomly with errors such as
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:387
Expected 134 + 7 == iter.offset, but
134 + 7 == 141 (0x8d)
iter.offset == 17214 (0x433e)
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:388
Expected 5 == iter.reg_idx, but
iter.reg_idx == 702 (0x2be)
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:389
Expected 11 == iter.reg_bitpos, but
iter.reg_bitpos == 15 (0xf)
# vcap_api_iterator_init_test: pass:0 fail:1 skip:0 total:1
Comments in the code state that "A typegroup table ends with an all-zero
terminator". Add the missing terminators.
Some of the typegroups did have a terminator of ".offset = 0, .width = 0,
.value = 0,". Replace those terminators with "{ }" (no trailing ',') for
consistency and to excplicitly state "this is a terminator".
Fixes: 67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API") Cc: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241119213202.2884639-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Mon, 18 Nov 2024 14:03:51 +0000 (15:03 +0100)]
net: usb: lan78xx: Fix refcounting and autosuspend on invalid WoL configuration
Validate Wake-on-LAN (WoL) options in `lan78xx_set_wol` before calling
`usb_autopm_get_interface`. This prevents USB autopm refcounting issues
and ensures the adapter can properly enter autosuspend when invalid WoL
options are provided.
Pavan Chebbi [Tue, 19 Nov 2024 05:57:41 +0000 (21:57 -0800)]
tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.
However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.
To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.
Eric Dumazet [Thu, 21 Nov 2024 19:41:05 +0000 (19:41 +0000)]
rtnetlink: fix rtnl_dump_ifinfo() error path
syzbot found that rtnl_dump_ifinfo() could return with a lock held [1]
Move code around so that rtnl_link_ops_put() and put_net()
can be called at the end of this function.
[1]
WARNING: lock held when returning to user space! 6.12.0-rc7-syzkaller-01681-g38f83a57aa8e #0 Not tainted
syz-executor399/5841 is leaving the kernel with locks still held!
1 lock held by syz-executor399/5841:
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rtnl_link_ops_get+0x22/0x250 net/core/rtnetlink.c:555
Fixes: 43c7ce69d28e ("rtnetlink: Protect struct rtnl_link_ops with SRCU.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241121194105.3632507-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Sat, 16 Nov 2024 13:05:58 +0000 (14:05 +0100)]
net: usb: lan78xx: Fix memory leak on device unplug by freeing PHY device
Add calls to `phy_device_free` after `fixed_phy_unregister` to fix a
memory leak that occurs when the device is unplugged. This ensures
proper cleanup of pseudo fixed-link PHYs.
Fixes: 89b36fb5e532 ("lan78xx: Lan7801 Support for Fixed PHY") Cc: Raghuram Chary J <raghuramchary.jallipalli@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241116130558.1352230-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In lan78xx_probe(), the buffer `buf` was being freed twice: once
implicitly through `usb_free_urb(dev->urb_intr)` with the
`URB_FREE_BUFFER` flag and again explicitly by `kfree(buf)`. This caused
a double free issue.
To resolve this, reordered `kmalloc()` and `usb_alloc_urb()` calls to
simplify the initialization sequence and removed the redundant
`kfree(buf)`. Now, `buf` is allocated after `usb_alloc_urb()`, ensuring
it is correctly managed by `usb_fill_int_urb()` and freed by
`usb_free_urb()` as intended.
Heiner Kallweit [Sat, 16 Nov 2024 20:52:15 +0000 (21:52 +0100)]
net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled
This is a follow-up to 41ffcd95015f ("net: phy: fix phylib's dual
eee_enabled") and resolves an issue with genphy_c45_an_config_eee_aneg()
(called from genphy_c45_ethtool_set_eee) not seeing the new value of
phydev->eee_cfg.eee_enabled.
Fixes: 49168d1980e2 ("net: phy: Add phy_support_eee() indicating MAC support EEE") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reported-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 21 Nov 2024 16:28:08 +0000 (08:28 -0800)]
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
- Support BPF objects of either endianness in libbpf (Tony Ambardar)
- Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai)
- Introduce private stack for eligible BPF programs (Yonghong Song)
- Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee)
- Migrate test_sock to selftests/bpf test_progs (Jordan Rife)
* tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits)
libbpf: Change hash_combine parameters from long to unsigned long
selftests/bpf: Fix build error with llvm 19
libbpf: Fix memory leak in bpf_program__attach_uprobe_multi
bpf: use common instruction history across all states
bpf: Add necessary migrate_disable to range_tree.
bpf: Do not alloc arena on unsupported arches
selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar
selftests/bpf: Add a test for arena range tree algorithm
bpf: Introduce range_tree data structure and use it in bpf arena
samples/bpf: Remove unused variable in xdp2skb_meta_kern.c
samples/bpf: Remove unused variables in tc_l2_redirect_kern.c
bpftool: Cast variable `var` to long long
bpf, x86: Propagate tailcall info only for subprogs
bpf: Add kernel symbol for struct_ops trampoline
bpf: Use function pointers count as struct_ops links count
bpf: Remove unused member rcu from bpf_struct_ops_map
selftests/bpf: Add struct_ops prog private stack tests
bpf: Support private stack for struct_ops progs
selftests/bpf: Add tracing prog private stack tests
bpf, x86: Support private stack in jit
...
Linus Torvalds [Wed, 20 Nov 2024 23:44:56 +0000 (15:44 -0800)]
Merge tag 'soc-defconfig-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC defconfig updates from Arnd Bergmann:
"As usual, a few newly added device drivers get enabled in the arm32
multi_v7_defconfig and arm64 defconfig as well as a few of the SoC
specific config files.
The main visible change is the inclusion of (reduced) debug info by
default in the 32-bit defconfig"
* tag 'soc-defconfig-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: imx_v6_v7_defconfig: Enable drivers for Kobo Clara 2E
arm64: defconfig: Enable VBATTB clock and Renesas RTCA-3
arm64: defconfig: Enable PCF857X GPIO expander
arm64: defconfig: Enable sc7280 clock controllers
ARM: configs: at91: enable PAC1934 driver as module
ARM: multi_v7_defconfig: Enable debugging symbols by default
Linus Torvalds [Wed, 20 Nov 2024 23:40:54 +0000 (15:40 -0800)]
Merge tag 'soc-drivers-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"Nothing particular important in the SoC driver updates, just the usual
improvements to for drivers/soc and a couple of subsystems that don't
fit anywhere else:
- The largest set of updates is for Qualcomm SoC drivers, extending
the set of supported features for additional SoCs in the QSEECOM,
LLCC and socinfo drivers.a
- The ti_sci firmware driver gains support for power managment
- The drivers/reset subsystem sees a rework of the microchip sparx5
and amlogic reset drivers to support additional chips, plus a few
minor updates on other platforms
- The SCMI firmware interface driver gains support for two protocol
extensions, allowing more flexible use of the shared memory area
and new DT binding properties for configurability.
- Mediatek SoC drivers gain support for power managment on the MT8188
SoC and a new driver for DVFS.
- The AMD/Xilinx ZynqMP SoC drivers gain support for system reboot
and a few bugfixes
- The Hisilicon Kunpeng HCCS driver gains support for configuring
lanes through sysfs
Finally, there are cleanups and minor fixes for drivers/{soc, bus,
memory}, including changing back the .remove_new callback to .remove,
as well as a few other updates for freescale (powerpc) soc drivers,
NXP i.MX soc drivers, cznic turris platform driver, memory controller
drviers, TI OMAP SoC drivers, and Tegra firmware drivers"
* tag 'soc-drivers-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (116 commits)
soc: fsl: cpm1: qmc: Set the ret error code on platform_get_irq() failure
soc: fsl: rcpm: fix missing of_node_put() in copy_ippdexpcr1_setting()
soc: fsl: cpm1: tsa: switch to for_each_available_child_of_node_scoped()
platform: cznic: turris-omnia-mcu: Rename variable holding GPIO line names
platform: cznic: turris-omnia-mcu: Document the driver private data structure
firmware: turris-mox-rwtm: Document the driver private data structure
bus: Switch back to struct platform_driver::remove()
soc: qcom: ice: Remove the device_link field in qcom_ice
drm/msm/adreno: Setup SMMU aparture for per-process page table
firmware: qcom: scm: Introduce CP_SMMU_APERTURE_ID
firmware: arm_scpi: Check the DVFS OPP count returned by the firmware
soc: qcom: socinfo: add IPQ5424/IPQ5404 SoC ID
dt-bindings: arm: qcom,ids: add SoC ID for IPQ5424/IPQ5404
soc: qcom: llcc: Flip the manual slice configuration condition
dt-bindings: firmware: qcom,scm: Document sm8750 SCM
firmware: qcom: uefisecapp: Allow X1E Devkit devices
misc: lan966x_pci: Fix dtc warn 'Missing interrupt-parent'
misc: lan966x_pci: Fix dtc warns 'missing or empty reg/ranges property'
soc: qcom: llcc: Add LLCC configuration for the QCS8300 platform
dt-bindings: cache: qcom,llcc: Document the QCS8300 LLCC
...
Linus Torvalds [Wed, 20 Nov 2024 23:26:46 +0000 (15:26 -0800)]
Merge tag 'soc-dt-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
"This release adds the devicetree files for an impressive number of new
SoC variants, though as expected these are all related to others we
already support:
- The microchip sam9x7 devicetree is now added, after the device
driver and platform code has already made it in. This is likely the
last ARMv5 (!) platform to ever get added, updating the 20+ year
old at91/sam9 platform with DDR3 memory and gigabit ethernet.
- On the Apple platform, there are now devicetree files for a number
of A-series SoCs in addition to the M-series ones, these are used
primarily in phones and tablets, but are closely related to the
already supported chips.
- Samsung Exynos 8895 and Exynos 990 are more phone SoCs used in
older Samsung Galaxy phones.
- Qualcomm Snapdragon 778G (SM7325) is another phone SoC, closely
related to the Snapdragon 7c+ Gen 3 (SC7280) used in low-end
laptops.
- Rockchip RK3528 and RK3576 are new variants of their TV box and
Tablet chips, still using the older ARMv8.0 cores from
RK3328/RK3399 but with a newer process and other improvements from
the RK35xx (otherwise ARMv8.2) chips. RK3566T and RK3399-S are also
added, these are just lower-cost versions of their normal
counterparts.
- TI J742S2 is a feature-reduced version of the J784s4
industrial/automotive SoC, with fewer CPU cores.
- Sophgo SG2002 is an embedded SoC with one RISC-V (C906) and one ARM
(Cortex-A53) core, at this point support is only added for running
on the RISC-V side on the LicheeRV Nano board.
A total of 92 new .dts files describing individual machines is added,
which must be a new record. The majority of these is for the newly
added chips above, notably all the Apple phones and tablets. The other
new machines include nine industrial/embedded boards with NXP i.MX6 or
i.MX8 SoCs, eight for Rockchips RK35XX and one or two each for
Rockchips RV1109, RK3308, Allwinner A33, Tegra 234, Qualcomm
qcs9100/sc8280xp/x1e80100, TI AM625 and Starfive JH7110.
As usual there are also many newly added features in existing boards
as well as cleanups and minor bugfixes"
* tag 'soc-dt-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (718 commits)
arm64: dts: apm: Remove unused and undocumented "bus_num" property
arm: dts: spear13xx: Remove unused and undocumented "pl022,slave-tx-disable" property
arm64: dts: amd: Remove unused and undocumented "amd,zlib-support" property
arm64: dts: lg131x: Update spi clock properties
arm64: dts: seattle: Update spi clock properties
arm64: dts: rockchip: use less broad pinctrl for pcie3x1 on Radxa E25
arm64: dts: rockchip: add Radxa ROCK 5C
dt-bindings: arm: rockchip: add Radxa ROCK 5C
arm64: dts: rockchip: orangepi-5-plus: Enable GPU
arm64: dts: rockchip: enable USB3 on NanoPC-T6
arm64: dts: rockchip: adapt regulator nodenames to preferred form
arm64: dts: rockchip: Enable HDMI display for rk3588 Cool Pi GenBook
arm64: dts: rockchip: Enable HDMI display for rk3588 Cool Pi 4B
arm64: dts: rockchip: Enable HDMI0 for rk3588 Cool Pi CM5 EVB
arm64: dts: rockchip: Enable HDMI on NanoPi R6C/R6S
arm64: dts: rockchip: Enable GPU on NanoPi R6C/R6S
arm64: dts: rockchip: Enable HDMI on Hardkernel ODROID-M2
arm64: dts: rockchip: Remove non-removable flag from sdmmc on rk3576-sige5
arm64: dts: allwinner: a100: perf1: Add eMMC and MMC node
arm64: dts: allwinner: pinephone: Add mount matrix to accelerometer
...
Linus Torvalds [Wed, 20 Nov 2024 23:13:02 +0000 (15:13 -0800)]
Merge tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are a number of unrelated cleanups, generally simplifying the
architecture specific header files:
- A series from Al Viro simplifies asm/vga.h, after it turns out that
most of it can be generalized.
- A series from Julian Vetter adds a common version of
memcpy_{to,from}io() and memset_io() and changes most architectures
to use that instead of their own implementation
- A series from Niklas Schnelle concludes his work to make PC style
inb()/outb() optional
- Nicolas Pitre contributes improvements for the generic do_div()
helper
- Christoph Hellwig adds a generic version of page_to_phys() and
phys_to_page(), replacing the slightly different architecture
specific definitions.
- Uwe Kleine-Koenig has a minor cleanup for ioctl definitions"
* tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits)
empty include/asm-generic/vga.h
sparc: get rid of asm/vga.h
asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to
vt_buffer.h: get rid of dead code in default scr_...() instances
tty: serial: export serial_8250_warn_need_ioport
lib/iomem_copy: fix kerneldoc format style
hexagon: simplify asm/io.h for !HAS_IOPORT
loongarch: Use new fallback IO memcpy/memset
csky: Use new fallback IO memcpy/memset
arm64: Use new fallback IO memcpy/memset
New implementation for IO memcpy and IO memset
watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240
__arch_xprod64(): make __always_inline when optimizing for performance
ARM: div64: improve __arch_xprod_64()
asm-generic/div64: optimize/simplify __div64_const32()
lib/math/test_div64: add some edge cases relevant to __div64_const32()
asm-generic: add an optional pfn_valid check to page_to_phys
asm-generic: provide generic page_to_phys and phys_to_page implementations
asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n
tty: serial: handle HAS_IOPORT dependencies
...
Linus Torvalds [Wed, 20 Nov 2024 22:51:08 +0000 (14:51 -0800)]
Merge tag 'ipe-pr-20241119' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE update from Fan Wu:
"One commit from Colin Ian King, which removes unnecessary error
handling code in the IPE boot policy generation helper program"
* tag 'ipe-pr-20241119' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
scripts: ipe: polgen: remove redundant close and error exit path
Linus Torvalds [Wed, 20 Nov 2024 22:13:28 +0000 (14:13 -0800)]
Merge tag 'efi-next-for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Just some cleanups and bug fixes this time around:
- Align handling of the compiled-in command line with the core kernel
- Measure the initrd into the TPM also when it was loaded via the EFI
file I/O protocols
- Clean up TPM event log handling
- Sanity check the EFI memory attributes table, and apply it after
kexec too
- Assorted other fixes"
* tag 'efi-next-for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Fix memory leak in efivar_ssdt_load
efi/libstub: Take command line overrides into account for loaded files
efi/libstub: Fix command line fallback handling when loading files
efi/libstub: Parse builtin command line after bootloader provided one
x86/efi: Apply EFI Memory Attributes after kexec
x86/efi: Drop support for the EFI_PROPERTIES_TABLE
efi/memattr: Ignore table if the size is clearly bogus
efi/zboot: Fix outdated comment about using LoadImage/StartImage
efi/libstub: Free correct pointer on failure
libstub,tpm: do not ignore failure case when reading final event log
tpm: fix unsigned/signed mismatch errors related to __calc_tpm2_event_size
tpm: do not ignore memblock_reserve return value
tpm: fix signed/unsigned bug when checking event logs
efi/libstub: measure initrd to PCR9 independent of source
efi/libstub: remove unnecessary cmd_line_len from efi_convert_cmdline()
efi/libstub: fix efi_parse_options() ignoring the default command line
Linus Torvalds [Wed, 20 Nov 2024 22:07:55 +0000 (14:07 -0800)]
Merge tag 'platform-drivers-x86-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
- alienware WMAX thermal interface support
- Split ACPI and platform device based amd/hsmp drivers
- AMD X3D frequency/cache mode switching support
- asus thermal policy fixes
- Disable C1 auto-demotion in suspend to allow entering the deepest
C-states
- Fix volume buttons on Thinkpad X12 Detachable Tablet Gen 1
- Replace intel_scu_ipc "workaround" with 32-bit IO
- Correct *_show() function error handling in panasonic-laptop
- Gemini Lake P2SB devfn correction
- think-lmi Admin/System certificate authentication support
- Disable WMI devices for shutdown, refactoring continues
- Vexia EDU ATLA 10 tablet support
- Surface Pro 9 5G (Arm/QCOM) support
- Misc cleanups / refactoring / improvements
* tag 'platform-drivers-x86-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (69 commits)
platform/x86: p2sb: Cache correct PCI bar for P2SB on Gemini Lake
platform/x86: panasonic-laptop: Return errno correctly in show callback
Documentation: alienware-wmi: Describe THERMAL_INFORMATION operation 0x02
alienware-wmi: create_thermal_profile() no longer brute-forces IDs
alienware-wmi: Adds support to Alienware x17 R2
alienware-wmi: extends the list of supported models
alienware-wmi: order alienware_quirks[] alphabetically
platform/x86/intel/pmt: allow user offset for PMT callbacks
platform/x86/amd/hsmp: Change the error type
platform/x86/amd/hsmp: Add new error code and error logs
platform/x86/amd: amd_3d_vcache: Add sysfs ABI documentation
platform/x86/amd: amd_3d_vcache: Add AMD 3D V-Cache optimizer driver
intel-hid: fix volume buttons on Thinkpad X12 Detachable Tablet Gen 1
platform/x86/amd/hsmp: mark hsmp_msg_desc_table[] as maybe_unused
platform/x86: asus-wmi: Use platform_profile_cycle()
platform/x86: asus-wmi: Fix inconsistent use of thermal policies
platform/x86: hp: hp-bioscfg: remove redundant if statement
MAINTAINERS: Update ISHTP ECLITE maintainer entry
platform/x86: x86-android-tablets: Add support for Vexia EDU ATLA 10 tablet
platform/x86: x86-android-tablets: Add support for getting i2c_adapter by PCI parent devname()
...
Linus Torvalds [Wed, 20 Nov 2024 22:05:34 +0000 (14:05 -0800)]
Merge tag 'media/v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
- uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in
uvc_parse_format
* tag 'media/v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format
Linus Torvalds [Wed, 20 Nov 2024 21:57:40 +0000 (13:57 -0800)]
Merge tag 'hid-for-linus-2024111801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- improvement in the way hid-bpf coexists with specific drivers (others
than hid-generic) that are already bound to devices (Benjamin
Tissoires)
- removal of three way-too-aggressive BUG_ON()s from HID drivers (He
Lugang)
- assorted cleanups and small code fixes to HID core (Dmitry Torokhov,
Yan Zhen, Nathan Chancellor, Andy Shevchenko)
- support for Corsair Void headset family (Stuart Hayhurst)
- Support for Goodix GT7986U SPI (Charles Wang)
- initial vendor-specific driver for Kysona, currently adding support
for Kysona M600 (Lode Willems)
- other assorted code cleanups and small bugfixes all over the place
* tag 'hid-for-linus-2024111801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (40 commits)
HID: multitouch: make mt_set_mode() less cryptic
HID: hid-goodix-spi: Add OF supports
dt-bindings: input: Goodix GT7986U SPI HID Touchscreen
HID: hyperv: streamline driver probe to avoid devres issues
HID: magicmouse: Apple Magic Trackpad 2 USB-C driver support
HID: rmi: Add select RMI4_F3A in Kconfig
HID: wacom: Interpret tilt data from Intuos Pro BT as signed values
HID: steelseries: Add capacity_level mapping
HID: steelseries: Fix battery requests stopping after some time
HID: hid-goodix: Fix HID get/set feature operation overwritten problem
HID: hid-goodix: Return 0 when receiving an empty HID feature package
HID: bpf: drop use of Logical|Physical|UsageRange
HID: bpf: Fix Rapoo M50 Plus Silent side buttons
HID: bpf: Fix NKRO on Mistel MD770
HID: replace BUG_ON() with WARN_ON()
HID: wacom: Set eraser status when either 'Eraser' or 'Invert' usage is set
HID: Kysona: add basic online status
HID: Kysona: check battery status every 5s using a workqueue
HID: Kysona: Add basic battery reporting for Kysona M600
HID: Add IDs for Kysona
...
Linus Torvalds [Wed, 20 Nov 2024 21:19:25 +0000 (13:19 -0800)]
Merge tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples. Fix
the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys
address of the DTB as __pa() is not always valid to use. This fixes
a warning for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports
and endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever
possible"
* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
of/address: Rework bus matching to avoid warnings
of: WARN on deprecated #address-cells/#size-cells handling
of/fdt: Don't use default address cell sizes for address translation
dt-bindings: Enable dtc "interrupt_provider" warnings
of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
media: xilinx-tpg: use new of_graph functions
fbdev: omapfb: use new of_graph functions
gpu: drm: omapdrm: use new of_graph functions
ASoC: audio-graph-card2: use new of_graph functions
ASoC: audio-graph-card: use new of_graph functions
ASoC: test-component: use new of_graph functions
of: property: use new of_graph functions
of: property: add of_graph_get_next_port_endpoint()
of: property: add of_graph_get_next_port()
of: module: remove strlen() call in of_modalias()
...
Linus Torvalds [Wed, 20 Nov 2024 20:55:41 +0000 (12:55 -0800)]
Merge tag 'auxdisplay-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay
Pull auxdisplay update from Andy Shevchenko:
- Move Holtek 16k33 driver to use agnostic i2c_get_match_data()
- Miscellaneuous cleanups
* tag 'auxdisplay-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
auxdisplay: Remove unused functions
auxdisplay: ht16k33: Make use of i2c_get_match_data()
auxdisplay: Drop explicit initialization of struct i2c_device_id::driver_data to 0
Linus Torvalds [Wed, 20 Nov 2024 20:51:32 +0000 (12:51 -0800)]
Merge tag 'mmc-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Add support for Ultra Capacity SD cards (SDUC, 2TB to 128TB)
- Add support for Ultra High-Speed II SD cards (UHS-II)
- Use a reset control for pwrseq_simple
- Add SD card quirk for broken poweroff notification
- Use GFP_NOIO for SD ACMD22
MMC host:
- bcm2835: Introduce proper clock handling
- mtk-sd: Add support for the Host-Software-Queue interface
- mtk-sd: Add support for the mt7988/mt8196 variants
- mtk-sd: Fix a couple of error paths in ->probe()
- sdhci: Add interface to support UHS-II SD cards
- sdhci_am654: Fixup support for changing the signal voltage level
- sdhci-cadence: Add support for the Microchip PIC64GX variant
- sdhci-esdhc-imx: Add support for eMMC HW-reset
- sdhci-msm: Add support for the X1E80100/IPQ5424/SAR2130P/QCS615 variants
- sdhci-of-arasan: Add support for eMMC HW-reset
- sdhci-pci-gli: Add UHS-II support for the GL9767/GL9755 variants
MEMSTICK:
- A couple of minor updates"
* tag 'mmc-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (78 commits)
mmc: pwrseq_simple: Handle !RESET_CONTROLLER properly
mmc: mtk-sd: Fix MMC_CAP2_CRYPTO flag setting
mmc: mtk-sd: Fix error handle of probe function
mmc: core: Correction a warning caused by incorrect type in assignment for UHS-II
mmc: sdhci-esdhc-imx: Update esdhc sysctl dtocv bitmask
mmc: sdhci-esdhc-imx: Implement emmc hardware reset
mmc: core: Correct type in variable assignment for UHS-II
mmc: sdhci-uhs2: correction a warning caused by incorrect type in argument
mmc: sdhci-uhs2: Remove unnecessary variables
mmc: sdhci-uhs2: Correct incorrect type in argument
mmc: sdhci: Make MMC_SDHCI_UHS2 config symbol invisible
mmc: sdhci-uhs2: Remove unnecessary NULL check
mmc: core: Fix error paths for UHS-II card init and re-init
mmc: core: Add error handling of sd_uhs2_power_up()
mmc: core: Simplify sd_uhs2_power_up()
mmc: bcm2835: Introduce proper clock handling
mmc: bcm2835: Fix type of current clock speed
dt-bindings: mmc: Add sdhci compatible for QCS615
mmc: core: Use GFP_NOIO in ACMD22
dt-bindings: mmc: sdhci-msm: Add SAR2130P compatible
...
Linus Torvalds [Wed, 20 Nov 2024 20:44:59 +0000 (12:44 -0800)]
Merge tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain updates from Ulf Hansson:
"pmdomain core:
- Set the required dev for a required OPP during genpd attach
- Add support for required OPPs to dev_pm_domain_attach_list()
pmdomain providers:
- ti: Enable GENPD_FLAG_ACTIVE_WAKEUP flag for ti_sci PM domains
- mediatek: Add support for MT6735 PM domains
- mediatek: Use OF-specific regulator API to get power domain supply
- qcom: Add support for the SM8750/SAR2130P/qcs615/qcs8300 rpmhpds
pmdomain consumers:
- Convert a couple of consumer drivers to
*_pm_domain_attach|detach_list()
opp core:
- Rework and cleanup some code that manages required OPPs
- Remove *_opp_attach|detach_genpd()"
* tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (25 commits)
pmdomain: qcom: rpmhpd: Add rpmhpd support for SM8750
dt-bindings: power: qcom,rpmpd: document the SM8750 RPMh Power Domains
pmdomain: imx: Use of_property_present() for non-boolean properties
pmdomain: imx: gpcv2: replace dev_err() with dev_err_probe()
pmdomain: ti-sci: Use scope based of_node_put() to simplify code.
pmdomain: ti-sci: Add missing of_node_put() for args.np
pmdomain: ti-sci: set the GENPD_FLAG_ACTIVE_WAKEUP flag for all PM domains
pmdomain: mediatek: Add support for MT6735
pmdomain: qcom: rpmhpd: add support for SAR2130P
dt-bindings: power: Add binding for MediaTek MT6735 power controller
dt-bindings: power: rpmpd: Add SAR2130P compatible
OPP: Drop redundant *_opp_attach|detach_genpd()
cpufreq: qcom-nvmem: Convert to dev_pm_domain_attach|detach_list()
media: venus: Convert into devm_pm_domain_attach_list() for OPP PM domain
drm/tegra: gr3d: Convert into devm_pm_domain_attach_list()
OPP: Drop redundant code in _link_required_opps()
pmdomain: core: Set the required dev for a required OPP during genpd attach
pmdomain: core: Manage the default required OPP from a separate function
PM: domains: Support required OPPs in dev_pm_domain_attach_list()
OPP: Rework _set_required_devs() to manage a single device per call
...
Linus Torvalds [Wed, 20 Nov 2024 20:41:03 +0000 (12:41 -0800)]
Merge tag 'pwrseq-updates-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing updates from Bartosz Golaszewski:
- extend support for the wcn6855 model in the pwrseq-qcom-wcn driver
- make this driver depend on CONFIG_OF=y as it uses some very
OF-specific interfaces and depends on phandle parsing
* tag 'pwrseq-updates-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: improve support for wcn6855
power: sequencing: make the QCom PMU pwrseq driver depend on CONFIG_OF
Linus Torvalds [Wed, 20 Nov 2024 20:37:06 +0000 (12:37 -0800)]
Merge tag 'gpio-updates-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"Three new drivers, support for some new models in existing ones and
lots of various tweaks and improvements across the board (switching to
using recommended APIs, code shrink and simplification, etc.).
Also a new feature in the character device uAPI where we now notify
the user-space about changes triggered by in-kernel users as well, not
only when they were done by other user-space agents.
Summary:
GPIOLIB core:
- use the new mem_is_zero() instead of memchr_inv(s, 0, n)
- don't store debounce period twice needlessly
- clean-up debugfs handling
- remove leftover comments referring to no longer used spinlocks
- unduplicate some operations like SRCU locks and initializing GPIO
descriptors
- constify the sysfs class struct
- use lock guards in GPIO sysfs code
- update GPIO uAPI internal flags all at once atomically for
consistency with other places
- modify the behavior of the sysfs interface by no longer exporting
lines that are named inside the driver code or board files with the
sysfs links bearing the line names as this has for many years been
largely unused due to the prevalence of DT, ACPI and firmware nodes
over board files and made the API inconsistent
- for GPIO interrupt providers: free irqs that are still requested by
users when removing the chip
GPIO uAPI:
- notify user-space about changes to GPIO lines' state (requested,
released, reconfigured) triggered from the kernel as well (until
now we'd only do this for changes triggered from user-space)
- to that end: modify the internal workings of the notification
mechanism by switching to an atomic notifier which allows us to
send events from atomic context
- also to that end store the debounce period in the GPIO descriptor
struct and not in the character device context struct
- while at it, also cover the corner-case of users introducing
changes over sysfs while others watch them via the character device
- don't report GPIO lines requested as interrupts as "used" to
user-space as it can still request them as GPIOs
New drivers:
- GPIO part of the MFD Congatec Board Controller
- PolarFire GPIO controller
- GPIOs on FTDI FT2232H
Driver improvements:
- use generic device property accessors instead of OF-specific ones
across many GPIO drivers (mpc8xxx, vf610, eic-sprd, davinci,
ts4900, xilinx, mvebu)
- use devres helpers to simplify error paths and either shrink or
entirely remove the driver's remove() callback (grgpio, amdpt,
menz127, max730x, ftgpio010, 74x164, ljca)
- use helper variables to store the address of pdev->dev and avoid
some line-breaks
- use device_for_each_child_node_scoped() to avoid having to put the
fwnode on breaks or errors (gpio-sim, gpio-dwapb, gpiolib-acpi)
- use a scoped bitmap to simplify the code and drop goto labels in
gpio-aggregator
- drop unneeded Kconfig dependencies on OF_GPIO (grgpio, mveby,
xilinx)
- add support for new models to gpio-aspeed, gpio-rockchip and
gpio-dwapb
- clean-up ACPI handling and some other bits in gpio-xgene-sb
- replace deprecated PCI functions in pcie-idio-24 and pci-idio-16
- allow to build davinci and mvebu drivers with COMPILE_TEST=y
- remove dead code in gpio-mb86s7x
- switch back to using platform_driver::remove() (after the
conversion to remove_new()) across the GPIO drivers
- remove remaining uses of GPIOF_ACTIVE_LOW across the tree and drop
this deprecated symbol
- convert the gpio-altera driver to no longer pull in the deprecated
legacy-of-mm-gpiochip.h header
- use of_property_present() instead of of_property_read_bool() in
gpiolib-of and gpio-rockchip
- allow to build the tegra186 driver on Tegra234 platforms in Kconfig
Late fixes:
- add a missing return value check after devm_kasprintf() to
gpio-grgpio
DT bindings:
- document the ngpios property of gpio-mmio
- add support for a new aspeed model
- fix the example for st,nomadik-gpio
Other:
- kernel doc and comments tweaks
- fix typos in TODO
- reorder headers alphabetically in some drivers
- fix incorrect format specifiers in gpio tools"
* tag 'gpio-updates-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (98 commits)
gpio: tegra186: Allow to enable driver on Tegra234
gpio: grgpio: Add NULL check in grgpio_probe
tools: gpio: Fix several incorrect format specifiers
gpio: mpfs: add CoreGPIO support
gpio: rockchip: support new version GPIO
gpio: rockchip: change the GPIO version judgment logic
gpio: rockchip: explan the format of the GPIO version ID
gpiolib: cdev: use !mem_is_zero() instead of memchr_inv(s, 0, n)
MAINTAINERS: add gpio driver to PolarFire entry
gpio: Get rid of GPIOF_ACTIVE_LOW
USB: gadget: pxa27x_udc: Avoid using GPIOF_ACTIVE_LOW
pcmcia: soc_common: Avoid using GPIOF_ACTIVE_LOW
leds: gpio: Avoid using GPIOF_ACTIVE_LOW
Input: gpio_keys_polled - avoid using GPIOF_ACTIVE_LOW
Input: gpio_keys - avoid using GPIOF_ACTIVE_LOW
gpio: Use of_property_present() for non-boolean properties
gpio: mpfs: add polarfire soc gpio support
gpio: altera: Drop legacy-of-mm-gpiochip.h header
gpio: pcie-idio-24: Replace deprecated PCI functions
gpio: pci-idio-16: Replace deprecated PCI functions
...
Linus Torvalds [Wed, 20 Nov 2024 20:34:22 +0000 (12:34 -0800)]
Merge tag 'pwm/for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm updates from Uwe Kleine-König:
"This contains a new abstraction for PWM waveforms that is more
expressive that the legacy one. Compared to the old abstraction it
contains a duty_offset member instead of polarity. This new
abstraction is already used in an ADC driver merged into the iio tree.
The new API requires changes to the lowlevel drivers. For now there
are two drivers that are converted to the new API (axi-pwmgen and
stm32). Converted drivers continue to work with the old API. Drivers
not yet converted only work with the older API.
Otherwise it's the usual collection of fixes, cleanups and dt doc
updates.
This time around thanks go to Andy Shevchenko, Clark Wang, Conor
Dooley, David Lechner, Dimitri Fedrau, Frank Li, Jun Li, Kelvin Zhang,
Krzysztof Kozlowski, Nuno Sa, Shen Lichuan and Trevor Gamblin for code
contributions, testing and review"
* tag 'pwm/for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: Assume a disabled PWM to emit a constant inactive output
pwm: core: export pwm_get_state_hw()
pwm: core: use device_match_name() instead of strcmp(dev_name(...
dt-bindings: pwm: adi,axi-pwmgen: Increase #pwm-cells to 3
pwm: imx27: Use clk_bulk_*() API to simplify clock handling
pwm: imx27: Workaround of the pwm output bug when decrease the duty cycle
pwm: axi-pwmgen: Enable FORCE_ALIGN by default
pwm: axi-pwmgen: Rename 0x10 register
dt-bindings: pwm: amlogic: Document C3 PWM
pwm: axi-pwmgen: Create a dedicated function for getting driver data from a chip
pwm: atmel-tcb: Use min() macro
pwm: stm32: Fix error checking for a regmap_read() call
pwm: Add kernel doc for members added to pwm_ops recently
pwm: Reorder symbols in core.c
pwm: stm32: Implementation of the waveform callbacks
pwm: axi-pwmgen: Implementation of the waveform callbacks
pwm: Add tracing for waveform callbacks
pwm: Provide new consumer API functions for waveforms
pwm: New abstraction for PWM waveforms
pwm: Add more locking
Linus Torvalds [Wed, 20 Nov 2024 20:23:06 +0000 (12:23 -0800)]
Merge tag 'spi-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"The only real core work we've got this time around is the completion
of the transition to the new host/target naming for the core APIs,
Kconfig still needs doing but that's a lot less invasive.
Otherwise the big changes are the new drivers that have been added:
- Completion of the conversion to spi_alloc_host()/_target() and
removal of the old naming.
- Cleanups for Rockchip drivers, these brought in a new logging
helper in the driver core for warnings during probe.
- Support for configuration of the word delay via spidev_test.
- Support for AMD HID2 controllers, Apple SPI controller and Realtek
SPI-NAND controllers"
* tag 'spi-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (58 commits)
spi: imx: support word delay
spi: imx: pass struct spi_transfer to prepare_transfer()
spi: cs42l43: Add GPIO speaker id support to the bridge configuration
spi: Delete useless checks
spi: apple: Remove unnecessary .owner for apple_spi_driver
spi: spidev_test: add support for word delay
spi: apple: Add driver for Apple SPI controller
spi: dt-bindings: apple,spi: Add binding for Apple SPI controllers
spi: Use of_property_present() for non-boolean properties
spi: zynqmp-gqspi: Undo runtime PM changes at driver exit time​
spi: spi-mem: rtl-snand: Correctly handle DMA transfers
spi: tegra210-quad: Avoid shift-out-of-bounds
spi: axi-spi-engine: Emit trace events for spi transfers
dt-bindings: spi: sprd,sc9860-spi: convert to YAML
spi: Replace deprecated PCI functions
spi: dt-bindings: samsung: Add a compatible for samsung,exynos8895-spi
spi: spi-mem: Add Realtek SPI-NAND controller
dt-bindings: spi: Add realtek,rtl9301-snand
spi: make class structs const
spi: dt-bindings: brcm,bcm2835-aux-spi: Convert to dtschema
...
Linus Torvalds [Wed, 20 Nov 2024 20:18:50 +0000 (12:18 -0800)]
Merge tag 'regulator-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This was a quite quiet release for regulators on the drivers front,
but we do have two small but useful core improvements:
- Improve handling of cases where DT platforms specify both DT and
init_data based configuration for a single regulator.
- A helper of_regulator_get_optional() to simplify writing generic
bindings for regulator consumers in subsystems.
There's also some YAML conversions for the DT bindings which are a big
part of the diffstat"
* tag 'regulator-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: qcom,rpmh: Correct PM8550VE supplies
regulator: Switch back to struct platform_driver::remove()
regulator: doc: remove documentation comment for regulator_init
regulator: doc: add missing documentation for init_cb
regulator: rk808: Restrict DVS GPIOs to the RK808 variant only
regulator: rk808: Use dev_err_probe() in the probe path
regulator: rk808: Perform trivial code cleanups
regulator: dt-bindings: qcom,qca6390-pmu: add more properties for wcn6855
regulator: dt-bindings: lltc,ltc3676: convert to YAML
regulator: core: Use fsleep() to get best sleep mechanism
regulator: core: remove machine init callback from config
regulator: core: add callback to perform runtime init
regulator: core: do not silently ignore provided init_data
regulator: max5970: Drop unused structs
regulator: dt-bindings: vctrl-regulator: convert to YAML
regulator: qcom-smd: make smd_vreg_rpm static
regulator: Call of_node_put() only once in rzg2l_usb_vbus_regulator_probe()
regulator: isl6271a: Drop explicit initialization of struct i2c_device_id::driver_data to 0
regulator: Add devres version of of_regulator_get_optional()
regulator: Add of_regulator_get_optional() for pure DT regulator lookup
Linus Torvalds [Wed, 20 Nov 2024 20:09:47 +0000 (12:09 -0800)]
Merge tag 'regmap-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"The main thing for regmap this time around is some improvements of the
lockdep annotations which stop some false positives. We also have one
new helper for setting a bitmask to the same value, and several test
improvements"
* tag 'regmap-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: provide regmap_assign_bits()
regmap: irq: Set lockdep class for hierarchical IRQ domains
regmap: maple: Provide lockdep (sub)class for maple tree's internal lock
regmap: kunit: Fix repeated test param
regcache: Improve documentation of available cache types
regmap: Specifically test writing 0 as a value to sparse caches
regmap-irq: Consistently use memset32() in regmap_irq_thread()
Linus Torvalds [Wed, 20 Nov 2024 19:54:39 +0000 (11:54 -0800)]
Merge tag 'linux_kselftest-next-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"timer test:
- remove duplicate defines
- fixes to improve error reporting
rtc test:
- check rtc alarm status in alarm test
resctrl test:
- add array overrun checks during iMC config parsing code and when
reading strings
- fixes and reorganizing code"
* tag 'linux_kselftest-next-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
selftests/resctrl: Replace magic constants used as array size
selftests/resctrl: Keep results from first test run
selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
selftests/resctrl: Use cache size to determine "fill_buf" buffer size
selftests/resctrl: Ensure measurements skip initialization of default benchmark
selftests/resctrl: Make benchmark parameter passing robust
selftests/resctrl: Remove unused measurement code
selftests/resctrl: Only support measured read operation
selftests/resctrl: Remove "once" parameter required to be false
selftests/resctrl: Make wraparound handling obvious
selftests/resctrl: Protect against array overflow when reading strings
selftests/resctrl: Protect against array overrun during iMC config parsing
selftests/resctrl: Fix memory overflow due to unhandled wraparound
selftests/resctrl: Print accurate buffer size as part of MBM results
selftests/resctrl: Make functions only used in same file static
selftests: Add a test mangling with uc_sigmask
selftests: Rename sigaltstack to generic signal
selftest: rtc: Add to check rtc alarm status for alarm related test
selftests:timers: remove local CLOCKID defines
selftests: timers: Remove unneeded semicolon
...
Linus Torvalds [Wed, 20 Nov 2024 19:47:43 +0000 (11:47 -0800)]
Merge tag 'kgdb-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"A relatively modest collection of changes:
- Adopt kstrtoint() and kstrtol() instead of the simple_strtoXX
family for better error checking of user input.
- Align the print behavour when breakpoints are enabled and disabled
by adopting the current behaviour of breakpoint disable for both.
- Remove some of the (rather odd and user hostile) hex fallbacks and
require kdb users to prefix with 0x instead.
- Tidy up (and fix) control code handling in kdb's keyboard code.
This makes the control code handling at the keyboard behave the
same way as it does via the UART.
- Switch my own entry in MAINTAINERS to my @kernel.org address"
* tag 'kgdb-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: fix ctrl+e/a/f/b/d/p/n broken in keyboard mode
MAINTAINERS: Use Daniel Thompson's korg address for kgdb work
kdb: Fix breakpoint enable to be silent if already enabled
kdb: Remove fallback interpretation of arbitrary numbers as hex
trace: kdb: Replace simple_strtoul with kstrtoul in kdb_ftdump
kdb: Replace the use of simple_strto with safer kstrto in kdb_main
Linus Torvalds [Wed, 20 Nov 2024 19:34:10 +0000 (11:34 -0800)]
Merge tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Restructure the function graph shadow stack to prepare it for use
with kretprobes
With the goal of merging the shadow stack logic of function graph and
kretprobes, some more restructuring of the function shadow stack is
required.
Move out function graph specific fields from the fgraph
infrastructure and store it on the new stack variables that can pass
data from the entry callback to the exit callback.
Hopefully, with this change, the merge of kretprobes to use fgraph
shadow stacks will be ready by the next merge window.
- Make shadow stack 4k instead of using PAGE_SIZE.
Some architectures have very large PAGE_SIZE values which make its
use for shadow stacks waste a lot of memory.
- Give shadow stacks its own kmem cache.
When function graph is started, every task on the system gets a
shadow stack. In the future, shadow stacks may not be 4K in size.
Have it have its own kmem cache so that whatever size it becomes will
still be efficient in allocations.
- Initialize profiler graph ops as it will be needed for new updates to
fgraph
- Convert to use guard(mutex) for several ftrace and fgraph functions
- Add more comments and documentation
- Show function return address in function graph tracer
Add an option to show the caller of a function at each entry of the
function graph tracer, similar to what the function tracer does.
- Abstract out ftrace_regs from being used directly like pt_regs
ftrace_regs was created to store a partial pt_regs. It holds only the
registers and stack information to get to the function arguments and
return values. On several archs, it is simply a wrapper around
pt_regs. But some users would access ftrace_regs directly to get the
pt_regs which will not work on all archs. Make ftrace_regs an
abstract structure that requires all access to its fields be through
accessor functions.
- Show how long it takes to do function code modifications
When code modification for function hooks happen, it always had the
time recorded in how long it took to do the conversion. But this
value was never exported. Recently the code was touched due to new
ROX modification handling that caused a large slow down in doing the
modifications and had a significant impact on boot times.
Expose the timings in the dyn_ftrace_total_info file. This file was
created a while ago to show information about memory usage and such
to implement dynamic function tracing. It's also an appropriate file
to store the timings of this modification as well. This will make it
easier to see the impact of changes to code modification on boot up
timings.
- Other clean ups and small fixes
* tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits)
ftrace: Show timings of how long nop patching took
ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash()
ftrace: Use guard to take the ftrace_lock in release_probe()
ftrace: Use guard to lock ftrace_lock in cache_mod()
ftrace: Use guard for match_records()
fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()
fgraph: Give ret_stack its own kmem cache
fgraph: Separate size of ret_stack from PAGE_SIZE
ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
selftests/ftrace: Fix check of return value in fgraph-retval.tc test
ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros
ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
ftrace: Make ftrace_regs abstract from direct use
fgragh: No need to invoke the function call_filter_check_discard()
fgraph: Simplify return address printing in function graph tracer
function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()
function_graph: Support recording and printing the function return address
ftrace: Have calltime be saved in the fgraph storage
ftrace: Use a running sleeptime instead of saving on shadow stack
fgraph: Use fgraph data to store subtime for profiler
...
Linus Torvalds [Wed, 20 Nov 2024 18:08:00 +0000 (10:08 -0800)]
Merge tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve the default select_cpu() implementation making it topology
aware and handle WAKE_SYNC better.
- set_arg_maybe_null() was used to inform the verifier which ops args
could be NULL in a rather hackish way. Use the new __nullable CFI
stub tags instead.
- On Sapphire Rapids multi-socket systems, a BPF scheduler, by
hammering on the same queue across sockets, could live-lock the
system to the point where the system couldn't make reasonable forward
progress.
This could lead to soft-lockup triggered resets or stalling out
bypass mode switch and thus BPF scheduler ejection for tens of
minutes if not hours. After trying a number of mitigations, the
following set worked reliably:
- Injecting artificial cpu_relax() loops in two places while
sched_ext is trying to turn on the bypass mode.
- Triggering scheduler ejection when soft-lockup detection is
imminent (a quarter of threshold left).
While not the prettiest, the impact both in terms of code complexity
and overhead is minimal.
- A common complaint on the API is the overuse of the word "dispatch"
and the confusion around "consume". This is due to how the dispatch
queues became more generic over time. Rename the affected kfuncs for
clarity. Thanks to BPF's compatibility features, this change can be
made in a way that's both forward and backward compatible. The
compatibility code will be dropped in a few releases.
- Other misc changes
* tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (21 commits)
sched_ext: Replace scx_next_task_picked() with switch_class() in comment
sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*()
sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()
sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()
sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context
sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu_dfl()
sched_ext: Clarify sched_ext_ops table for userland scheduler
sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
sched_ext: Avoid live-locking bypass mode switching
sched_ext: Fix incorrect use of bitwise AND
sched_ext: Do not enable LLC/NUMA optimizations when domains overlap
sched_ext: Introduce NUMA awareness to the default idle selection policy
sched_ext: Replace set_arg_maybe_null() with __nullable CFI stub tags
sched_ext: Rename CFI stubs to names that are recognized by BPF
sched_ext: Introduce LLC awareness to the default idle selection policy
sched_ext: Clarify ops.select_cpu() for single-CPU tasks
sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
sched_ext: Use btf_ids to resolve task_struct
sched/ext: Use tg_cgroup() to elieminate duplicate code
sched/ext: Fix unmatch trailing comment of CONFIG_EXT_GROUP_SCHED
...
Linus Torvalds [Wed, 20 Nov 2024 17:54:49 +0000 (09:54 -0800)]
Merge tag 'cgroup-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpu.stat now also shows niced CPU time
- Freezer and cpuset optimizations
- Other misc changes
* tag 'cgroup-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set
cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation
cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()"
MAINTAINERS: remove Zefan Li
cgroup/freezer: Add cgroup CGRP_FROZEN flag update helper
cgroup/freezer: Reduce redundant traversal for cgroup_freeze
cgroup/bpf: only cgroup v2 can be attached by bpf programs
Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline"
selftests/cgroup: Fix compile error in test_cpu.c
cgroup/rstat: Selftests for niced CPU statistics
cgroup/rstat: Tracking cgroup-level niced CPU time
cgroup/cpuset: Fix spelling errors in file kernel/cgroup/cpuset.c
Linus Torvalds [Wed, 20 Nov 2024 17:41:11 +0000 (09:41 -0800)]
Merge tag 'wq-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- The maximum concurrency limit of 512 which was set a long time ago is
too low now.
A legitimate use (BPF cgroup release) of system_wq could saturate it
under stress test conditions leading to false dependencies and
deadlocks.
While the offending use was switched to a dedicated workqueue, use
the opportunity to bump WQ_MAX_ACTIVE four fold and document that
system workqueue shouldn't be saturated. Workqueue should add at
least a warning mechanism for cases where system workqueues are
saturated.
- Recent workqueue updates to support more flexible execution topology
made unbound workqueues use per-cpu worker pool frontends which
pushed up workqueue flush overhead.
As consecutive CPUs are likely to be pointing to the same worker
pool, reduce overhead by switching locks only when necessary.
* tag 'wq-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Reduce expensive locks for unbound workqueue
workqueue: Adjust WQ_MAX_ACTIVE from 512 to 2048
workqueue: doc: Add a note saturating the system_wq is not permitted
Linus Torvalds [Wed, 20 Nov 2024 17:36:05 +0000 (09:36 -0800)]
Merge tag 'probes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Kprobes cleanups. Functionality does not change.
- kprobes: Cleanup the config comment
Adjust #endif comments.
- kprobes: Cleanup collect_one_slot() and __disable_kprobe()
Make fail fast to reduce code nested level.
- kprobes: Use struct_size() in __get_insn_slot()
Use struct_size() to avoid special macro.
- x86/kprobes: Cleanup kprobes on ftrace code
Use macro instead of direct field access/magic number, and avoid
redundant instruction pointer setting"
* tag 'probes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
x86/kprobes: Cleanup kprobes on ftrace code
kprobes: Use struct_size() in __get_insn_slot()
kprobes: Cleanup collect_one_slot() and __disable_kprobe()
kprobes: Cleanup the config comment
Linus Torvalds [Wed, 20 Nov 2024 17:33:23 +0000 (09:33 -0800)]
Merge tag 'livepatching-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching update from Petr Mladek:
- A new selftest for livepatching of a kprobed function
* tag 'livepatching-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: test livepatching a kprobed function
selftests: livepatch: save and restore kprobe state
selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIR
Linus Torvalds [Wed, 20 Nov 2024 17:21:11 +0000 (09:21 -0800)]
Merge tag 'printk-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Print more precise information about the printk log buffer memory
usage.
- Make sure that the sysrq title is shown on the console even when
deferred.
- Do not enable earlycon by `console=` which is meant to disable the
default console.
* tag 'printk-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: add dummy printk_force_console_enter/exit helpers
tty: sysrq: Use printk_force_console context on __handle_sysrq
printk: Introduce FORCE_CON flag
printk: Improve memory usage logging during boot
init: Don't proxy `console=` to earlycon
Linus Torvalds [Wed, 20 Nov 2024 17:16:45 +0000 (09:16 -0800)]
Merge tag 'docs-6.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Another moderately busy cycle in docsland:
- Work on Chinese translations has picked up again. Happily, they are
maintaining the existing translations and not just adding new ones.
- Some maintenance of the Japanese and Italian translations as well.
- The removal of the venerable "dontdiff" file. It has long outlived
its usefulness and contained entries ("parse.*") that would
actively mask actual source change.
- The addition of enforcement information to the code-of-conduct
documentation.
Along with some build-system fixes and a lot of typo and language
fixes"
* tag 'docs-6.13' of git://git.lwn.net/linux: (52 commits)
Documentation/CoC: spell out enforcement for unacceptable behaviors
docs: fix typos and whitespace in Documentation/process/backporting.rst
docs/zh_CN: fix one sentence in llvm.rst
docs: bug-bisect: add a note about bisecting -next
docs/zh_CN: add the translation of kbuild/llvm.rst
Documentation: Fix incorrect paths/magic in magic numbers rst
Documentation/maintainer-tip: Fix typos
Documentation: Improve crash_kexec_post_notifiers description
Docs/zh_CN: Translate physical_memory.rst to Simplified Chinese
Documentation: admin: reorganize kernel-parameters intro
docs/zh_CN: update the translation of process/programming-language.rst
docs/zh_CN: update the translation of mm/page_owner.rst
docs/zh_CN: update the translation of mm/page_table_check.rst
docs/zh_CN: update the translation of mm/overcommit-accounting.rst
docs/zh_CN: update the translation of mm/admon/faq.rst
docs/zh_CN: update the translation of mm/active_mm.rst
docs/zh_CN: update the translation of mm/hmm.rst
docs: remove Documentation/dontdiff
docs/zh_CN: Add a entry in Chinese glossary
Docs/zh_CN: Fix the pfn calculation error in page_tables.rst
...
Linus Torvalds [Wed, 20 Nov 2024 00:35:06 +0000 (16:35 -0800)]
Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
Linus Torvalds [Wed, 20 Nov 2024 00:09:13 +0000 (16:09 -0800)]
Merge tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vdso data page handling updates from Thomas Gleixner:
"First steps of consolidating the VDSO data page handling.
The VDSO data page handling is architecture specific for historical
reasons, but there is no real technical reason to do so.
Aside of that VDSO data has become a dump ground for various
mechanisms and fail to provide a clear separation of the
functionalities.
Clean this up by:
- consolidating the VDSO page data by getting rid of architecture
specific warts especially in x86 and PowerPC.
- removing the last includes of header files which are pulling in
other headers outside of the VDSO namespace.
- seperating timekeeping and other VDSO data accordingly.
Further consolidation of the VDSO page handling is done in subsequent
changes scheduled for the next merge window.
This also lays the ground for expanding the VDSO time getters for
independent PTP clocks in a generic way without making every
architecture add support seperately"
* tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/vdso: Add missing brackets in switch case
vdso: Rename struct arch_vdso_data to arch_vdso_time_data
powerpc: Split systemcfg struct definitions out from vdso
powerpc: Split systemcfg data out of vdso data page
powerpc: Add kconfig option for the systemcfg page
powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors
powerpc/pseries/lparcfg: Fix printing of system_active_processors
powerpc/procfs: Propagate error of remap_pfn_range()
powerpc/vdso: Remove offset comment from 32bit vdso_arch_data
x86/vdso: Split virtual clock pages into dedicated mapping
x86/vdso: Delete vvar.h
x86/vdso: Access vdso data without vvar.h
x86/vdso: Move the rng offset to vsyscall.h
x86/vdso: Access rng vdso data without vvar.h
x86/vdso: Access timens vdso data without vvar.h
x86/vdso: Allocate vvar page from C code
x86/vdso: Access rng data from kernel without vvar
x86/vdso: Place vdso_data at beginning of vvar page
x86/vdso: Use __arch_get_vdso_data() to access vdso data
x86/mm/mmap: Remove arch_vma_name()
...
Linus Torvalds [Tue, 19 Nov 2024 23:54:19 +0000 (15:54 -0800)]
Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Tree wide:
- Make nr_irqs static to the core code and provide accessor functions
to remove existing and prevent future aliasing problems with local
variables or function arguments of the same name.
Core code:
- Prevent freeing an interrupt in the devres code which is not
managed by devres in the first place.
- Use seq_put_decimal_ull_width() for decimal values output in
/proc/interrupts which increases performance significantly as it
avoids parsing the format strings over and over.
- Optimize raising the timer and hrtimer soft interrupts by using the
'set bit only' variants instead of the combined version which
checks whether ksoftirqd should be woken up. The latter is a
pointless exercise as both soft interrupts are raised in the
context of the timer interrupt and therefore never wake up
ksoftirqd.
- Delegate timer/hrtimer soft interrupt processing to a dedicated
thread on RT.
Timer and hrtimer soft interrupts are always processed in ksoftirqd
on RT enabled kernels. This can lead to high latencies when other
soft interrupts are delegated to ksoftirqd as well.
The separate thread allows to run them seperately under a RT
scheduling policy to reduce the latency overhead.
Drivers:
- New drivers or extensions of existing drivers to support Renesas
RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
chips
- Support for multi-cluster GICs on MIPS.
MIPS CPUs can come with multiple CPU clusters, where each CPU
cluster has its own GIC (Generic Interrupt Controller). This
requires to access the GIC of a remote cluster through a redirect
register block.
This is encapsulated into a set of helper functions to keep the
complexity out of the actual code paths which handle the GIC
details.
- Support for encrypted guests in the ARM GICV3 ITS driver
The ITS page needs to be shared with the hypervisor and therefore
must be decrypted.
- Small cleanups and fixes all over the place"
* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
irqchip/riscv-aplic: Prevent crash when MSI domain is missing
genirq/proc: Use seq_put_decimal_ull_width() for decimal values
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
timers: Use __raise_softirq_irqoff() to raise the softirq.
hrtimer: Use __raise_softirq_irqoff() to raise the softirq
riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
irqchip: Add T-HEAD C900 ACLINT SSWI driver
dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
irqchip/mips-gic: Multi-cluster support
irqchip/mips-gic: Setup defaults in each cluster
irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
irqchip/mips-gic: Replace open coded online CPU iterations
genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
genirq/devres: Don't free interrupt which is not managed by devres
irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
irqchip/aspeed-intc: Add AST27XX INTC support
dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
...