Foster Snowhill [Sat, 25 Jan 2025 23:54:07 +0000 (00:54 +0100)]
usbnet: ipheth: break up NCM header size computation
Originally, the total NCM header size was computed as the sum of two
vaguely labelled constants. While accurate, it wasn't particularly clear
where they were coming from.
Use sizes of existing NCM structs where available. Define the total
NDP16 size based on the maximum amount of DPEs that can fit into the
iOS-specific fixed-size header.
This change does not fix any particular issue. Rather, it introduces
intermediate constants that will simplify subsequent commits.
It should also make it clearer for the reader where the constant values
come from.
Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Foster Snowhill <forst@pen.gy> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Foster Snowhill [Sat, 25 Jan 2025 23:54:06 +0000 (00:54 +0100)]
usbnet: ipheth: refactor NCM datagram loop
Introduce an rx_error label to reduce repetitions in the header
signature checks.
Store wDatagramIndex and wDatagramLength after endianness conversion to
avoid repeated le16_to_cpu() calls.
Rewrite the loop to return on a null trailing DPE, which is required
by the CDC NCM spec. In case it is missing, fall through to rx_error.
This change does not fix any particular issue. Its purpose is to
simplify a subsequent commit that fixes a potential OoB read by limiting
the maximum amount of processed DPEs.
Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Foster Snowhill <forst@pen.gy> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Foster Snowhill [Sat, 25 Jan 2025 23:54:05 +0000 (00:54 +0100)]
usbnet: ipheth: use static NDP16 location in URB
Original code allowed for the start of NDP16 to be anywhere within the
URB based on the `wNdpIndex` value in NTH16. Only the start position of
NDP16 was checked, so it was possible for even the fixed-length part
of NDP16 to extend past the end of URB, leading to an out-of-bounds
read.
On iOS devices, the NDP16 header always directly follows NTH16. Rely on
and check for this specific format.
This, along with NCM-specific minimal URB length check that already
exists, will ensure that the fixed-length part of NDP16 plus a set
amount of DPEs fit within the URB.
Note that this commit alone does not fully address the OoB read.
The limit on the amount of DPEs needs to be enforced separately.
Foster Snowhill [Sat, 25 Jan 2025 23:54:04 +0000 (00:54 +0100)]
usbnet: ipheth: check that DPE points past NCM header
By definition, a DPE points at the start of a network frame/datagram.
Thus it makes no sense for it to point at anything that's part of the
NCM header. It is not a security issue, but merely an indication of
a malformed DPE.
Enforce that all DPEs point at the data portion of the URB, past the
NCM header.
Thomas Weißschuh [Sat, 25 Jan 2025 09:28:38 +0000 (10:28 +0100)]
ptp: Properly handle compat ioctls
Pointer arguments passed to ioctls need to pass through compat_ptr() to
work correctly on s390; as explained in Documentation/driver-api/ioctl.rst.
Detect compat mode at runtime and call compat_ptr() for those commands
which do take pointer arguments.
Nikita Zhandarovich [Fri, 24 Jan 2025 09:30:20 +0000 (01:30 -0800)]
net: usb: rtl8150: enable basic endpoint checking
Syzkaller reports [1] encountering a common issue of utilizing a wrong
usb endpoint type during URB submitting stage. This, in turn, triggers
a warning shown below.
For now, enable simple endpoint checking (specifically, bulk and
interrupt eps, testing control one is not essential) to mitigate
the issue with a view to do other related cosmetic changes later,
if they are necessary.
Jakub Kicinski [Tue, 28 Jan 2025 00:16:31 +0000 (16:16 -0800)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-01-24 (idpf, ice, iavf)
For idpf:
Emil adds memory barrier when accessing control queue descriptors and
restores call to idpf_vc_xn_shutdown() when resetting.
Manoj Vishwanathan expands transaction lock to properly protect xn->salt
value and adds additional debugging information.
Marco Leogrande converts workqueues to be unbound.
For ice:
Przemek fixes incorrect size use for array.
Mateusz removes reporting of invalid parameter and value.
For iavf:
Michal adjusts some VLAN changes to occur without a PF call to avoid
timing issues with the calls.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: allow changing VLAN state without calling PF
ice: remove invalid parameter of equalizer
ice: fix ice_parser_rt::bst_key array size
idpf: add more info during virtchnl transaction timeout/salt mismatch
idpf: convert workqueues to unbound
idpf: Acquire the lock before accessing the xn->salt
idpf: fix transaction timeouts on reset
idpf: add read memory barrier when checking descriptor done bit
====================
1) Fix incrementing the upper 32 bit sequence numbers for GSO skbs.
From Jianbo Liu.
2) Fix an out-of-bounds read on xfrm state lookup.
From Florian Westphal.
3) Fix secpath handling on packet offload mode.
From Alexandre Cassen.
4) Fix the usage of skb->sk in the xfrm layer.
5) Don't disable preemption while looking up cache state
to fix PREEMPT_RT.
From Sebastian Sewior.
* tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Don't disable preemption while looking up cache state.
xfrm: Fix the usage of skb->sk
xfrm: delete intermediate secpath entry in packet offload mode
xfrm: state: fix out-of-bounds read during lookup
xfrm: replay: Fix the update of replay_esn->oseq_hi for GSO
====================
The root cause is the bad handling of disconnect() generated internally
by the MPTCP protocol in case of connect FASTOPEN errors.
Address the issue increasing the socket disconnect counter even on such
a case, to allow other threads waiting on the same socket lock to
properly error out.
With the in-kernel path-manager, it is possible to change the 'fullmesh'
flag. The code in mptcp_pm_nl_fullmesh() expects to change it only on
'subflow' endpoints, to recreate more or less subflows using the linked
address.
Unfortunately, the set_flags() hook was a bit more permissive, and
allowed 'implicit' endpoints to get the 'fullmesh' flag while it is not
allowed before.
That's what syzbot found, triggering the following warning:
Here, syzbot managed to set the 'fullmesh' flag on an 'implicit' and
used -- according to 'id_avail_bitmap' -- endpoint, causing the PM to
try decrement the local_addr_used counter which is only incremented for
the 'subflow' endpoint.
Note that 'no type' endpoints -- not 'subflow', 'signal', 'implicit' --
are fine, because their ID will not be marked as used in the 'id_avail'
bitmap, and setting 'fullmesh' can help forcing the creation of subflow
when receiving an ADD_ADDR.
Fixes: 73c762c1f07d ("mptcp: set fullmesh flag in pm_netlink") Cc: stable@vger.kernel.org Reported-by: syzbot+cd16e79c1e45f3fe0377@syzkaller.appspotmail.com Closes: https://lore.kernel.org/6786ac51.050a0220.216c54.00a6.GAE@google.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/540 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-2-af73258a726f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 23 Jan 2025 18:05:54 +0000 (19:05 +0100)]
mptcp: consolidate suboption status
MPTCP maintains the received sub-options status is the bitmask carrying
the received suboptions and in several bitfields carrying per suboption
additional info.
Zeroing the bitmask before parsing is not enough to ensure a consistent
status, and the MPTCP code has to additionally clear some bitfiled
depending on the actually parsed suboption.
The above schema is fragile, and syzbot managed to trigger a path where
a relevant bitfield is not cleared/initialized:
Local variable mp_opt created at:
mptcp_incoming_options+0x119/0x3d30 net/mptcp/options.c:1127
tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
The current schema is too fragile; address the issue grouping all the
state-related data together and clearing the whole group instead of
just the bitmask. This also cleans-up the code a bit, as there is no
need to individually clear "random" bitfield in a couple of places
any more.
Chenyuan Yang [Thu, 23 Jan 2025 21:42:13 +0000 (15:42 -0600)]
net: davicom: fix UAF in dm9000_drv_remove
dm is netdev private data and it cannot be
used after free_netdev() call. Using dm after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.
This is similar to the issue fixed in commit ad297cd2db89 ("net: qcom/emac: fix UAF in emac_remove").
This bug is detected by our static analysis tool.
Fixes: cf9e60aa69ae ("net: davicom: Fix regulator not turned off on driver removal") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> CC: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/20250123214213.623518-1-chenyuan0y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Milos Reljin [Fri, 24 Jan 2025 10:41:02 +0000 (10:41 +0000)]
net: phy: c45-tjaxx: add delay between MDIO write and read in soft_reset
In application note (AN13663) for TJA1120, on page 30, there's a figure
with average PHY startup timing values following software reset.
The time it takes for SMI to become operational after software reset
ranges roughly from 500 us to 1500 us.
This commit adds 2000 us delay after MDIO write which triggers software
reset. Without this delay, soft_reset function returns an error and
prevents successful PHY init.
Shigeru Yoshida [Thu, 23 Jan 2025 14:57:46 +0000 (23:57 +0900)]
vxlan: Fix uninit-value in vxlan_vnifilter_dump()
KMSAN reported an uninit-value access in vxlan_vnifilter_dump() [1].
If the length of the netlink message payload is less than
sizeof(struct tunnel_msg), vxlan_vnifilter_dump() accesses bytes
beyond the message. This can lead to uninit-value access. Fix this by
returning an error in such situations.
David Howells [Thu, 23 Jan 2025 08:59:12 +0000 (08:59 +0000)]
rxrpc, afs: Fix peer hash locking vs RCU callback
In its address list, afs now retains pointers to and refs on one or more
rxrpc_peer objects. The address list is freed under RCU and at this time,
it puts the refs on those peers.
Now, when an rxrpc_peer object runs out of refs, it gets removed from the
peer hash table and, for that, rxrpc has to take a spinlock. However, it
is now being called from afs's RCU cleanup, which takes place in BH
context - but it is just taking an ordinary spinlock.
The put may also be called from non-BH context, and so there exists the
possibility of deadlock if the BH-based RCU cleanup happens whilst the hash
spinlock is held. This led to the attached lockdep complaint.
Fix this by changing spinlocks of rxnet->peer_hash_lock back to
BH-disabling locks.
================================
WARNING: inconsistent lock state
6.13.0-rc5-build2+ #1223 Tainted: G E
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff88810babe228 (&rxnet->peer_hash_lock){+.?.}-{3:3}, at: rxrpc_put_peer+0xcb/0x180
{SOFTIRQ-ON-W} state was registered at:
mark_usage+0x164/0x180
__lock_acquire+0x544/0x990
lock_acquire.part.0+0x103/0x280
_raw_spin_lock+0x2f/0x40
rxrpc_peer_keepalive_worker+0x144/0x440
process_one_work+0x486/0x7c0
process_scheduled_works+0x73/0x90
worker_thread+0x1c8/0x2a0
kthread+0x19b/0x1b0
ret_from_fork+0x24/0x40
ret_from_fork_asm+0x1a/0x30
irq event stamp: 972402
hardirqs last enabled at (972402): [<ffffffff8244360e>] _raw_spin_unlock_irqrestore+0x2e/0x50
hardirqs last disabled at (972401): [<ffffffff82443328>] _raw_spin_lock_irqsave+0x18/0x60
softirqs last enabled at (972300): [<ffffffff810ffbbe>] handle_softirqs+0x3ee/0x430
softirqs last disabled at (972313): [<ffffffff810ffc54>] __irq_exit_rcu+0x44/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rxnet->peer_hash_lock);
<Interrupt>
lock(&rxnet->peer_hash_lock);
*** DEADLOCK ***
1 lock held by swapper/1/0:
#0: ffffffff83576be0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x7/0x30
Jan Stancek [Thu, 23 Jan 2025 12:38:51 +0000 (13:38 +0100)]
selftests: net/{lib,openvswitch}: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.
Some Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
/usr/bin/ld: /tmp/ccd2apay.o: relocation R_X86_64_32 against
`.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/lib/csum] Error 1
openvswitch/Makefile CFLAGS currently do not appear to be used, but
fix it anyway for the case when new tests are introduced in future.
Jan Stancek [Thu, 23 Jan 2025 08:35:42 +0000 (09:35 +0100)]
selftests: mptcp: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.
mptcp Makefile currently overrides CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/mptcp/mptcp_sockopt] Error 1
/usr/bin/ld: /tmp/ccqyMVdb.o: relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
Jakub Kicinski [Thu, 23 Jan 2025 23:16:20 +0000 (15:16 -0800)]
net: page_pool: don't try to stash the napi id
Page ppol tried to cache the NAPI ID in page pool info to avoid
having a dependency on the life cycle of the NAPI instance.
Since commit under Fixes the NAPI ID is not populated until
napi_enable() and there's a good chance that page pool is
created before NAPI gets enabled.
Protect the NAPI pointer with the existing page pool mutex,
the reading path already holds it. napi_id itself we need
to READ_ONCE(), it's protected by netdev_lock() which are
not holding in page pool.
This is the SET path, where we call GET to either check user request
against max values, or check if any of the settings will change.
The logic in netdevsim is trying to report the default (ENABLED)
if user has not requested any specific setting. The user setting
is recorded in dev->cfg, don't depend on kernel_ringparam being
pre-populated with it.
Fixes: 928459bbda19 ("net: ethtool: populate the default HDS params in the core") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com Tested-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250123221410.1067678-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 23 Jan 2025 15:55:40 +0000 (07:55 -0800)]
MAINTAINERS: add Paul Fertser as a NC-SI reviewer
Paul has been providing very solid reviews for NC-SI changes
lately, so much so I started CCing him on all NC-SI patches.
Make the designation official.
====================
eth: fix calling napi_enable() in atomic context
Dan has reported that I missed a lot of drivers which call napi_enable()
in atomic with the naive coccinelle search for spin locks:
https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Fix them. Most of the fixes involve taking the netdev_lock()
before the spin lock. mt76 is special because we can just
move napi_enable() from the BH section.
local_bh_disable() is not a real lock, its most likely taken
because napi_schedule() requires that we invoke softirqs at
some point. napi_enable() needs to take a mutex, so move it
from under the BH protection.
Jakub Kicinski [Fri, 24 Jan 2025 03:18:36 +0000 (19:18 -0800)]
eth: forcedeth: remove local wrappers for napi enable/disable
The local helpers for calling napi_enable() and napi_disable()
don't serve much purpose and they will complicate the fix in
the subsequent patch. Remove them, call the core functions
directly.
Jakub Kicinski [Fri, 24 Jan 2025 03:18:35 +0000 (19:18 -0800)]
eth: tg3: fix calling napi_enable() in atomic context
tg3 has a spin lock protecting most of the config,
switch to taking netdev_lock() explicitly on enable/start
paths. Disable/stop paths seem to not be under the spin
lock (since napi_disable() already needs to sleep),
so leave that side as is.
tg3_restart_hw() releases and re-takes the spin lock,
we need to do the same because dev_close() needs to
take netdev_lock().
Jakub Kicinski [Fri, 24 Jan 2025 01:21:30 +0000 (17:21 -0800)]
tools: ynl: c: correct reverse decode of empty attrs
netlink reports which attribute was incorrect by sending back
an attribute offset. Offset points to the address of struct nlattr,
but to interpret the type we also need the nesting path.
Attribute IDs have different meaning in different nests
of the same message.
Correct the condition for "is the offset within current attribute".
ynl_attr_data_len() does not include the attribute header,
so the end offset was off by 4 bytes.
This means that we'd always skip over flags and empty nests.
The devmem tests, for example, issues an invalid request with
empty queue nests, resulting in the following error:
Thomas Weißschuh [Thu, 23 Jan 2025 07:22:40 +0000 (08:22 +0100)]
ptp: Ensure info->enable callback is always set
The ioctl and sysfs handlers unconditionally call the ->enable callback.
Not all drivers implement that callback, leading to NULL dereferences.
Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c.
Instead use a dummy callback if no better was specified by the driver.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev [Thu, 23 Jan 2025 00:04:07 +0000 (16:04 -0800)]
net/mlx5e: add missing cpu_to_node to kvzalloc_node in mlx5e_open_xdpredirect_sq
kvzalloc_node is not doing a runtime check on the node argument
(__alloc_pages_node_noprof does have a VM_BUG_ON, but it expands to
nothing on !CONFIG_DEBUG_VM builds), so doing any ethtool/netlink
operation that calls mlx5e_open on a CPU that's larger that MAX_NUMNODES
triggers OOB access and panic (see the trace below).
Add missing cpu_to_node call to convert cpu id to node id.
Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.
As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.
The way of selecting the first suitable MAC address from the list is
changed, instead of having the driver check it this patch just assumes
any valid MAC should be good.
Fixes: b8291cf3d118 ("net/ncsi: Add NC-SI 1.2 Get MC MAC Address command") Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Swiatkowski [Thu, 5 Sep 2024 09:14:10 +0000 (11:14 +0200)]
iavf: allow changing VLAN state without calling PF
First case:
> ip l a l $VF name vlanx type vlan id 100
> ip l d vlanx
> ip l a l $VF name vlanx type vlan id 100
As workqueue can be execute after sometime, there is a window to have
call trace like that:
- iavf_del_vlan
- iavf_add_vlan
- iavf_del_vlans (wq)
It means that our VLAN 100 will change the state from IAVF_VLAN_ACTIVE
to IAVF_VLAN_REMOVE (iavf_del_vlan). After that in iavf_add_vlan state
won't be changed because VLAN 100 is on the filter list. The final
result is that the VLAN 100 filter isn't added in hardware (no
iavf_add_vlans call).
To fix that change the state if the filter wasn't removed yet directly
to active. It is save as IAVF_VLAN_REMOVE means that virtchnl message
wasn't sent yet.
Second case:
> ip l a l $VF name vlanx type vlan id 100
Any type of VF reset ex. change trust
> ip l s $PF vf $VF_NUM trust on
> ip l d vlanx
> ip l a l $VF name vlanx type vlan id 100
In case of reset iavf driver is responsible for readding all filters
that are being used. To do that all VLAN filters state are changed to
IAVF_VLAN_ADD. Here is even longer window for changing VLAN state from
kernel side, as workqueue isn't called immediately. We can have call
trace like that:
- changing to IAVF_VLAN_ADD (after reset)
- iavf_del_vlan (called from kernel ops)
- iavf_del_vlans (wq)
Not exsisitng VLAN filters will be removed from hardware. It isn't a
bug, ice driver will handle it fine. However, we can have call trace
like that:
- changing to IAVF_VLAN_ADD (after reset)
- iavf_del_vlan (called from kernel ops)
- iavf_add_vlan (called from kernel ops)
- iavf_del_vlans (wq)
With fix for previous case we end up with no VLAN filters in hardware.
We have to remove VLAN filters if the state is IAVF_VLAN_ADD and delete
VLAN was called. It is save as IAVF_VLAN_ADD means that virtchnl message
wasn't sent yet.
Mateusz Polchlopek [Tue, 31 Dec 2024 09:50:44 +0000 (10:50 +0100)]
ice: remove invalid parameter of equalizer
It occurred that in the commit 70838938e89c ("ice: Implement driver
functionality to dump serdes equalizer values") the invalid DRATE parameter
for reading has been added. The output of the command:
$ ethtool -d <ethX>
returns the garbage value in the place where DRATE value should be
stored.
Remove mentioned parameter to prevent return of corrupted data to
userspace.
Fixes: 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Thu, 19 Dec 2024 11:55:16 +0000 (12:55 +0100)]
ice: fix ice_parser_rt::bst_key array size
Fix &ice_parser_rt::bst_key size. It was wrongly set to 10 instead of 20
in the initial impl commit (see Fixes tag). All usage code assumed it was
of size 20. That was also the initial size present up to v2 of the intro
series [2], but halved by v3 [3] refactor described as "Replace magic
hardcoded values with macros." The introducing series was so big that
some ugliness was unnoticed, same for bugs :/
ICE_BST_KEY_TCAM_SIZE and ICE_BST_TCAM_KEY_SIZE were differing by one.
There was tmp variable @j in the scope of edited function, but was not
used in all places. This ugliness is now gone.
I'm moving ice_parser_rt::pg_prio a few positions up, to fill up one of
the holes in order to compensate for the added 10 bytes to the ::bst_key,
resulting in the same size of the whole as prior to the fix, and minimal
changes in the offsets of the fields.
Extend also the debug dump print of the key to cover all bytes. To not
have string with 20 "%02x" and 20 params, switch to
ice_debug_array_w_prefix().
Manoj Vishwanathan [Mon, 16 Dec 2024 16:27:35 +0000 (16:27 +0000)]
idpf: add more info during virtchnl transaction timeout/salt mismatch
Add more information related to the transaction like cookie, vc_op,
salt when transaction times out and include similar information
when transaction salt does not match.
Info output for transaction timeout:
-------------------
(op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------
Marco Leogrande [Mon, 16 Dec 2024 16:27:34 +0000 (16:27 +0000)]
idpf: convert workqueues to unbound
When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.
This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.
Tested:
* I have also run a manual test to gauge the performance
improvement. The test consists of an antagonist process
(`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
process is run under `taskset 01` to bind it to CPU0, and its
priority is changed with `chrt -pQ 9900 10000 ${pid}` and
`renice -n -20 ${pid}` after start.
Then, the IDPF driver is forced to prefer CPU0 by editing all calls
to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
Finally, `ktraces` for the workqueue events are collected.
Without the current patch, the antagonist process can force
arbitrary delays between `workqueue_queue_work` and
`workqueue_execute_start`, that in my tests were as high as
`30ms`. With the current patch applied, the workqueue can be
migrated to another unloaded CPU in the same node, and, keeping
everything else equal, the maximum delay I could see was `6us`.
Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Signed-off-by: Marco Leogrande <leogrande@google.com> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Emil Tantilov [Fri, 20 Dec 2024 02:09:32 +0000 (18:09 -0800)]
idpf: fix transaction timeouts on reset
Restore the call to idpf_vc_xn_shutdown() at the beginning of
idpf_vc_core_deinit() provided the function is not called on remove.
In the reset path the mailbox is destroyed, leading to all transactions
timing out.
Fixes: 09d0fb5cb30e ("idpf: deinit virtchnl transaction manager after vport and vectors") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Emil Tantilov [Fri, 22 Nov 2024 04:40:59 +0000 (20:40 -0800)]
idpf: add read memory barrier when checking descriptor done bit
Add read memory barrier to ensure the order of operations when accessing
control queue descriptors. Specifically, we want to avoid cases where loads
can be reordered:
1. Load #1 is dispatched to read descriptor flags.
2. Load #2 is dispatched to read some other field from the descriptor.
3. Load #2 completes, accessing memory/cache at a point in time when the DD
flag is zero.
4. NIC DMA overwrites the descriptor, now the DD flag is one.
5. Any fields loaded before step 4 are now inconsistent with the actual
descriptor state.
Add read memory barrier between steps 1 and 2, so that load #2 is not
executed until load #1 has completed.
Sebastian Sewior [Thu, 23 Jan 2025 16:20:45 +0000 (17:20 +0100)]
xfrm: Don't disable preemption while looking up cache state.
For the state cache lookup xfrm_input_state_lookup() first disables
preemption, to remain on the CPU and then retrieves a per-CPU pointer.
Within the preempt-disable section it also acquires
netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
acquired with explicit disabled preemption (such as by get_cpu())
because this lock becomes a sleeping lock on PREEMPT_RT.
To remain on the same CPU is just an optimisation for the CPU local
lookup. The actual modification of the per-CPU variable happens with
netns_xfrm::xfrm_state_lock acquired.
Remove get_cpu() and use the state_cache_input on the current CPU.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Eric Dumazet [Tue, 21 Jan 2025 18:12:41 +0000 (18:12 +0000)]
ipmr: do not call mr_mfc_uses_dev() for unres entries
syzbot found that calling mr_mfc_uses_dev() for unres entries
would crash [1], because c->mfc_un.res.minvif / c->mfc_un.res.maxvif
alias to "struct sk_buff_head unresolved", which contain two pointers.
This code never worked, lets remove it.
[1]
Unable to handle kernel paging request at virtual address ffff5fff2d536613
KASAN: maybe wild-memory-access in range [0xfffefff96a9b3098-0xfffefff96a9b309f]
Modules linked in:
CPU: 1 UID: 0 PID: 7321 Comm: syz.0.16 Not tainted 6.13.0-rc7-syzkaller-g1950a0af2d55 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline]
pc : mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334
lr : mr_mfc_uses_dev net/ipv4/ipmr_base.c:289 [inline]
lr : mr_table_dump+0x694/0x8b0 net/ipv4/ipmr_base.c:334
Call trace:
mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline] (P)
mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334 (P)
mr_rtm_dumproute+0x254/0x454 net/ipv4/ipmr_base.c:382
ipmr_rtm_dumproute+0x248/0x4b4 net/ipv4/ipmr.c:2648
rtnl_dump_all+0x2e4/0x4e8 net/core/rtnetlink.c:4327
rtnl_dumpit+0x98/0x1d0 net/core/rtnetlink.c:6791
netlink_dump+0x4f0/0xbc0 net/netlink/af_netlink.c:2317
netlink_recvmsg+0x56c/0xe64 net/netlink/af_netlink.c:1973
sock_recvmsg_nosec net/socket.c:1033 [inline]
sock_recvmsg net/socket.c:1055 [inline]
sock_read_iter+0x2d8/0x40c net/socket.c:1125
new_sync_read fs/read_write.c:484 [inline]
vfs_read+0x740/0x970 fs/read_write.c:565
ksys_read+0x15c/0x26c fs/read_write.c:708
Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Reported-by: syzbot+5cfae50c0e5f2c500013@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/678fe2d1.050a0220.15cac.00b3.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250121181241.841212-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement cleanup of descriptors in the TSO error path of
fec_enet_txq_submit_tso(). The cleanup
- Unmaps DMA buffers for data descriptors skipping TSO header
- Clears all buffer descriptors
- Handles extended descriptors by clearing cbd_esc when enabled
Dimitri Fedrau [Sat, 18 Jan 2025 18:43:43 +0000 (19:43 +0100)]
net: phy: marvell-88q2xxx: Fix temperature measurement with reset-gpios
When using temperature measurement on Marvell 88Q2XXX devices and the
reset-gpios property is set in DT, the device does a hardware reset when
interface is brought down and up again. That means that the content of
the register MDIO_MMD_PCS_MV_TEMP_SENSOR2 is reset to default and that
leads to permanent deactivation of the temperature measurement, because
activation is done in mv88q2xxx_probe. To fix this move activation of
temperature measurement to mv88q222x_config_init.
Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor") Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250118-marvell-88q2xxx-fix-hwmon-v2-1-402e62ba2dcb@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jian Shen [Sat, 18 Jan 2025 09:47:41 +0000 (17:47 +0800)]
net: hns3: fix oops when unload drivers paralleling
When unload hclge driver, it tries to disable sriov first for each
ae_dev node from hnae3_ae_dev_list. If user unloads hns3 driver at
the time, because it removes all the ae_dev nodes, and it may cause
oops.
But we can't simply use hnae3_common_lock for this. Because in the
process flow of pci_disable_sriov(), it will trigger the remove flow
of VF, which will also take hnae3_common_lock.
To fixes it, introduce a new mutex to protect the unload process.
Yijie Yang [Mon, 20 Jan 2025 07:08:28 +0000 (15:08 +0800)]
dt-bindings: net: qcom,ethqos: Correct fallback compatible for qcom,qcs615-ethqos
The qcs615-ride utilizes the same EMAC as the qcs404, rather than the
sm8150. The current incorrect fallback could result in packet loss.
The Ethernet on qcs615-ride is currently not utilized by anyone. Therefore,
there is no need to worry about any ABI impact.
Fixes: 32535b9410b8 ("dt-bindings: net: qcom,ethqos: add description for qcs615") Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250120-schema_qcs615-v4-1-d9d122f89e64@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Fertser [Thu, 16 Jan 2025 15:29:00 +0000 (18:29 +0300)]
net/ncsi: wait for the last response to Deselect Package before configuring channel
The NCSI state machine as it's currently implemented assumes that
transition to the next logical state is performed either explicitly by
calling `schedule_work(&ndp->work)` to re-queue itself or implicitly
after processing the predefined (ndp->pending_req_num) number of
replies. Thus to avoid the configuration FSM from advancing prematurely
and getting out of sync with the process it's essential to not skip
waiting for a reply.
This patch makes the code wait for reception of the Deselect Package
response for the last package probed before proceeding to channel
configuration.
Thanks go to Potin Lai and Cosmo Chou for the initial investigation and
testing.
Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser <fercerpav@gmail.com> Link: https://patch.msgid.link/20250116152900.8656-1-fercerpav@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Marangi [Mon, 20 Jan 2025 15:41:40 +0000 (16:41 +0100)]
net: airoha: Fix wrong GDM4 register definition
Fix wrong GDM4 register definition, in Airoha SDK GDM4 is defined at
offset 0x2400 but this doesn't make sense as it does conflict with the
CDM4 that is in the same location.
Following the pattern where each GDM base is at the FWD_CFG, currently
GDM4 base offset is set to 0x2500. This is correct but REG_GDM4_FWD_CFG
and REG_GDM4_SRC_PORT_SET are still using the SDK reference with the
0x2400 offset. Fix these 2 define by subtracting 0x100 to each register
to reflect the real address location.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250120154148.13424-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Fri, 17 Jan 2025 09:38:41 +0000 (12:38 +0300)]
NFC: nci: Add bounds checking in nci_hci_create_pipe()
The "pipe" variable is a u8 which comes from the network. If it's more
than 127, then it results in memory corruption in the caller,
nci_hci_connect_gate().
Cc: stable@vger.kernel.org Fixes: a1b0b9415817 ("NFC: nci: Create pipe on specific gate in nci_hci_connect_gate") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/bcf5453b-7204-4297-9c20-4d8c7dacf586@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jamal Hadi Salim [Sat, 11 Jan 2025 14:57:39 +0000 (09:57 -0500)]
net: sched: fix ets qdisc OOB Indexing
Haowei Yan <g1042620637@gmail.com> found that ets_class_from_arg() can
index an Out-Of-Bound class in ets_class_from_arg() when passed clid of
0. The overflow may cause local privilege escalation.
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Reported-by: Haowei Yan <g1042620637@gmail.com> Suggested-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250111145740.74755-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 22 Jan 2025 16:28:57 +0000 (08:28 -0800)]
Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
When the 'cachestat()' system call was added in commit cf264e1329fb
("cachestat: implement cachestat syscall"), it was meant to be a much
more convenient (and performant) version of mincore() that didn't need
mapping things into the user virtual address space in order to work.
But it ended up missing the "check for writability or ownership" fix for
mincore(), done in commit 134fca9063ad ("mm/mincore.c: make mincore()
more conservative").
This just adds equivalent logic to 'cachestat()', modified for the file
context (rather than vma).
Linus Torvalds [Wed, 22 Jan 2025 04:12:24 +0000 (20:12 -0800)]
Merge tag 'audit-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that fixes a problem when collecting pathnames
for audit PATH records that was caused by some faulty pathname
matching logic"
* tag 'audit-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix suffixed '/' filename matching
Linus Torvalds [Wed, 22 Jan 2025 04:09:14 +0000 (20:09 -0800)]
Merge tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Extended permissions supported in conditional policy
The SELinux extended permissions, aka "xperms", allow security admins
to target individuals ioctls, and recently netlink messages, with
their SELinux policy. Adding support for conditional policies allows
admins to toggle the granular xperms using SELinux booleans, helping
pave the way for greater use of xperms in general purpose SELinux
policies. This change bumps the maximum SELinux policy version to 34.
- Fix a SCTP/SELinux error return code inconsistency
Depending on the loaded SELinux policy, specifically it's
EXTSOCKCLASS support, the bind(2) LSM/SELinux hook could return
different error codes due to the SELinux code checking the socket's
SELinux object class (which can vary depending on EXTSOCKCLASS) and
not the socket's sk_protocol field. We fix this by doing the obvious,
and looking at the sock->sk_protocol field instead of the object
class.
- Makefile fixes to properly cleanup av_permissions.h
Add av_permissions.h to "targets" so that it is properly cleaned up
using the kbuild infrastructure.
- A number of smaller improvements by Christian Göttsche
A variety of straightforward changes to reduce code duplication,
reduce pointer lookups, migrate void pointers to defined types,
simplify code, constify function parameters, and correct iterator
types.
* tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: make more use of str_read() when loading the policy
selinux: avoid unnecessary indirection in struct level_datum
selinux: use known type instead of void pointer
selinux: rename comparison functions for clarity
selinux: rework match_ipv6_addrmask()
selinux: constify and reconcile function parameter names
selinux: avoid using types indicating user space interaction
selinux: supply missing field initializers
selinux: add netlink nlmsg_type audit message
selinux: add support for xperms in conditional policies
selinux: Fix SCTP error inconsistency in selinux_socket_bind()
selinux: use native iterator types
selinux: add generated av_permissions.h to targets
Linus Torvalds [Wed, 22 Jan 2025 04:03:04 +0000 (20:03 -0800)]
Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improved handling of LSM "secctx" strings through lsm_context struct
The LSM secctx string interface is from an older time when only one
LSM was supported, migrate over to the lsm_context struct to better
support the different LSMs we now have and make it easier to support
new LSMs in the future.
These changes explain the Rust, VFS, and networking changes in the
diffstat.
- Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
enabled
Small tweak to be a bit smarter about when we build the LSM's common
audit helpers.
- Check for absurdly large policies from userspace in SafeSetID
SafeSetID policies rules are fairly small, basically just "UID:UID",
it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
helps quiet a number of syzbot related issues. While work is being
done to address the syzbot issues through other mechanisms, this is a
trivial and relatively safe fix that we can do now.
- Various minor improvements and cleanups
A collection of improvements to the kernel selftests, constification
of some function parameters, removing redundant assignments, and
local variable renames to improve readability.
* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: initialize local array before use to quiet static analysis
safesetid: check size of policy writes
net: corrections for security_secid_to_secctx returns
lsm: rename variable to avoid shadowing
lsm: constify function parameters
security: remove redundant assignment to return variable
lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
binder: initialize lsm_context structure
rust: replace lsm context+len with lsm_context
lsm: secctx provider check on release
lsm: lsm_context in security_dentry_init_security
lsm: use lsm_context in security_inode_getsecctx
lsm: replace context+len with lsm_context
lsm: ensure the correct LSM context releaser
Linus Torvalds [Wed, 22 Jan 2025 03:54:32 +0000 (19:54 -0800)]
Merge tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"There's just a couple of changes: two kernel messages addressed, a
measurement policy collision addressed, and one policy cleanup.
Please note that the contents of the IMA measurement list is
potentially affected. The builtin tmpfs IMA policy rule change might
introduce additional measurements, while detecting a reboot might
eliminate some measurements"
* tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ignore suffixed policy rule comments
ima: limit the builtin 'tcb' dont_measure tmpfs policy rule
ima: kexec: silence RCU list traversal warning
ima: Suspend PCR extends and log appends when rebooting
Linus Torvalds [Wed, 22 Jan 2025 03:48:29 +0000 (19:48 -0800)]
Merge tag 'chrome-platform-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"New:
- Support new EC if the memory region information comes from the CRS
ACPI resource descriptor in cros_ec_lpc
Improvements:
- Make sure EC is in RW before probing
- Only check events on MKBP notifies to reduce the number of query
commands in cros_ec_lpc
Cleanups:
- Remove unused code and DT bindings for cros-kbd-led-backlight
- Constify 'struct bin_attribute' in cros_ec_vbc
- Use str_enabled_disabled() in cros_usbpd_logger"
* tag 'chrome-platform-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_lpc: Handle EC without CRS section
platform/chrome: cros_usbpd_logger: Use str_enabled_disabled() helper
platform/chrome: cros_ec_lpc: Support direct EC register memory access
platform/chrome: cros_ec_lpc: Merge lpc_driver_ops into ec private structure
platform/chrome: Update ChromeOS EC command tracing
platform/chrome: cros_ec_lpc: Only check for events on MKBP notifies
platform/chrome: cros_ec_vbc: Constify 'struct bin_attribute'
dt-bindings: cros-ec: Remove google,cros-kbd-led-backlight
platform/chrome: cros_kbd_led_backlight: Remove OF match
platform/chrome: cros_ec_proto: remove unnecessary retries
platform/chrome: cros_ec: jump to RW before probing
platform/chrome: cros_kbd_led_backlight: remove unneeded if-statement
Linus Torvalds [Wed, 22 Jan 2025 02:00:00 +0000 (18:00 -0800)]
Merge tag 'docs-6.14' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
- Quite a bit of Chinese and Spanish translation work
- Clarifying that Git commit IDs >12chars are OK
- A new nvme-multipath document
- A reorganization of the admin-guide top-level page to make it
readable
- Clarification of the role of Acked-by and maintainer discretion on
their acceptance
- Some reorganization of debugging-oriented docs
... and typo fixes, documentation updates, etc as usual
* tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits)
Documentation: Fix x86_64 UEFI outdated references to elilo
Documentation/sysctl: Add timer_migration to kernel.rst
docs/mm: Physical memory: Remove zone_t
docs: submitting-patches: clarify that signers may use their discretion on tags
docs: submitting-patches: clarify difference between Acked-by and Reviewed-by
docs: submitting-patches: clarify Acked-by and introduce "# Suffix"
Documentation: bug-hunting.rst: remove odd contact information
docs/zh_CN: Add sak index Chinese translation
doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes
doc: module: Fix documented type of namespace
Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
docs/zh_CN: Add landlock index Chinese translation
Documentation: Fix typo localmodonfig -> localmodconfig
overlayfs.rst: Fix and improve grammar
docs/zh_CN: Add siphash index Chinese translation
docs/zh_CN: Add security IMA-templates Chinese translation
docs/zh_CN: Add security digsig Chinese translation
Align git commit ID abbreviation guidelines and checks
docs: process: submitting-patches: split canonical patch format section
docs/zh_CN: Add security lsm Chinese translation
...
Linus Torvalds [Wed, 22 Jan 2025 01:48:03 +0000 (17:48 -0800)]
Merge tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Finish the move to custom FFI integer types started in the previous
cycle and finally map 'long' to 'isize' and 'char' to 'u8'. Do a
few cleanups on top thanks to that.
- Start to use 'derive(CoercePointee)' on Rust >= 1.84.0.
This is a major milestone on the path to build the kernel using
only stable Rust features. In particular, previously we were using
the unstable features 'coerce_unsized', 'dispatch_from_dyn' and
'unsize', and now we will use the new 'derive_coerce_pointee' one,
which is on track to stabilization. This new feature is a macro
that essentially expands into code that internally uses the
unstable features that we were using before, without having to
expose those.
With it, stable Rust users, including the kernel, will be able to
build custom smart pointers that work with trait objects, e.g.:
fn f(p: &Arc<dyn Display>) {
pr_info!("{p}\n");
}
let a: Arc<dyn Display> = Arc::new(42i32, GFP_KERNEL)?;
let b: Arc<dyn Display> = Arc::new("hello there", GFP_KERNEL)?;
f(&a); // Prints "42".
f(&b); // Prints "hello there".
Together with the 'arbitrary_self_types' feature that we started
using in the previous cycle, using our custom smart pointers like
'Arc' will eventually only rely in stable Rust.
- Introduce 'PROCMACROLDFLAGS' environment variable to allow to link
Rust proc macros using different flags than those used for linking
Rust host programs (e.g. when 'rustc' uses a different C library
than the host programs' one), which Android needs.
- Help kernel builds under macOS with Rust enabled by accomodating
other naming conventions for dynamic libraries (i.e. '.so' vs.
'.dylib') which are used for Rust procedural macros. The actual
support for macOS (i.e. the rest of the pieces needed) is provided
out-of-tree by others, following the policy used for other parts of
the kernel by Kbuild.
- Run Clippy for 'rusttest' code too and clean the bits it spotted.
- Provide Clippy with the minimum supported Rust version to improve
the suggestions it gives.
- Document 'bindgen' 0.71.0 regression.
'kernel' crate:
- 'build_error!': move users of the hidden function to the documented
macro, prevent such uses in the future by moving the function
elsewhere and add the macro to the prelude.
- 'types' module: add improved version of 'ForeignOwnable::borrow_mut'
(which was removed in the past since it was problematic); change
'ForeignOwnable' pointer type to '*mut'.
- 'alloc' module: implement 'Display' for 'Box' and align the 'Debug'
implementation to it; add example (doctest) for 'ArrayLayout::new()'
- 'sync' module: document 'PhantomData' in 'Arc'; use
'NonNull::new_unchecked' in 'ForeignOwnable for Arc' impl.
- 'uaccess' module: accept 'Vec's with different allocators in
'UserSliceReader::read_all'.
- 'workqueue' module: enable run-testing a couple more doctests.
- 'error' module: simplify 'from_errno()'.
- 'block' module: fix formatting in code documentation (a lint to catch
these is being implemented).
- Avoid 'unwrap()'s in doctests, which also improves the examples by
showing how kernel code is supposed to be written.
- Avoid 'as' casts with 'cast{,_mut}' calls which are a bit safer.
And a few other cleanups"
* tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (32 commits)
kbuild: rust: add PROCMACROLDFLAGS
rust: uaccess: generalize userSliceReader to support any Vec
rust: kernel: add improved version of `ForeignOwnable::borrow_mut`
rust: kernel: reorder `ForeignOwnable` items
rust: kernel: change `ForeignOwnable` pointer to mut
rust: arc: split unsafe block, add missing comment
rust: types: avoid `as` casts
rust: arc: use `NonNull::new_unchecked`
rust: use derive(CoercePointee) on rustc >= 1.84.0
rust: alloc: add doctest for `ArrayLayout::new()`
rust: init: update `stack_try_pin_init` examples
rust: error: import `kernel`'s `LayoutError` instead of `core`'s
rust: str: replace unwraps with question mark operators
rust: page: remove unnecessary helper function from doctest
rust: rbtree: remove unwrap in asserts
rust: init: replace unwraps with question mark operators
rust: use host dylib naming convention to support macOS
rust: add `build_error!` to the prelude
rust: kernel: move `build_error` hidden function to prevent mistakes
rust: use the `build_error!` macro, not the hidden function
...
Linus Torvalds [Wed, 22 Jan 2025 01:10:05 +0000 (17:10 -0800)]
Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
"Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never
execute relevant code on any other CPU. This is currently handled
by smpboot code which takes care of CPU-hotplug operations.
Affinity here is a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and
can't run anywhere else. The affinity is set through
kthread_bind_mask() and the subsystem takes care by itself to
handle CPU-hotplug operations. Affinity here is assumed to be a
correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
This is not a correctness constraint but merely a preference in
terms of memory locality. kswapd and kcompactd both fall into this
category. The affinity is set manually like for any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
node comes up. Also care should be taken so that the node affinity
doesn't cross isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its
own ad-hoc way.
This is an infrastructure proposal to handle this with the following
API changes:
- kthread_create_on_node() automatically affines the created kthread
to its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called
right after kthread_create_on_node() to specify a preferred
affinity different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine to
the housekeeping CPUs (which means to all online CPUs most of the time
or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
- Consolidate a bunch of ad-hoc implementations of
kthread_run_on_cpu()
- Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
- Add some correctness check to ensure kthread_bind() is always
called before the first kthread wake up.
- Default affine kthread to its preferred node.
- Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
- Implement kthreads preferred affinity
- Unify kthread worker and kthread API's style
- Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation"
* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
kthread: modify kernel-doc function name to match code
rcu: Use kthread preferred affinity for RCU exp kworkers
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
rcu: Use kthread preferred affinity for RCU boost
kthread: Implement preferred affinity
mm: Create/affine kswapd to its preferred node
mm: Create/affine kcompactd to its preferred node
kthread: Default affine kthread to its preferred NUMA node
kthread: Make sure kthread hasn't started while binding it
sched,arm64: Handle CPU isolation on last resort fallback rq selection
arm64: Exclude nohz_full CPUs from 32bits el0 support
lib: test_objpool: Use kthread_run_on_cpu()
kallsyms: Use kthread_run_on_cpu()
soc/qman: test: Use kthread_run_on_cpu()
arm/bL_switcher: Use kthread_run_on_cpu()
Linus Torvalds [Wed, 22 Jan 2025 00:09:47 +0000 (16:09 -0800)]
Merge tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
Linus Torvalds [Tue, 21 Jan 2025 23:19:20 +0000 (15:19 -0800)]
Merge tag 'trace-sorttable-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull scipts/sorttable updates from Steven Rostedt:
"The sorttable.c was a copy from recordmcount.c which is very hard to
maintain. That's because it uses macro helpers and places the code in
a header file sorttable.h to handle both the 64 bit and 32 bit version
of the Elf structures. It also uses _r()/r()/r2() wrappers around
accessing the data which will read the 64 bit or 32 bit version of the
data as well as handle endianess. If the wrong wrapper is used, an
invalid value will result, and this has been a cause for bugs in the
past. In fact the new ORC code doesn't even use it. That's fine
because ORC is only for 64 bit x86 which is the default parsing.
Instead of having a bunch of macros defined and then include the code
twice from a header, the Elf structures are each wrapped in a union.
The union holds the 64 bit and 32 bit version of the needed structure.
Then a structure of function pointers is used, along with helper
macros to access the ELF types appropriately for their byte size and
endianess. How to reference the data fields is moved from the code
that implements the sorting to the helper functions where all accesses
to a field will use he same helper function. As long as the helper
functions access the fields correctly, the code will also access the
fields. This is an improvement over having to code implementing the
sorting having to make sure it always uses the right accessor function
when reading an ELF field.
This is a clean up only, the functionality of the scripts/sorttable.c
does not change"
* tag 'trace-sorttable-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
scripts/sorttable: Use a structure of function pointers for elf helpers
scripts/sorttable: Get start/stop_mcount_loc from ELF file directly
scripts/sorttable: Move code from sorttable.h into sorttable.c
scripts/sorttable: Use uint64_t for mcount sorting
scripts/sorttable: Add helper functions for Elf_Sym
scripts/sorttable: Add helper functions for Elf_Shdr
scripts/sorttable: Add helper functions for Elf_Ehdr
scripts/sorttable: Convert Elf_Sym MACRO over to a union
scripts/sorttable: Replace Elf_Shdr Macro with a union
scripts/sorttable: Convert Elf_Ehdr to union
scripts/sorttable: Make compare_extable() into two functions
scripts/sorttable: Have the ORC code use the _r() functions to read
scripts/sorttable: Remove unneeded Elf_Rel
scripts/sorttable: Remove unused write functions
scripts/sorttable: Remove unused macro defines
Linus Torvalds [Tue, 21 Jan 2025 23:15:28 +0000 (15:15 -0800)]
Merge tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
Linus Torvalds [Tue, 21 Jan 2025 23:11:54 +0000 (15:11 -0800)]
Merge tag 'trace-ringbuffer-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace ring-buffer updates from Steven Rostedt:
- Clean up the __rb_map_vma() logic
The logic of __rb_map_vma() has a error check with WARN_ON() that
makes sure that the index does not go past the end of the array of
buffers. The test in the loop pretty much guarantees that it will
never happen, but since the relation of the variables used is a
little complex, the WARN_ON() check was added. It was noticed that
the array was dereferenced before this check and if the logic does
break and for some reason the logic goes past the array, there will
be an out of bounds access here. Move the access to after the
WARN_ON().
- Consolidate how the ring buffer is determined to be empty
Currently there's two ways that are used to determine if the ring
buffer is empty. One relies on the status of the commit and reader
pages and what was read, and the other is on what was written vs what
was read. By using the number of entries (written) method, it can be
used for reading events that are out of the kernel's control (what
pKVM will use). Move to this method to make it easier to implement a
pKVM ring buffer that the kernel can read.
* tag 'trace-ringbuffer-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Make reading page consistent with the code logic
ring-buffer: Check for empty ring-buffer with rb_num_of_entries()
Linus Torvalds [Tue, 21 Jan 2025 22:39:21 +0000 (14:39 -0800)]
Merge tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Uladzislau Rezki:
"Misc fixes:
- check if IRQs are disabled in rcu_exp_need_qs()
- instrument KCSAN exclusive-writer assertions
- add extra WARN_ON_ONCE() check
- set the cpu_no_qs.b.exp under lock
- warn if callback enqueued on offline CPU
Torture-test updates:
- add rcutorture.preempt_duration kernel module parameter
- make the TREE03 scenario do preemption
- improve pooling timeouts for rcu_torture_writer()
- improve output of "Failure/close-call rcutorture reader segments"
- add some reader-state debugging checks
- update doc of polled APIs
- add extra diagnostics for per-reader-segment preemption
- add an extra test for sched_clock()
- improve testing on unresponsive systems
SRCU updates:
- improve doc for srcu_read_lock() in terms of return value
- fix typo in comments
- remove redundant GP sequence checks in the srcu_funnel_gp_start"
* tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start
srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/
srcu: Guarantee non-negative return value from srcu_read_lock()
MAINTAINERS: Update RCU git tree
rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
rcu: Make preemptible rcu_exp_handler() check idempotency
rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
rcu: Report callbacks enqueued on offline CPU blind spot
rcutorture: Use symbols for SRCU reader flavors
rcutorture: Add per-reader-segment preemption diagnostics
rcutorture: Read CPU ID for decoration protected by both reader types
rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics
rcutorture: Add parameters to control polled/conditional wait interval
rcutorture: Add documentation for recent conditional and polled APIs
rcutorture: Ignore attempts to test preemption and forward progress
rcutorture: Make rcutorture_one_extend() check reader state
rcutorture: Pretty-print rcutorture reader segments
...
Linus Torvalds [Tue, 21 Jan 2025 21:57:20 +0000 (13:57 -0800)]
Merge tag 'slab-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Move the kfree_rcu() implementation from RCU to SLAB subsystem
(Uladzislau Rezki)
The kfree_rcu() implementation has been historically maintained in
the RCU subsystem. At LSF/MM we agreed to move it to SLAB, where it
more logically belongs. The batching is planned be more integrated
with SLUB internals in the future, while using the RCU APIs like any
other subsystem.
- Fix for kernel-doc warning (Randy Dunlap)
* tag 'slab-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: fix kernel-doc func param names
mm/slab: Move kvfree_rcu() into SLAB
rcu/kvfree: Adjust a shrinker name
rcu/kvfree: Adjust names passed into trace functions
rcu/kvfree: Move some functions under CONFIG_TINY_RCU
rcu/kvfree: Initialize kvfree_rcu() separately
Linus Torvalds [Tue, 21 Jan 2025 21:51:07 +0000 (13:51 -0800)]
Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
- Consolidate the machine_kexec_mask_interrupts() by providing a
generic implementation and replacing the copy & pasta orgy in the
relevant architectures.
- Prevent unconditional operations on interrupt chips during kexec
shutdown, which can trigger warnings in certain cases when the
underlying interrupt has been shut down before.
- Make the enforcement of interrupt handling in interrupt context
unconditionally available, so that it actually works for non x86
related interrupt chips. The earlier enablement for ARM GIC chips set
the required chip flag, but did not notice that the check was hidden
behind a config switch which is not selected by ARM[64].
- Decrapify the handling of deferred interrupt affinity setting.
Some interrupt chips require that affinity changes are made from the
context of handling an interrupt to avoid certain race conditions.
For x86 this was the default, but with interrupt remapping this
requirement was lifted and a flag was introduced which tells the core
code that affinity changes can be done in any context. Unrestricted
affinity changes are the default for the majority of interrupt chips.
RISCV has the requirement to add the deferred mode to one of it's
interrupt controllers, but with the original implementation this
would require to add the any context flag to all other RISC-V
interrupt chips. That's backwards, so reverse the logic and require
that chips, which need the deferred mode have to be marked
accordingly. That avoids chasing the 'sane' chips and marking them.
- Add multi-node support to the Loongarch AVEC interrupt controller
driver.
- The usual tiny cleanups, fixes and improvements all over the place.
* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
genirq/timings: Add kernel-doc for a function parameter
genirq: Remove IRQ_MOVE_PCNTXT and related code
x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
genirq: Provide IRQCHIP_MOVE_DEFERRED
hexagon: Remove GENERIC_PENDING_IRQ leftover
ARC: Remove GENERIC_PENDING_IRQ
genirq: Remove handle_enforce_irqctx() wrapper
genirq: Make handle_enforce_irqctx() unconditionally available
irqchip/loongarch-avec: Add multi-nodes topology support
irqchip/ts4800: Replace seq_printf() by seq_puts()
irqchip/ti-sci-inta : Add module build support
irqchip/ti-sci-intr: Add module build support
irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
kexec: Consolidate machine_kexec_mask_interrupts() implementation
genirq: Reuse irq_thread_fn() for forced thread case
genirq: Move irq_thread_fn() further up in the code
Linus Torvalds [Tue, 21 Jan 2025 21:16:00 +0000 (13:16 -0800)]
Merge tag 'timers-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner:
- Just boring cleanups, typo and comment fixes and trivial optimizations
* tag 'timers-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Simplify top level detection on group setup
timers: Optimize get_timer_[this_]cpu_base()
timekeeping: Remove unused ktime_get_fast_timestamps()
timer/migration: Fix kernel-doc warnings for union tmigr_state
tick/broadcast: Add kernel-doc for function parameters
hrtimers: Update the return type of enqueue_hrtimer()
clocksource/wdtest: Print time values for short udelay(1)
posix-timers: Fix typo in __lock_timer()
vdso: Correct typo in PAGE_SHIFT comment
Linus Torvalds [Tue, 21 Jan 2025 21:11:26 +0000 (13:11 -0800)]
Merge tag 'livepatching-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Add a sysfs attribute showing the livepatch ordering
- Some code clean up
* tag 'livepatching-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: add test cases of stack_order sysfs interface
livepatch: Add stack_order sysfs attribute
selftests/livepatch: Replace hardcoded module name with variable in test-callbacks.sh
Linus Torvalds [Tue, 21 Jan 2025 21:09:29 +0000 (13:09 -0800)]
Merge tag 'printk-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Prevent possible deadlocks, caused by the lock serializing per-CPU
backtraces, by entering the deferred printk context
- Enforce the right casting in LOG_BUF_LEN_MAX definition
* tag 'printk-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Defer legacy printing when holding printk_cpu_sync
printk: Remove redundant deferred check in vprintk()
printk: Fix signed integer overflow when defining LOG_BUF_LEN_MAX
Linus Torvalds [Tue, 21 Jan 2025 19:32:36 +0000 (11:32 -0800)]
Merge tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Fair scheduler (SCHED_FAIR) enhancements:
- Behavioral improvements:
- Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)
- Delayed-dequeue enhancements & fixes: (Vincent Guittot)
- Rename h_nr_running into h_nr_queued
- Add new cfs_rq.h_nr_runnable
- Use the new cfs_rq.h_nr_runnable
- Removed unsued cfs_rq.h_nr_delayed
- Rename cfs_rq.idle_h_nr_running into h_nr_idle
- Remove unused cfs_rq.idle_nr_running
- Rename cfs_rq.nr_running into nr_queued
- Do not try to migrate delayed dequeue task
- Fix variable declaration position
- Encapsulate set custom slice in a __setparam_fair() function
- Fixes:
- Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
- Fix CPU bandwidth limit bypass during CPU hotplug (Vishal
Chourasia)
- Cleanups:
- Clean up in migrate_degrades_locality() to improve readability
(Peter Zijlstra)
- Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
- Update comments after sched_tick() rename (Sebastian Andrzej
Siewior)
- Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
(Valentin Schneider)
- Improve performance by prioritizing migrating eligible tasks in
sched_balance_rq() (Hao Jia)
- Do not compute NUMA Balancing stats unnecessarily during
load-balancing (K Prateek Nayak)
- Do not compute overloaded status unnecessarily during
load-balancing (K Prateek Nayak)
Generic scheduling code enhancements:
- Use READ_ONCE() in task_on_rq_queued(), to consistently use the
WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)
Isolated CPUs support enhancements: (Waiman Long)
- Make "isolcpus=nohz" equivalent to "nohz_full"
- Consolidate housekeeping cpumasks that are always identical
- Remove HK_TYPE_SCHED
- Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE
RSEQ enhancements:
- Validate read-only fields under DEBUG_RSEQ config (Mathieu
Desnoyers)
PSI enhancements:
- Fix race when task wakes up before psi_sched_switch() adjusts flags
(Chengming Zhou)
IRQ time accounting performance enhancements: (Yafang Shao)
- Define sched_clock_irqtime as static key
- Don't account irq time if sched_clock_irqtime is disabled
Virtual machine scheduling enhancements:
- Don't try to catch up excess steal time (Suleiman Souhlal)
Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)
- Convert "sysctl_sched_itmt_enabled" to boolean
- Use guard() for itmt_update_mutex
- Move the "sched_itmt_enabled" sysctl to debugfs
- Remove x86_smt_flags and use cpu_smt_flags directly
- Use x86_sched_itmt_flags for PKG domain unconditionally
Debugging code & instrumentation enhancements:
- Change need_resched warnings to pr_err() (David Rientjes)
- Print domain name in /proc/schedstat (K Prateek Nayak)
- Fix value reported by hot tasks pulled in /proc/schedstat (Peter
Zijlstra)
- Report the different kinds of imbalances in /proc/schedstat
(Swapnil Sapkal)
- Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
- Update Schedstat version to 17 (Swapnil Sapkal)"
* tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rseq: Fix rseq unregistration regression
psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
sched, psi: Don't account irq time if sched_clock_irqtime is disabled
sched: Don't account irq time if sched_clock_irqtime is disabled
sched: Define sched_clock_irqtime as static key
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
sched/core: Prioritize migrating eligible tasks in sched_balance_rq()
sched/debug: Change need_resched warnings to pr_err
sched/fair: Encapsulate set custom slice in a __setparam_fair() function
sched: Fix race between yield_to() and try_to_wake_up()
docs: Update Schedstat version to 17
sched/stats: Print domain name in /proc/schedstat
sched: Move sched domain name out of CONFIG_SCHED_DEBUG
sched: Report the different kinds of imbalances in /proc/schedstat
...
Linus Torvalds [Tue, 21 Jan 2025 19:15:29 +0000 (11:15 -0800)]
Merge tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Miscellaneous x86 cleanups and typo fixes, and also the removal of
the 'disablelapic' boot parameter"
* tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Remove a stray tab in the IO-APIC type string
x86/cpufeatures: Remove "AMD" from the comments to the AMD-specific leaf
Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text
x86/cpu: Fix typo in x86_match_cpu()'s doc
x86/apic: Remove "disablelapic" cmdline option
Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
x86/ioremap: Remove unused size parameter in remapping functions
x86/ioremap: Simplify setup_data mapping variants
x86/boot/compressed: Remove unused header includes from kaslr.c
Linus Torvalds [Tue, 21 Jan 2025 18:52:03 +0000 (10:52 -0800)]
Merge tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Seqlock optimizations that arose in a perf context and were merged
into the perf tree:
- seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan)
- mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan)
- mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren
Baghdasaryan)
- mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra)
Core perf enhancements:
- Reduce 'struct page' footprint of perf by mapping pages in advance
(Lorenzo Stoakes)
- Save raw sample data conditionally based on sample type (Yabin Cui)
- Reduce sampling overhead by checking sample_type in
perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin
Cui)
- Export perf_exclude_event() (Namhyung Kim)
- Simplify find_active_uprobe_rcu() VMA checks
- Add speculative lockless VMA-to-inode-to-uprobe resolution
- Simplify session consumer tracking
- Decouple return_instance list traversal and freeing
- Ensure return_instance is detached from the list before freeing
- Reuse return_instances between multiple uretprobes within task
- Guard against kmemdup() failing in dup_return_instance()
AMD core PMU driver enhancements:
- Relax privilege filter restriction on AMD IBS (Namhyung Kim)
AMD RAPL energy counters support: (Dhananjay Ugwekar)
- Introduce topology_logical_core_id() (K Prateek Nayak)
- Remove the unused get_rapl_pmu_cpumask() function
- Remove the cpu_to_rapl_pmu() function
- Rename rapl_pmu variables
- Make rapl_model struct global
- Add arguments to the init and cleanup functions
- Modify the generic variable names to *_pkg*
- Remove the global variable rapl_msrs
- Move the cntr_mask to rapl_pmus struct
- Add core energy counter support for AMD CPUs
Intel core PMU driver enhancements:
- Support RDPMC 'metrics clear mode' feature (Kan Liang)
- Clarify adaptive PEBS processing (Kan Liang)
- Factor out functions for PEBS records processing (Kan Liang)
- Simplify the PEBS records processing for adaptive PEBS (Kan Liang)
Intel uncore driver enhancements: (Kan Liang)
- Convert buggy pmu->func_id use to pmu->registered
- Support more units on Granite Rapids"
* tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
perf: map pages in advance
perf/x86/intel/uncore: Support more units on Granite Rapids
perf/x86/intel/uncore: Clean up func_id
perf/x86/intel: Support RDPMC metrics clear mode
uprobes: Guard against kmemdup() failing in dup_return_instance()
perf/x86: Relax privilege filter restriction on AMD IBS
perf/core: Export perf_exclude_event()
uprobes: Reuse return_instances between multiple uretprobes within task
uprobes: Ensure return_instance is detached from the list before freeing
uprobes: Decouple return_instance list traversal and freeing
uprobes: Simplify session consumer tracking
uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
uprobes: simplify find_active_uprobe_rcu() VMA checks
mm: introduce mmap_lock_speculate_{try_begin|retry}
mm: convert mm_lock_seq to a proper seqcount
mm/gup: Use raw_seqcount_try_begin()
seqlock: add raw_seqcount_try_begin
perf/x86/rapl: Add core energy counter support for AMD CPUs
perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
perf/x86/rapl: Remove the global variable rapl_msrs
...
- Optimize the annotation-sections parsing code (Peter Zijlstra)
- Centralize annotation definitions in <linux/objtool.h>
- Unify & simplify the barrier_before_unreachable()/unreachable()
definitions (Peter Zijlstra)
- Convert unreachable() calls to BUG() in x86 code, as unreachable()
has unreliable code generation (Peter Zijlstra)
- Remove annotate_reachable() and annotate_unreachable(), as it's
unreliable against compiler optimizations (Peter Zijlstra)
- Fix non-standard ANNOTATE_REACHABLE annotation order (Peter Zijlstra)
- Robustify the annotation code by warning about unknown annotation
types (Peter Zijlstra)
- Allow arch code to discover jump table size, in preparation of
annotated jump table support (Ard Biesheuvel)
* tag 'objtool-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Convert unreachable() to BUG()
objtool: Allow arch code to discover jump table size
objtool: Warn about unknown annotation types
objtool: Fix ANNOTATE_REACHABLE to be a normal annotation
objtool: Convert {.UN}REACHABLE to ANNOTATE
objtool: Remove annotate_{,un}reachable()
loongarch: Use ASM_REACHABLE
x86: Convert unreachable() to BUG()
unreachable: Unify
objtool: Collect more annotations in objtool.h
objtool: Collapse annotate sequences
objtool: Convert ANNOTATE_INTRA_FUNCTION_CALL to ANNOTATE
objtool: Convert ANNOTATE_IGNORE_ALTERNATIVE to ANNOTATE
objtool: Convert VALIDATE_UNRET_BEGIN to ANNOTATE
objtool: Convert instrumentation_{begin,end}() to ANNOTATE
objtool: Convert ANNOTATE_RETPOLINE_SAFE to ANNOTATE
objtool: Convert ANNOTATE_NOENDBR to ANNOTATE
objtool: Generic annotation infrastructure
Linus Torvalds [Tue, 21 Jan 2025 18:10:24 +0000 (10:10 -0800)]
Merge tag 'locking-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Lockdep:
- Improve and fix lockdep bitsize limits, clarify the Kconfig
documentation (Carlos Llamas)
- Fix lockdep build warning on Clang related to
chain_hlock_class_idx() inlining (Andy Shevchenko)
- Relax the requirements of PROVE_RAW_LOCK_NESTING arch support by
not tying it to ARCH_SUPPORTS_RT unnecessarily (Waiman Long)
Rust integration:
- Support lock pointers managed by the C side (Lyude Paul)
- Support guard types (Lyude Paul)
- Update MAINTAINERS file filters to include the Rust locking code
(Boqun Feng)
Wake-queues:
- Add raw_spin_*wake() helpers to simplify locking code (John Stultz)
SMP cross-calls:
- Fix potential data update race by evaluating the local cond_func()
before IPI side-effects (Mathieu Desnoyers)
Guard primitives:
- Ease [c]tags based searches by including the cleanup/guard type
primitives (Peter Zijlstra)
ww_mutexes:
- Simplify the ww_mutex self-test code via swap() (Thorsten Blum)
Static calls:
- Update the static calls MAINTAINERS file-pattern (Jiri Slaby)"
* tag 'locking-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add static_call_inline.c to STATIC BRANCH/CALL
cleanup, tags: Create tags for the cleanup primitives
sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabled
rust: sync: Add lock::Backend::assert_is_held()
rust: sync: Add SpinLockGuard type alias
rust: sync: Add MutexGuard type alias
rust: sync: Make Guard::new() public
rust: sync: Add Lock::from_raw() for Lock<(), B>
locking: MAINTAINERS: Start watching Rust locking primitives
lockdep: Move lockdep_assert_locked() under #ifdef CONFIG_PROVE_LOCKING
lockdep: Mark chain_hlock_class_idx() with __maybe_unused
lockdep: Document MAX_LOCKDEP_CHAIN_HLOCKS calculation
lockdep: Clarify size for LOCKDEP_*_BITS configs
lockdep: Fix upper limit for LOCKDEP_*_BITS configs
locking/ww_mutex/test: Use swap() macro
smp/scf: Evaluate local cond_func() before IPI side-effects
locking/lockdep: Enforce PROVE_RAW_LOCK_NESTING only if ARCH_SUPPORTS_RT
Linus Torvalds [Tue, 21 Jan 2025 17:38:52 +0000 (09:38 -0800)]
Merge tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- The first part of a restructuring of AMD's representation of a
northbridge which is legacy now, and the creation of the new AMD node
concept which represents the Zen architecture of having a collection
of I/O devices within an SoC. Those nodes comprise the so-called data
fabric on Zen.
This has at least one practical advantage of not having to add a PCI
ID each time a new data fabric PCI device releases. Eventually, the
lot more uniform provider of data fabric functionality amd_node.c
will be used by all the drivers which need it
- Smaller cleanups
* tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_node: Use defines for SMN register offsets
x86/amd_node: Remove dependency on AMD_NB
x86/amd_node: Update __amd_smn_rw() error paths
x86/amd_nb: Move SMN access code to a new amd_node driver
x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
x86/amd_nb: Simplify function 3 search
x86/amd_nb: Use topology info to get AMD node count
x86/amd_nb: Simplify root device search
x86/amd_nb: Simplify function 4 search
x86: Start moving AMD node functionality out of AMD_NB
x86/amd_nb: Clean up early_is_amd_nb()
x86/amd_nb: Restrict init function to AMD-based systems
x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
Linus Torvalds [Tue, 21 Jan 2025 17:00:31 +0000 (09:00 -0800)]
Merge tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- A segmented Reverse Map table (RMP) is a across-nodes distributed
table of sorts which contains per-node descriptors of each node-local
4K page, denoting its ownership (hypervisor, guest, etc) in the realm
of confidential computing. Add support for such a table in order to
improve referential locality when accessing or modifying RMP table
entries
- Add support for reading the TSC in SNP guests by removing any
interference or influence the hypervisor might have, with the goal of
making a confidential guest even more independent from the hypervisor
* tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Add the Secure TSC feature for SNP guests
x86/tsc: Init the TSC for Secure TSC guests
x86/sev: Mark the TSC in a secure TSC guest as reliable
x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guests
x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
x86/sev: Change TSC MSR behavior for Secure TSC enabled guests
x86/sev: Add Secure TSC support for SNP guests
x86/sev: Relocate SNP guest messaging routines to common code
x86/sev: Carve out and export SNP guest messaging init routines
virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL
virt: sev-guest: Remove is_vmpck_empty() helper
x86/sev/docs: Document the SNP Reverse Map Table (RMP)
x86/sev: Add full support for a segmented RMP table
x86/sev: Treat the contiguous RMP table as a single RMP segment
x86/sev: Map only the RMP table entries instead of the full RMP range
x86/sev: Move the SNP probe routine out of the way
x86/sev: Require the RMPREAD instruction after Zen4
x86/sev: Add support for the RMPREAD instruction
x86/sev: Prepare for using the RMPREAD instruction to access the RMP
Linus Torvalds [Tue, 21 Jan 2025 16:33:10 +0000 (08:33 -0800)]
Merge tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Borislav Petkov:
- A bunch of minor cleanups
* tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Remove ret local var in early_apply_microcode()
x86/microcode/AMD: Have __apply_microcode_amd() return bool
x86/microcode/AMD: Make __verify_patch_size() return bool
x86/microcode/AMD: Remove bogus comment from parse_container()
x86/microcode/AMD: Return bool from find_blobs_in_containers()
Linus Torvalds [Tue, 21 Jan 2025 16:31:04 +0000 (08:31 -0800)]
Merge tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Extend resctrl with the capability of total memory bandwidth
monitoring, thus accomodating systems which support only total but
not local memory bandwidth monitoring. Add the respective new mount
options
- The usual cleanups
* tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Document the new "mba_MBps_event" file
x86/resctrl: Add write option to "mba_MBps_event" file
x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
x86/resctrl: Make mba_sc use total bandwidth if local is not supported
x86/resctrl: Compute memory bandwidth for all supported events
x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
x86/resctrl: Use kthread_run_on_cpu()
Linus Torvalds [Tue, 21 Jan 2025 16:22:40 +0000 (08:22 -0800)]
Merge tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU speculation update from Borislav Petkov:
- Add support for AMD hardware which is not affected by SRSO on the
user/kernel attack vector and advertise it to guest userspace
* tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Advertise SRSO_USER_KERNEL_NO to userspace
x86/bugs: Add SRSO_USER_KERNEL_NO support
Linus Torvalds [Tue, 21 Jan 2025 16:21:12 +0000 (08:21 -0800)]
Merge tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Remove the EDAC PowerPC Cell driver due to the removal of the IBM
Cell blades support
- Add a new EDAC driver for Loongson SoCs which reports single-bit
correctable errors
- Extend the SKX and i10NM EDAC drivers to support UV systems which can
have more than 8 nodes
- Add Intel Clearwater Forest server support to i10nm_edac
- Minor fix
* tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/cell: Remove powerpc Cell driver
EDAC: Add an EDAC driver for the Loongson memory controller
EDAC: Fix typos in comments
EDAC/{i10nm,skx,skx_common}: Support UV systems
EDAC/i10nm: Add Intel Clearwater Forest server support
Linus Torvalds [Tue, 21 Jan 2025 16:16:24 +0000 (08:16 -0800)]
Merge tag 'ras_core_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Remove the shared threshold bank hack on AMD and streamline and
simplify it
- Cleanup and sanitize MCA code
* tag 'ras_core_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/amd: Remove shared threshold bank plumbing
x86/mce: Remove the redundant mce_hygon_feature_init()
x86/mce: Convert family/model mixed checks to VFM-based checks
x86/mce: Break up __mcheck_cpu_apply_quirks()
x86/mce: Make four functions return bool
x86/mce/threshold: Remove the redundant this_cpu_dec_return()
x86/mce: Make several functions return bool
Mathieu Desnoyers [Thu, 16 Jan 2025 20:59:56 +0000 (15:59 -0500)]
rseq: Fix rseq unregistration regression
A logic inversion in rseq_reset_rseq_cpu_node_id() causes the rseq
unregistration to fail when rseq_validate_ro_fields() succeeds rather
than the opposite.
This affects both CONFIG_DEBUG_RSEQ=y and CONFIG_DEBUG_RSEQ=n.
Linus Torvalds [Tue, 21 Jan 2025 05:40:19 +0000 (21:40 -0800)]
Merge tag 'powerpc-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- Add preempt lazy support
- Deprecate cxl and cxl flash driver
- Fix a possible IOMMU related OOPS at boot on pSeries
- Optimize sched_clock() in ppc32 by replacing mulhdu() by
mul_u64_u64_shr()
Thanks to Andrew Donnellan, Andy Shevchenko, Ankur Arora, Christophe
Leroy, Frederic Barrat, Gaurav Batra, Luis Felipe Hernandez, Michael
Ellerman, Nilay Shroff, Ricardo B. Marliere, Ritesh Harjani (IBM),
Sebastian Andrzej Siewior, Shrikanth Hegde, Sourabh Jain, Thorsten Blum,
and Zhu Jun.
* tag 'powerpc-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix argument order to timer_sub()
powerpc/prom_init: Use IS_ENABLED()
powerpc/pseries/iommu: IOMMU incorrectly marks MMIO range in DDW
powerpc: Use str_on_off() helper in check_cache_coherency()
powerpc: Large user copy aware of full:rt:lazy preemption
powerpc: Add preempt lazy support
powerpc/book3s64/hugetlb: Fix disabling hugetlb when fadump is active
powerpc/vdso: Mark the vDSO code read-only after init
powerpc/64: Use get_user() in start_thread()
macintosh: declare ctl_table as const
selftest/powerpc/ptrace: Cleanup duplicate macro definitions
selftest/powerpc/ptrace/ptrace-pkey: Remove duplicate macros
selftest/powerpc/ptrace/core-pkey: Remove duplicate macros
powerpc/8xx: Drop legacy-of-mm-gpiochip.h header
scsi/cxlflash: Deprecate driver
cxl: Deprecate driver
selftests/powerpc: Fix typo in test-vphn.c
powerpc/xmon: Use str_yes_no() helper in dump_one_paca()
powerpc/32: Replace mulhdu() by mul_u64_u64_shr()
Linus Torvalds [Tue, 21 Jan 2025 05:21:49 +0000 (21:21 -0800)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"We've got a little less than normal thanks to the holidays in
December, but there's the usual summary below. The highlight is
probably the 52-bit physical addressing (LPA2) clean-up from Ard.
Confidential Computing:
- Register a platform device when running in CCA realm mode to enable
automatic loading of dependent modules
CPU Features:
- Update a bunch of system register definitions to pick up new field
encodings from the architectural documentation
- Add hwcaps and selftests for the new (2024) dpISA extensions
Documentation:
- Update EL3 (firmware) requirements for booting Linux on modern
arm64 designs
- Remove stale information about the kernel virtual memory map
Miscellaneous:
- Minor cleanups and typo fixes
Memory management:
- Fix vmemmap_check_pmd() to look at the PMD type bits
- LPA2 (52-bit physical addressing) cleanups and minor fixes
- Adjust physical address space depending upon whether or not LPA2 is
enabled
Perf and PMUs:
- Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
- Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
- Fix Designware PCIe PMU event numbering
- Add generic branch events for the Apple M1 CPU PMU
- Add support for Marvell Odyssey DDR and LLC-TAD PMUs
- Cleanups to the Hisilicon DDRC and Uncore PMU code
- Advertise discard mode for the SPE PMU
- Add the perf users mailing list to our MAINTAINERS entry"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
Documentation: arm64: Remove stale and redundant virtual memory diagrams
perf docs: arm_spe: Document new discard mode
perf: arm_spe: Add format option for discard mode
MAINTAINERS: Add perf list for drivers/perf/
arm64: Remove duplicate included header
drivers/perf: apple_m1: Map generic branch events
arm64: rsi: Add automatic arm-cca-guest module loading
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
arm64/hwcap: Describe 2024 dpISA extensions to userspace
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
...
Linus Torvalds [Tue, 21 Jan 2025 05:18:36 +0000 (21:18 -0800)]
Merge tag 'm68k-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Use the generic muldi3 libgcc function
- Miscellaneous fixes and improvements
* tag 'm68k-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: libgcc: Fix lvalue abuse in umul_ppmm()
m68k: vga: Fix I/O defines
zorro: Constify 'struct bin_attribute'
m68k: atari: Use str_on_off() helper in atari_nvram_proc_read()
m68k: Use kernel's generic muldi3 libgcc function
Linus Torvalds [Tue, 21 Jan 2025 05:14:49 +0000 (21:14 -0800)]
Merge tag 's390-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Select config option KASAN_VMALLOC if KASAN is enabled
- Select config option VMAP_STACK unconditionally
- Implement arch_atomic_inc() / arch_atomic_dec() functions which
result in a single instruction if compiled for z196 or newer
architectures
- Make layering between atomic.h and atomic_ops.h consistent
- Comment s390 preempt_count implementation
- Remove pre MARCH_HAS_Z196_FEATURES preempt count implementation
- GCC uses the number of lines of an inline assembly to calculate
number of instructions and decide on inlining. Therefore remove
superfluous new lines from a couple of inline assemblies.
- Provide arch_atomic_*_and_test() implementations that allow the
compiler to generate slightly better code.
- Optimize __preempt_count_dec_and_test()
- Remove __bootdata annotations from declarations in header files
- Add missing include of <linux/smp.h> in abs_lowcore.h to provide
declarations for get_cpu() and put_cpu() used in the code
- Fix suboptimal kernel image base when running make kasan.config
- Remove huge_pte_none() and huge_pte_none_mostly() as are identical to
the generic variants
- Remove unused PAGE_KERNEL_EXEC, SEGMENT_KERNEL_EXEC, and
REGION3_KERNEL_EXEC defines
- Simplify noexec page protection handling and change the page, segment
and region3 protection definitions automatically if the instruction
execution-protection facility is not available
- Save one instruction and prefer EXRL instruction over EX in string,
xor_*(), amode31 and other functions
- Create /dev/diag misc device to fetch diagnose specific information
from the kernel and provide it to userspace
- Retrieve electrical power readings using DIAGNOSE 0x324 ioctl
- Make ccw_device_get_ciw() consistent and use array indices instead of
pointer arithmetic
- s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into
one place
- The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that in s390 code
- Add missing TLB range adjustment in pud_free_tlb()
- Improve topology setup by adding early polarization detection
- Fix length checks in codepage_convert() function
- The generic bitops implementation is nearly identical to the s390
one. Switch to the generic variant and decrease a bit the kernel
image size
- Provide an optimized arch_test_bit() implementation which makes use
of flag output constraint. This generates slightly better code
- Provide memory topology information obtanied with DIAGNOSE 0x310
using ioctl.
- Various other small improvements, fixes, and cleanups
Also, some changes came in through a merge of 'pci-device-recovery'
branch:
- Add PCI error recovery status mechanism
- Simplify and document debug_next_entry() logic
- Split private data allocation and freeing out of debug file open()
and close() operations
- Add debug_dump() function that gets a textual representation of a
debug info (e.g. PCI recovery hardware error logs)
- Add formatted content of pci_debug_msg_id to the PCI report
* tag 's390-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/futex: Fix FUTEX_OP_ANDN implementation
s390/diag: Add memory topology information via diag310
s390/bitops: Provide optimized arch_test_bit()
s390/bitops: Switch to generic bitops
s390/ebcdic: Fix length decrement in codepage_convert()
s390/ebcdic: Fix length check in codepage_convert()
s390/ebcdic: Use exrl instead of ex
s390/amode31: Use exrl instead of ex
s390/stackleak: Use exrl instead of ex in __stackleak_poison()
s390/lib: Use exrl instead of ex in xor functions
s390/topology: Improve topology detection
s390/tlb: Add missing TLB range adjustment
s390/pkey: Constify 'struct bin_attribute'
s390/sclp: Constify 'struct bin_attribute'
s390/pci: Constify 'struct bin_attribute'
s390/ipl: Constify 'struct bin_attribute'
s390/crypto/cpacf: Constify 'struct bin_attribute'
s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into one place
s390/cio: Use array indices instead of pointer arithmetic
s390/qdio: Rename feature flag aif_osa to aif_qdio
...