Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable holds the full DS field.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()
Unmask the upper DSCP bits when calling nf_route() which eventually
calls ip_route_output_key() so that in the future it could perform the
FIB lookup according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified as part of the tunnel parameters or the one inherited from the
inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified via the tunnel key or the one inherited from the inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The usage looks weird since it checks @ns_type but calls namespace()
it is found for all existing class definitions that the other filed is
also assigned once one is assigned in current kernel tree, so fix this
weird usage by checking @namespace to call namespace().
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Sep 2024 09:29:05 +0000 (10:29 +0100)]
Merge branch 'fs_enet-cleanup'
Maxime Chevallier says:
====================
net: ethernet: fs_enet: Cleanup and phylink conversion
This is V3 of a series that cleans-up fs_enet, with the ultimate goal of
converting it to phylink (patch 8).
The main changes compared to V2 are :
- Reviewed-by tags from Andrew were gathered
- Patch 5 now includes the removal of now unused includes, thanks
Andrew for spotting this
- Patch 4 is new, it reworks the adjust_link to move the spinlock
acquisition to a more suitable location. Although this dissapears in
the actual phylink port, it makes the phylink conversion clearer on
that point
- Patch 8 includes fixes in the tx_timeout cancellation, to prevent
taking rtnl twice when canceling a pending tx_timeout. Thanks Jakub
for spotting this.
Link to V2: https://lore.kernel.org/netdev/20240829161531.610874-1-maxime.chevallier@bootlin.com/
Link to V1: https://lore.kernel.org/netdev/20240828095103.132625-1-maxime.chevallier@bootlin.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:21 +0000 (19:18 +0200)]
net: ethernet: fs_enet: phylink conversion
fs_enet is a quite old but still used Ethernet driver found on some NXP
devices. It has support for 10/100 Mbps ethernet, with half and full
duplex. Some variants of it can use RMII, while other integrations are
MII-only.
This also allows removing some internal flags such as the use_rmii flag.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:20 +0000 (19:18 +0200)]
net: ethernet: fs_enet: simplify clock handling with devm accessors
devm_clock_get_enabled() can be used to simplify clock handling for the
PER register clock.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:19 +0000 (19:18 +0200)]
net: ethernet: fs_enet: use macros for speed and duplex values
The PHY speed and duplex should be manipulated using the SPEED_XXX and
DUPLEX_XXX macros available. Use it in the fcc, fec and scc MAC for
fs_enet.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:18 +0000 (19:18 +0200)]
net: ethernet: fs_enet: drop unused phy_info and mii_if_info
There's no user of the struct phy_info, the 'phy' field and the
mii_if_info in the fs_enet driver, probably dating back when phylib
wasn't as widely used. Drop these from the driver code.
As the definition for struct mii_if_info is no longer required, drop the
include for linux/mii.h altogether in the driver.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:17 +0000 (19:18 +0200)]
net: ethernet: fs_enet: only protect the .restart() call in .adjust_link
When .adjust_link() gets called, it runs in thread context, with the
phydev->lock held. We only need to protect the fep->fecp/fccp/sccp
register that are accessed within the .restart() function from
concurrent access from the interrupts.
These registers are being protected by the fep->lock spinlock, so we can
move the spinlock protection around the .restart() call instead of the
entire adjust_link() call. By doing so, we can simplify further the
.adjust_link() callback and avoid the intermediate helper.
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:16 +0000 (19:18 +0200)]
net: ethernet: fs_enet: drop the .adjust_link custom fs_ops
There's no in-tree user for the fs_ops .adjust_link() function, so we
can always use the generic one in fe_enet-main.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:15 +0000 (19:18 +0200)]
net: ethernet: fs_enet: cosmetic cleanups
Due to the age of the driver and the slow recent activity on it, the code
has taken some layers of dust. Clean the main driver file up so that it
passes checkpatch and also conforms with the net coding style.
Changes include :
- Re-ordering of the variable declarations for RCT
- Fixing the comment styles to either one-line comments, or net-style
comments
- Adding braces around single-statement 'else' clauses
- Aligning function/macro parameters on the opening parenthesis
- Simplifying checks for NULL pointers
- Splitting cascaded assignments into individual assignments
- Fixing some typos
- Fixing whitespace issues
This is a cosmetic change and doesn't introduce any change in behaviour.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 4 Sep 2024 17:18:14 +0000 (19:18 +0200)]
net: ethernet: fs_enet: convert to SPDX
The ENET driver has SPDX tags in the header files, but they were missing
in the C files. Change the licence information to SPDX format.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED
The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.
The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps. However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.
This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.
This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps. The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: consistently use rcu_replace_pointer() in taprio_change()
According to Vinicius (and carefully looking through the whole
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
once again), txtime branch of 'taprio_change()' is not going to
race against 'advance_sched()'. But using 'rcu_replace_pointer()'
in the former may be a good idea as well.
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 7 Sep 2024 01:39:31 +0000 (18:39 -0700)]
Merge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 adds ctnetlink support for kernel side filtering for
deletions, from Changliang Wu.
Patch #2 updates nft_counter support to Use u64_stats_t,
from Sebastian Andrzej Siewior.
Patch #3 uses kmemdup_array() in all xtables frontends,
from Yan Zhen.
Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
opencoded casting, from Shen Lichuan.
Patch #5 removes unused argument in nftables .validate interface,
from Florian Westphal.
Patch #6 is a oneliner to correct a typo in nftables kdoc,
from Simon Horman.
Patch #7 fixes missing kdoc in nftables, also from Simon.
Patch #8 updates nftables to handle timeout less than CONFIG_HZ.
Patch #9 rejects element expiration if timeout is zero,
otherwise it is silently ignored.
Patch #10 disallows element expiration larger than timeout.
Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.
Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.
Patch #13 annotates data-races around element expiration.
Patch #14 allocates timeout and expiration in one single set element
extension, they are tighly couple, no reason to keep them
separated anymore.
Patch #15 updates nftables to interpret zero timeout element as never
times out. Note that it is already possible to declare sets
with elements that never time out but this generalizes to all
kind of set with timeouts.
Patch #16 supports for element timeout and expiration updates.
* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: set element timeout update support
netfilter: nf_tables: zero timeout means element never times out
netfilter: nf_tables: consolidate timeout extension for elements
netfilter: nf_tables: annotate data-races around element expiration
netfilter: nft_dynset: annotate data-races around set timeout
netfilter: nf_tables: remove annotation to access set timeout while holding lock
netfilter: nf_tables: reject expiration higher than timeout
netfilter: nf_tables: reject element expiration with no timeout
netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
netfilter: nf_tables: Add missing Kernel doc
netfilter: nf_tables: Correct spelling in nf_tables.h
netfilter: nf_tables: drop unused 3rd argument from validate callback ops
netfilter: conntrack: Convert to use ERR_CAST()
netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
netfilter: nft_counter: Use u64_stats_t for statistic.
netfilter: ctnetlink: support CTA_FILTER for flush
====================
The PCIe bus can be pretty busy during boot and probe function can
see excessive delays. Let's find the minimal value out of several
tests and use it as estimated value.
Jakub Kicinski [Sat, 7 Sep 2024 01:23:51 +0000 (18:23 -0700)]
Merge branch 'octeontx2-address-some-warnings'
Simon Horman says:
====================
octeontx2: Address some warnings
This patchset addresses some warnings flagged by Sparse, gcc-14, and
clang-18 in files touched by recent patch submissions.
Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
Sparse will stand out more readily.
Simon Horman [Wed, 4 Sep 2024 18:29:36 +0000 (19:29 +0100)]
octeontx2-af: Pass string literal as format argument of alloc_workqueue()
Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of alloc_workqueue()
is potentially insecure.
E.g. clang-18 says:
.../rvu.c:2493:32: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
2493 | mw->mbox_wq = alloc_workqueue(name,
| ^~~~
.../rvu.c:2493:32: note: treat the string as an argument to avoid this
2493 | mw->mbox_wq = alloc_workqueue(name,
| ^
| "%s",
It is always the case where the contents of name is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Edward Cree [Wed, 4 Sep 2024 18:11:56 +0000 (19:11 +0100)]
sfc: siena: rip out rss-context dead code
Siena hardware does not support custom RSS contexts, but when the
driver was forked from sfc.ko, some of the plumbing for them was
copied across from the common code. Actually trying to use them
would lead to EOPNOTSUPP as the relevant efx_nic_type methods were
not populated.
Remove this dead code from the Siena driver.
net: tls: wait for async completion on last message
When asynchronous encryption is used KTLS sends out the final data at
proto->close time. This becomes problematic when the task calling
close() receives a signal. In this case it can happen that
tcp_sendmsg_locked() called at close time returns -ERESTARTSYS and the
final data is not sent.
The described situation happens when KTLS is used in conjunction with
io_uring, as io_uring uses task_work_add() to add work to the current
userspace task. A discussion of the problem along with a reproducer can
be found in [1] and [2]
Fix this by waiting for the asynchronous encryption to be completed on
the final message. With this there is no data left to be sent at close
time.
====================
make use of the helper macro LIST_HEAD()
The macro LIST_HEAD() declares a list variable and
initializes it, which can be used to simplify the steps
of list initialization, thereby simplifying the code.
These serials just do some equivalatent substitutions,
and with no functional modifications.
====================
Chen Ni [Wed, 4 Sep 2024 08:49:51 +0000 (16:49 +0800)]
sfc: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:40:34 +0000 (16:40 +0800)]
sfc/siena: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:17:28 +0000 (16:17 +0800)]
ionic: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:08:45 +0000 (16:08 +0800)]
net: atlantic: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Gal Pressman [Wed, 4 Sep 2024 07:49:22 +0000 (10:49 +0300)]
bnx2x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:21 +0000 (10:49 +0300)]
cxgb4: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:20 +0000 (10:49 +0300)]
ixgbe: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:19 +0000 (10:49 +0300)]
igc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:18 +0000 (10:49 +0300)]
igb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:17 +0000 (10:49 +0300)]
ice: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:16 +0000 (10:49 +0300)]
i40e: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:15 +0000 (10:49 +0300)]
net: netcp: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:14 +0000 (10:49 +0300)]
net: ti: icssg-prueth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:13 +0000 (10:49 +0300)]
net: ethernet: ti: cpsw_ethtool: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:12 +0000 (10:49 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:11 +0000 (10:49 +0300)]
mlxsw: spectrum: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:10 +0000 (10:49 +0300)]
net: sparx5: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:09 +0000 (10:49 +0300)]
net: lan966x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:08 +0000 (10:49 +0300)]
lan743x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Sep 2024 07:41:36 +0000 (08:41 +0100)]
Merge branch 'microchip=ksz8-cleanup'
Pieter Van Trappen says:
====================
net: dsa: microchip: rename and clean ksz8 series files
The first KSZ8 series implementation was done for a KSZ8795 device but
since several other KSZ8 devices have been added. Rename these files
to adhere to the ksz8 naming convention as already used in most
functions and the existing ksz8.h; add an explanatory note.
In addition, clean the files by removing macros that are defined at
more than one place and remove confusion by renaming the KSZ8830
string which in fact is not an existing KSZ series switch.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
---
v4:
- correct once more Kconfig list of supported switches
v3: https://lore.kernel.org/netdev/20240903072946.344507-1-vtpieter@gmail.com/
- rename all KSZ8830 to KSZ88X3 only (not KSZ8863)
- update Kconfig as per Arun's suggestion
v2: https://lore.kernel.org/netdev/20240830141250.30425-1-vtpieter@gmail.com/
- more finegrained description in Kconfig and ksz8.c header
- add KSZ8830/ksz8830 to KSZ8863/ksz88x3 renaming
Replace ksz8830 with ksz88x3 for CHIP_ID definition and other
strings. This due to KSZ8830 not being an actual switch but the Chip
ID shared among KSZ8863/8873 switches, impossible to differentiate
from their Chip ID or Revision ID registers.
Now all KSZ*_CHIP_ID macros refer to actual, existing switches which
removes confusion.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: clean up ksz8_reg definition macros
Remove macros that are already defined at more appropriate places.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The first KSZ8 series implementation was done for a KSZ8795 device but
since several other KSZ8 devices have been added. Rename these files
to adhere to the ksz8 naming convention as already used in most
functions and the existing ksz8.h; add an explanatory note.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes adding realtek automotive ethernet driver
and adding rtase ethernet driver entry in MAINTAINERS file.
This ethernet device driver for the PCIe interface of
Realtek Automotive Ethernet Switch,applicable to
RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071.
====================
Justin Lai [Wed, 4 Sep 2024 03:21:11 +0000 (11:21 +0800)]
rtase: Implement ethtool function
Implement the ethtool function to support users to obtain network card
information, including obtaining various device settings, Report whether
physical link is up, Report pause parameters, Set pause parameters,
Return extended statistics about the device.
Justin Lai [Wed, 4 Sep 2024 03:21:09 +0000 (11:21 +0800)]
rtase: Implement net_device_ops
1. Implement .ndo_set_rx_mode so that the device can change address
list filtering.
2. Implement .ndo_set_mac_address so that mac address can be changed.
3. Implement .ndo_change_mtu so that mtu can be changed.
4. Implement .ndo_tx_timeout to perform related processing when the
transmitter does not make any progress.
5. Implement .ndo_get_stats64 to provide statistics that are called
when the user wants to get network device usage.
6. Implement .ndo_vlan_rx_add_vid to register VLAN ID when the device
supports VLAN filtering.
7. Implement .ndo_vlan_rx_kill_vid to unregister VLAN ID when the device
supports VLAN filtering.
8. Implement the .ndo_setup_tc to enable setting any "tc" scheduler,
classifier or action on dev.
9. Implement .ndo_fix_features enables adjusting requested feature flags
based on device-specific constraints.
10. Implement .ndo_set_features enables updating device configuration to
new features.
Justin Lai [Wed, 4 Sep 2024 03:21:08 +0000 (11:21 +0800)]
rtase: Implement a function to receive packets
Implement rx_handler to read the information of the rx descriptor,
thereby checking the packet accordingly and storing the packet
in the socket buffer to complete the reception of the packet.
Justin Lai [Wed, 4 Sep 2024 03:21:07 +0000 (11:21 +0800)]
rtase: Implement .ndo_start_xmit function
Implement .ndo_start_xmit function to fill the information of the packet
to be transmitted into the tx descriptor, and then the hardware will
transmit the packet using the information in the tx descriptor.
In addition, we also implemented the tx_handler function to enable the
tx descriptor to be reused.
Justin Lai [Wed, 4 Sep 2024 03:21:06 +0000 (11:21 +0800)]
rtase: Implement hardware configuration function
Implement rtase_hw_config to set default hardware settings, including
setting interrupt mitigation, tx/rx DMA burst, interframe gap time,
rx packet filter, near fifo threshold and fill descriptor ring and
tally counter address, and enable flow control. When filling the
rx descriptor ring, the first group of queues needs to be processed
separately because the positions of the first group of queues are not
regular with other subsequent groups. The other queues are all newly
added features, but we want to retain the original design. So they were
not put together.
Justin Lai [Wed, 4 Sep 2024 03:21:05 +0000 (11:21 +0800)]
rtase: Implement the interrupt routine and rtase_poll
1. Implement rtase_interrupt to handle txQ0/rxQ0, txQ4~txQ7 interrupts,
and implement rtase_q_interrupt to handle txQ1/rxQ1, txQ2/rxQ2 and
txQ3/rxQ3 interrupts.
2. Implement rtase_poll to call ring_handler to process the tx or
rx packet of each ring. If the returned value is budget,it means that
there is still work of a certain ring that has not yet been completed.
Justin Lai [Wed, 4 Sep 2024 03:21:03 +0000 (11:21 +0800)]
rtase: Implement the .ndo_open function
Implement the .ndo_open function to set default hardware settings
and initialize the descriptor ring and interrupts. Among them,
when requesting interrupt, because the first group of interrupts
needs to process more events, the overall structure and interrupt
handler will be different from other groups of interrupts, so it
needs to be handled separately. The first set of interrupt handlers
need to handle the interrupt status of RXQ0 and TXQ0, TXQ4~7,
while other groups of interrupt handlers will handle the interrupt
status of RXQ1&TXQ1 or RXQ2&TXQ2 or RXQ3&TXQ3 according to the
interrupt vector.
Joe Damato [Wed, 4 Sep 2024 15:34:30 +0000 (15:34 +0000)]
net: napi: Prevent overflow of napi_defer_hard_irqs
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes (including some of the widespread
work on fixing missing ID tables for module autoloading and the revert
of some problematic PM work in spi-rockchip), some improvements to the
MAINTAINERS information for the NXP drivers and the addition of a new
device ID to spidev"
* tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
MAINTAINERS: SPI: Add freescale lpspi maintainer information
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
spi: spidev: Add missing spi_device_id for jg10309-01
spi: bcm63xx: Enable module autoloading
spi: intel: Add check devm_kasprintf() returned value
spi: spidev: Add an entry for elgin,jg10309-01
spi: rockchip: Resolve unbalanced runtime PM / system PM handling
Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Doug Anderson for a missing stub, required to fix the build
for some newly added users of devm_regulator_bulk_get_const() in
!REGULATOR configurations"
* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix builds for nightly compiler users now that 'new_uninit' was
split into new features by using an alternative approach for the
code that used what is now called the 'box_uninit_write' feature
- Allow the 'stable_features' lint to preempt upcoming warnings about
them, since soon there will be unstable features that will become
stable in nightly compilers
- Export bss symbols too
'kernel' crate:
- 'block' module: fix wrong usage of lockdep API
'macros' crate:
- Provide correct provenance when constructing 'THIS_MODULE'
Documentation:
- Remove unintended indentation (blockquotes) in generated output
- Fix a couple typos
MAINTAINERS:
- Remove Wedson as Rust maintainer
- Update Andreas' email"
* tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
MAINTAINERS: update Andreas Hindborg's email address
MAINTAINERS: Remove Wedson as Rust maintainer
rust: macros: provide correct provenance when constructing THIS_MODULE
rust: allow `stable_features` lint
docs: rust: remove unintended blockquote in Quick Start
rust: alloc: eschew `Box<MaybeUninit<T>>::write`
rust: kernel: fix typos in code comments
docs: rust: remove unintended blockquote in Coding Guidelines
rust: block: fix wrong usage of lockdep API
rust: kbuild: fix export of bss symbols
Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix adding a new fgraph callback after function graph tracing has
already started.
If the new caller does not initialize its hash before registering the
fgraph_ops, it can cause a NULL pointer dereference. Fix this by
adding a new parameter to ftrace_graph_enable_direct() passing in the
newly added gops directly and not rely on using the fgraph_array[],
as entries in the fgraph_array[] must be initialized.
Assign the new gops to the fgraph_array[] after it goes through
ftrace_startup_subops() as that will properly initialize the
gops->ops and initialize its hashes.
- Fix a memory leak in fgraph storage memory test.
If the "multiple fgraph storage on a function" boot up selftest fails
in the registering of the function graph tracer, it will not free the
memory it allocated for the filter. Break the loop up into two where
it allocates the filters first and then registers the functions where
any errors will do the appropriate clean ups.
- Only clear the timerlat timers if it has an associated kthread.
In the rtla tool that uses timerlat, if it was killed just as it was
shutting down, the signals can free the kthread and the timer. But
the closing of the timerlat files could cause the hrtimer_cancel() to
be called on the already freed timer. As the kthread variable is is
set to NULL when the kthreads are stopped and the timers are freed it
can be used to know not to call hrtimer_cancel() on the timer if the
kthread variable is NULL.
- Use a cpumask to keep track of osnoise/timerlat kthreads
The timerlat tracer can use user space threads for its analysis. With
the killing of the rtla tool, the kernel can get confused between if
it is using a user space thread to analyze or one of its own kernel
threads. When this confusion happens, kthread_stop() can be called on
a user space thread and bad things happen. As the kernel threads are
per-cpu, a bitmask can be used to know when a kernel thread is used
or when a user space thread is used.
- Add missing interface_lock to osnoise/timerlat stop_kthread()
The stop_kthread() function in osnoise/timerlat clears the osnoise
kthread variable, and if it was a user space thread does a put_task
on it. But this can race with the closing of the timerlat files that
also does a put_task on the kthread, and if the race happens the task
will have put_task called on it twice and oops.
- Add cond_resched() to the tracing_iter_reset() loop.
The latency tracers keep writing to the ring buffer without resetting
when it issues a new "start" event (like interrupts being disabled).
When reading the buffer with an iterator, the tracing_iter_reset()
sets its pointer to that start event by walking through all the
events in the buffer until it gets to the time stamp of the start
event. In the case of a very large buffer, the loop that looks for
the start event has been reported taking a very long time with a non
preempt kernel that it can trigger a soft lock up warning. Add a
cond_resched() into that loop to make sure that doesn't happen.
- Use list_del_rcu() for eventfs ei->list variable
It was reported that running loops of creating and deleting kprobe
events could cause a crash due to the eventfs list iteration hitting
a LIST_POISON variable. This is because the list is protected by SRCU
but when an item is deleted from the list, it was using list_del()
which poisons the "next" pointer. This is what list_del_rcu() was to
prevent.
* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
tracing/timerlat: Only clear timer if a kthread exists
tracing/osnoise: Use a cpumask to know what threads are kthreads
eventfs: Use list_del_rcu() for SRCU protected list variable
tracing: Avoid possible softlockup in tracing_iter_reset()
tracing: Fix memory leak in fgraph storage selftest
tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
Memory state around the buggy address: ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
--subscribe "monitor" --sleep 10
fails with:
File "/repo/./tools/net/ynl/cli.py", line 109, in main
ynl.check_ntf()
File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19
Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).
Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.
Merge tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/pmf: ASUS GA403 quirk matching tweak
- dell-smbios: Fix to the init function rollback path
* tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd: pmf: Make ASUS GA403 quirk generic
platform/x86: dell-smbios: Fix error path in dell_smbios_init()
Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
Steven Rostedt [Thu, 5 Sep 2024 15:33:59 +0000 (11:33 -0400)]
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
The timerlat interface will get and put the task that is part of the
"kthread" field of the osn_var to keep it around until all references are
released. But here's a race in the "stop_kthread()" code that will call
put_task_struct() on the kthread if it is not a kernel thread. This can
race with the releasing of the references to that task struct and the
put_task_struct() can be called twice when it should have been called just
once.
Take the interface_lock() in stop_kthread() to synchronize this change.
But to do so, the function stop_per_cpu_kthreads() needs to change the
loop from for_each_online_cpu() to for_each_possible_cpu() and remove the
cpu_read_lock(), as the interface_lock can not be taken while the cpu
locks are held. The only side effect of this change is that it may do some
extra work, as the per_cpu variables of the offline CPUs would not be set
anyway, and would simply be skipped in the loop.
Remove unneeded "return;" in stop_kthread().
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Link: https://lore.kernel.org/20240905113359.2b934242@gandalf.local.home Fixes: e88ed227f639e ("tracing/timerlat: Add user-space interface") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 5 Sep 2024 12:53:30 +0000 (08:53 -0400)]
tracing/timerlat: Only clear timer if a kthread exists
The timerlat tracer can use user space threads to check for osnoise and
timer latency. If the program using this is killed via a SIGTERM, the
threads are shutdown one at a time and another tracing instance can start
up resetting the threads before they are fully closed. That causes the
hrtimer assigned to the kthread to be shutdown and freed twice when the
dying thread finally closes the file descriptors, causing a use-after-free
bug.
Only cancel the hrtimer if the associated thread is still around. Also add
the interface_lock around the resetting of the tlat_var->kthread.
Note, this is just a quick fix that can be backported to stable. A real
fix is to have a better synchronization between the shutdown of old
threads and the starting of new ones.
Steven Rostedt [Wed, 4 Sep 2024 14:34:28 +0000 (10:34 -0400)]
tracing/osnoise: Use a cpumask to know what threads are kthreads
The start_kthread() and stop_thread() code was not always called with the
interface_lock held. This means that the kthread variable could be
unexpectedly changed causing the kthread_stop() to be called on it when it
should not have been, leading to:
while true; do
rtla timerlat top -u -q & PID=$!;
sleep 5;
kill -INT $PID;
sleep 0.001;
kill -TERM $PID;
wait $PID;
done
This is because it would mistakenly call kthread_stop() on a user space
thread making it "exit" before it actually exits.
Since kthreads are created based on global behavior, use a cpumask to know
when kthreads are running and that they need to be shutdown before
proceeding to do new work.