Russell King (Oracle) [Tue, 4 Mar 2025 11:21:27 +0000 (11:21 +0000)]
net: stmmac: simplify phylink_suspend() and phylink_resume() calls
Currently, the calls to phylink's suspend and resume functions are
inside overly complex tests, and boil down to:
if (device_may_wakeup(priv->device) && priv->plat->pmt) {
call phylink
} else {
call phylink and
if (device_may_wakeup(priv->device))
do something else
}
This results in phylink always being called, possibly with differing
arguments for phylink_suspend().
Simplify this code, noting that each site is slightly different due to
the order in which phylink is called and the "something else".
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1tpQL1-005St4-Hn@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 5 Mar 2025 17:54:16 +0000 (17:54 +0000)]
net: stmmac: avoid shadowing global buf_sz
stmmac_rx() declares a local variable named "buf_sz" but there is also
a global variable for a module parameter which is called the same. To
avoid confusion, rename the local variable.
Jakub Kicinski [Tue, 4 Mar 2025 23:32:03 +0000 (15:32 -0800)]
selftests: net: fix error message in bpf_offload
We hit a following exception on timeout, nmaps is never set:
Test bpftool bound info reporting (own ns)...
Traceback (most recent call last):
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module>
check_dev_info(False, "")
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info
maps = bpftool_map_list_wait(expected=2, ns=ns)
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait
raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps))
NameError: name 'nmaps' is not defined
A recent cleanup changed the behaviour of tcp_set_window_clamp(). This
looks unintentional, and affects MPTCP selftests, e.g. some tests
re-establishing a connection after a disconnect are now unstable.
The cleanup used the 'clamp' macro which takes 3 arguments -- value,
lowest, and highest -- and returns a value between the lowest and the
highest allowable values. This then assumes ...
lowest (rcv_ssthresh) <= highest (rcv_wnd)
... which doesn't seem to be always the case here according to the MPTCP
selftests, even when running them without MPTCP, but only TCP.
For example, when we have ...
rcv_wnd < rcv_ssthresh < new_rcv_ssthresh
... before the cleanup, the rcv_ssthresh was not changed, while after
the cleanup, it is lowered down to rcv_wnd (highest).
During a simple test with TCP, here are the values I observed:
113664 (out) 65483 < 65536
117760 (out) 110968 < 110976
129024 (out) 116527 > 109696 => lo > hi
Here, we can see that it is not that rare to have rcv_ssthresh (lo)
higher than rcv_wnd (hi), so having a different behaviour when the
clamp() macro is used, even without MPTCP.
Note: new_window_clamp is always out of range (rcv_ssthresh < rcv_wnd)
here, which seems to be generally the case in my tests with small
connections.
I then suggests reverting this part, not to change the behaviour.
Fixes: 863a952eb79a ("tcp: tcp_set_window_clamp() cleanup") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/551 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250305-net-next-fix-tcp-win-clamp-v1-1-12afb705d34e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 5 Mar 2025 17:54:21 +0000 (17:54 +0000)]
net: stmmac: mostly remove "buf_sz"
The "buf_sz" parameter is not used in the stmmac driver - there is one
place where the value of buf_sz is validated, and two places where it
is written. It is otherwise unused.
Remove these accesses. However, leave the module parameter in place as
removing it could cause module load to fail, breaking user setups.
====================
net: stmmac: dwc-qos: Add FSD EQoS support
FSD platform has two instances of EQoS IP, one is in FSYS0 block and
another one is in PERIC block. This patch series add required DT binding
and platform driver specific changes for the same.
====================
Swathi K S [Wed, 5 Mar 2025 09:12:46 +0000 (14:42 +0530)]
net: stmmac: dwc-qos: Add FSD EQoS support
The FSD SoC contains two instance of the Synopsys DWC ethernet QOS IP core.
The binding that it uses is slightly different from existing ones because
of the integration (clocks, resets).
Swathi K S [Wed, 5 Mar 2025 09:12:45 +0000 (14:42 +0530)]
dt-bindings: net: Add FSD EQoS device tree bindings
Add FSD Ethernet compatible in Synopsys dt-bindings document. Add FSD
Ethernet YAML schema to enable the DT validation.
Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Signed-off-by: Ravi Patel <ravi.patel@samsung.com> Signed-off-by: Swathi K S <swathi.ks@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250305091246.106626-2-swathi.ks@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
tcp: even faster connect() under stress
This is a followup on the prior series, "tcp: scale connect() under pressure"
Now spinlocks are no longer in the picture, we see a very high cost
of the inet6_ehashfn() function.
In this series (of 2), I change how lport contributes to inet6_ehashfn()
to ensure better cache locality and call inet6_ehashfn()
only once per connect() system call.
This brings an additional 229 % increase of performance
for "neper/tcp_crr -6 -T 200 -F 30000" stress test,
while greatly improving latency metrics.
Eric Dumazet [Wed, 5 Mar 2025 03:45:49 +0000 (03:45 +0000)]
inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
In order to speedup __inet_hash_connect(), we want to ensure hash values
for <source address, port X, destination address, destination port>
are not randomly spread, but monotonically increasing.
Goal is to allow __inet_hash_connect() to derive the hash value
of a candidate 4-tuple with a single addition in the following
patch in the series.
Then (hash_sport == hash_0 + sport) for all sport values.
As far as I know, there is no security implication with this change.
After this patch, when __inet_hash_connect() has to try XXXX candidates,
the hash table buckets are contiguous and packed, allowing
a better use of cpu caches and hardware prefetchers.
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
net/ethtool/cabletest.c 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") 637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device")
====================
net: Hold netdev instance lock during ndo operations
As the gradual purging of rtnl continues, start grabbing netdev
instance lock in more places so we can get to the state where
most paths are working without rtnl. Start with requiring the
drivers that use shaper api (and later queue mgmt api) to work
with both rtnl and netdev instance lock. Eventually we might
attempt to drop rtnl. This mostly affects iavf, gve, bnxt and
netdev sim (as the drivers that implement shaper/queue mgmt)
so those drivers are converted in the process.
call_netdevice_notifiers locking is very inconsistent and might need
a separate follow up. Some notified events are covered by the
instance lock, some are not, which might complicate the driver
expectations.
Reviewed-by: Eric Dumazet <edumazet@google.com>
====================
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:32 +0000 (08:37 -0800)]
eth: bnxt: remove most dependencies on RTNL
Only devlink and sriov paths are grabbing rtnl explicitly. The rest is
covered by netdev instance lock which the core now grabs, so there is
no need to manage rtnl in most places anymore.
On the core side we can now try to drop rtnl in some places
(do_setlink for example) for the drivers that signal non-rtnl
mode (TBD).
Boot-tested and with `ethtool -L eth1 combined 24` to trigger reset.
Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-15-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:30 +0000 (08:37 -0800)]
net: add option to request netdev instance lock
Currently only the drivers that implement shaper or queue APIs
are grabbing instance lock. Add an explicit opt-in for the
drivers that want to grab the lock without implementing the above
APIs.
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:29 +0000 (08:37 -0800)]
net: replace dev_addr_sem with netdev instance lock
Lockdep reports possible circular dependency in [0]. Instead of
fixing the ordering, replace global dev_addr_sem with netdev
instance lock. Most of the paths that set/get mac are RTNL
protected. Two places where it's not, convert to explicit
locking:
- sysfs address_show
- dev_get_mac_address via dev_ioctl
Jakub Kicinski [Wed, 5 Mar 2025 16:37:28 +0000 (08:37 -0800)]
net: ethtool: try to protect all callback with netdev instance lock
Protect all ethtool callbacks and PHY related state with the netdev
instance lock, for drivers which want / need to have their ops
instance-locked. Basically take the lock everywhere we take rtnl_lock.
It was tempting to take the lock in ethnl_ops_begin(), but turns
out we actually nest those calls (when generating notifications).
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-11-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:24 +0000 (08:37 -0800)]
net: hold netdev instance lock during rtnetlink operations
To preserve the atomicity, hold the lock while applying multiple
attributes. The major issue with a full conversion to the instance
lock are software nesting devices (bonding/team/vrf/etc). Those
devices call into the core stack for their lower (potentially
real hw) devices. To avoid explicitly wrapping all those places
into instance lock/unlock, introduce new API boundaries:
- (some) existing dev_xxx calls are now considered "external"
(to drivers) APIs and they transparently grab the instance
lock if needed (dev_api.c)
- new netif_xxx calls are internal core stack API (naming is
sketchy, I've tried netdev_xxx_locked per Jakub's suggestion,
but it feels a bit verbose; but happy to get back to this
naming scheme if this is the preference)
This avoids touching most of the existing ioctl/sysfs/drivers paths.
Note the special handling of ndo_xxx_slave operations: I exploit
the fact that none of the drivers that call these functions
need/use instance lock. At the same time, they use dev_xxx
APIs, so the lower device has to be unlocked.
Changes in unregister_netdevice_many_notify (to protect dev->state
with instance lock) trigger lockdep - the loop over close_list
(mostly from cleanup_net) introduces spurious ordering issues.
netdev_lock_cmp_fn has a justification on why it's ok to suppress
for now.
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:23 +0000 (08:37 -0800)]
net: hold netdev instance lock during queue operations
For the drivers that use queue management API, switch to the mode where
core stack holds the netdev instance lock. This affects the following
drivers:
- bnxt
- gve
- netdevsim
Originally I locked only start/stop, but switched to holding the
lock over all iterations to make them look atomic to the device
(feels like it should be easier to reason about).
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:22 +0000 (08:37 -0800)]
net: hold netdev instance lock during qdisc ndo_setup_tc
Qdisc operations that can lead to ndo_setup_tc might need
to have an instance lock. Add netdev_lock_ops/netdev_unlock_ops
invocations for all psched_rtnl_msg_handlers operations.
Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:21 +0000 (08:37 -0800)]
net: sched: wrap doit/dumpit methods
In preparation for grabbing netdev instance lock around qdisc
operations, introduce tc_xxx wrappers that lookup netdev
and call respective __tc_xxx helper to do the actual work.
No functional changes.
Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev [Wed, 5 Mar 2025 16:37:19 +0000 (08:37 -0800)]
net: hold netdev instance lock during ndo_open/ndo_stop
For the drivers that use shaper API, switch to the mode where
core stack holds the netdev lock. This affects two drivers:
* iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly
remove these
* netdevsim - switch to _locked APIs to avoid deadlock
iavf_close diff is a bit confusing, the existing call looks like this:
iavf_close() {
netdev_lock()
..
netdev_unlock()
wait_event_timeout(down_waitqueue)
}
I change it to the following:
netdev_lock()
iavf_close() {
..
netdev_unlock()
wait_event_timeout(down_waitqueue)
netdev_lock() // reusing this lock call
}
netdev_unlock()
Since I'm reusing existing netdev_lock call, so it looks like I only
add netdev_unlock.
- wifi: iwlwifi:
- fix A-MSDU TSO preparation
- free pages allocated when failing to build A-MSDU
- ipv6: fix dst ref loop in ila lwtunnel
- mptcp: fix 'scheduling while atomic' in
mptcp_pm_nl_append_new_local_addr
- bluetooth: add check for mgmt_alloc_skb() in
mgmt_device_connected()
- ethtool: allow NULL nlattrs when getting a phy_device
- eth: be2net: fix sleeping while atomic bugs in
be_ndo_bridge_getlink
Previous releases - always broken:
- core: support TCP GSO case for a few missing flags
- wifi: mac80211:
- fix vendor-specific inheritance
- cleanup sta TXQs on flush
- llc: do not use skb_get() before dev_queue_xmit()
- eth: ipa: nable checksum for IPA_ENDPOINT_AP_MODEM_{RX,TX}
for v4.7"
* tag 'net-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits)
net: ipv6: fix missing dst ref drop in ila lwtunnel
net: ipv6: fix dst ref loop in ila lwtunnel
mctp i3c: handle NULL header address
net: dsa: mt7530: Fix traffic flooding for MMIO devices
net-timestamp: support TCP GSO case for a few missing flags
vlan: enforce underlying device type
mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device
ppp: Fix KMSAN uninit-value warning with bpf
net: ipa: Enable checksum for IPA_ENDPOINT_AP_MODEM_{RX,TX} for v4.7
net: ipa: Fix QSB data for v4.7
net: ipa: Fix v4.7 resource group names
net: hns3: make sure ptp clock is unregister and freed if hclge_ptp_get_cycle returns an error
wifi: nl80211: disable multi-link reconfiguration
net: dsa: rtl8366rb: don't prompt users for LED control
be2net: fix sleeping while atomic bugs in be_ndo_bridge_getlink
llc: do not use skb_get() before dev_queue_xmit()
wifi: cfg80211: regulatory: improve invalid hints checking
caif_virtio: fix wrong pointer check in cfv_probe()
net: gso: fix ownership in __udp_gso_segment
...
Linus Torvalds [Thu, 6 Mar 2025 19:19:15 +0000 (09:19 -1000)]
Merge tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd
Pull smb fixes from Steve French:
"Five SMB server fixes, two related client fixes, and minor MAINTAINERS
update:
- Two SMB3 lock fixes fixes (including use after free and bug on fix)
- Fix to race condition that can happen in processing IPC responses
- Four ACL related fixes: one related to endianness of num_aces, and
two related fixes to the checks for num_aces (for both client and
server), and one fixing missing check for num_subauths which can
cause memory corruption
- And minor update to email addresses in MAINTAINERS file"
* tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd:
cifs: fix incorrect validation for num_aces field of smb_acl
ksmbd: fix incorrect validation for num_aces field of smb_acl
smb: common: change the data type of num_aces to le16
ksmbd: fix bug on trap in smb2_lock
ksmbd: fix use-after-free in smb2_lock
ksmbd: fix type confusion via race condition when using ipc_msg_send_request
ksmbd: fix out-of-bounds in parse_sec_desc()
MAINTAINERS: update email address in cifs and ksmbd entry
Linus Torvalds [Thu, 6 Mar 2025 18:18:48 +0000 (08:18 -1000)]
Merge tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Optimize new cluster allocation by correctly find empty entry slot
- Add a check to prevent excessive bitmap clearing due to invalid
data size of file/dir entry
- Fix incorrect error return for zero-byte writes
* tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: add a check for invalid data size
exfat: short-circuit zero-byte writes in exfat_file_write_iter
exfat: fix soft lockup in exfat_clear_bitmap
exfat: fix just enough dentries but allocate a new cluster to dir
Linus Torvalds [Thu, 6 Mar 2025 18:04:49 +0000 (08:04 -1000)]
Merge tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix spelling mistakes in idmappings.rst
- Fix RCU warnings in override_creds()/revert_creds()
- Create new pid namespaces with default limit now that pid_max is
namespaced
* tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
pid: Do not set pid_max in new pid namespaces
doc: correcting two prefix errors in idmappings.rst
cred: Fix RCU warnings in override/revert_creds
Linus Torvalds [Thu, 6 Mar 2025 17:53:25 +0000 (07:53 -1000)]
fs/pipe: fix pipe buffer index use in FUSE
This was another case that Rasmus pointed out where the direct access to
the pipe head and tail pointers broke on 32-bit configurations due to
the type changes.
As with the pipe FIONREAD case, fix it by using the appropriate helper
functions that deal with the right pipe index sizing.
Linus Torvalds [Thu, 6 Mar 2025 17:33:58 +0000 (07:33 -1000)]
fs/pipe: do not open-code pipe head/tail logic in FIONREAD
Rasmus points out that we do indeed have other cases of breakage from
the type changes that were introduced on 32-bit targets in order to read
the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe:
Read pipe->{head,tail} atomically outside pipe->mutex").
Fix it up by using the proper helper functions that now deal with the
pipe buffer index types properly. This makes the code simpler and more
obvious.
The compiler does the CSE and loop hoisting of the pipe ring size
masking that we used to do manually, so open-coding this was never a
good idea.
Linus Torvalds [Thu, 6 Mar 2025 17:30:42 +0000 (07:30 -1000)]
fs/pipe: express 'pipe_empty()' in terms of 'pipe_occupancy()'
That's what 'pipe_full()' does, so it's more consistent. But more
importantly it gets the type limits right when the pipe head and tail
are no longer necessarily 'unsigned int'.
Justin Iurman [Tue, 4 Mar 2025 18:10:39 +0000 (19:10 +0100)]
net: ipv6: fix dst ref loop in ila lwtunnel
This patch follows commit 92191dd10730 ("net: ipv6: fix dst ref loops in
rpl, seg6 and ioam6 lwtunnels") and, on a second thought, the same patch
is also needed for ila (even though the config that triggered the issue
was pathological, but still, we don't want that to happen).
Michal Koutný [Wed, 5 Mar 2025 14:58:49 +0000 (15:58 +0100)]
pid: Do not set pid_max in new pid namespaces
It is already difficult for users to troubleshoot which of multiple pid
limits restricts their workload. The per-(hierarchical-)NS pid_max would
contribute to the confusion.
Also, the implementation copies the limit upon creation from
parent, this pattern showed cumbersome with some attributes in legacy
cgroup controllers -- it's subject to race condition between parent's
limit modification and children creation and once copied it must be
changed in the descendant.
Let's do what other places do (ucounts or cgroup limits) -- create new
pid namespaces without any limit at all. The global limit (actually any
ancestor's limit) is still effectively in place, we avoid the
set/unshare race and bumps of global (ancestral) limit have the desired
effect on pid namespace that do not care.
After blamed commit rtm_to_fib_config() now calls
lwtunnel_valid_encap_type{_attr}() without RTNL held,
triggering an unlock balance in __rtnl_unlock,
as reported by syzbot [1]
IPv6 and rtm_to_nh_config() are not yet converted.
Add a temporary @rtnl_is_held parameter to lwtunnel_valid_encap_type()
and lwtunnel_valid_encap_type_attr().
While we are at it replace the two rcu_dereference()
in lwtunnel_valid_encap_type() with more appropriate
rcu_access_pointer().
[1]
syz-executor245/5836 is trying to release lock (rtnl_mutex) at:
[<ffffffff89d0e38c>] __rtnl_unlock+0x6c/0xf0 net/core/rtnetlink.c:142
but there are no more locks to release!
other info that might help us debug this:
no locks held by syz-executor245/5836.
Lorenzo Bianconi [Tue, 4 Mar 2025 08:50:23 +0000 (09:50 +0100)]
net: dsa: mt7530: Fix traffic flooding for MMIO devices
On MMIO devices (e.g. MT7988 or EN7581) unicast traffic received on lanX
port is flooded on all other user ports if the DSA switch is configured
without VLAN support since PORT_MATRIX in PCR regs contains all user
ports. Similar to MDIO devices (e.g. MT7530 and MT7531) fix the issue
defining default VLAN-ID 0 for MT7530 MMIO devices.
Fixes: 110c18bfed414 ("net: dsa: mt7530: introduce driver for MT7988 built-in switch") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com> Link: https://patch.msgid.link/20250304-mt7988-flooding-fix-v1-1-905523ae83e9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Mon, 3 Mar 2025 20:19:25 +0000 (21:19 +0100)]
net: phy: remove remaining PHY package related definitions from phy.h
Move definition of struct phy_package_shared to phy_package.c, and
move remaining PHY package related declarations from phy.h to
phylib.h, thus making them accessible for PHY drivers only.
Heiner Kallweit [Mon, 3 Mar 2025 20:18:46 +0000 (21:18 +0100)]
net: phy: move PHY package related code from phy.h to phy_package.c
Move PHY package related inline functions from phy.h to phy_package.c.
While doing so remove locked versions phy_package_read() and
phy_package_write() which have no user.
Heiner Kallweit [Mon, 3 Mar 2025 20:15:09 +0000 (21:15 +0100)]
net: phy: add getters for public members in struct phy_package_shared
Add getters for public members, this prepares for making struct
phy_package_shared private to phylib. Declare the getters in a new header
file phylib.h, which will be used by PHY drivers only.
====================
Enable SGMII and 2500BASEX interface mode switching for Intel platforms
During the interface mode change, the 'phylink_major_config' function will
be triggered in phylink. The modification of the following functions will
support the switching between SGMII and 2500BASE-X interface modes for
the Intel platform:
- xpcs_switch_interface_mode: Re-initiates clause 37 auto-negotiation for
the SGMII interface mode to perform auto-negotiation.
- mac_finish: Configures the SerDes according to the interface mode.
With the above changes, the code will work as follows during the interface
mode change. The PCS will reconfigure according to the pcs_neg_mode and the
selected interface mode. Then, the MAC driver will perform SerDes
configuration in 'mac_finish' based on the selected interface mode. During
the SerDes configuration, the selected interface mode will identify TSN
lane registers from FIA and then send IPC commands to the Power Management
Controller (PMC) through the PMC driver/API. The PMC will act as a proxy to
program the PLL registers.
====================
net: stmmac: configure SerDes according to the interface mode
Intel platform will configure the SerDes through PMC API based on the
provided interface mode.
This patch adds several new functions below:-
- intel_tsn_lane_is_available(): This new function reads FIA lane
ownership registers and common lane registers through IPC commands
to know which lane the mGbE port is assigned to.
- intel_mac_finish(): To configure the SerDes based on the assigned
lane and latest interface mode, it sends IPC command to the PMC through
PMC driver/API. The PMC acts as a proxy for R/W on behalf of the driver.
- intel_set_reg_access(): Set the register access to the available TSN
interface.
David E. Box [Thu, 27 Feb 2025 12:15:19 +0000 (20:15 +0800)]
arch: x86: add IPC mailbox accessor function and add SoC register access
- Exports intel_pmc_ipc() for host access to the PMC IPC mailbox
- Enables the host to access specific SoC registers through the PMC
firmware using IPC commands. This access method is necessary for
registers that are not available through direct Memory-Mapped I/O (MMIO),
which is used for other accessible parts of the PMC.
Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Chao Qin <chao.qin@intel.com> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250227121522.1802832-4-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The xpcs_switch_interface_mode function was introduced to handle
interface switching.
According to the XPCS datasheet, a soft reset is required to initiate
Clause 37 auto-negotiation when the XPCS switches interface modes.
When the interface mode switches from 2500BASE-X to SGMII,
re-initiating Clause 37 auto-negotiation is required for the SGMII
interface mode to function properly.
net: phylink: use pl->link_interface in phylink_expects_phy()
The phylink_expects_phy() function allows MAC drivers to check if they are
expecting a PHY to attach. The checking condition in phylink_expects_phy()
aims to achieve the same result as the checking condition in
phylink_attach_phy().
However, the checking condition in phylink_expects_phy() uses
pl->link_config.interface, while phylink_attach_phy() uses
pl->link_interface.
Initially, both pl->link_interface and pl->link_config.interface are set
to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND.
When the interface switches from SGMII to 2500BASE-X,
pl->link_config.interface is updated by phylink_major_config().
At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and
pl->link_config.interface is set to 2500BASE-X.
Subsequently, when the STMMAC interface is taken down
administratively and brought back up, it is blocked by
phylink_expects_phy().
Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the
same result, phylink_expects_phy() should check pl->link_interface,
which never changes, instead of pl->link_config.interface, which is
updated by phylink_major_config().
Linus Torvalds [Wed, 5 Mar 2025 17:35:40 +0000 (07:35 -1000)]
fs/pipe: remove buggy and unused 'helper' function
While looking for incorrect users of the pipe head/tail fields (see
commit c27c66afc449: "fs/pipe: Fix pipe_occupancy() with 16-bit
indexes"), I found a bug in pipe_discard_from() that looked entirely
broken.
However, the fix is trivial: this buggy function isn't actually called
by anything, so let's just remove it ASAP.
Linus Torvalds [Wed, 5 Mar 2025 17:08:09 +0000 (07:08 -1000)]
fs/pipe: Fix pipe_occupancy() with 16-bit indexes
The pipe_occupancy() logic implicitly relied on the natural unsigned
modulo arithmetic in C, but that doesn't work for the new 'pipe_index_t'
case, since any arithmetic will be done in 'int' (and here we had also
made it 'unsigned int' due to the function call boundary).
So make the modulo arithmetic explicit by casting the result to the
proper type.
Jason Xing [Tue, 4 Mar 2025 00:44:29 +0000 (08:44 +0800)]
net-timestamp: support TCP GSO case for a few missing flags
When I read through the TSO codes, I found out that we probably
miss initializing the tx_flags of last seg when TSO is turned
off, which means at the following points no more timestamp
(for this last one) will be generated. There are three flags
to be handled in this patch:
1. SKBTX_HW_TSTAMP
2. SKBTX_BPF
3. SKBTX_SCHED_TSTAMP
Note that SKBTX_BPF[1] was added in 6.14.0-rc2 by commit 6b98ec7e882af ("bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback")
and only belongs to net-next branch material for now. The common
issue of the above three flags can be fixed by this single patch.
This patch initializes the tx_flags to SKBTX_ANY_TSTAMP like what
the UDP GSO does to make the newly segmented last skb inherit the
tx_flags so that requested timestamp will be generated in each
certain layer, or else that last one has zero value of tx_flags
which leads to no timestamp at all.
Fixes: 4ed2d765dfacc ("net-timestamp: TCP timestamping") Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Sandeen [Tue, 11 Feb 2025 20:14:21 +0000 (14:14 -0600)]
exfat: short-circuit zero-byte writes in exfat_file_write_iter
When generic_write_checks() returns zero, it means that
iov_iter_count() is zero, and there is no work to do.
Simply return success like all other filesystems do, rather than
proceeding down the write path, which today yields an -EFAULT in
generic_perform_write() via the
(fault_in_iov_iter_readable(i, bytes) == bytes) check when bytes
== 0.
Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Reported-by: Noah <kernel-org-10@maxgrass.eu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Namjae Jeon [Fri, 31 Jan 2025 03:55:55 +0000 (12:55 +0900)]
exfat: fix soft lockup in exfat_clear_bitmap
bitmap clear loop will take long time in __exfat_free_cluster()
if data size of file/dir enty is invalid.
If cluster bit in bitmap is already clear, stop clearing bitmap go to
out of loop.
Fixes: 31023864e67a ("exfat: add fat entry operations") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Yuezhang Mo [Fri, 22 Nov 2024 02:50:55 +0000 (10:50 +0800)]
exfat: fix just enough dentries but allocate a new cluster to dir
This commit fixes the condition for allocating cluster to parent
directory to avoid allocating new cluster to parent directory when
there are just enough empty directory entries at the end of the
parent directory.
Fixes: af02c72d0b62 ("exfat: convert exfat_find_empty_entry() to use dentry cache") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
====================
Permission checks for dynamic POSIX clocks
Dynamic clocks - such as PTP clocks - extend beyond the standard POSIX
clock API by using ioctl calls. While file permissions are enforced for
standard POSIX operations, they are not implemented for ioctl calls,
since the POSIX layer cannot differentiate between calls which modify
the clock's state (like enabling PPS output generation) and those that
don't (such as retrieving the clock's PPS capabilities).
On the other hand, drivers implementing the dynamic clocks lack the
necessary information context to enforce permission checks themselves.
Additionally, POSIX clock layer requires the WRITE permission even for
readonly adjtime() operations before invoking the callback.
Add a struct file pointer to the POSIX clock context and use it to
implement the appropriate permission checks on PTP chardevs. Permit
readonly adjtime() for dynamic clocks. Add a readonly option to testptp.
Changes in v4:
- Allow readonly adjtime() for dynamic clocks, as suggested by Thomas
Changes in v3:
- Reword the log message for commit against posix-clock and fix
documentation of struct posix_clock_context, as suggested by Thomas
Changes in v2:
- Store file pointer in POSIX clock context rather than fmode in the PTP
clock's private data, as suggested by Richard.
- Move testptp.c changes into separate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wojtek Wasko [Mon, 3 Mar 2025 16:13:45 +0000 (18:13 +0200)]
testptp: Add option to open PHC in readonly mode
PTP Hardware Clocks no longer require WRITE permission to perform
readonly operations, such as listing device capabilities or listening to
EXTTS events once they have been enabled by a process with WRITE
permissions.
Add '-r' option to testptp to open the PHC in readonly mode instead of
the default read-write mode. Skip enabling EXTTS if readonly mode is
requested.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Many devices implement highly accurate clocks, which the kernel manages
as PTP Hardware Clocks (PHCs). Userspace applications rely on these
clocks to timestamp events, trace workload execution, correlate
timescales across devices, and keep various clocks in sync.
The kernel’s current implementation of PTP clocks does not enforce file
permissions checks for most device operations except for POSIX clock
operations, where file mode is verified in the POSIX layer before
forwarding the call to the PTP subsystem. Consequently, it is common
practice to not give unprivileged userspace applications any access to
PTP clocks whatsoever by giving the PTP chardevs 600 permissions. An
example of users running into this limitation is documented in [1].
Additionally, POSIX layer requires WRITE permission even for readonly
adjtime() calls which are used in PTP layer to return current frequency
offset applied to the PHC.
Add permission checks for functions that modify the state of a PTP
device. Continue enforcing permission checks for POSIX clock operations
(settime, adjtime) in the POSIX layer. Only require WRITE access for
dynamic clocks adjtime() if any flags are set in the modes field.
Changes in v4:
- Require FMODE_WRITE in ajtime() only for calls modifying the clock in
any way.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Wojtek Wasko [Mon, 3 Mar 2025 16:13:43 +0000 (18:13 +0200)]
posix-clock: Store file pointer in struct posix_clock_context
File descriptor based pc_clock_*() operations of dynamic posix clocks
have access to the file pointer and implement permission checks in the
generic code before invoking the relevant dynamic clock callback.
Character device operations (open, read, poll, ioctl) do not implement a
generic permission control and the dynamic clock callbacks have no
access to the file pointer to implement them.
Extend struct posix_clock_context with a struct file pointer and
initialize it in posix_clock_open(), so that all dynamic clock callbacks
can access it.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 5 Mar 2025 05:05:53 +0000 (19:05 -1000)]
Merge tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull AMD microcode loading fixes from Borislav Petkov:
- Load only sha256-signed microcode patch blobs
- Other good cleanups
* tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Load only SHA256-checksummed patches
x86/microcode/AMD: Add get_patch_level()
x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration
x86/microcode/AMD: Merge early_apply_microcode() into its single callsite
x86/microcode/AMD: Remove unused save_microcode_in_initrd_amd() declarations
x86/microcode/AMD: Remove ugly linebreak in __verify_patch_section() signature
Dan Carpenter [Mon, 3 Mar 2025 12:02:12 +0000 (15:02 +0300)]
net: Prevent use after free in netif_napi_set_irq_locked()
The cpu_rmap_put() will call kfree() when the last reference is dropped
so it could result in a use after free when we dereference the same
pointer the next line. Move the cpu_rmap_put() after the dereference.
Sean Anderson [Mon, 3 Mar 2025 23:18:32 +0000 (18:18 -0500)]
net: cadence: macb: Synchronize standard stats
The new stats calculations add several additional calls to
macb/gem_update_stats() and accesses to bp->hw_stats. These are
protected by a spinlock since commit fa52f15c745c ("net: cadence: macb:
Synchronize stats calculations"), which was applied in parallel. Add
some locking now that the net has been merged into net-next.
Eric Dumazet [Sun, 2 Mar 2025 12:42:37 +0000 (12:42 +0000)]
tcp: use RCU lookup in __inet_hash_connect()
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
the many spin_lock_bh(&head->lock) performed in its loop.
This patch adds an RCU lookup to avoid the spinlock cost.
check_established() gets a new @rcu_lookup argument.
First reason is to not make any changes while head->lock
is not held.
Second reason is to not make this RCU lookup a second time
after the spinlock has been acquired.
Eric Dumazet [Sun, 2 Mar 2025 12:42:34 +0000 (12:42 +0000)]
tcp: use RCU in __inet{6}_check_established()
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
__inet_check_established() and/or __inet6_check_established().
This patch adds an RCU lookup to avoid the spinlock
acquisition when the 4-tuple is found in the hash table.
Note that there are still spin_lock_bh() calls in
__inet_hash_connect() to protect inet_bind_hashbucket,
this will be fixed later in this series.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krister Johansen [Mon, 3 Mar 2025 17:10:13 +0000 (18:10 +0100)]
mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
If multiple connection requests attempt to create an implicit mptcp
endpoint in parallel, more than one caller may end up in
mptcp_pm_nl_append_new_local_addr because none found the address in
local_addr_list during their call to mptcp_pm_nl_get_local_id. In this
case, the concurrent new_local_addr calls may delete the address entry
created by the previous caller. These deletes use synchronize_rcu, but
this is not permitted in some of the contexts where this function may be
called. During packet recv, the caller may be in a rcu read critical
section and have preemption disabled.
An example stack:
BUG: scheduling while atomic: swapper/2/0/0x00000302
This problem seems particularly prevalent if the user advertises an
endpoint that has a different external vs internal address. In the case
where the external address is advertised and multiple connections
already exist, multiple subflow SYNs arrive in parallel which tends to
trigger the race during creation of the first local_addr_list entries
which have the internal address instead.
Fix by skipping the replacement of an existing implicit local address if
called via mptcp_pm_nl_get_local_id.
Markus Elfring [Thu, 13 Apr 2023 15:00:11 +0000 (17:00 +0200)]
tipc: Reduce scope for the variable “fdefq” in tipc_link_tnl_prepare()
The address of a data structure member was determined before
a corresponding null pointer check in the implementation of
the function “tipc_link_tnl_prepare”.
Thus avoid the risk for undefined behaviour by moving the definition
for the local variable “fdefq” into an if branch at the end.
This issue was detected by using the Coccinelle software.
Jakub Kicinski [Fri, 28 Feb 2025 21:29:55 +0000 (13:29 -0800)]
selftests: drv-net: use env.rpath in the HDS test
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added a new test case in the net tree, now that
this code has made its way to net-next convert it to use the env.rpath()
helper instead of manually computing the relative path.
Daniel Golle [Sat, 1 Mar 2025 01:41:18 +0000 (01:41 +0000)]
dsa: mt7530: Utilize REGMAP_IRQ for interrupt handling
Replace the custom IRQ chip handler and mask/unmask functions with
REGMAP_IRQ. This significantly simplifies the code and allows for the
removal of almost all interrupt-related functions from mt7530.c.
Tested on MT7988A built-in switch (MMIO) as well as MT7531AE IC (MDIO).
So, when tb is NULL (such as in the ethnl notify path), we have a nice
crash.
It turns out that there's only the PLCA command that is in that case, as
the other phydev-specific commands don't have a notification.
This commit fixes the crash by passing the cmd index and the nlattr
array separately, allowing NULL-checking it directly inside the helper.
Fixes: c15e065b46dc ("net: ethtool: Allow passing a phy index for some commands") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Link: https://patch.msgid.link/20250301141114.97204-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qingfang Deng [Sat, 1 Mar 2025 13:55:16 +0000 (21:55 +0800)]
ppp: use IFF_NO_QUEUE in virtual interfaces
For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly
forwards packets to the underlying network stack and never returns
anything other than 1. So these interfaces do not require a qdisc,
and the IFF_NO_QUEUE flag should be set.
Introduces a direct_xmit flag in struct ppp_channel to indicate when
IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel()
for relevant protocols.
While at it, remove the usused latency member from struct ppp_channel.
====================
eth: fbnic: Cleanup macros and string function
We have received some feedback that the macros we use for reading FW mailbox
attributes are too large in scope and confusing to understanding. Additionally
the string function did not provide errors allowing it to silently succeed.
This patch set fixes theses issues.
====================
Lee Trager [Fri, 28 Feb 2025 19:15:28 +0000 (11:15 -0800)]
eth: fbnic: Replace firmware field macros
Replace the firmware field macros with new macros which follow typical
kernel standards. No variables are required to be predefined for use and
results are now returned. These macros are prefixed with fta or fbnic
TLV attribute.
Lee Trager [Fri, 28 Feb 2025 19:15:27 +0000 (11:15 -0800)]
eth: fbnic: Update fbnic_tlv_attr_get_string() to work like nla_strscpy()
Allow fbnic_tlv_attr_get_string() to return an error code. In the event the
source mailbox attribute is missing return -EINVAL. Like nla_strscpy() return
-E2BIG when the source string is larger than the destination string. In this
case the amount of data copied is equal to dstsize.
The aim of this series is to modernize the device tree bindings for the
Freescale "Gianfar" ethernet controller (a.k.a. TSEC, Triple Speed
Ethernet Controller) by converting them to YAML.
J. Neuschäfer [Fri, 28 Feb 2025 17:32:51 +0000 (18:32 +0100)]
dt-bindings: net: fsl,gianfar-mdio: Update information about TBI
When this binding was originally written, all known TSEC Ethernet
controllers had a Ten-Bit Interface (TBI). However, some datasheets such
as for the MPC8315E suggest that this is not universally true:
The eTSECs do not support TBI, GMII, and FIFO operating modes, so all
references to these interfaces and features should be ignored for this
device.
J. Neuschäfer [Fri, 28 Feb 2025 17:32:50 +0000 (18:32 +0100)]
dt-bindings: net: Convert fsl,gianfar-{mdio,tbi} to YAML
Move the information related to the Freescale Gianfar (TSEC) MDIO bus
and the Ten-Bit Interface (TBI) from fsl-tsec-phy.txt to a new binding
file in YAML format, fsl,gianfar-mdio.yaml.
====================
net: phy: nxp-c45-tja11xx: add support for TJA1121
This patch series adds .match_phy_device for the existing TJAs
to differentiate between TJA1103/TJA1104 and TJA1120/TJA1121.
TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec
capabilities while TJA1103 doesn't.
Also add support for TJA1121 which is based on TJA1120 hardware
with additional MACsec IP.
====================