Claudiu Beznea [Tue, 28 Nov 2023 08:04:35 +0000 (10:04 +0200)]
net: ravb: Use pm_runtime_resume_and_get()
pm_runtime_get_sync() may return an error. In case it returns with an error
dev->power.usage_count needs to be decremented. pm_runtime_resume_and_get()
takes care of this. Thus use it.
Claudiu Beznea [Tue, 28 Nov 2023 08:04:34 +0000 (10:04 +0200)]
net: ravb: Check return value of reset_control_deassert()
reset_control_deassert() could return an error. Some devices cannot work
if reset signal de-assert operation fails. To avoid this check the return
code of reset_control_deassert() in ravb_probe() and take proper action.
Along with it, the free_netdev() call from the error path was moved after
reset_control_assert() on its own label (out_free_netdev) to free
netdev in case reset_control_deassert() fails.
Jiawen Wu [Tue, 28 Nov 2023 09:59:28 +0000 (17:59 +0800)]
net: libwx: fix memory leak on msix entry
Since pci_free_irq_vectors() set pdev->msix_enabled as 0 in the
calling of pci_msix_shutdown(), wx->msix_entries is never freed.
Reordering the lines to fix the memory leak.
Dave Ertman [Mon, 27 Nov 2023 21:23:38 +0000 (13:23 -0800)]
ice: Fix VF Reset paths when interface in a failed over aggregate
There is an error when an interface has the following conditions:
- PF is in an aggregate (bond)
- PF has VFs created on it
- bond is in a state where it is failed-over to the secondary interface
- A VF reset is issued on one or more of those VFs
The issue is generated by the originating PF trying to rebuild or
reconfigure the VF resources. Since the bond is failed over to the
secondary interface the queue contexts are in a modified state.
To fix this issue, have the originating interface reclaim its resources
prior to the tear-down and rebuild or reconfigure. Then after the process
is complete, move the resources back to the currently active interface.
There are multiple paths that can be used depending on what triggered the
event, so create a helper function to move the queues and use paired calls
to the helper (back to origin, process, then move back to active interface)
under the same lag_mutex lock.
Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20231127212340.1137657-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 30 Nov 2023 03:43:34 +0000 (19:43 -0800)]
Merge tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
wireless fixes:
- debugfs had a deadlock (removal vs. use of files),
fixes going through wireless ACKed by Greg
- support for HT STAs on 320 MHz channels, even if it's
not clear that should ever happen (that's 6 GHz), best
not to WARN()
- fix for the previous CQM fix that broke most cases
- various wiphy locking fixes
- various small driver fixes
* tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: use wiphy locked debugfs for sdata/link
wifi: mac80211: use wiphy locked debugfs helpers for agg_status
wifi: cfg80211: add locked debugfs wrappers
debugfs: add API to allow debugfs operations cancellation
debugfs: annotate debugfs handlers vs. removal with lockdep
debugfs: fix automount d_fsdata usage
wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
wifi: avoid offset calculation on NULL pointer
wifi: cfg80211: hold wiphy mutex for send_interface
wifi: cfg80211: lock wiphy mutex for rfkill poll
wifi: cfg80211: fix CQM for non-range use
wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush
wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta()
wifi: mt76: mt7925: fix typo in mt7925_init_he_caps
wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config
====================
Jakub Kicinski [Thu, 30 Nov 2023 03:40:04 +0000 (19:40 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-11-30
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 10 files changed, 66 insertions(+), 15 deletions(-).
The main changes are:
1) Fix AF_UNIX splat from use after free in BPF sockmap,
from John Fastabend.
2) Fix a syzkaller splat in netdevsim by properly handling offloaded
programs (and not device-bound ones), from Stanislav Fomichev.
3) Fix bpf_mem_cache_alloc_flags() to initialize the allocation hint,
from Hou Tao.
4) Fix netkit by rejecting IFLA_NETKIT_PEER_INFO in changelink,
from Daniel Borkmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Add af_unix test with both sockets in map
bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
netkit: Reject IFLA_NETKIT_PEER_INFO in netkit_change_link
bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags()
netdevsim: Don't accept device bound programs
====================
John Fastabend [Wed, 29 Nov 2023 01:25:57 +0000 (17:25 -0800)]
bpf, sockmap: Add af_unix test with both sockets in map
This adds a test where both pairs of a af_unix paired socket are put into a
BPF map. This ensures that when we tear down the af_unix pair we don't have
any issues on sockmap side with ordering and reference counting.
John Fastabend [Wed, 29 Nov 2023 01:25:56 +0000 (17:25 -0800)]
bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
AF_UNIX stream sockets are a paired socket. So sending on one of the pairs
will lookup the paired socket as part of the send operation. It is possible
however to put just one of the pairs in a BPF map. This currently increments
the refcnt on the sock in the sockmap to ensure it is not free'd by the
stack before sockmap cleans up its state and stops any skbs being sent/recv'd
to that socket.
But we missed a case. If the peer socket is closed it will be free'd by the
stack. However, the paired socket can still be referenced from BPF sockmap
side because we hold a reference there. Then if we are sending traffic through
BPF sockmap to that socket it will try to dereference the free'd pair in its
send logic creating a use after free. And following splat:
To fix let BPF sockmap hold a refcnt on both the socket in the sockmap and its
paired socket. It wasn't obvious how to contain the fix to bpf_unix logic. The
primarily problem with keeping this logic in bpf_unix was: In the sock close()
we could handle the deref by having a close handler. But, when we are destroying
the psock through a map delete operation we wouldn't have gotten any signal
thorugh the proto struct other than it being replaced. If we do the deref from
the proto replace its too early because we need to deref the sk_pair after the
backlog worker has been stopped.
Given all this it seems best to just cache it at the end of the psock and eat 8B
for the af_unix and vsock users. Notice dgram sockets are OK because they handle
locking already.
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20231129012557.95371-2-john.fastabend@gmail.com
struct ynl_req_state carries reply-related info from generated code
into generic YNL code. While we don't need reply info to execute
a request without a reply, we still need to pass in the struct, because
it's also where we get the pointer to struct ynl_sock from. Passing NULL
results in crashes if kernel returns an error or an unexpected reply.
Jakub Kicinski [Sun, 26 Nov 2023 22:58:06 +0000 (14:58 -0800)]
ethtool: don't propagate EOPNOTSUPP from dumps
The default dump handler needs to clear ret before returning.
Otherwise if the last interface returns an inconsequential
error this error will propagate to user space.
This may confuse user space (ethtool CLI seems to ignore it,
but YNL doesn't). It will also terminate the dump early
for mutli-skb dump, because netlink core treats EOPNOTSUPP
as a real error.
Yoshihiro Shimoda [Mon, 27 Nov 2023 12:24:20 +0000 (21:24 +0900)]
ravb: Fix races between ravb_tx_timeout_work() and net related ops
Fix races between ravb_tx_timeout_work() and functions of net_device_ops
and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
may happen in ravb_tx_timeout_work() like below:
CPU0 CPU1
ravb_tx_timeout()
schedule_work()
...
__dev_close_many()
// Under rtnl lock
ravb_close()
cancel_work_sync()
// Waiting
ravb_tx_timeout_work()
rtnl_lock()
// This is possible to cause a deadlock
If rtnl_trylock() fails, rescheduling the work with sleep for 1 msec.
Heiner Kallweit [Sun, 26 Nov 2023 22:01:02 +0000 (23:01 +0100)]
r8169: prevent potential deadlock in rtl8169_close
ndo_stop() is RTNL-protected by net core, and the worker function takes
RTNL as well. Therefore we will deadlock when trying to execute a
pending work synchronously. To fix this execute any pending work
asynchronously. This will do no harm because netif_running() is false
in ndo_stop(), and therefore the work function is effectively a no-op.
However we have to ensure that no task is running or pending after
rtl_remove_one(), therefore add a call to cancel_work_sync().
Heiner Kallweit [Sun, 26 Nov 2023 18:36:46 +0000 (19:36 +0100)]
r8169: fix deadlock on RTL8125 in jumbo mtu mode
The original change results in a deadlock if jumbo mtu mode is used.
Reason is that the phydev lock is held when rtl_reset_work() is called
here, and rtl_jumbo_config() calls phy_start_aneg() which also tries
to acquire the phydev lock. Fix this by calling rtl_reset_work()
asynchronously.
Fixes: 621735f59064 ("r8169: fix rare issue with broken rx after link-down on RTL8125") Reported-by: Ian Chen <free122448@hotmail.com> Tested-by: Ian Chen <free122448@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/caf6a487-ef8c-4570-88f9-f47a659faf33@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gustavo A. R. Silva [Sat, 25 Nov 2023 21:33:58 +0000 (15:33 -0600)]
neighbour: Fix __randomize_layout crash in struct neighbour
Previously, one-element and zero-length arrays were treated as true
flexible arrays, even though they are actually "fake" flex arrays.
The __randomize_layout would leave them untouched at the end of the
struct, similarly to proper C99 flex-array members.
However, this approach changed with commit 1ee60356c2dc ("gcc-plugins:
randstruct: Only warn about true flexible arrays"). Now, only C99
flexible-array members will remain untouched at the end of the struct,
while one-element and zero-length arrays will be subject to randomization.
Fix a `__randomize_layout` crash in `struct neighbour` by transforming
zero-length array `primary_key` into a proper C99 flexible-array member.
Fixes: 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays") Closes: https://lore.kernel.org/linux-hardening/20231124102458.GB1503258@e124191.cambridge.arm.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/ZWJoRsJGnCPdJ3+2@work Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Subbaraya Sundeep [Sat, 25 Nov 2023 16:36:57 +0000 (22:06 +0530)]
octeontx2-pf: Restore TC ingress police rules when interface is up
TC ingress policer rules depends on interface receive queue
contexts since the bandwidth profiles are attached to RQ
contexts. When an interface is brought down all the queue
contexts are freed. This in turn frees bandwidth profiles in
hardware causing ingress police rules non-functional after
the interface is brought up. Fix this by applying all the ingress
police rules config to hardware in otx2_open. Also allow
adding ingress rules only when interface is running
since no contexts exist for the interface when it is down.
Geetha sowjanya [Sat, 25 Nov 2023 16:34:02 +0000 (22:04 +0530)]
octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64
When more than 64 VFs are enabled for a PF then mbox communication
between VF and PF is not working as mbox work queueing for few VFs
are skipped due to wrong calculation of VF numbers.
Furong Xu [Sat, 25 Nov 2023 06:01:26 +0000 (14:01 +0800)]
net: stmmac: xgmac: Disable FPE MMC interrupts
Commit aeb18dd07692 ("net: stmmac: xgmac: Disable MMC interrupts
by default") tries to disable MMC interrupts to avoid a storm of
unhandled interrupts, but leaves the FPE(Frame Preemption) MMC
interrupts enabled, FPE MMC interrupts can cause the same problem.
Now we mask FPE TX and RX interrupts to disable all MMC interrupts.
Elena Salomatkina [Fri, 24 Nov 2023 21:08:02 +0000 (00:08 +0300)]
octeontx2-af: Fix possible buffer overflow
A loop in rvu_mbox_handler_nix_bandprof_free() contains
a break if (idx == MAX_BANDPROF_PER_PFFUNC),
but if idx may reach MAX_BANDPROF_PER_PFFUNC
buffer '(*req->prof_idx)[layer]' overflow happens before that check.
The patch moves the break to the
beginning of the loop.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e8e095b3b370 ("octeontx2-af: cn10k: Bandwidth profiles config support"). Signed-off-by: Elena Salomatkina <elena.salomatkina.cmc@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20231124210802.109763-1-elena.salomatkina.cmc@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Same init_rng() in both tests. The function reads /dev/urandom to
initialize srand(). In case of failure, it falls back onto the
entropy in the uninitialized variable. Not sure if this is on purpose.
But failure reading urandom should be rare, so just fail hard. While
at it, convert to getrandom(). Which man 4 random suggests is simpler
and more robust.
mptcp_inq.c:525:6:
mptcp_connect.c:1131:6:
error: variable 'foo' is used uninitialized
whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Fixes: b51880568f20 ("selftests: mptcp: add inq test case") Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Willem de Bruijn <willemb@google.com>
----
When input is randomized because this is expected to meaningfully
explore edge cases, should we also add
1. logging the random seed to stdout and
2. adding a command line argument to replay from a specific seed
I can do this in net-next, if authors find it useful in this case. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231124171645.1011043-5-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 24 Nov 2023 17:15:20 +0000 (12:15 -0500)]
selftests/net: fix a char signedness issue
Signedness of char is signed on x86_64, but unsigned on arm64.
Fix the warning building cmsg_sender.c on signed platforms or
forced with -fsigned-char:
msg_sender.c:455:12:
error: implicit conversion from 'int' to 'char'
changes value from 128 to -128
[-Werror,-Wconstant-conversion]
buf[0] = ICMPV6_ECHO_REQUEST;
Willem de Bruijn [Fri, 24 Nov 2023 17:15:19 +0000 (12:15 -0500)]
selftests/net: ipsec: fix constant out of range
Fix a small compiler warning.
nr_process must be a signed long: it is assigned a signed long by
strtol() and is compared against LONG_MIN and LONG_MAX.
ipsec.c:2280:65:
error: result of comparison of constant -9223372036854775808
with expression of type 'unsigned int' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
Daniel Borkmann [Mon, 27 Nov 2023 20:05:33 +0000 (21:05 +0100)]
netkit: Reject IFLA_NETKIT_PEER_INFO in netkit_change_link
The IFLA_NETKIT_PEER_INFO attribute can only be used during device
creation, but not via changelink callback. Hence reject it there.
Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/e86a277a1e8d3b19890312779e42f790b0605ea4.1701115314.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Johannes Berg [Fri, 24 Nov 2023 16:25:29 +0000 (17:25 +0100)]
wifi: mac80211: use wiphy locked debugfs for sdata/link
The debugfs files for netdevs (sdata) and links are removed
with the wiphy mutex held, which may deadlock. Use the new
wiphy locked debugfs to avoid that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 24 Nov 2023 16:25:28 +0000 (17:25 +0100)]
wifi: mac80211: use wiphy locked debugfs helpers for agg_status
The read is currently with RCU and the write can deadlock,
convert both for the sake of illustration.
Make mac80211 depend on cfg80211 debugfs to get the helpers,
but mac80211 debugfs without it does nothing anyway. This
also required some adjustments in ath9k.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 24 Nov 2023 16:25:27 +0000 (17:25 +0100)]
wifi: cfg80211: add locked debugfs wrappers
Add wrappers for debugfs files that should be called with
the wiphy mutex held, while the file is also to be removed
under the wiphy mutex. This could otherwise deadlock when
a file is trying to acquire the wiphy mutex while the code
removing it holds the mutex but waits for the removal.
This actually works by pushing the execution of the read
or write handler to a wiphy work that can be cancelled
using the debugfs cancellation API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 24 Nov 2023 16:25:26 +0000 (17:25 +0100)]
debugfs: add API to allow debugfs operations cancellation
In some cases there might be longer-running hardware accesses
in debugfs files, or attempts to acquire locks, and we want
to still be able to quickly remove the files.
Introduce a cancellations API to use inside the debugfs handler
functions to be able to cancel such operations on a per-file
basis.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 24 Nov 2023 16:25:25 +0000 (17:25 +0100)]
debugfs: annotate debugfs handlers vs. removal with lockdep
When you take a lock in a debugfs handler but also try
to remove the debugfs file under that lock, things can
deadlock since the removal has to wait for all users
to finish.
Add lockdep annotations in debugfs_file_get()/_put()
to catch such issues.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 24 Nov 2023 16:25:24 +0000 (17:25 +0100)]
debugfs: fix automount d_fsdata usage
debugfs_create_automount() stores a function pointer in d_fsdata,
but since commit 7c8d469877b1 ("debugfs: add support for more
elaborate ->d_fsdata") debugfs_release_dentry() will free it, now
conditionally on DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, but that's not
set for the function pointer in automount. As a result, removing
an automount dentry would attempt to free the function pointer.
Luckily, the only user of this (tracing) never removes it.
Nevertheless, it's safer if we just handle the fsdata in one way,
namely either DEBUGFS_FSDATA_IS_REAL_FOPS_BIT or allocated. Thus,
change the automount to allocate it, and use the real_fops in the
data to indicate whether or not automount is filled, rather than
adding a type tag. At least for now this isn't actually needed,
but the next changes will require it.
Also check in debugfs_file_get() that it gets only called
on regular files, just to make things clearer.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Hou Tao [Sat, 11 Nov 2023 04:38:21 +0000 (12:38 +0800)]
bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags()
bpf_mem_cache_alloc_flags() may call __alloc() directly when there is no
free object in free list, but it doesn't initialize the allocation hint
for the returned pointer. It may lead to bad memory dereference when
freeing the pointer, so fix it by initializing the allocation hint.
David S. Miller [Sun, 26 Nov 2023 15:18:57 +0000 (15:18 +0000)]
Merge branch 'dpaa2-eth-fixes'
Ioana Ciornei says:
====================
dpaa2-eth: various fixes
The first patch fixes a memory corruption issue happening between the Tx
and Tx confirmation of a packet by making the Tx alignment at 64bytes
mandatory instead of optional as it was previously.
The second patch fixes the Rx copybreak code path which recycled the
initial data buffer before all processing was done on the packet.
Changes in v2:
- squashed patches #1 and #2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 24 Nov 2023 10:28:05 +0000 (12:28 +0200)]
dpaa2-eth: recycle the RX buffer only after all processing done
The blamed commit added support for Rx copybreak. This meant that for
certain frame sizes, a new skb was allocated and the initial data buffer
was recycled. Instead of waiting to recycle the Rx buffer only after all
processing was done on it (like accessing the parse results or timestamp
information), the code path just went ahead and re-used the buffer right
away.
This sometimes lead to corrupted HW and SW annotation areas.
Fix this by delaying the moment when the buffer is recycled.
Fixes: 50f826999a80 ("dpaa2-eth: add rx copybreak support") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 24 Nov 2023 10:28:04 +0000 (12:28 +0200)]
dpaa2-eth: increase the needed headroom to account for alignment
Increase the needed headroom to account for a 64 byte alignment
restriction which, with this patch, we make mandatory on the Tx path.
The case in which the amount of headroom needed is not available is
already handled by the driver which instead sends a S/G frame with the
first buffer only holding the SW and HW annotation areas.
Without this patch, we can empirically see data corruption happening
between Tx and Tx confirmation which sometimes leads to the SW
annotation area being overwritten.
Since this is an old IP where the hardware team cannot help to
understand the underlying behavior, we make the Tx alignment mandatory
for all frames to avoid the crash on Tx conf. Also, remove the comment
that suggested that this is just an optimization.
This patch also sets the needed_headroom net device field to the usual
value that the driver would need on the Tx path:
- 64 bytes for the software annotation area
- 64 bytes to account for a 64 byte aligned buffer address
Fixes: 6e2387e8f19e ("staging: fsl-dpaa2/eth: Add Freescale DPAA2 Ethernet driver") Closes: https://lore.kernel.org/netdev/aa784d0c-85eb-4e5d-968b-c8f74fa86be6@gin.de/ Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Ungerer [Fri, 24 Nov 2023 04:15:29 +0000 (14:15 +1000)]
net: dsa: mv88e6xxx: fix marvell 6350 probe crash
As of commit b92143d4420f ("net: dsa: mv88e6xxx: add infrastructure for
phylink_pcs") probing of a Marvell 88e6350 switch causes a NULL pointer
de-reference like this example:
...
mv88e6085 d0072004.mdio-mii:11: switch 0x3710 detected: Marvell 88E6350, revision 2
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000000 when read
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in:
CPU: 0 PID: 8 Comm: kworker/u2:0 Not tainted 6.7.0-rc2-dirty #26
Hardware name: Marvell Armada 370/XP (Device Tree)
Workqueue: events_unbound deferred_probe_work_func
PC is at mv88e6xxx_port_setup+0x1c/0x44
LR is at dsa_port_devlink_setup+0x74/0x154
pc : [<c057ea24>] lr : [<c0819598>] psr: a0000013
sp : c184fce0 ip : c542b8f4 fp : 00000000
r10: 00000001 r9 : c542a540 r8 : c542bc00
r7 : c542b838 r6 : c5244580 r5 : 00000005 r4 : c5244580
r3 : 00000000 r2 : c542b840 r1 : 00000005 r0 : c1a02040
...
The Marvell 6350 switch has no SERDES interface and so has no
corresponding pcs_ops defined for it. But during probing a call is made
to mv88e6xxx_port_setup() which unconditionally expects pcs_ops to exist -
though the presence of the pcs_ops->pcs_init function is optional.
Modify code to check for pcs_ops first, before checking for and calling
pcs_ops->pcs_init. Modify checking and use of pcs_ops->pcs_teardown
which may potentially suffer the same problem.
Fixes: b92143d4420f ("net: dsa: mv88e6xxx: add infrastructure for phylink_pcs") Signed-off-by: Greg Ungerer <gerg@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The problem stems from the use of mv88e6185_phylink_get_caps() to get
the device capabilities. Create a new dedicated phylink_get_caps for the
6351 family (which the 6350 is one of) to properly support their set of
capabilities.
According to chip.h the 6351 switch family includes the 6171, 6175, 6350
and 6351 switches, so update each of these to use the correct
phylink_get_caps.
Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Greg Ungerer <gerg@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Greear [Thu, 9 Nov 2023 18:22:01 +0000 (10:22 -0800)]
wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
The new 320 MHz channel width wasn't handled, so connecting
a station to a 320 MHz AP would limit the station to 20 MHz
(on HT) after a warning, handle 320 MHz to fix that.
Michael-CY Lee [Wed, 22 Nov 2023 03:02:37 +0000 (11:02 +0800)]
wifi: avoid offset calculation on NULL pointer
ieee80211_he_6ghz_oper() can be passed a NULL pointer
and checks for that, but already did the calculation
to inside of it before. Move it after the check.
Johannes Berg [Wed, 15 Nov 2023 12:06:16 +0000 (13:06 +0100)]
wifi: cfg80211: hold wiphy mutex for send_interface
Given all the locking rework in mac80211, we pretty much
need to get into the driver with the wiphy mutex held in
all callbacks. This is already mostly the case, but as
Johan reported, in the get_txpower it may not be true.
Lock the wiphy mutex around nl80211_send_iface(), then
is also around callers of nl80211_notify_iface(). This
is easy to do, fixes the problem, and aligns the locking
between various calls to it in different parts of the
code of cfg80211.
Fixes: 0e8185ce1dde ("wifi: mac80211: check wiphy mutex in ops") Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/r/ZVOXX6qg4vXEx8dX@hovoldconsulting.com Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 6 Nov 2023 22:17:16 +0000 (23:17 +0100)]
wifi: cfg80211: fix CQM for non-range use
My prior race fix here broke CQM when ranges aren't used, as
the reporting worker now requires the cqm_config to be set in
the wdev, but isn't set when there's no range configured.
Rather than continuing to special-case the range version, set
the cqm_config always and configure accordingly, also tracking
if range was used or not to be able to clear the configuration
appropriately with the same API, which was actually not right
if both were implemented by a driver for some reason, as is
the case with mac80211 (though there the implementations are
equivalent so it doesn't matter.)
Also, the original multiple-RSSI commit lost checking for the
callback, so might have potentially crashed if a driver had
neither implementation, and userspace tried to use it despite
not being advertised as supported.
Oldřich Jedlička [Sat, 4 Nov 2023 14:13:33 +0000 (15:13 +0100)]
wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush
This fixes WARN_ONs when using AP_VLANs after station removal. The flush
call passed AP_VLAN vif to driver, but because these vifs are virtual and
not registered with drivers, we need to translate to the correct AP vif
first.
Closes: https://github.com/openwrt/openwrt/issues/12420 Fixes: 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal") Fixes: d00800a289c9 ("wifi: mac80211: add flush_sta method") Tested-by: Konstantin Demin <rockdrilla@gmail.com> Tested-by: Koen Vandeputte <koen.vandeputte@citymesh.com> Signed-off-by: Oldřich Jedlička <oldium.pro@gmail.com> Link: https://lore.kernel.org/r/20231104141333.3710-1-oldium.pro@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The root causes are as follows:
Thread A Thread B
... netif_receive_skb
br_dev_stop ...
br_multicast_leave_snoopers ...
__ip_mc_dec_group ...
__igmp_group_dropped igmp_rcv
igmp_stop_timer igmp_heard_query //ref = 1
ip_ma_put igmp_mod_timer
refcount_dec_and_test igmp_start_timer //ref = 0
... refcount_inc //ref increases from 0
When the device receives an IGMPv2 Query message, it starts the timer
immediately, regardless of whether the device is running. If the device is
down and has left the multicast group, it will cause the mc list refcount
uaf issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Adam Davis [Thu, 23 Nov 2023 01:23:39 +0000 (09:23 +0800)]
mptcp: fix uninit-value in mptcp_incoming_options
Added initialization use_ack to mptcp_parse_option().
Reported-by: syzbot+b834a6b2decad004cfa1@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Nov 2023 01:52:36 +0000 (01:52 +0000)]
Merge branch 'rswitch-fixes'
Yoshihiro Shimoda says:
====================
net: rswitch: Fix issues in rswitch_start_xmit()
This patch series is based on the latest net.git / main branch.
Changes from v2:
https://lore.kernel.org/all/20231122012556.3645840-1-yoshihiro.shimoda.uh@renesas.com/
- Keep reverse christmas tree of local variable declarations in patch 1/3.
Changes from v1:
https://lore.kernel.org/all/20231121055255.3627949-1-yoshihiro.shimoda.uh@renesas.com/
- Separate a patch because fixing 2 issues.
- Add fixing wrong type of return value.
- Use goto for improving code readability.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Wed, 22 Nov 2023 05:11:41 +0000 (14:11 +0900)]
net: rswitch: Fix type of ret in rswitch_start_xmit()
The type of ret in rswitch_start_xmit() should be netdev_tx_t. So,
fix it.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislav Fomichev [Tue, 14 Nov 2023 04:54:52 +0000 (20:54 -0800)]
netdevsim: Don't accept device bound programs
Commit 2b3486bc2d23 ("bpf: Introduce device-bound XDP programs") introduced
device-bound programs by largely reusing existing offloading infrastructure.
This changed the semantics of 'prog->aux->offload' a bit. Now, it's non-NULL
for both offloaded and device-bound programs.
Instead of looking at 'prog->aux->offload' let's call bpf_prog_is_offloaded
which should be true iff the program is offloaded and not merely device-bound.
Linus Torvalds [Thu, 23 Nov 2023 18:40:13 +0000 (10:40 -0800)]
Merge tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf.
Current release - regressions:
- Revert "net: r8169: Disable multicast filter for RTL8168H and
RTL8107E"
- kselftest: rtnetlink: fix ip route command typo
Current release - new code bugs:
- s390/ism: make sure ism driver implies smc protocol in kconfig
- two build fixes for tools/net
Previous releases - regressions:
- rxrpc: couple of ACK/PING/RTT handling fixes
Previous releases - always broken:
- bpf: verify bpf_loop() callbacks as if they are called unknown
number of times
- improve stability of auto-bonding with Hyper-V
- account BPF-neigh-redirected traffic in interface statistics
Misc:
- net: fill in some more MODULE_DESCRIPTION()s"
* tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
tools: ynl: fix duplicate op name in devlink
tools: ynl: fix header path for nfsd
net: ipa: fix one GSI register field width
tls: fix NULL deref on tls_sw_splice_eof() with empty record
net: axienet: Fix check for partial TX checksum
vsock/test: fix SEQPACKET message bounds test
i40e: Fix adding unsupported cloud filters
ice: restore timestamp configuration after device reset
ice: unify logic for programming PFINT_TSYN_MSK
ice: remove ptp_tx ring parameter flag
amd-xgbe: propagate the correct speed and duplex status
amd-xgbe: handle the corner-case during tx completion
amd-xgbe: handle corner-case during sfp hotplug
net: veth: fix ethtool stats reporting
octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
net/smc: avoid data corruption caused by decline
nfc: virtual_ncidev: Add variable to check if ndev is running
dpll: Fix potential msg memleak when genlmsg_put_reply failed
...
Jakub Kicinski [Thu, 23 Nov 2023 03:05:58 +0000 (19:05 -0800)]
tools: ynl: fix duplicate op name in devlink
We don't support CRUD-inspired message types in YNL too well.
One aspect that currently trips us up is the fact that single
message ID can be used in multiple commands (as the response).
This leads to duplicate entries in the id-to-string tables:
devlink-user.c:19:34: warning: initialized field overwritten [-Woverride-init]
19 | [DEVLINK_CMD_PORT_NEW] = "port-new",
| ^~~~~~~~~~
devlink-user.c:19:34: note: (near initialization for ‘devlink_op_strmap[7]’)
Fixes tag points at where the code was generated, the "real" problem
is that the code generator does not support CRUD.
Jann Horn [Wed, 22 Nov 2023 21:44:47 +0000 (22:44 +0100)]
tls: fix NULL deref on tls_sw_splice_eof() with empty record
syzkaller discovered that if tls_sw_splice_eof() is executed as part of
sendfile() when the plaintext/ciphertext sk_msg are empty, the send path
gets confused because the empty ciphertext buffer does not have enough
space for the encryption overhead. This causes tls_push_record() to go on
the `split = true` path (which is only supposed to be used when interacting
with an attached BPF program), and then get further confused and hit the
tls_merge_open_record() path, which then assumes that there must be at
least one populated buffer element, leading to a NULL deref.
It is possible to have empty plaintext/ciphertext buffers if we previously
bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path.
tls_sw_push_pending_record() already handles this case correctly; let's do
the same check in tls_sw_splice_eof().
Fixes: df720d288dbb ("tls/sw: Use splice_eof() to flush") Cc: stable@vger.kernel.org Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arseniy Krasnov [Tue, 21 Nov 2023 21:16:42 +0000 (00:16 +0300)]
vsock/test: fix SEQPACKET message bounds test
Tune message length calculation to make this test work on machines
where 'getpagesize()' returns >32KB. Now maximum message length is not
hardcoded (on machines above it was smaller than 'getpagesize()' return
value, thus we get negative value and test fails), but calculated at
runtime and always bigger than 'getpagesize()' result. Reproduced on
aarch64 with 64KB page size.
Ivan Vecera [Tue, 21 Nov 2023 21:13:36 +0000 (13:13 -0800)]
i40e: Fix adding unsupported cloud filters
If a VF tries to add unsupported cloud filter through virtchnl
then i40e_add_del_cloud_filter(_big_buf) returns -ENOTSUPP but
this error code is stored in 'ret' instead of 'aq_ret' that
is used as error code sent back to VF. In this scenario where
one of the mentioned functions fails the value of 'aq_ret'
is zero so the VF will incorrectly receive a 'success'.
Use 'aq_ret' to store return value and remove 'ret' local
variable. Additionally fix the issue when filter allocation
fails, in this case no notification is sent back to the VF.
Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231121211338.3348677-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ice: restore timestamp config after reset
Jake Keller says:
We recently discovered during internal validation that the ice driver has
not been properly restoring Tx timestamp configuration after a device reset,
which resulted in application failures after a device reset.
After some digging, it turned out this problem is two-fold. Since the
introduction of the PTP support the driver has been clobbering the storage
of the current timestamp configuration during reset. Thus after a reset, the
driver will no longer perform Tx or Rx timestamps, and will report
timestamp configuration as disabled if SIOCGHWTSTAMP ioctl is issued.
In addition, the recently merged auxiliary bus support code missed that
PFINT_TSYN_MSK must be reprogrammed on the clock owner for E822 devices.
Failure to restore this register configuration results in the driver no
longer responding to interrupts from other ports. Depending on the traffic
pattern, this can either result in increased latency responding to
timestamps on the non-owner ports, or it can result in the driver never
reporting any timestamps. The configuration of PFINT_TSYN_MSK was only done
during initialization. Due to this, the Tx timestamp issue persists even if
userspace reconfigures timestamping.
This series fixes both issues, as well as removes a redundant Tx ring field
since we can rely on the skb flag as the primary detector for a Tx timestamp
request.
Note that I don't think this series will directly apply to older stable
releases (even v6.6) as we recently refactored a lot of the PTP code to
support auxiliary bus. Patch 2/3 only matters for the post-auxiliary bus
implementation. The principle of patch 1/3 and 3/3 could apply as far back
as the initial PTP support, but I don't think it will apply cleanly as-is.
====================
Jacob Keller [Tue, 21 Nov 2023 21:12:57 +0000 (13:12 -0800)]
ice: restore timestamp configuration after device reset
The driver calls ice_ptp_cfg_timestamp() during ice_ptp_prepare_for_reset()
to disable timestamping while the device is resetting. This operation
destroys the user requested configuration. While the driver does call
ice_ptp_cfg_timestamp in ice_rebuild() to restore some hardware settings
after a reset, it unconditionally passes true or false, resulting in
failure to restore previous user space configuration.
This results in a device reset forcibly disabling timestamp configuration
regardless of current user settings.
This was not detected previously due to a quirk of the LinuxPTP ptp4l
application. If ptp4l detects a missing timestamp, it enters a fault state
and performs recovery logic which includes executing SIOCSHWTSTAMP again,
restoring the now accidentally cleared configuration.
Not every application does this, and for these applications, timestamps
will mysteriously stop after a PF reset, without being restored until an
application restart.
Fix this by replacing ice_ptp_cfg_timestamp() with two new functions:
1) ice_ptp_disable_timestamp_mode() which unconditionally disables the
timestamping logic in ice_ptp_prepare_for_reset() and ice_ptp_release()
2) ice_ptp_restore_timestamp_mode() which calls
ice_ptp_restore_tx_interrupt() to restore Tx timestamping configuration,
calls ice_set_rx_tstamp() to restore Rx timestamping configuration, and
issues an immediate TSYN_TX interrupt to ensure that timestamps which
may have occurred during the device reset get processed.
Modify the ice_ptp_set_timestamp_mode to directly save the user
configuration and then call ice_ptp_restore_timestamp_mode. This way, reset
no longer destroys the saved user configuration.
This obsoletes the ice_set_tx_tstamp() function which can now be safely
removed.
With this change, all devices should now restore Tx and Rx timestamping
functionality correctly after a PF reset without application intervention.
Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jacob Keller [Tue, 21 Nov 2023 21:12:56 +0000 (13:12 -0800)]
ice: unify logic for programming PFINT_TSYN_MSK
Commit d938a8cca88a ("ice: Auxbus devices & driver for E822 TS") modified
how Tx timestamps are handled for E822 devices. On these devices, only the
clock owner handles reading the Tx timestamp data from firmware. To do
this, the PFINT_TSYN_MSK register is modified from the default value to one
which enables reacting to a Tx timestamp on all PHY ports.
The driver currently programs PFINT_TSYN_MSK in different places depending
on whether the port is the clock owner or not. For the clock owner, the
PFINT_TSYN_MSK value is programmed during ice_ptp_init_owner just before
calling ice_ptp_tx_ena_intr to program the PHY ports.
For the non-clock owner ports, the PFINT_TSYN_MSK is programmed during
ice_ptp_init_port.
If a large enough device reset occurs, the PFINT_TSYN_MSK register will be
reset to the default value in which only the PHY associated directly with
the PF will cause the Tx timestamp interrupt to trigger.
The driver lacks logic to reprogram the PFINT_TSYN_MSK register after a
device reset. For the E822 device, this results in the PF no longer
responding to interrupts for other ports. This results in failure to
deliver Tx timestamps to user space applications.
Rename ice_ptp_configure_tx_tstamp to ice_ptp_cfg_tx_interrupt, and unify
the logic for programming PFINT_TSYN_MSK and PFINT_OICR_ENA into one place.
This function will program both registers according to the combination of
user configuration and device requirements.
This ensures that PFINT_TSYN_MSK is always restored when we configure the
Tx timestamp interrupt.
Fixes: d938a8cca88a ("ice: Auxbus devices & driver for E822 TS") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jacob Keller [Tue, 21 Nov 2023 21:12:55 +0000 (13:12 -0800)]
ice: remove ptp_tx ring parameter flag
Before performing a Tx timestamp in ice_stamp(), the driver checks a ptp_tx
ring variable to see if timestamping is enabled on that ring. This value is
set for all rings whenever userspace configures Tx timestamping.
Ostensibly this was done to avoid wasting cycles checking other fields when
timestamping has not been enabled. However, for Tx timestamps we already
get an individual per-SKB flag indicating whether userspace wants to
request a timestamp on that packet. We do not gain much by also having
a separate flag to check for whether timestamping was enabled.
In fact, the driver currently fails to restore the field after a PF reset.
Because of this, if a PF reset occurs, timestamps will be disabled.
Since this flag doesn't add value in the hotpath, remove it and always
provide a timestamp if the SKB flag has been set.
A following change will fix the reset path to properly restore user
timestamping configuration completely.
This went unnoticed for some time because one of the most common
applications using Tx timestamps, ptp4l, will reconfigure the socket as
part of its fault recovery logic.
Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:35 +0000 (00:44 +0530)]
amd-xgbe: propagate the correct speed and duplex status
xgbe_get_link_ksettings() does not propagate correct speed and duplex
information to ethtool during cable unplug. Due to which ethtool reports
incorrect values for speed and duplex.
Address this by propagating correct information.
Fixes: 7c12aa08779c ("amd-xgbe: Move the PHY support into amd-xgbe") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:34 +0000 (00:44 +0530)]
amd-xgbe: handle the corner-case during tx completion
The existing implementation uses software logic to accumulate tx
completions until the specified time (1ms) is met and then poll them.
However, there exists a tiny gap which leads to a race between
resetting and checking the tx_activate flag. Due to this the tx
completions are not reported to upper layer and tx queue timeout
kicks-in restarting the device.
To address this, introduce a tx cleanup mechanism as part of the
periodic maintenance process.
Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:33 +0000 (00:44 +0530)]
amd-xgbe: handle corner-case during sfp hotplug
Force the mode change for SFI in Fixed PHY configurations. Fixed PHY
configurations needs PLL to be enabled while doing mode set. When the
SFP module isn't connected during boot, driver assumes AN is ON and
attempts auto-negotiation. However, if the connected SFP comes up in
Fixed PHY configuration the link will not come up as PLL isn't enabled
while the initial mode set command is issued. So, force the mode change
for SFI in Fixed PHY configuration to fix link issues.
Fixes: e57f7a3feaef ("amd-xgbe: Prepare for working with more than one type of phy") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Tue, 21 Nov 2023 19:08:44 +0000 (20:08 +0100)]
net: veth: fix ethtool stats reporting
Fix a possible misalignment between page_pool stats and tx xdp_stats
reported in veth_get_ethtool_stats routine.
The issue can be reproduced configuring the veth pair with the
following tx/rx queues:
$ip link add v0 numtxqueues 2 numrxqueues 4 type veth peer name v1 \
numtxqueues 1 numrxqueues 1
and loading a simple XDP program on v0 that just returns XDP_PASS.
In this case on v0 the page_pool stats overwrites tx xdp_stats for queue 1.
Fix the issue incrementing pp_idx of dev->real_num_tx_queues * VETH_TQ_STATS_LEN
since we always report xdp_stats for all tx queues in ethtool.
Suman Ghosh [Tue, 21 Nov 2023 16:56:24 +0000 (22:26 +0530)]
octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
It is possible to add a ntuple rule which would like to direct packet to
a VF whose number of queues are greater/less than its PF's queue numbers.
For example a PF can have 2 Rx queues but a VF created on that PF can have
8 Rx queues. As of today, ntuple rule will reject rule because it is
checking the requested queue number against PF's number of Rx queues.
As a part of this fix if the action of a ntuple rule is to move a packet
to a VF's queue then the check is removed. Also, a debug information is
printed to aware user that it is user's responsibility to cross check if
the requested queue number on that VF is a valid one.
Fixes: f0a1913f8a6f ("octeontx2-pf: Add support for ethtool ntuple filters") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231121165624.3664182-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lech Perczak [Fri, 17 Nov 2023 23:19:18 +0000 (00:19 +0100)]
net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Interface 4 is used by for QMI interface in stock firmware of MF28D, the
router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up
from option driver.
The proper configuration is:
Interface mapping is:
0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI
Linus Torvalds [Wed, 22 Nov 2023 18:20:17 +0000 (10:20 -0800)]
Merge tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix several build errors, a potential kernel panic, a cpu hotplug
issue and update links in documentations"
* tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst
Docs/LoongArch: Update links in LoongArch introduction.rst
LoongArch: Implement constant timer shutdown interface
LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL
LoongArch: Silence the boot warning about 'nokaslr'
LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
LoongArch: Record pc instead of offset in la_abs relocation
LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi
Linus Torvalds [Wed, 22 Nov 2023 17:56:26 +0000 (09:56 -0800)]
Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- One fix for the KVP daemon (Ani Sinha)
- Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)
- Micro-optimization for hv_nmi_unknown() (Uros Bizjak)
* tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
We really don't want to do atomic_read() or anything like that, since we
already have the value, not the lock. The whole point of this is that
we've loaded the lock from memory, and we want to check whether the
value we loaded was a locked one or not.
The main use of this is the lockref code, which loads both the lock and
the reference count in one atomic operation, and then works on that
combined value. With the atomic_read(), the compiler would pointlessly
spill the value to the stack, in order to then be able to read it back
"atomically".
This is the qspinlock version of commit c6f4a9002252 ("asm-generic:
ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same
bug for ticket locks.
I couldn't reproduce the reported issue. What I did, based on a pcap
packet log provided by the reporter:
- Used same chip version (RTL8168h)
- Set MAC address to the one used on the reporters system
- Replayed the EAPOL unicast packet that, according to the reporter,
was filtered out by the mc filter.
The packet was properly received.
Therefore the root cause of the reported issue seems to be somewhere
else. Disabling mc filtering completely for the most common chip
version is a quite big hammer. Therefore revert the change and wait
for further analysis results from the reporter.
Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:
As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.
This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).
This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.
Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nfc: virtual_ncidev: Add variable to check if ndev is running
syzbot reported an memory leak that happens when an skb is add to
send_buff after virtual nci closed.
This patch adds a variable to track if the ndev is running before
handling new skb in send function.
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com Reviewed-by: Bongsu Jeon Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Uros Bizjak [Tue, 14 Nov 2023 16:59:28 +0000 (17:59 +0100)]
x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:
Jakub Kicinski [Tue, 21 Nov 2023 23:49:30 +0000 (15:49 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-11-21
We've added 19 non-merge commits during the last 4 day(s) which contain
a total of 18 files changed, 1043 insertions(+), 416 deletions(-).
The main changes are:
1) Fix BPF verifier to validate callbacks as if they are called an unknown
number of times in order to fix not detecting some unsafe programs,
from Eduard Zingerman.
2) Fix bpf_redirect_peer() handling which missed proper stats accounting
for veth and netkit and also generally fix missing stats for the latter,
from Peilin Ye, Daniel Borkmann et al.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: check if max number of bpf_loop iterations is tracked
bpf: keep track of max number of bpf_loop callback iterations
selftests/bpf: test widening for iterating callbacks
bpf: widening for callback iterators
selftests/bpf: tests for iterating callbacks
bpf: verify callbacks as if they are called unknown number of times
bpf: extract setup_func_entry() utility function
bpf: extract __check_reg_arg() utility function
selftests/bpf: fix bpf_loop_bench for new callback verification scheme
selftests/bpf: track string payload offset as scalar in strobemeta
selftests/bpf: track tcp payload offset as scalar in xdp_synproxy
selftests/bpf: Add netkit to tc_redirect selftest
selftests/bpf: De-veth-ize the tc_redirect test case
bpf, netkit: Add indirect call wrapper for fetching peer dev
bpf: Fix dev's rx stats for bpf_redirect_peer traffic
veth: Use tstats per-CPU traffic counters
netkit: Add tstats per-CPU traffic counters
net: Move {l,t,d}stats allocation to core and convert veth & vrf
net, vrf: Move dstats structure to core
====================
Jakub Kicinski [Mon, 20 Nov 2023 20:01:09 +0000 (12:01 -0800)]
docs: netdev: try to guide people on dealing with silence
There has been more than a few threads which went idle before
the merge window and now people came back to them and started
asking about next steps.
We currently tell people to be patient and not to repost too
often. Our "not too often", however, is still a few orders of
magnitude faster than other subsystems. Or so I feel after
hearing people talk about review rates at LPC.
Clarify in the doc that if the discussion went idle for a week
on netdev, 95% of the time there's no point waiting longer.
Jose Ignacio Tornos Martinez [Mon, 20 Nov 2023 12:06:29 +0000 (13:06 +0100)]
net: usb: ax88179_178a: fix failed operations during ax88179_reset
Using generic ASIX Electronics Corp. AX88179 Gigabit Ethernet device,
the following test cycle has been implemented:
- power on
- check logs
- shutdown
- after detecting the system shutdown, disconnect power
- after approximately 60 seconds of sleep, power is restored
Running some cycles, sometimes error logs like this appear:
kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to write reg index 0x0001: -19
kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to read reg index 0x0001: -19
...
These failed operation are happening during ax88179_reset execution, so
the initialization could not be correct.
In order to avoid this, we need to increase the delay after reset and
clock initial operations. By using these larger values, many cycles
have been run and no failed operations appear.
It would be better to check some status register to verify when the
operation has finished, but I do not have found any available information
(neither in the public datasheets nor in the manufacturer's driver). The
only available information for the necessary delays is the maufacturer's
driver (original values) but the proposed values are not enough for the
tested devices.
Fixes: e2ca90c276e1f ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Reported-by: Herb Wei <weihao.bj@ieisystem.com> Tested-by: Herb Wei <weihao.bj@ieisystem.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Link: https://lore.kernel.org/r/20231120120642.54334-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 21 Nov 2023 19:56:57 +0000 (11:56 -0800)]
Merge tag 'platform-drivers-x86-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
"Just a few fixes (one with two non-fix deps) plus tidying up
MAINTAINERS"
* tag 'platform-drivers-x86-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: intel_telemetry: Fix kernel doc descriptions
MAINTAINERS: Drop Mark Gross as maintainer for x86 platform drivers
platform/x86/amd/pmc: adjust getting DRAM size behavior
platform/x86: hp-bioscfg: Remove unused obj in hp_add_other_attributes()
platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()
platform/x86: hp-bioscfg: move mutex_lock() down in hp_add_other_attributes()
platform/x86: hp-bioscfg: Simplify return check in hp_add_other_attributes()
platform/x86: ideapad-laptop: Set max_brightness before using it
MAINTAINERS: Remove stale entry for SBL platform driver
Simon Horman [Mon, 20 Nov 2023 08:28:40 +0000 (08:28 +0000)]
MAINTAINERS: Add indirect_call_wrapper.h to NETWORKING [GENERAL]
indirect_call_wrapper.h is not, strictly speaking, networking specific.
However, it's git history indicates that in practice changes go through
netdev and thus the netdev maintainers have effectively been taking
responsibility for it.
Formalise this by adding it to the NETWORKING [GENERAL] section in the
MAINTAINERS file.
It is not clear how many other files under include/linux fall into this
category and it would be interesting, as a follow-up, to audit that and
propose further updates to the MAINTAINERS file as appropriate.
Long Li [Sun, 19 Nov 2023 16:23:43 +0000 (08:23 -0800)]
hv_netvsc: Mark VF as slave before exposing it to user-mode
When a VF is being exposed form the kernel, it should be marked as "slave"
before exposing to the user-mode. The VF is not usable without netvsc
running as master. The user-mode should never see a VF without the "slave"
flag.
This commit moves the code of setting the slave flag to the time before
VF is exposed to user-mode.
Cc: stable@vger.kernel.org Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Haiyang Zhang [Sun, 19 Nov 2023 16:23:42 +0000 (08:23 -0800)]
hv_netvsc: Fix race of register_netdevice_notifier and VF register
If VF NIC is registered earlier, NETDEV_REGISTER event is replayed,
but NETDEV_POST_INIT is not.
Move register_netdevice_notifier() earlier, so the call back
function is set before probing.
Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Haiyang Zhang [Sun, 19 Nov 2023 16:23:41 +0000 (08:23 -0800)]
hv_netvsc: fix race of netvsc and VF register_netdevice
The rtnl lock also needs to be held before rndis_filter_device_add()
which advertises nvsp_2_vsc_capability / sriov bit, and triggers
VF NIC offering and registering. If VF NIC finished register_netdev()
earlier it may cause name based config failure.
To fix this issue, move the call to rtnl_lock() before
rndis_filter_device_add(), so VF will be registered later than netvsc
/ synthetic NIC, and gets a name numbered (ethX) after netvsc.
Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
core.c:116: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_update_events'
core.c:188: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_get_eventconfig'
It looks like it were copy'n'paste typos when these descriptions
had been introduced. Fix the typos.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310070743.WALmRGSY-lkp@intel.com/ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231120150756.1661425-1-andriy.shevchenko@linux.intel.com Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Hans de Goede [Mon, 20 Nov 2023 15:45:45 +0000 (16:45 +0100)]
MAINTAINERS: Drop Mark Gross as maintainer for x86 platform drivers
Mark has not really been active as maintainer for x86 platform drivers
lately, drop Mark from the MAINTAINERS entries for drivers/platform/x86,
drivers/platform/mellanox and drivers/platform/surface.
Cc: Mark Gross <markgross@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20231120154548.611041-1-hdegoede@redhat.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
When a cpu is hot-unplugged, it is put in idle state and the function
arch_cpu_idle_dead() is called. The timer interrupt for this processor
should be disabled, otherwise there will be pending timer interrupt for
the unplugged cpu, so that vcpu is prevented from giving up scheduling
when system is running in vm mode.
This patch implements the timer shutdown interface so that the constant
timer will be properly disabled when a CPU is hot-unplugged.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Tue, 21 Nov 2023 07:03:25 +0000 (15:03 +0800)]
LoongArch: Silence the boot warning about 'nokaslr'
The kernel parameter 'nokaslr' is handled before start_kernel(), so we
don't need early_param() to mark it technically. But it can cause a boot
warning as follows:
Unknown kernel command line parameters "nokaslr", will be passed to user space.
When we use 'init=/bin/bash', 'nokaslr' which passed to user space will
even cause a kernel panic. So we use early_param() to mark 'nokaslr',
simply print a notice and silence the boot warning (also fix a potential
panic). This logic is similar to RISC-V.
Huacai Chen [Tue, 21 Nov 2023 07:03:25 +0000 (15:03 +0800)]
LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
When build kernel with C=1, we get:
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46: expected void *ptr
arch/loongarch/kernel/process.c:234:46: got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46: expected void *ptr
arch/loongarch/kernel/process.c:234:46: got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46: expected void *ptr
arch/loongarch/kernel/process.c:234:46: got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46: expected void *ptr
arch/loongarch/kernel/process.c:234:46: got unsigned long [noderef] __percpu *
Add __percpu annotation for __percpu_read()/__percpu_write() can avoid
such warnings. __percpu_xchg() and other functions don't need annotation
because their wrapper, i.e. _pcp_protect(), already suppresses warnings.
WANG Rui [Tue, 21 Nov 2023 07:03:25 +0000 (15:03 +0800)]
LoongArch: Record pc instead of offset in la_abs relocation
To clarify, the previous version functioned flawlessly. However, it's
worth noting that the LLVM's LoongArch backend currently lacks support
for cross-section label calculations. With this patch, we enable the use
of clang to compile relocatable kernels.
WANG Rui [Tue, 21 Nov 2023 07:03:25 +0000 (15:03 +0800)]
LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
After this llvm commit [1], The -fno-pic does not imply direct access
external data. Explicitly set -fdirect-access-external-data for vmlinux
that can avoids GOT entries.