Saeed Mahameed [Tue, 14 Nov 2023 21:58:44 +0000 (13:58 -0800)]
net/mlx5e: Reduce the size of icosq_str
icosq_str size is unnecessarily too long, and it causes a build warning
-Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B,
hence this patch reduces the size to 32B which should be more than enough
to host the string: "ICOSQ: 0x%x, ".
While here, add a missing space in the formatted string.
This fixes the following build warning:
$ KCFLAGS='-Wall -Werror'
$ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout':
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56:
error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=]
718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
| ^~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288
717 | snprintf(err_str, sizeof(err_str),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
719 | rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rahul Rameshbabu [Tue, 14 Nov 2023 21:58:43 +0000 (13:58 -0800)]
net/mlx5: Increase size of irq name buffer
Without increased buffer size, will trigger -Wformat-truncation with W=1
for the snprintf operation writing to the buffer.
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc':
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=]
296 | "%s@pci:%s", name, pci_name(dev->pdev));
| ^~~~~
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32
295 | snprintf(irq->name, MLX5_MAX_IRQ_NAME,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
296 | "%s@pci:%s", name, pci_name(dev->pdev));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rahul Rameshbabu [Tue, 14 Nov 2023 21:58:42 +0000 (13:58 -0800)]
net/mlx5e: Update doorbell for port timestamping CQ before the software counter
Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the
incremented consumer index after the relevant software counters in the
kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to
either overrunning the device CQ or exceeding the expected software buffer
size in the device CQ if the device CQ size was greater than the software
buffer size. Update the relevant software counter only after updating the
device CQ consumer index in the port timestamping napi_poll context.
Rahul Rameshbabu [Tue, 14 Nov 2023 21:58:41 +0000 (13:58 -0800)]
net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
Ensure the skb is available in metadata mapping to skbs before tracking the
metadata index for detecting undelivered CQEs. If the metadata index is put
in the tracking list before putting the skb in the map, the metadata index
might be used for detecting undelivered CQEs before the relevant skb is
available in the map, which can lead to a null-ptr-deref.
Rahul Rameshbabu [Tue, 14 Nov 2023 21:58:40 +0000 (13:58 -0800)]
net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
When SQ is a port timestamping SQ for PTP, do not access tx flags of skb
after free-ing the skb. Free the skb only after all references that depend
on it have been handled in the dropped WQE path.
Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianbo Liu [Tue, 14 Nov 2023 21:58:39 +0000 (13:58 -0800)]
net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
As IPSec packet offload in switchdev mode is not supported with LAG,
it's unnecessary to modify those sent-to-vport rules to the peer eswitch.
Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vlad Buslov [Tue, 14 Nov 2023 21:58:38 +0000 (13:58 -0800)]
net/mlx5e: Fix pedit endianness
Referenced commit addressed endianness issue in mlx5 pedit implementation
in ad hoc manner instead of systematically treating integer values
according to their types which left pedit fields of sizes not equal to 4
and where the bytes being modified are not least significant ones broken on
big endian machines since wrong bits will be consumed during parsing which
leads to following example error when applying pedit to source and
destination MAC addresses:
Treat masks and values of pedit and filter match as network byte order,
refactor pointers to them to void pointers instead of confusing u32
pointers and only cast to pointer-to-integer when reading a value from
them. Treat pedit mlx5_fields->field_mask as host byte order according to
its type u32, change the constants in fields array accordingly.
Fixes: 82198d8bcdef ("net/mlx5e: Fix endianness when calculating pedit mask first bit") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gavin Li [Tue, 14 Nov 2023 21:58:37 +0000 (13:58 -0800)]
net/mlx5e: fix double free of encap_header in update funcs
Follow up to the previous patch to fix the same issue for
mlx5e_tc_tun_update_header_ipv4{6} when mlx5_packet_reformat_alloc()
fails.
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
mlx5e_tc_tun_update_header_ipv4{6} will be released within it. However,
e->encap_header is already set to the previously freed encap_header
before mlx5_packet_reformat_alloc(). As a result, the later
mlx5e_encap_put() will free e->encap_header again, causing a double free
issue.
Dust Li [Tue, 14 Nov 2023 21:58:36 +0000 (13:58 -0800)]
net/mlx5e: fix double free of encap_header
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However,
e->encap_header is already set to the previously freed encap_header
before mlx5_packet_reformat_alloc(). As a result, the later
mlx5e_encap_put() will free e->encap_header again, causing a double free
issue.
Rahul Rameshbabu [Tue, 14 Nov 2023 21:58:35 +0000 (13:58 -0800)]
net/mlx5: Decouple PHC .adjtime and .adjphase implementations
When running a phase adjustment operation, the free running clock should
not be modified at all. The phase control keyword is intended to trigger an
internal servo on the device that will converge to the provided delta. A
free running counter cannot implement phase adjustment.
Fixes: 8e11a68e2e8a ("net/mlx5: Add adjphase function to support hardware-only offset control") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Erez Shitrit [Tue, 14 Nov 2023 21:58:34 +0000 (13:58 -0800)]
net/mlx5: DR, Allow old devices to use multi destination FTE
The current check isn't aware of old devices that don't have the
relevant FW capability. This patch allows multi destination FTE
in old cards, as it was before this check.
Fixes: f6f46e7173cb ("net/mlx5: DR, Add check for multi destination FTE") Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maher Sanalla [Tue, 14 Nov 2023 21:58:33 +0000 (13:58 -0800)]
net/mlx5: Free used cpus mask when an IRQ is released
Each EQ table maintains a cpumask of the already used CPUs that are mapped
to IRQs to ensure that each IRQ gets mapped to a unique CPU.
However, on IRQ release, the said cpumask is not updated by clearing the
CPU from the mask to allow future IRQ request, causing the following
error when a SF is reloaded after it has utilized all CPUs for its IRQs:
mlx5_irq_affinity_request:135:(pid 306010): Didn't find a matching IRQ.
err = -28
Thus, when releasing an IRQ, clear its mapped CPU from the used CPUs
mask, to prevent the case described above.
While at it, move the used cpumask update to the EQ layer as it is more
fitting and preserves symmetricity of the IRQ request/release API.
Itamar Gozlan [Tue, 14 Nov 2023 21:58:32 +0000 (13:58 -0800)]
Revert "net/mlx5: DR, Supporting inline WQE when possible"
This reverts commit 95c337cce0e11d06a715da73e6796ade9216637f.
The revert is required due to the suspicion it cause some tests
fail and will be moved to further investigation.
We've added 7 non-merge commits during the last 6 day(s) which contain
a total of 9 files changed, 200 insertions(+), 49 deletions(-).
The main changes are:
1) Do not allocate bpf specific percpu memory unconditionally, from Yonghong.
2) Fix precision backtracking instruction iteration, from Andrii.
3) Fix control flow graph checking, from Andrii.
4) Fix xskxceiver selftest build, from Anders.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Do not allocate percpu memory at init stage
selftests/bpf: add more test cases for check_cfg()
bpf: fix control-flow graph checking in privileged mode
selftests/bpf: add edge case backtracking logic test
bpf: fix precision backtracking instruction iteration
bpf: handle ldimm64 properly in check_cfg()
selftests: bpf: xskxceiver: ksft_print_msg: fix format type error
====================
Yonghong Song [Sat, 11 Nov 2023 01:39:28 +0000 (17:39 -0800)]
bpf: Do not allocate percpu memory at init stage
Kirill Shutemov reported significant percpu memory consumption increase after
booting in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
non-fix-size percpu mem allocation"). The percpu memory consumption is
increased from 111MB to 969MB. The number is from /proc/meminfo.
I tried to reproduce the issue with my local VM which at most supports upto
255 cpus. With 252 cpus, without the above commit, the percpu memory
consumption immediately after boot is 57MB while with the above commit the
percpu memory consumption is 231MB.
This is not good since so far percpu memory from bpf memory allocator is not
widely used yet. Let us change pre-allocation in init stage to on-demand
allocation when verifier detects there is a need of percpu memory for bpf
program. With this change, percpu memory consumption after boot can be reduced
signicantly.
Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation") Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231111013928.948838-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Gal Pressman [Tue, 14 Nov 2023 07:56:18 +0000 (09:56 +0200)]
net: Fix undefined behavior in netdev name allocation
Cited commit removed the strscpy() call and kept the snprintf() only.
It is common to use 'dev->name' as the format string before a netdev is
registered, this results in 'res' and 'name' pointers being equal.
According to POSIX, if copying takes place between objects that overlap
as a result of a call to sprintf() or snprintf(), the results are
undefined.
Add back the strscpy() and use 'buf' as an intermediate buffer.
Fixes: 7ad17b04dc7b ("net: trust the bitmap in __dev_alloc_name()") Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When moving the *-internal-delay-ps properties to only apply for RGMII
interface modes there where a typo in the text formatting.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The intent for the commit was to be able to detect carrier loss/gain
for just the NIC connected to the BMC. The unwanted effect is a
carrier loss for auxiliary paths also causes the BMC to lose
carrier. The BMC never regains carrier despite the secondary NIC
regaining a link.
This change, when merged, needs to be backported to stable kernels.
5.4-stable, 5.10-stable, 5.15-stable, 6.1-stable, 6.5-stable
Fixes: 3780bb29311e ("ncsi: Propagate carrier gain/loss events to the NCSI controller") CC: stable@vger.kernel.org Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 15 Nov 2023 04:10:45 +0000 (20:10 -0800)]
Merge branch 'mptcp-misc-fixes-for-v6-7'
Matthieu Baerts says:
====================
mptcp: misc. fixes for v6.7
Here are a few fixes related to MPTCP:
- Patch 1 limits GSO max size to ~64K when MPTCP is being used due to a
spec limit. 'gso_max_size' can exceed the max value supported by MPTCP
since v5.19.
- Patch 2 fixes a possible NULL pointer dereference on close that can
happen since v6.7-rc1.
- Patch 3 avoids sending a RM_ADDR when the corresponding address is no
longer tracked locally. A regression for a fix backported to v5.19.
- Patch 4 adds a missing lock when changing the IP TOS with setsockopt().
A fix for v5.17.
- Patch 5 fixes an expectation when running MPTCP Join selftest with the
checksum option (-C). An issue present since v6.1.
====================
The problem is really in the wrong expectations for the RST checks
implied by the csum validation. Note that the same check is repeated
explicitly in the same test-case, with the correct expectation and
pass successfully.
Address the issue explicitly setting the correct expectation for
the failing checks.
Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-5-7b9cd6a7b7f4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 13 Nov 2023 23:16:16 +0000 (00:16 +0100)]
mptcp: fix setsockopt(IP_TOS) subflow locking
The MPTCP implementation of the IP_TOS socket option uses the lockless
variant of the TOS manipulation helper and does not hold such lock at
the helper invocation time.
Add the required locking.
Fixes: ffcacff87cd6 ("mptcp: Support for IP_TOS for MPTCP setsockopt()") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/457 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-4-7b9cd6a7b7f4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Mon, 13 Nov 2023 23:16:15 +0000 (00:16 +0100)]
mptcp: add validity check for sending RM_ADDR
This patch adds the validity check for sending RM_ADDRs for userspace PM
in mptcp_pm_remove_addrs(), only send a RM_ADDR when the address is in the
anno_list or conn_list.
Paolo Abeni [Mon, 13 Nov 2023 23:16:14 +0000 (00:16 +0100)]
mptcp: fix possible NULL pointer dereference on close
After the blamed commit below, the MPTCP release callback can
dereference the first subflow pointer via __mptcp_set_connected()
and send buffer auto-tuning. Such pointer is always expected to be
valid, except at socket destruction time, when the first subflow is
deleted and the pointer zeroed.
If the connect event is handled by the release callback while the
msk socket is finally released, MPTCP hits the following splat:
Paolo Abeni [Mon, 13 Nov 2023 23:16:13 +0000 (00:16 +0100)]
mptcp: deal with large GSO size
After the blamed commit below, the TCP sockets (and the MPTCP subflows)
can build egress packets larger than 64K. That exceeds the maximum DSS
data size, the length being misrepresent on the wire and the stream being
corrupted, as later observed on the receiver:
Ziwei Xiao [Tue, 14 Nov 2023 00:41:44 +0000 (16:41 -0800)]
gve: Fixes for napi_poll when budget is 0
Netpoll will explicilty pass the polling call with a budget of 0 to
indicate it's clearing the Tx path only. For the gve_rx_poll and
gve_xdp_poll, they were mistakenly taking the 0 budget as the indication
to do all the work. Add check to avoid the rx path and xdp path being
called when budget is 0. And also avoid napi_complete_done being called
when budget is 0 for netpoll.
Jakub Kicinski [Wed, 15 Nov 2023 03:56:30 +0000 (19:56 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-13 (ice)
This series contains updates to ice driver only.
Arkadiusz ensures the device is initialized with valid lock status
value. He also removes range checking of dpll priority to allow firmware
to process the request; supported values are firmware dependent.
Finally, he removes setting of can change capability for pins that
cannot be changed.
Dan restores ability to load a package which doesn't contain a signature
segment.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix DDP package download for packages without signature segment
ice: dpll: fix output pin capabilities
ice: dpll: fix check for dpll input priority range
ice: dpll: fix initial lock status of dpll
====================
====================
pds_core: fix irq index bug and compiler warnings
The first patch fixes a bug in our interrupt masking where we used the
wrong index. The second patch addresses a couple of kernel test robot
string truncation warnings.
====================
Shannon Nelson [Mon, 13 Nov 2023 18:32:57 +0000 (10:32 -0800)]
pds_core: fix up some format-truncation complaints
Our friendly kernel test robot pointed out a couple of potential
string truncation issues. None of which were we worried about,
but can be relatively easily fixed to quiet the complaints.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310211736.66syyDpp-lkp@intel.com/ Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231113183257.71110-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Mon, 13 Nov 2023 18:32:56 +0000 (10:32 -0800)]
pds_core: use correct index to mask irq
Use the qcq's interrupt index, not the irq number, to mask
the interrupt. Since the irq number can be out of range from
the number of possible interrupts, we can end up accessing
and potentially scribbling on out-of-range and/or unmapped
memory, making the kernel angry.
Baruch Siach [Mon, 13 Nov 2023 17:42:49 +0000 (19:42 +0200)]
net: stmmac: fix rx budget limit check
The while loop condition verifies 'count < limit'. Neither value change
before the 'count >= limit' check. As is this check is dead code. But
code inspection reveals a code path that modifies 'count' and then goto
'drain_data' and back to 'read_again'. So there is a need to verify
count value sanity after 'read_again'.
Move 'read_again' up to fix the count limit check.
Eric Dumazet [Mon, 13 Nov 2023 13:49:38 +0000 (13:49 +0000)]
af_unix: fix use-after-free in unix_stream_read_actor()
syzbot reported the following crash [1]
After releasing unix socket lock, u->oob_skb can be changed
by another thread. We must temporarily increase skb refcount
to make sure this other thread will not free the skb under us.
[1]
BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa7/0xc0 net/unix/af_unix.c:2866
Read of size 4 at addr ffff88801f3b9cc4 by task syz-executor107/5297
The buggy address belongs to the object at ffff88801f3b9c80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 68 bytes inside of
freed 240-byte region [ffff88801f3b9c80, ffff88801f3b9d70)
Memory state around the buggy address: ffff88801f3b9b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801f3b9c00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88801f3b9c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88801f3b9d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff88801f3b9d80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: 876c14ad014d ("af_unix: fix holding spinlock in oob handling") Reported-and-tested-by: syzbot+7a2d546fa43e49315ed3@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rao Shoaib <rao.shoaib@oracle.com> Reviewed-by: Rao shoaib <rao.shoaib@oracle.com> Link: https://lore.kernel.org/r/20231113134938.168151-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ChunHao Lin [Thu, 9 Nov 2023 17:34:00 +0000 (01:34 +0800)]
r8169: fix network lost after resume on DASH systems
Device that support DASH may be reseted or powered off during suspend.
So driver needs to handle DASH during system suspend and resume. Or
DASH firmware will influence device behavior and causes network lost.
Fixes: b646d90053f8 ("r8169: magic.") Cc: stable@vger.kernel.org Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: ChunHao Lin <hau@realtek.com> Link: https://lore.kernel.org/r/20231109173400.4573-3-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ChunHao Lin [Thu, 9 Nov 2023 17:33:59 +0000 (01:33 +0800)]
r8169: add handling DASH when DASH is disabled
For devices that support DASH, even DASH is disabled, there may still
exist a default firmware that will influence device behavior.
So driver needs to handle DASH for devices that support DASH, no
matter the DASH status is.
This patch also prepares for "fix network lost after resume on DASH
systems".
Fixes: ee7a1beb9759 ("r8169:call "rtl8168_driver_start" "rtl8168_driver_stop" only when hardware dash function is enabled") Cc: stable@vger.kernel.org Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20231109173400.4573-2-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Fix large frames in the Gemini ethernet driver
This is the result of a bug hunt for a problem with the
RTL8366RB DSA switch leading me wrong all over the place.
I am indebted to Vladimir Oltean who as usual pointed
out where the real problem was, many thanks!
Tryig to actually use big ("jumbo") frames on this
hardware uncovered the real bugs. Then I tested it on
the DSA switch and it indeed fixes the issue.
To make sure it also works fine with big frames on
non-DSA devices I also copied a large video file over
scp to a device with maximum frame size, the data
was transported in large TCP packets ending up in
0x7ff sized frames using software checksumming at
~2.0 MB/s.
If I set down the MTU to the standard 1500 bytes so
that hardware checksumming is used, the scp transfer
of the same file was slightly lower, ~1.8-1.9 MB/s.
Despite this not being the best test it shows that
we can now stress the hardware with large frames
and that software checksum works fine.
Linus Walleij [Thu, 9 Nov 2023 09:03:14 +0000 (10:03 +0100)]
net: ethernet: cortina: Fix MTU max setting
The RX max frame size is over 10000 for the Gemini ethernet,
but the TX max frame size is actually just 2047 (0x7ff after
checking the datasheet). Reflect this in what we offer to Linux,
cap the MTU at the TX max frame minus ethernet headers.
We delete the code disabling the hardware checksum for large
MTUs as netdev->mtu can no longer be larger than
netdev->max_mtu meaning the if()-clause in gmac_fix_features()
is never true.
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20231109-gemini-largeframe-fix-v4-3-6e611528db08@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Walleij [Thu, 9 Nov 2023 09:03:13 +0000 (10:03 +0100)]
net: ethernet: cortina: Handle large frames
The Gemini ethernet controller provides hardware checksumming
for frames up to 1514 bytes including ethernet headers but not
FCS.
If we start sending bigger frames (after first bumping up the MTU
on both interfaces sending and receiving the frames), truncated
packets start to appear on the target such as in this tcpdump
resulting from ping -s 1474:
If we bypass the hardware checksumming and provide a software
fallback, everything starts working fine up to the max TX MTU
of 2047 bytes, for example ping -s2000 192.168.1.2:
The bit enabling to bypass hardware checksum (or any of the
"TSS" bits) are undocumented in the hardware reference manual.
The entire hardware checksum unit appears undocumented. The
conclusion that we need to use the "bypass" bit was found by
trial-and-error.
Since no hardware checksum will happen, we slot in a software
checksum fallback.
Check for the condition where we need to compute checksum on the
skb with either hardware or software using == CHECKSUM_PARTIAL instead
of != CHECKSUM_NONE which is an incomplete check according to
<linux/skbuff.h>.
On the D-Link DIR-685 router this fixes a bug on the conduit
interface to the RTL8366RB DSA switch: as the switch needs to add
space for its tag it increases the MTU on the conduit interface
to 1504 and that means that when the router sends packages
of 1500 bytes these get an extra 4 bytes of DSA tag and the
transfer fails because of the erroneous hardware checksumming,
affecting such basic functionality as the LuCI web interface.
Eric Dumazet [Thu, 9 Nov 2023 18:01:02 +0000 (18:01 +0000)]
bonding: stop the device in bond_setup_by_slave()
Commit 9eed321cde22 ("net: lapbether: only support ethernet devices")
has been able to keep syzbot away from net/lapb, until today.
In the following splat [1], the issue is that a lapbether device has
been created on a bonding device without members. Then adding a non
ARPHRD_ETHER member forced the bonding master to change its type.
The fix is to make sure we call dev_close() in bond_setup_by_slave()
so that the potential linked lapbether devices (or any other devices
having assumptions on the physical device) are removed.
A similar bug has been addressed in commit 40baec225765
("bonding: fix panic on non-ARPHRD_ETHER enslave failure")
Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231109180102.4085183-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 9 Nov 2023 17:48:59 +0000 (17:48 +0000)]
ptp: annotate data-race around q->head and q->tail
As I was working on a syzbot report, I found that KCSAN would
probably complain that reading q->head or q->tail without
barriers could lead to invalid results.
Add corresponding READ_ONCE() and WRITE_ONCE() to avoid
load-store tearing.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20231109174859.3995880-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Nowlin [Tue, 7 Nov 2023 17:32:27 +0000 (12:32 -0500)]
ice: fix DDP package download for packages without signature segment
Commit 3cbdb0343022 ("ice: Add support for E830 DDP package segment")
incorrectly removed support for package download for packages without a
signature segment. These packages include the signature buffer inline
in the configurations buffers, and not in a signature segment.
Fix package download by providing download support for both packages
with (ice_download_pkg_with_sig_seg()) and without signature segment
(ice_download_pkg_without_sig_seg()).
Fixes: 3cbdb0343022 ("ice: Add support for E830 DDP package segment") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Closes: https://lore.kernel.org/netdev/ZUT50a94kk2pMGKb@boxer/ Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Arkadiusz Kubalewski [Tue, 31 Oct 2023 17:08:00 +0000 (18:08 +0100)]
ice: dpll: fix output pin capabilities
The dpll output pins which are used to feed clock signal of PHY and MAC
circuits cannot be disconnected, those integrated circuits require clock
signal for operation.
By stopping assignment of DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE pin
capability, prevent the user from invoking the state set callback on
those pins, setting the state on those pins already returns error, as
firmware doesn't allow the change of their state.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Fixes: 8a3a565ff210 ("ice: add admin commands to access cgu configuration") Reviewed-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Arkadiusz Kubalewski [Tue, 31 Oct 2023 17:06:54 +0000 (18:06 +0100)]
ice: dpll: fix check for dpll input priority range
Supported priority value for input pins may differ with regard of NIC
firmware version. E810T NICs with 3.20/4.00 FW versions would accept
priority range 0-31, where firmware 4.10+ would support the range 0-9
and extra value of 255.
Remove the in-range check as the driver has no information on supported
values from the running firmware, let firmware decide if given value is
correct and return extack error if the value is not supported.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Arkadiusz Kubalewski [Fri, 13 Oct 2023 10:25:10 +0000 (12:25 +0200)]
ice: dpll: fix initial lock status of dpll
When dpll device is registered and dpll subsystem performs notify of a
new device, the lock state value provided to dpll subsystem equals 0
which is invalid value for the `enum dpll_lock_status`.
Provide correct value by obtaining it from firmware before registering
the dpll device.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Willem de Bruijn [Mon, 13 Nov 2023 03:16:32 +0000 (22:16 -0500)]
ppp: limit MRU to 64K
ppp_sync_ioctl allows setting device MRU, but does not sanity check
this input.
Limit to a sane upper bound of 64KB.
No implementation I could find generates larger than 64KB frames.
RFC 2823 mentions an upper bound of PPP over SDL of 64KB based on the
16-bit length field. Other protocols will be smaller, such as PPPoE
(9KB jumbo frame) and PPPoA (18190 maximum CPCS-SDU size, RFC 2364).
PPTP and L2TP encapsulate in IP.
Syzbot managed to trigger alloc warning in __alloc_pages:
if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
WARNING: CPU: 1 PID: 37 at mm/page_alloc.c:4544 __alloc_pages+0x3ab/0x4a0 mm/page_alloc.c:4544
Similar code exists in other drivers that implement ppp_channel_ops
ioctl PPPIOCSMRU. Those might also be in scope. Notably excluded from
this are pppol2tp_ioctl and pppoe_ioctl.
This code goes back to the start of git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+6177e1f90d92583bcc58@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Auhagen [Sat, 11 Nov 2023 04:41:12 +0000 (05:41 +0100)]
net: mvneta: fix calls to page_pool_get_stats
Calling page_pool_get_stats in the mvneta driver without checks
leads to kernel crashes.
First the page pool is only available if the bm is not used.
The page pool is also not allocated when the port is stopped.
It can also be not allocated in case of errors.
The current implementation leads to the following crash calling
ethstats on a port that is down or when calling it at the wrong moment:
This commit adds the proper checks before calling page_pool_get_stats.
Fixes: b3fc79225f05 ("net: mvneta: add support for page_pool_get_stats") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Reported-by: Paulo Da Silva <Paulo.DaSilva@kyberna.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Bytes 34-35 of 36 are uninitialized
Memory access of size 36 starts at ffff88802d464a00
Data copied to user address 00007ff55033c0a0
CPU: 0 PID: 30322 Comm: syz-executor.0 Not tainted 6.6.0-14500-g1c41041124bd #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
=====================================================
tipc_add_tlv() puts TLV descriptor and value onto `skb`. This size is
calculated with TLV_SPACE() macro. It adds the size of struct tlv_desc and
the length of TLV value passed as an argument, and aligns the result to a
multiple of TLV_ALIGNTO, i.e., a multiple of 4 bytes.
If the size of struct tlv_desc plus the length of TLV value is not aligned,
the current implementation leaves the remaining bytes uninitialized. This
is the cause of the above kernel-infoleak issue.
This patch resolves this issue by clearing data up to an aligned size.
Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 10 Nov 2023 15:36:00 +0000 (10:36 -0500)]
net: gso_test: support CONFIG_MAX_SKB_FRAGS up to 45
The test allocs a single page to hold all the frag_list skbs. This
is insufficient on kernels with CONFIG_MAX_SKB_FRAGS=45, due to the
increased skb_shared_info frags[] array length.
Simplify the logic. Just allocate a page for each frag_list skb.
Fixes: 4688ecb1385f ("net: expand skb_segment unit test with frag_list coverage") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jijie Shao [Fri, 10 Nov 2023 09:37:13 +0000 (17:37 +0800)]
net: hns3: fix VF wrong speed and duplex issue
If PF is down, firmware will returns 10 Mbit/s rate and half-duplex mode
when PF queries the port information from firmware.
After imp reset command is executed, PF status changes to down,
and PF will query link status and updates port information
from firmware in a periodic scheduled task.
However, there is a low probability that port information is updated
when PF is down, and then PF link status changes to up.
In this case, PF synchronizes incorrect rate and duplex mode to VF.
This patch fixes it by updating port information before
PF synchronizes the rate and duplex to the VF
when PF changes to up.
Fixes: 18b6e31f8bf4 ("net: hns3: PF add support for pushing link status to VFs") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jijie Shao [Fri, 10 Nov 2023 09:37:12 +0000 (17:37 +0800)]
net: hns3: fix VF reset fail issue
Currently the reset process in hns3 and firmware watchdog init process is
asynchronous. We think firmware watchdog initialization is completed
before VF clear the interrupt source. However, firmware initialization
may not complete early. So VF will receive multiple reset interrupts
and fail to reset.
So we add delay before VF interrupt source and 5 ms delay
is enough to avoid second reset interrupt.
Fixes: 427900d27d86 ("net: hns3: fix the timing issue of VF clearing interrupt sources") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 10 Nov 2023 09:37:11 +0000 (17:37 +0800)]
net: hns3: fix variable may not initialized problem in hns3_init_mac_addr()
When a VF is calling hns3_init_mac_addr(), get_mac_addr() may
return fail, then the value of mac_addr_temp is not initialized.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 10 Nov 2023 09:37:10 +0000 (17:37 +0800)]
net: hns3: fix out-of-bounds access may occur when coalesce info is read via debugfs
The hns3 driver define an array of string to show the coalesce
info, but if the kernel adds a new mode or a new state,
out-of-bounds access may occur when coalesce info is read via
debugfs, this patch fix the problem.
Fixes: c99fead7cb07 ("net: hns3: add debugfs support for interrupt coalesce") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 10 Nov 2023 09:37:09 +0000 (17:37 +0800)]
net: hns3: fix incorrect capability bit display for copper port
Currently, the FEC capability bit is default set for device version V2.
It's incorrect for the copper port. Eventhough it doesn't make the nic
work abnormal, but the capability information display in debugfs may
confuse user. So clear it when driver get the port type inforamtion.
Fixes: 433ccce83504 ("net: hns3: use FEC capability queried from firmware") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 10 Nov 2023 09:37:08 +0000 (17:37 +0800)]
net: hns3: add barrier in vf mailbox reply process
In hclgevf_mbx_handler() and hclgevf_get_mbx_resp() functions,
there is a typical store-store and load-load scenario between
received_resp and additional_info. This patch adds barrier
to fix the problem.
Fixes: 4671042f1ef0 ("net: hns3: add match_id to check mailbox response from PF to VF") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 10 Nov 2023 09:37:07 +0000 (17:37 +0800)]
net: hns3: fix add VLAN fail issue
The hclge_sync_vlan_filter is called in periodic task,
trying to remove VLAN from vlan_del_fail_bmap. It can
be concurrence with VLAN adding operation from user.
So once user failed to delete a VLAN id, and add it
again soon, it may be removed by the periodic task,
which may cause the software configuration being
inconsistent with hardware. So add mutex handling
to avoid this.
Jan Kiszka [Fri, 10 Nov 2023 16:13:08 +0000 (17:13 +0100)]
net: ti: icssg-prueth: Fix error cleanup on failing pruss_request_mem_region
We were just continuing in this case, surely not desired.
Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
CPU: 0 PID: 12950 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
=====================================================
ppp_sync_input() checks the first 2 bytes of the data are PPP_ALLSTATIONS
and PPP_UI. However, if the data length is 1 and the first byte is
PPP_ALLSTATIONS, an access to an uninitialized value occurs when checking
PPP_UI. This patch resolves this issue by checking the data length.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 9 Nov 2023 15:22:41 +0000 (15:22 +0000)]
ipvlan: add ipvlan_route_v6_outbound() helper
Inspired by syzbot reports using a stack of multiple ipvlan devices.
Reduce stack size needed in ipvlan_process_v6_outbound() by moving
the flowi6 struct used for the route lookup in an non inlined
helper. ipvlan_route_v6_outbound() needs 120 bytes on the stack,
immediately reclaimed.
Also make sure ipvlan_process_v4_outbound() is not inlined.
We might also have to lower MAX_NEST_DEV, because only syzbot uses
setups with more than four stacked devices.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Let's move the SOCK_RCU_FREE part up a bit, before we are inserting
the socket into hashtables. Note, that the race is really harmless;
the bpf callers are handling this situation (where listener socket
doesn't have SOCK_RCU_FREE set) correctly, so the only
annoyance is a WARN_ONCE.
More details from Eric regarding SOCK_RCU_FREE timeline:
Commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under
synflood") added SOCK_RCU_FREE. At that time, the precise location of
sock_set_flag(sk, SOCK_RCU_FREE) did not matter, because the thread calling
__inet_hash() owns a reference on sk. SOCK_RCU_FREE was only tested
at dismantle time.
Commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
started checking SOCK_RCU_FREE _after_ the lookup to infer whether
the refcount has been taken care of.
Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Fri, 10 Nov 2023 06:14:11 +0000 (22:14 -0800)]
selftests/bpf: add more test cases for check_cfg()
Add a few more simple cases to validate proper privileged vs unprivileged
loop detection behavior. conditional_loop2 is the one reported by Hao
Sun that triggered this set of fixes.
Andrii Nakryiko [Fri, 10 Nov 2023 06:14:10 +0000 (22:14 -0800)]
bpf: fix control-flow graph checking in privileged mode
When BPF program is verified in privileged mode, BPF verifier allows
bounded loops. This means that from CFG point of view there are
definitely some back-edges. Original commit adjusted check_cfg() logic
to not detect back-edges in control flow graph if they are resulting
from conditional jumps, which the idea that subsequent full BPF
verification process will determine whether such loops are bounded or
not, and either accept or reject the BPF program. At least that's my
reading of the intent.
Unfortunately, the implementation of this idea doesn't work correctly in
all possible situations. Conditional jump might not result in immediate
back-edge, but just a few unconditional instructions later we can arrive
at back-edge. In such situations check_cfg() would reject BPF program
even in privileged mode, despite it might be bounded loop. Next patch
adds one simple program demonstrating such scenario.
To keep things simple, instead of trying to detect back edges in
privileged mode, just assume every back edge is valid and let subsequent
BPF verification prove or reject bounded loops.
Note a few test changes. For unknown reason, we have a few tests that
are specified to detect a back-edge in a privileged mode, but looking at
their code it seems like the right outcome is passing check_cfg() and
letting subsequent verification to make a decision about bounded or not
bounded looping.
Bounded recursion case is also interesting. The example should pass, as
recursion is limited to just a few levels and so we never reach maximum
number of nested frames and never exhaust maximum stack depth. But the
way that max stack depth logic works today it falsely detects this as
exceeding max nested frame count. This patch series doesn't attempt to
fix this orthogonal problem, so we just adjust expected verifier failure.
====================
BPF control flow graph and precision backtrack fixes
A small fix to BPF verifier's CFG logic around handling and reporting ldimm64
instructions. Patch #1 was previously submitted separately ([0]), and so this
patch set supersedes that patch.
Second patch is fixing obscure corner case in mark_chain_precise() logic. See
patch for details. Patch #3 adds a dedicated test, however fragile it might.
Andrii Nakryiko [Fri, 10 Nov 2023 00:26:38 +0000 (16:26 -0800)]
selftests/bpf: add edge case backtracking logic test
Add a dedicated selftests to try to set up conditions to have a state
with same first and last instruction index, but it actually is a loop
3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't
take into account jump history.
Fix an edge case in __mark_chain_precision() which prematurely stops
backtracking instructions in a state if it happens that state's first
and last instruction indexes are the same. This situations doesn't
necessarily mean that there were no instructions simulated in a state,
but rather that we starting from the instruction, jumped around a bit,
and then ended up at the same instruction before checkpointing or
marking precision.
To distinguish between these two possible situations, we need to consult
jump history. If it's empty or contain a single record "bridging" parent
state and first instruction of processed state, then we indeed
backtracked all instructions in this state. But if history is not empty,
we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely.
Use -ENOENT return code to denote "we are out of instructions"
situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once
the next fix in this patch set is applied.
Andrii Nakryiko [Fri, 10 Nov 2023 00:26:36 +0000 (16:26 -0800)]
bpf: handle ldimm64 properly in check_cfg()
ldimm64 instructions are 16-byte long, and so have to be handled
appropriately in check_cfg(), just like the rest of BPF verifier does.
This has implications in three places:
- when determining next instruction for non-jump instructions;
- when determining next instruction for callback address ldimm64
instructions (in visit_func_call_insn());
- when checking for unreachable instructions, where second half of
ldimm64 is expected to be unreachable;
We take this also as an opportunity to report jump into the middle of
ldimm64. And adjust few test_verifier tests accordingly.
Anders Roxell [Thu, 9 Nov 2023 17:43:28 +0000 (18:43 +0100)]
selftests: bpf: xskxceiver: ksft_print_msg: fix format type error
Crossbuilding selftests/bpf for architecture arm64, format specifies
type error show up like.
xskxceiver.c:912:34: error: format specifies type 'int' but the argument
has type '__u64' (aka 'unsigned long long') [-Werror,-Wformat]
ksft_print_msg("[%s] expected meta_count [%d], got meta_count [%d]\n",
~~
%llu
__func__, pkt->pkt_nb, meta->count);
^~~~~~~~~~~
xskxceiver.c:929:55: error: format specifies type 'unsigned long long' but
the argument has type 'u64' (aka 'unsigned long') [-Werror,-Wformat]
ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len);
~~~~ ^~~~
Fixing the issues by casting to (unsigned long long) and changing the
specifiers to be %llu from %d and %u, since with u64s it might be %llx
or %lx, depending on architecture.
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
Linus Torvalds [Fri, 10 Nov 2023 01:04:58 +0000 (17:04 -0800)]
Merge tag 'v6.7-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in ahash and hides the Kconfig sub-options for
the jitter RNG"
* tag 'v6.7-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ahash - Set using_shash for cloned ahash wrapper over shash
crypto: jitterentropy - Hide esoteric Kconfig options under FIPS and EXPERT
Linus Torvalds [Thu, 9 Nov 2023 22:18:42 +0000 (14:18 -0800)]
Merge tag 'input-for-v6.7-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a number of input drivers has been converted to use facilities
provided by the device core to instantiate driver-specific attributes
instead of using devm_device_add_group() and similar APIs
- platform input devices have been converted to use remove() callback
returning void
- a fix for use-after-free when tearing down a Synaptics RMI device
- a few flexible arrays in input structures have been annotated with
__counted_by to help hardening efforts
- handling of vddio supply in cyttsp5 driver
- other miscellaneous fixups
* tag 'input-for-v6.7-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits)
Input: walkera0701 - use module_parport_driver macro to simplify the code
Input: synaptics-rmi4 - fix use after free in rmi_unregister_function()
dt-bindings: input: fsl,scu-key: Document wakeup-source
Input: cyttsp5 - add handling for vddio regulator
dt-bindings: input: cyttsp5: document vddio-supply
Input: tegra-kbc - use device_get_match_data()
Input: Annotate struct ff_device with __counted_by
Input: axp20x-pek - avoid needless newline removal
Input: mt - annotate struct input_mt with __counted_by
Input: leds - annotate struct input_leds with __counted_by
Input: evdev - annotate struct evdev_client with __counted_by
Input: synaptics-rmi4 - replace deprecated strncpy
Input: wm97xx-core - convert to platform remove callback returning void
Input: wm831x-ts - convert to platform remove callback returning void
Input: ti_am335x_tsc - convert to platform remove callback returning void
Input: sun4i-ts - convert to platform remove callback returning void
Input: stmpe-ts - convert to platform remove callback returning void
Input: pcap_ts - convert to platform remove callback returning void
Input: mc13783_ts - convert to platform remove callback returning void
Input: mainstone-wm97xx - convert to platform remove callback returning void
...
Linus Torvalds [Thu, 9 Nov 2023 22:10:38 +0000 (14:10 -0800)]
Merge tag 'for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"This contains one patch which slipped through the cracks (iproc), a
core sanitizing improvement as the new memdup_array_user() helper went
upstream (i2c-dev), and two driver bugfixes (designware, cp2615)"
* tag 'for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cp2615: Fix 'assignment to __be16' warning
i2c: dev: copy userspace array safely
i2c: designware: Disable TX_EMPTY irq while waiting for block length byte
i2c: iproc: handle invalid slave state
Linus Torvalds [Thu, 9 Nov 2023 21:54:25 +0000 (13:54 -0800)]
Merge tag 'linux-watchdog-6.7-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add support for Amlogic C3 and S4 SoCs
- add IT8613 ID
- add MSM8226 and MSM8974 compatibles
- other small fixes and improvements
* tag 'linux-watchdog-6.7-rc1' of git://www.linux-watchdog.org/linux-watchdog: (24 commits)
dt-bindings: watchdog: Add support for Amlogic C3 and S4 SoCs
watchdog: mlx-wdt: Parameter desctiption warning fix
watchdog: aspeed: Add support for aspeed,reset-mask DT property
dt-bindings: watchdog: aspeed-wdt: Add aspeed,reset-mask property
watchdog: apple: Deactivate on suspend
dt-bindings: watchdog: qcom-wdt: Add MSM8226 and MSM8974 compatibles
dt-bindings: watchdog: fsl-imx7ulp-wdt: Add 'fsl,ext-reset-output'
wdog: imx7ulp: Enable wdog int_en bit for watchdog any reset
drivers: watchdog: marvell_gti: Program the max_hw_heartbeat_ms
drivers: watchdog: marvell_gti: fix zero pretimeout handling
watchdog: marvell_gti: Replace of_platform.h with explicit includes
watchdog: imx_sc_wdt: continue if the wdog already enabled
watchdog: st_lpc: Use device_get_match_data()
watchdog: wdat_wdt: Add timeout value as a param in ping method
watchdog: gpio_wdt: Make use of device properties
sbsa_gwdt: Calculate timeout with 64-bit math
watchdog: ixp4xx: Make sure restart always works
watchdog: it87_wdt: add IT8613 ID
watchdog: marvell_gti_wdt: Fix error code in probe()
Watchdog: marvell_gti_wdt: Remove redundant dev_err_probe() for platform_get_irq()
...
Linus Torvalds [Thu, 9 Nov 2023 21:47:52 +0000 (13:47 -0800)]
Merge tag 'pwm/for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This contains a few fixes and a bunch of cleanups, a lot of which is
in preparation for Uwe's character device support that may be ready in
time for the next merge window"
* tag 'pwm/for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (37 commits)
pwm: samsung: Document new member .channel in struct samsung_pwm_chip
pwm: bcm2835: Add support for suspend/resume
pwm: brcmstb: Checked clk_prepare_enable() return value
pwm: brcmstb: Utilize appropriate clock APIs in suspend/resume
pwm: pxa: Explicitly include correct DT includes
pwm: cros-ec: Simplify using devm_pwmchip_add() and dev_err_probe()
pwm: samsung: Consistently use the same name for driver data
pwm: vt8500: Simplify using devm functions
pwm: sprd: Simplify using devm_pwmchip_add() and dev_err_probe()
pwm: sprd: Provide a helper to cast a chip to driver data
pwm: spear: Simplify using devm functions
pwm: mtk-disp: Simplify using devm_pwmchip_add()
pwm: imx-tpm: Simplify using devm functions
pwm: brcmstb: Simplify using devm functions
pwm: bcm2835: Simplify using devm functions
pwm: bcm-iproc: Simplify using devm functions
pwm: Adapt sysfs API documentation to reality
pwm: dwc: add PWM bit unset in get_state call
pwm: dwc: make timer clock configurable
pwm: dwc: split pci out of core driver
...
Linus Torvalds [Thu, 9 Nov 2023 21:37:28 +0000 (13:37 -0800)]
Merge tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core changes:
- Make default-domains mandatory for all IOMMU drivers
- Remove group refcounting
- Add generic_single_device_group() helper and consolidate drivers
- Cleanup map/unmap ops
- Scaling improvements for the IOVA rcache depot
- Convert dart & iommufd to the new domain_alloc_paging()
ARM-SMMU:
- Device-tree binding update:
- Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
- SMMUv2:
- Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
- SMMUv3:
- Large refactoring of the context descriptor code to move the CD
table into the master, paving the way for '->set_dev_pasid()'
support on non-SVA domains
- Minor cleanups to the SVA code
Intel VT-d:
- Enable debugfs to dump domain attached to a pasid
- Remove an unnecessary inline function
AMD IOMMU:
- Initial patches for SVA support (not complete yet)
S390 IOMMU:
- DMA-API conversion and optimized IOTLB flushing
And some smaller fixes and improvements"
* tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits)
iommu/dart: Remove the force_bypass variable
iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()
iommu/dart: Convert to domain_alloc_paging()
iommu/dart: Move the blocked domain support to a global static
iommu/dart: Use static global identity domains
iommufd: Convert to alloc_domain_paging()
iommu/vt-d: Use ops->blocked_domain
iommu/vt-d: Update the definition of the blocking domain
iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain
Revert "iommu/vt-d: Remove unused function"
iommu/amd: Remove DMA_FQ type from domain allocation path
iommu: change iommu_map_sgtable to return signed values
iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
iommu/vt-d: debugfs: Support dumping a specified page table
iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
iommu/vt-d: debugfs: Dump entry pointing to huge page
iommu/vt-d: Remove unused function
iommu/arm-smmu-v3-sva: Remove bond refcount
iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle
iommu/arm-smmu-v3: Rename cdcfg to cd_table
...
Diogo Ivo [Tue, 7 Nov 2023 12:00:36 +0000 (12:00 +0000)]
net: ti: icss-iep: fix setting counter value
Currently icss_iep_set_counter() writes the upper 32-bits of the
counter value to both the lower and upper counter registers, so
fix this by writing the appropriate value to the lower register.
Edward Adam Davis [Tue, 7 Nov 2023 08:00:41 +0000 (16:00 +0800)]
ptp: fix corrupted list in ptp_open
There is no lock protection when writing ptp->tsevqs in ptp_open() and
ptp_release(), which can cause data corruption, use spin lock to avoid this
issue.
Moreover, ptp_release() should not be used to release the queue in ptp_read(),
and it should be deleted altogether.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reported-and-tested-by: syzbot+df3f3ef31f60781fa911@syzkaller.appspotmail.com Fixes: 8f5de6fb2453 ("ptp: support multiple timestamp event readers") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_CD19564FFE8DA8A5918DFE92325D92DD8107@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Adam Davis [Tue, 7 Nov 2023 08:00:40 +0000 (16:00 +0800)]
ptp: ptp_read should not release queue
Firstly, queue is not the memory allocated in ptp_read;
Secondly, other processes may block at ptp_read and wait for conditions to be
met to perform read operations.
Acked-by: Richard Cochran <richardcochran@gmail.com> Reported-and-tested-by: syzbot+df3f3ef31f60781fa911@syzkaller.appspotmail.com Fixes: 8f5de6fb2453 ("ptp: support multiple timestamp event readers") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_18747D76F1675A3C633772960237544AAA09@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 9 Nov 2023 02:43:52 +0000 (18:43 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-06 (ice)
This series contains updates to ice driver only.
Dave removes SR-IOV LAG attribute for only the interface being disabled
to allow for proper unwinding of all interfaces.
Michal Schmidt changes some LAG allocations from GFP_KERNEL to GFP_ATOMIC
due to non-allowed sleeping.
Aniruddha and Marcin fix redirection and drop rules for switchdev by
properly setting and marking egress/ingress type.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix VF-VF direction matching in drop rule in switchdev
ice: Fix VF-VF filter rules in switchdev mode
ice: lag: in RCU, use atomic allocation
ice: Fix SRIOV LAG disable on non-compliant aggregate
====================
Jakub Kicinski [Thu, 9 Nov 2023 02:20:13 +0000 (18:20 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-06 (i40e)
This series contains updates to i40e driver only.
Ivan Vecera resolves a couple issues with devlink; removing a call to
devlink_port_type_clear() and ensuring devlink port is unregistered
after the net device.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix devlink port unregistering
i40e: Do not call devlink_port_type_clear()
====================
Jakub Kicinski [Thu, 9 Nov 2023 02:14:59 +0000 (18:14 -0800)]
Merge tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Add missing netfilter modules description to fix W=1, from Florian Westphal.
2) Fix catch-all element GC with timeout when use with the pipapo set
backend, this remained broken since I tried to fix it this summer,
then another attempt to fix it recently.
3) Add missing IPVS modules descriptions to fix W=1, also from Florian.
4) xt_recent allocated a too small buffer to store an IPv4-mapped IPv6
address which can be parsed by in6_pton(), from Maciej Zenczykowski.
Broken for many releases.
5) Skip IPv4-mapped IPv6, IPv4-compat IPv6, site/link local scoped IPv6
addressses to set up IPv6 NAT redirect, also from Florian. This is
broken since 2012.
* tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
====================
Jakub Kicinski [Thu, 9 Nov 2023 01:56:13 +0000 (17:56 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-11-08
We've added 16 non-merge commits during the last 6 day(s) which contain
a total of 30 files changed, 341 insertions(+), 130 deletions(-).
The main changes are:
1) Fix a BPF verifier issue in precision tracking for BPF_ALU | BPF_TO_BE |
BPF_END where the source register was incorrectly marked as precise,
from Shung-Hsi Yu.
2) Fix a concurrency issue in bpf_timer where the former could still have
been alive after an application releases or unpins the map, from Hou Tao.
3) Fix a BPF verifier issue where immediates are incorrectly cast to u32
before being spilled and therefore losing sign information, from Hao Sun.
4) Fix a misplaced BPF_TRACE_ITER in check_css_task_iter_allowlist which
incorrectly compared bpf_prog_type with bpf_attach_type, from Chuyi Zhou.
5) Add __bpf_hook_{start,end} as well as __bpf_kfunc_{start,end}_defs macros,
migrate all BPF-related __diag callsites over to it, and add a new
__diag_ignore_all for -Wmissing-declarations to the macros to address
recent build warnings, from Dave Marchevsky.
6) Fix broken BPF selftest build of xdp_hw_metadata test on architectures
where char is not signed, from Björn Töpel.
7) Fix test_maps selftest to properly use LIBBPF_OPTS() macro to initialize
the bpf_map_create_opts, from Andrii Nakryiko.
8) Fix bpffs selftest to avoid unmounting /sys/kernel/debug as it may have
been mounted and used by other applications already, from Manu Bretelle.
9) Fix a build issue without CONFIG_CGROUPS wrt css_task open-coded
iterators, from Matthieu Baerts.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
selftests/bpf: Fix broken build where char is unsigned
selftests/bpf: precision tracking test for BPF_NEG and BPF_END
bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
selftests/bpf: Add test for using css_task iter in sleepable progs
selftests/bpf: Add tests for css_task iter combining with cgroup iter
bpf: Relax allowlist for css_task iter
selftests/bpf: fix test_maps' use of bpf_map_create_opts
bpf: Check map->usercnt after timer->timer is assigned
bpf: Add __bpf_hook_{start,end} macros
bpf: Add __bpf_kfunc_{start,end}_defs macros
selftests/bpf: fix test_bpffs
selftests/bpf: Add test for immediate spilled to stack
bpf: Fix check_stack_write_fixed_off() to correctly spill imm
bpf: fix compilation error without CGROUPS
====================
Vlad Buslov [Fri, 3 Nov 2023 15:14:10 +0000 (16:14 +0100)]
net/sched: act_ct: Always fill offloading tuple iifidx
Referenced commit doesn't always set iifidx when offloading the flow to
hardware. Fix the following cases:
- nf_conn_act_ct_ext_fill() is called before extension is created with
nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
unspecified iifidx when connection is offloaded after only single
original-direction packet has been processed by tc data path. Always fill
the new nf_conn_act_ct_ext instance after creating it in
nf_conn_act_ct_ext_add().
- Offloading of unidirectional UDP NEW connections is now supported, but ct
flow iifidx field is not updated when connection is promoted to
bidirectional which can result reply-direction iifidx to be zero when
refreshing the connection. Fill in the extension and update flow iifidx
before calling flow_offload_refresh().
Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 8 Nov 2023 21:39:16 +0000 (13:39 -0800)]
Merge tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- SUNRPC:
- re-probe the target RPC port after an ECONNRESET error
- handle allocation errors from rpcb_call_async()
- fix a use-after-free condition in rpc_pipefs
- fix up various checks for timeouts
- NFSv4.1:
- Handle NFS4ERR_DELAY errors during session trunking
- fix SP4_MACH_CRED protection for pnfs IO
- NFSv4:
- Ensure that we test all delegations when the server notifies
us that it may have revoked some of them
Features:
- Allow knfsd processes to break out of NFS4ERR_DELAY loops when
re-exporting NFSv4.x by setting appropriate values for the
'delay_retrans' module parameter
- nfs: Convert nfs_symlink() to use a folio"
* tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Convert nfs_symlink() to use a folio
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO
SUNRPC: Add an IS_ERR() check back to where it was
NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunking
nfs41: drop dependency between flexfiles layout driver and NFSv3 modules
NFSv4: fairly test all delegations on a SEQ4_ revocation
SUNRPC: SOFTCONN tasks should time out when on the sending list
SUNRPC: Force close the socket when a hard error is reported
SUNRPC: Don't skip timeout checks in call_connect_status()
SUNRPC: ECONNRESET might require a rebind
NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
Linus Torvalds [Wed, 8 Nov 2023 21:33:47 +0000 (13:33 -0800)]
Merge tag 'exfat-for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Fix an issue that exfat timestamps are not updated caused by new
timestamp accessor function patch
* tag 'exfat-for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix ctime is not updated
exfat: fix setting uninitialized time to ctime/atime
Linus Torvalds [Wed, 8 Nov 2023 21:22:16 +0000 (13:22 -0800)]
Merge tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
- Realtime device subsystem:
- Cleanup usage of xfs_rtblock_t and xfs_fsblock_t data types
- Replace open coded conversions between rt blocks and rt extents
with calls to static inline helpers
- Replace open coded realtime geometry compuation and macros with
helper functions
- CPU usage optimizations for realtime allocator
- Misc bug fixes associated with Realtime device
- Allow read operations to execute while an FICLONE ioctl is being
serviced
- Misc bug fixes:
- Alert user when xfs_droplink() encounters an inode with a link
count of zero
- Handle the case where the allocator could return zero extents when
servicing an fallocate request
* tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits)
xfs: allow read IO and FICLONE to run concurrently
xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
xfs: introduce protection for drop nlink
xfs: don't look for end of extent further than necessary in xfs_rtallocate_extent_near()
xfs: don't try redundant allocations in xfs_rtallocate_extent_near()
xfs: limit maxlen based on available space in xfs_rtallocate_extent_near()
xfs: return maximum free size from xfs_rtany_summary()
xfs: invert the realtime summary cache
xfs: simplify rt bitmap/summary block accessor functions
xfs: simplify xfs_rtbuf_get calling conventions
xfs: cache last bitmap block in realtime allocator
xfs: use accessor functions for summary info words
xfs: consolidate realtime allocation arguments
xfs: create helpers for rtsummary block/wordcount computations
xfs: use accessor functions for bitmap words
xfs: create helpers for rtbitmap block/wordcount computations
xfs: create a helper to handle logging parts of rt bitmap/summary blocks
xfs: convert rt summary macros to helpers
xfs: convert open-coded xfs_rtword_t pointer accesses to helper
xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros
...
The mailman-2 system behind lists.linux[-]foundation.org is being
retired, so the lists are being migrated to lists.linux.dev.
Since both domains belong to LF and setting up proper forwards is
possible, the old addresses will continue to work for a while, but all
new patches should be sent to the new canonical addresses for each list.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 8 Nov 2023 20:39:54 +0000 (12:39 -0800)]
Merge tag 's390-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Get rid of s390 specific use of two PTEs per 4KB page with complex
half-used pages tracking. Using full 4KB pages for 2KB PTEs increases
the memory footprint of page tables but drastically simplify mm code,
removing a common blocker for common code changes and adaptations
- Simplify and rework "cmma no-dat" handling. This is a follow up for
recent fixes which prevent potential incorrect guest TLB flushes
- Add perf user stack unwinding as well as USER_STACKTRACE support for
user space built with -mbackchain compile option
- Add few missing conversion from tlb_remove_table to tlb_remove_ptdesc
- Fix crypto cards vanishing in a secure execution environment due to
asynchronous errors
- Avoid reporting crypto cards or queues in check-stop state as online
- Fix null-ptr deference in AP bus code triggered by early config
change via SCLP
- Couple of stability improvements in AP queue interrupt handling
* tag 's390-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: make pte_free_tlb() similar to pXd_free_tlb()
s390/mm: use compound page order to distinguish page tables
s390/mm: use full 4KB page for 2KB PTE
s390/cmma: rework no-dat handling
s390/cmma: move arch_set_page_dat() to header file
s390/cmma: move set_page_stable() and friends to header file
s390/cmma: move parsing of cmma kernel parameter to early boot code
s390/cmma: cleanup inline assemblies
s390/ap: fix vanishing crypto cards in SE environment
s390/zcrypt: don't report online if card or queue is in check-stop state
s390: add USER_STACKTRACE support
s390/perf: implement perf_callchain_user()
s390/ap: fix AP bus crash on early config change callback invocation
s390/ap: re-enable interrupt for AP queues
s390/ap: rework to use irq info from ap queue status
s390/mm: add missing conversion to use ptdescs
Linus Torvalds [Wed, 8 Nov 2023 17:47:52 +0000 (09:47 -0800)]
Merge tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull RCU fixes from Frederic Weisbecker:
- Fix a lock inversion between scheduler and RCU introduced in
v6.2-rc4. The scenario could trigger on any user of RCU_NOCB
(mostly Android but also nohz_full)
- Fix PF_IDLE semantic changes introduced in v6.6-rc3 breaking
some RCU-Tasks and RCU-Tasks-Trace expectations as to what
exactly is an idle task. This resulted in potential spurious
stalls and warnings.
* tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
rcu/tasks-trace: Handle new PF_IDLE semantics
rcu/tasks: Handle new PF_IDLE semantics
rcu: Introduce rcu_cpu_online()
rcu: Break rcu_node_0 --> &rq->__lock order
Linus Torvalds [Wed, 8 Nov 2023 17:40:13 +0000 (09:40 -0800)]
Merge tag 'memblock-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock update from Mike Rapoport:
"Report failures when memblock_can_resize is not set.
Numerous memblock reservations at early boot may exhaust static
memblock.reserved array and it is unnoticed because most of the
callers don't check memblock_reserve() return value.
In this case the system will crash later, but the reason is hard to
identify.
Replace return of an error with panic() when memblock.reserved is
exhausted before it can be resized"
* tag 'memblock-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: report failures when memblock_can_resize is not set