Daniel Jurgens [Fri, 12 Jul 2024 00:33:08 +0000 (17:33 -0700)]
net/mlx5: Set sf_eq_usage for SF max EQs
When setting max_io_eqs for an SF function also set the sf_eq_usage_cap.
This is to indicate to the SF driver from the PF that the user has set
the max io eqs via devlink. So the SF driver can later query the proper
max eq value from the new cap.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Jurgens [Fri, 12 Jul 2024 00:33:07 +0000 (17:33 -0700)]
net/mlx5: IFC updates for SF max IO EQs
Expose a new cap sf_eq_usage. The vhca_resource_manager can write this
cap, indicating the SF driver should use max_num_eqs_24b to determine
how many EQs to use.
Will be used in the next patch, to indicate to the SF driver from the PF
that the user has set the max io eqs via devlink. So the SF driver can
later query the proper max eq value from the new cap.
devlink port function set pci/0000:08:00.0/32768 max_io_eqs 32
Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the data type of the variable freq in mvpp2_rx_time_coal_set()
and mvpp2_tx_time_coal_set() to u32 because port->priv->tclk also has
the data type u32.
Change the data type of the function parameter clk_hz in
mvpp2_usec_to_cycles() and mvpp2_cycles_to_usec() to u32 accordingly
and remove the following Coccinelle/coccicheck warning reported by
do_div.cocci:
WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead
Use min() to simplify the code and improve its readability.
net: ethtool: Monotonically increase the message sequence number
Currently, during the module firmware flashing process, unicast
notifications are sent from the kernel using the same sequence number,
making it impossible for user space to track missed notifications.
Monotonically increase the message sequence number, so the order of
notifications could be tracked effectively.
However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
ACK.
For example, the write() syscall in the following packetdrill script fails
with -EAGAIN, and wrong SNMP stats get incremented.
0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
+0 < S 0:0(0) win 1000 <mss 1000>
+0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
+0 < S. 0:0(0) ack 1 win 1000
# packetdrill cross-synack.pkt
cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
# nstat
...
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPSYNChallenge 1 0.0
The problem is that bpf_skops_established() is triggered by the Challenge
ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to
check if the peer supports a TCP option that is expected to be exchanged
in SYN and SYN+ACK.
Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid
such a situation.
Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to
send an unnecessary ACK, but this could be a bit risky for net.git, so this
targets for net-next.
Jakub Kicinski [Sat, 13 Jul 2024 05:27:25 +0000 (22:27 -0700)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
idpf: XDP chapter I: convert Rx to libeth
Alexander Lobakin says:
XDP for idpf is currently 5 chapters:
* convert Rx to libeth (this);
* convert Tx and stats to libeth;
* generic XDP and XSk code changes, libeth_xdp;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).
Part I does the following:
* splits &idpf_queue into 4 (RQ, SQ, FQ, CQ) and puts them on a diet;
* ensures optimal cacheline placement, strictly asserts CL sizes;
* moves currently unused/dead singleq mode out of line;
* reuses libeth's Rx ptype definitions and helpers;
* uses libeth's Rx buffer management for both header and payload;
* eliminates memcpy()s and coherent DMA uses on hotpath, uses
napi_build_skb() instead of in-place short skb allocation.
Most idpf patches, except for the queue split, removes more lines
than adds.
Expect far better memory utilization and +5-8% on Rx depending on
the case (+17% on skb XDP_DROP :>).
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: use libeth Rx buffer management for payload buffer
idpf: convert header split mode to libeth + napi_build_skb()
libeth: support different types of buffers for Rx
idpf: remove legacy Page Pool Ethtool stats
idpf: reuse libeth's definitions of parsed ptype structures
idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ
idpf: merge singleq and splitq &net_device_ops
idpf: strictly assert cachelines of queue and queue vector structures
idpf: avoid bloating &idpf_q_vector with big %NR_CPUS
idpf: split &idpf_queue into 4 strictly-typed queue structures
idpf: stop using macros for accessing queue descriptors
libeth: add cacheline / struct layout assertion helpers
page_pool: use __cacheline_group_{begin, end}_aligned()
cache: add __cacheline_group_{begin, end}_aligned() (+ couple more)
====================
We've added 23 non-merge commits during the last 3 day(s) which contain
a total of 18 files changed, 234 insertions(+), 243 deletions(-).
The main changes are:
1) Improve BPF verifier by utilizing overflow.h helpers to check
for overflows, from Shung-Hsi Yu.
2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
when attr->attach_prog_fd was not specified, from Tengda Wu.
3) Fix arm64 BPF JIT when generating code for BPF trampolines with
BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits,
from Puranjay Mohan.
4) Remove test_run callback from lwt_seg6local_prog_ops which never worked
in the first place and caused syzbot reports,
from Sebastian Andrzej Siewior.
5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/
/KF_RCU-typed BPF kfuncs, from Matt Bobrowski.
6) Fix a long standing bug in libbpf with regards to handling of BPF
skeleton's forward and backward compatibility, from Andrii Nakryiko.
7) Annotate btf_{seq,snprintf}_show functions with __printf,
from Alan Maguire.
8) BPF selftest improvements to reuse common network helpers in sk_lookup
test and dropping the open-coded inetaddr_len() and make_socket() ones,
from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type()
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again
bpf: use check_sub_overflow() to check for subtraction overflows
bpf: use check_add_overflow() to check for addition overflows
bpf: fix overflow check in adjust_jmp_off()
bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o
bpf: annotate BTF show functions with __printf
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
selftests/bpf: Close obj in error path in xdp_adjust_tail
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use start_server_addr in sk_lookup
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Add ASSERT_OK_FD macro
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m
bpf: Remove tst_run from lwt_seg6local_prog_ops.
bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU
...
====================
drivers/net/ethernet/broadcom/bnxt/bnxt.c f7ce5eb2cb79 ("bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()") 20c8ad72eb7f ("eth: bnxt: use the RSS context XArray instead of the local list")
Adjacent changes:
net/ethtool/ioctl.c 503757c80928 ("net: ethtool: Fix RSS setting") eac9122f0c41 ("net: ethtool: record custom RSS contexts in the XArray")
Jakub Kicinski [Sat, 13 Jul 2024 05:16:28 +0000 (22:16 -0700)]
Merge branch 'eth-bnxt-use-the-new-rss-api'
Jakub Kicinski says:
====================
eth: bnxt: use the new RSS API
Convert bnxt from using the set_rxfh API to separate create/modify/remove
callbacks.
Two small extensions to the core APIs are necessary:
- the ability to discard contexts if for some catastrophic reasons
device can no longer provide them;
- the ability to reserve space in the context for RSS table growth.
The driver is adjusted to store indirection tables on u32 to make
it easier to use core structs directly.
With that out of the way the conversion is fairly straightforward.
Since the opposition to discarding contexts was relatively mild
and its what bnxt does already, I'm sticking to that. We may very
well need to revisit that at a later time.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:13 +0000 (15:07 -0700)]
eth: bnxt: use the indir table from ethtool context
Instead of allocating a separate indir table in the vnic use
the one already present in the RSS context allocated by the core.
This saves some LoC and also we won't have to worry about syncing
the local version back to the core, once core learns how to dump
contexts.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:12 +0000 (15:07 -0700)]
eth: bnxt: bump the entry size in indir tables to u32
Ethtool core stores indirection table with u32 entries, "just to be safe".
Switch the type in the driver, so that it's easier to swap local tables
for the core ones. Memory allocations already use sizeof(*entry), switch
the memset()s as well.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:11 +0000 (15:07 -0700)]
eth: bnxt: pad out the correct indirection table
bnxt allocates tables of max size, and changes the used size
based on number of active rings. The unused entries get padded
out with zeros. bnxt_modify_rss() seems to always pad out
the table of the main / default RSS context, instead of
the table of the modified context.
I haven't observed any behavior change due to this patch,
so I don't think it's a fix. Not entirely sure what role
the padding plays, 0 is a valid queue ID.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:09 +0000 (15:07 -0700)]
eth: bnxt: use context priv for struct bnxt_rss_ctx
Core can allocate space for per-context driver-private data,
use it for struct bnxt_rss_ctx. Inline bnxt_alloc_rss_ctx()
at this point, most of the init (as in the actions bnxt_del_one_rss_ctx()
will undo) is open coded in bnxt_create_rxfh_context(), anyway.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:06 +0000 (15:07 -0700)]
eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends
Use the new ethtool ops for RSS context management. The conversion
is pretty straightforward cut / paste of the right chunks of the
combined handler. Main change is that we let the core pick the IDs
(bitmap will be removed separately for ease of review), so we need
to tell the core when we lose a context.
Since the new API passes rxfh as const, change bnxt_modify_rss()
to also take const.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:05 +0000 (15:07 -0700)]
eth: bnxt: allow deleting RSS contexts when the device is down
Contexts get deleted from FW when the device is down, but they
are kept in SW and re-added back on open. bnxt_set_rxfh_context()
apparently does not want to deal with complexity of dealing with
both the device down and device up cases. This is perhaps acceptable
for creating new contexts, but not being able to delete contexts
makes core-driven cleanups messy. Specifically with the new RSS
API core will try to delete contexts automatically after bringing
the device down.
Support the delete-while-down case. Skip the FW logic and delete
just the driver state.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:04 +0000 (15:07 -0700)]
net: ethtool: let drivers declare max size of RSS indir table and key
Some drivers (bnxt but I think also mlx5 from ML discussions) change
the size of the indirection table depending on the number of Rx rings.
Decouple the max table size from the size of the currently used table,
so that we can reserve space in the context for table growth.
Static members in ethtool_ops are good enough for now, we can add
callbacks to read the max size more dynamically if someone needs
that.
Jakub Kicinski [Thu, 11 Jul 2024 22:07:03 +0000 (15:07 -0700)]
net: ethtool: let drivers remove lost RSS contexts
RSS contexts may get lost from a device, in various extreme circumstances.
Specifically if the firmware leaks resources and resets, or crashes and
either recovers in partially working state or the crash causes a
different FW version to run - creating the context again may fail.
Drivers should do their absolute best to prevent this from happening.
When it does, however, telling user that a context exists, when it can't
possibly be used any more is counter productive. Add a helper for
drivers to discard contexts. Print an error, in the future netlink
notification will also be sent.
More robust approaches were proposed, like keeping the contexts
but marking them as "dead" (but possibly resurrected by next reset).
That may be better but it's unclear at this stage whether the
effort is worth the benefits.
Merge tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski:
"A quick follow up to yesterday's pull. We got a regressions report for
the bnxt patch as soon as it got to your tree. The ethtool fix is also
good to have, although it's an older regression.
Current release - regressions:
- eth: bnxt_en: fix crash in bnxt_get_max_rss_ctx_ring() on older HW
when user tries to decrease the ring count
Previous releases - regressions:
- ethtool: fix RSS setting, accept "no change" setting if the driver
doesn't support the new features
- eth: i40e: remove needless retries of NVM update, don't wait 20min
when we know the firmware update won't succeed"
* tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net:
bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()
octeontx2-af: fix issue with IPv4 match for RSS
octeontx2-af: fix issue with IPv6 ext match for RSS
octeontx2-af: fix detection of IP layer
octeontx2-af: fix a issue with cpt_lf_alloc mailbox
octeontx2-af: replace cpt slot with lf id on reg write
i40e: fix: remove needless retries of NVM update
net: ethtool: Fix RSS setting
bp->rss_ctx_list is not initialized if the chip or firmware does not
support multiple RSS contexts. Fix it by adding a check in
bnxt_get_max_rss_ctx_ring() before proceeding to reference
bp->rss_ctx_list.
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
When loading a EXT program without specifying `attr->attach_prog_fd`,
the `prog->aux->dst_prog` will be null. At this time, calling
resolve_prog_type() anywhere will result in a null pointer dereference.
One way to fix it is by forcing `attach_prog_fd` non-empty when
bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type`
API broken which use verifier log to probe prog type and will log
nothing if we reject invalid EXT prog before bpf_check().
Another way is by adding null check in resolve_prog_type().
The issue was introduced by commit 4a9c7bbe2ed4 ("bpf: Resolve to
prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") which wanted
to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before
that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows
the logic below:
It implies that when EXT program is not yet attached to `dst_prog`,
the prog type should be EXT itself. This code worked fine in the past.
So just keep using it.
Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog`
is not present in resolve_prog_type().
Fixes: 4a9c7bbe2ed4 ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20240711145819.254178-2-wutengda@huaweicloud.com
Merge tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fix a regression in extent map shrinker behaviour.
In the past weeks we got reports from users that there are huge
latency spikes or freezes. This was bisected to newly added shrinker
of extent maps (it was added to fix a build up of the structures in
memory).
I'm assuming that the freezes would happen to many users after release
so I'd like to get it merged now so it's in 6.10. Although the diff
size is not small the changes are relatively straightforward, the
reporters verified the fixes and we did testing on our side.
The fixes:
- adjust behaviour under memory pressure and check lock or scheduling
conditions, bail out if needed
- synchronize tracking of the scanning progress so inode ranges are
not skipped or work duplicated
- do a delayed iput when scanning a root so evicting an inode does
not slow things down in case of lots of dirty data, also fix
lockdep warning, a deadlock could happen when writing the dirty
data would need to start a transaction"
* tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid races when tracking progress for extent map shrinking
btrfs: stop extent map shrinker if reschedule is needed
btrfs: use delayed iput during extent map shrinking
Merge tag 'pmdomain-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fix from Ulf Hansson:
- qcom: Skip retention level for rpmhpd's
* tag 'pmdomain-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: qcom: rpmhpd: Skip retention level for Power Domains
Merge tag 'mmc-v6.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- davinci_mmc: Prevent transmitted data size from exceeding sgm's
length
- sdhci: Fix max_seg_size for 64KiB PAGE_SIZE
* tag 'mmc-v6.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: davinci_mmc: Prevent transmitted data size from exceeding sgm's length
mmc: sdhci: Fix max_seg_size for 64KiB PAGE_SIZE
Daniel Borkmann [Fri, 12 Jul 2024 16:12:30 +0000 (18:12 +0200)]
selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again
Revert commit 90dc946059b7 ("selftests/bpf: DENYLIST.aarch64: Remove
fexit_sleep") again. The fix in 19d3c179a377 ("bpf, arm64: Fix trampoline
for BPF_TRAMP_F_CALL_ORIG") does not address all of the issues and BPF
CI is still hanging and timing out:
Merge tag 'arm-fixes-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Most of these changes are Qualcomm SoC specific and came in just after
I sent out the last set of fixes. This includes two regression fixes
for SoC drivers, a defconfig change to ensure the Lenovo X13s is
usable and 11 changes to DT files to fix regressions and minor
platform specific issues.
Tony and Chunyan step back from their respective maintainership roles
on the omap and unisoc platforms, and Christophe in turn takes over
maintaining some of the Freescale SoC drivers that he has been taking
care of in practice already.
Lastly, there are two trivial fixes for the davinci and sunxi
platforms"
* tag 'arm-fixes-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY
MAINTAINERS: Add more maintainers for omaps
ARM: davinci: Convert comma to semicolon
MAINTAINERS: Move myself from SPRD Maintainer to Reviewer
Revert "dt-bindings: cache: qcom,llcc: correct QDU1000 reg entries"
arm64: dts: qcom: qdu1000: Fix LLCC reg property
arm64: dts: qcom: sm6115: add iommu for sdhc_1
arm64: dts: qcom: x1e80100-crd: fix DAI used for headset recording
arm64: dts: qcom: x1e80100-crd: fix WCD audio codec TX port mapping
soc: qcom: pmic_glink: disable UCSI on sc8280xp
arm64: defconfig: enable Elan i2c-hid driver
arm64: dts: qcom: sc8280xp-crd: use external pull up for touch reset
arm64: dts: qcom: sc8280xp-x13s: fix touchscreen power on
arm64: dts: qcom: x1e80100: Fix PCIe 6a reg offsets and add MHI
arm64: dts: qcom: sa8775p: Correct IRQ number of EL2 non-secure physical timer
arm64: dts: allwinner: Fix PMIC interrupt number
arm64: dts: qcom: sc8280xp: Set status = "reserved" on PSHOLD
arm64: dts: qcom: x1e80100-*: Allocate some CMA buffers
arm64: dts: qcom: sc8180x: Fix LLCC reg property again
====================
Use overflow.h helpers to check for overflows
This patch set refactors kernel/bpf/verifier.c to use type-agnostic, generic
overflow-check helpers defined in include/linux/overflow.h to check for addition
and subtraction overflow, and drop the signed_*_overflows() helpers we currently
have in kernel/bpf/verifier.c; with a fix for overflow check in adjust_jmp_off()
in patch 1.
There should be no functional change in how the verifier works and the main
motivation is to make future refactoring[1] easier.
While check_mul_overflow() also exists and could potentially replace what
we have in scalar*_min_max_mul(), it does not help with refactoring and
would either change how the verifier works (e.g. lifting restriction on
umax<=U32_MAX and u32_max<=U16_MAX) or make the code slightly harder to
read, so it is left for future endeavour.
Changes from v2 <https://lore.kernel.org/r/20240701055907.82481-1-shung-hsi.yu@suse.com>
- add fix for 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to
the 1st insn.") to correct the overflow check for general jump instructions
- adapt to changes in commit 5337ac4c9b80 ("bpf: Fix the corner case with
may_goto and jump to the 1st insn.")
- refactor in adjust_jmp_off() as well and remove signed_add16_overflow()
Changes from v1 <https://lore.kernel.org/r/20240623070324.12634-1-shung-hsi.yu@suse.com>:
- use pointers to values in dst_reg directly as the sum/diff pointer and
remove the else branch (Jiri)
- change local variables to be dst_reg pointers instead of src_reg values
- include comparison of generated assembly before & after the change
(Alexei)
bpf: use check_sub_overflow() to check for subtraction overflows
Similar to previous patch that drops signed_add*_overflows() and uses
(compiler) builtin-based check_add_overflow(), do the same for
signed_sub*_overflows() and replace them with the generic
check_sub_overflow() to make future refactoring easier and have the
checks implemented more efficiently.
Unsigned overflow check for subtraction does not use helpers and are
simple enough already, so they're left untouched.
After the change GCC 13.3.0 generates cleaner assembly on x86_64:
if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
139bf: mov 0x28(%r12),%rax
139c4: mov %edx,0x54(%r12)
139c9: sub %r11,%rax
139cc: mov %rax,0x28(%r12)
139d1: jo 14627 <adjust_reg_min_max_vals+0x1237>
check_sub_overflow(*dst_smax, src_reg->smin_value, dst_smax)) {
139d7: mov 0x30(%r12),%rax
139dc: sub %r9,%rax
139df: mov %rax,0x30(%r12)
if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
139e4: jo 14627 <adjust_reg_min_max_vals+0x1237>
...
*dst_smin = S64_MIN;
14627: movabs $0x8000000000000000,%rax
14631: mov %rax,0x28(%r12)
*dst_smax = S64_MAX;
14636: sub $0x1,%rax
1463a: mov %rax,0x30(%r12)
Before the change it gives:
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13a50: mov 0x28(%r12),%rdi
13a55: mov %edx,0x54(%r12)
dst_reg->smax_value = S64_MAX;
13a5a: movabs $0x7fffffffffffffff,%rdx
13a64: mov %eax,0x50(%r12)
dst_reg->smin_value = S64_MIN;
13a69: movabs $0x8000000000000000,%rax
s64 res = (s64)((u64)a - (u64)b);
13a73: mov %rdi,%rsi
13a76: sub %rcx,%rsi
if (b < 0)
13a79: test %rcx,%rcx
13a7c: js 145ea <adjust_reg_min_max_vals+0x119a>
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13a82: cmp %rsi,%rdi
13a85: jl 13ac7 <adjust_reg_min_max_vals+0x677>
signed_sub_overflows(dst_reg->smax_value, smin_val)) {
13a87: mov 0x30(%r12),%r8
s64 res = (s64)((u64)a - (u64)b);
13a8c: mov %r8,%rax
13a8f: sub %r9,%rax
return res > a;
13a92: cmp %rax,%r8
13a95: setl %sil
if (b < 0)
13a99: test %r9,%r9
13a9c: js 147d1 <adjust_reg_min_max_vals+0x1381>
dst_reg->smax_value = S64_MAX;
13aa2: movabs $0x7fffffffffffffff,%rdx
dst_reg->smin_value = S64_MIN;
13aac: movabs $0x8000000000000000,%rax
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13ab6: test %sil,%sil
13ab9: jne 13ac7 <adjust_reg_min_max_vals+0x677>
dst_reg->smin_value -= smax_val;
13abb: mov %rdi,%rax
dst_reg->smax_value -= smin_val;
13abe: mov %r8,%rdx
dst_reg->smin_value -= smax_val;
13ac1: sub %rcx,%rax
dst_reg->smax_value -= smin_val;
13ac4: sub %r9,%rdx
13ac7: mov %rax,0x28(%r12)
...
13ad1: mov %rdx,0x30(%r12)
...
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
145ea: cmp %rsi,%rdi
145ed: jg 13ac7 <adjust_reg_min_max_vals+0x677>
145f3: jmp 13a87 <adjust_reg_min_max_vals+0x637>
bpf: use check_add_overflow() to check for addition overflows
signed_add*_overflows() was added back when there was no overflow-check
helper. With the introduction of such helpers in commit f0907827a8a91
("compiler.h: enable builtin overflow checkers and add fallback code"), we
can drop signed_add*_overflows() in kernel/bpf/verifier.c and use the
generic check_add_overflow() instead.
This will make future refactoring easier, and takes advantage of
compiler-emitted hardware instructions that efficiently implement these
checks.
After the change GCC 13.3.0 generates cleaner assembly on x86_64:
Note: unlike adjust_ptr_min_max_vals() and scalar*_min_max_add(), it is
necessary to introduce intermediate variable in adjust_jmp_off() to keep
the functional behavior unchanged. Without an intermediate variable
imm/off will be altered even on overflow.
adjust_jmp_off() incorrectly used the insn->imm field for all overflow check,
which is incorrect as that should only be done or the BPF_JMP32 | BPF_JA case,
not the general jump instruction case. Fix it by using insn->off for overflow
check in the general case.
Merge tag 'char-misc-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are some small remaining driver fixes for 6.10-final that have
all been in linux-next for a while and resolve reported issues.
Included in here are:
- mei driver fixes (and a spelling fix at the end just to be clean)
- iio driver fixes for reported problems
- fastrpc bugfixes
- nvmem small fixes"
* tag 'char-misc-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: vsc: Fix spelling error
mei: vsc: Enhance SPI transfer of IVSC ROM
mei: vsc: Utilize the appropriate byte order swap function
mei: vsc: Prevent timeout error with added delay post-firmware download
mei: vsc: Enhance IVSC chipset stability during warm reboot
nvmem: core: limit cell sysfs permissions to main attribute ones
nvmem: core: only change name to fram for current attribute
nvmem: meson-efuse: Fix return value of nvmem callbacks
nvmem: rmem: Fix return value of rmem_read()
misc: microchip: pci1xxxx: Fix return value of nvmem callbacks
hpet: Support 32-bit userspace
misc: fastrpc: Restrict untrusted app to attach to privileged PD
misc: fastrpc: Fix ownership reassignment of remote heap
misc: fastrpc: Fix memory leak in audio daemon attach operation
misc: fastrpc: Avoid updating PD type for capability request
misc: fastrpc: Copy the complete capability structure to user
misc: fastrpc: Fix DSP capabilities request
iio: light: apds9306: Fix error handing
iio: trigger: Fix condition for own trigger
Merge tag 'tty-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial fixes from Greg KH:
"Here are some small serial driver fixes for 6.10-final. Included in
here are:
- qcom-geni fixes for a much much much discussed issue and everyone
now seems to be agreed that this is the proper way forward to
resolve the reported lockups
- imx serial driver bugfixes
- 8250_omap errata fix
- ma35d1 serial driver bugfix
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: qcom-geni: do not kill the machine on fifo underrun
serial: qcom-geni: fix hard lockup on buffer flush
serial: qcom-geni: fix soft lockup on sw flow control and suspend
serial: imx: ensure RTS signal is not left active after shutdown
tty: serial: ma35d1: Add a NULL check for of_node
serial: 8250_omap: Fix Errata i2310 with RX FIFO level check
serial: imx: only set receiver level if it is zero
Merge tag 'sound-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The majority of changes here are small device-specific fixes for ASoC
SOF / Intel and usual HD-audio quirks.
The only significant high LOC is found in the Cirrus firmware driver,
but all those are for hardening against malicious firmware blobs, and
they look fine for taking as a last minute fix, too"
* tag 'sound-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable Mute LED on HP 250 G7
firmware: cs_dsp: Use strnlen() on name fields in V1 wmfw files
ALSA: hda/realtek: Limit mic boost on VAIO PRO PX
ALSA: hda: cs35l41: Fix swapped l/r audio channels for Lenovo ThinBook 13x Gen4
ASoC: SOF: Intel: hda-pcm: Limit the maximum number of periods by MAX_BDL_ENTRIES
ASoC: rt711-sdw: add missing readable registers
ASoC: SOF: Intel: hda: fix null deref on system suspend entry
ALSA: hda/realtek: add quirk for Clevo V5[46]0TU
firmware: cs_dsp: Prevent buffer overrun when processing V2 alg headers
firmware: cs_dsp: Validate payload length before processing block
firmware: cs_dsp: Return error if block header overflows file
firmware: cs_dsp: Fix overflow checking of wmfw header
Merge tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs
Pull more bcachefs fixes from Kent Overstreet:
- revert the SLAB_ACCOUNT patch, something crazy is going on in memcg
and someone forgot to test
- minor fixes: missing rcu_read_lock(), scheduling while atomic (in an
emergency shutdown path)
- two lockdep fixes; these could have gone earlier, but were left to
bake awhile
* tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs:
bcachefs: bch2_gc_btree() should not use btree_root_lock
bcachefs: Set PF_MEMALLOC_NOFS when trans->locked
bcachefs; Use trans_unlock_long() when waiting on allocator
Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"
bcachefs: fix scheduling while atomic in break_cycle()
bcachefs: Fix RCU splat
Alan Maguire [Fri, 12 Jul 2024 09:28:59 +0000 (10:28 +0100)]
bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o
As reported by Mirsad [1] we still see format warnings in kernel/bpf/btf.o
at W=1 warning level:
CC kernel/bpf/btf.o
./kernel/bpf/btf.c: In function ‘btf_type_seq_show_flags’:
./kernel/bpf/btf.c:7553:21: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7553 | sseq.showfn = btf_seq_show;
| ^
./kernel/bpf/btf.c: In function ‘btf_type_snprintf_show’:
./kernel/bpf/btf.c:7604:31: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7604 | ssnprintf.show.showfn = btf_snprintf_show;
| ^
Combined with CONFIG_WERROR=y these can halt the build.
The fix (annotating the structure field with __printf())
suggested by Mirsad resolves these. Apologies I missed this last time.
No other W=1 warnings were observed in kernel/bpf after this fix.
Satheesh Paul [Wed, 10 Jul 2024 07:51:27 +0000 (13:21 +0530)]
octeontx2-af: fix issue with IPv4 match for RSS
While performing RSS based on IPv4, packets with
IPv4 options are not being considered. Adding changes
to match both plain IPv4 and IPv4 with option header.
Fixes: 41a7aa7b800d ("octeontx2-af: NIX Rx flowkey configuration for RSS") Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: fix issue with IPv6 ext match for RSS
While performing RSS based on IPv6, extension ltype
is not being considered. This will be problem for
fragmented packets or packets with extension header.
Adding changes to match IPv6 ext header along with IPv6
ltype.
Fixes: 41a7aa7b800d ("octeontx2-af: NIX Rx flowkey configuration for RSS") Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Mazur [Wed, 10 Jul 2024 07:51:25 +0000 (13:21 +0530)]
octeontx2-af: fix detection of IP layer
Checksum and length checks are not enabled for IPv4 header with
options and IPv6 with extension headers.
To fix this a change in enum npc_kpu_lc_ltype is required which will
allow adjustment of LTYPE_MASK to detect all types of IP headers.
Fixes: 21e6699e5cd6 ("octeontx2-af: Add NPC KPU profile") Signed-off-by: Michal Mazur <mmazur2@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: fix a issue with cpt_lf_alloc mailbox
This patch fixes CPT_LF_ALLOC mailbox error due to
incompatible mailbox message format. Specifically, it
corrects the `blkaddr` field type from `int` to `u8`.
Fixes: de2854c87c64 ("octeontx2-af: Mailbox changes for 98xx CPT block") Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: replace cpt slot with lf id on reg write
Replace slot id with global CPT lf id on reg read/write as
CPTPF/VF driver would send slot number instead of global
lf id in the reg offset. And also update the mailbox response
with the global lf's register offset.
Fixes: ae454086e3c2 ("octeontx2-af: add mailbox interface for CPT") Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Kerr [Wed, 10 Jul 2024 02:17:22 +0000 (10:17 +0800)]
net: mctp-i2c: invalidate flows immediately on TX errors
If we encounter an error on i2c packet transmit, we won't have a valid
flow anymore; since we didn't transmit a valid packet sequence, we'll
have to wait for the key to timeout instead of dropping it on the reply.
This causes the i2c lock to be held for longer than necessary.
Instead, invalidate the flow on TX error, and release the i2c lock
immediately.
Cc: Bonnie Lo <Bonnie_Lo@wiwynn.com> Tested-by: Jerry C Chen <Jerry_C_Chen@wiwynn.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY
FREESCALE SOC DRIVERS has been orphaned since
commit eaac25d026a1 ("MAINTAINERS: Drop Li Yang as their email address
stopped working")
QUICC ENGINE LIBRARY has Qiang Zhao as maintainer but he hasn't
responded for years and when Li Yang was still maintaining FREESCALE
SOC DRIVERS he was also handling QUICC ENGINE LIBRARY directly.
As a maintainer of LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX, I
also need FREESCALE SOC DRIVERS to be actively maintained, so add
myself as maintainer of FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY.
Tony Lindgren [Tue, 9 Jul 2024 13:59:29 +0000 (16:59 +0300)]
MAINTAINERS: Add more maintainers for omaps
There are many generations of omaps to maintain, and I will be only active
as a hobbyist with time permitting. Let's add more maintainers to ensure
continued Linux support.
TI is interested in maintaining the active SoCs such as am3, am4 and
dra7. And the hobbyists are interested in maintaining some of the older
devices, mainly based on omap3 and 4 SoCs.
Kevin and Roger have agreed to maintain the active TI parts. Both Kevin
and Roger have been working on the omap variants for a long time, and
have a good understanding of the hardware.
Aaro and Andreas have agreed to maintain the community devices. Both Aaro
and Andreas have long experience on working with the earlier TI SoCs.
While at it, let's also change me to be a reviewer for the omap1, and
drop the link to my old omap web page.
Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
James Chapman [Tue, 9 Jul 2024 16:28:39 +0000 (17:28 +0100)]
l2tp: fix l2tp_session_register with colliding l2tpv3 IDs
When handling colliding L2TPv3 session IDs, we use the existing
session IDR entry and link the new session on that using
session->coll_list. However, when using an existing IDR entry, we must
not do the idr_replace step.
Fixes: aa5e17e1f5ec ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shigeru Yoshida [Tue, 9 Jul 2024 14:36:32 +0000 (23:36 +0900)]
tipc: Consolidate redundant functions
link_is_up() and tipc_link_is_up() have the same functionality.
Consolidate these functions.
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shigeru Yoshida [Tue, 9 Jul 2024 14:34:10 +0000 (23:34 +0900)]
tipc: Remove unused struct declaration
struct tipc_name_table in core.h is not used. Remove this declaration.
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com> Signed-off-by: David S. Miller <davem@davemloft.net>
These changes aim to enhance the reliability of netconsole by
eliminating the potential race condition and improve maintainability
by making the code more straightforward to understand and modify.
====================
Adrian Moreno [Wed, 10 Jul 2024 09:04:59 +0000 (11:04 +0200)]
selftests: openvswitch: retry instead of sleep
There are a couple of places where the test script "sleep"s to wait for
some external condition to be met.
This is error prone, specially in slow systems (identified in CI by
"KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command
a few times until it succeeds. The timeout used is set to 5s for
"normal" systems and doubled if a slow CI machine is detected.
Alexander Lobakin [Wed, 10 Jul 2024 11:30:28 +0000 (04:30 -0700)]
netdevice: define and allocate &net_device _properly_
In fact, this structure contains a flexible array at the end, but
historically its size, alignment etc., is calculated manually.
There are several instances of the structure embedded into other
structures, but also there's ongoing effort to remove them and we
could in the meantime declare &net_device properly.
Declare the array explicitly, use struct_size() and store the array
size inside the structure, so that __counted_by() can be applied.
Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
allocated buffer is aligned to what the user expects.
Also, change its alignment from %NETDEV_ALIGN to the cacheline size
as per several suggestions on the netdev ML.
bna: adjust 'name' buf size of bna_tcb and bna_ccb structures
To have enough space to write all possible sprintf() args. Currently
'name' size is 16, but the first '%s' specifier may already need at
least 16 characters, since 'bnad->netdev->name' is used there.
For '%d' specifiers, assume that they require:
* 1 char for 'tx_id + tx_info->tcb[i]->id' sum, BNAD_MAX_TXQ_PER_TX is 8
* 2 chars for 'rx_id + rx_info->rx_ctrl[i].ccb->id', BNAD_MAX_RXP_PER_RX
is 16
And replace sprintf with snprintf.
Detected using the static analysis tool - Svace.
Fixes: 8b230ed8ec96 ("bna: Brocade 10Gb Ethernet device driver") Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksandr Loktionov [Wed, 10 Jul 2024 22:44:54 +0000 (15:44 -0700)]
i40e: fix: remove needless retries of NVM update
Remove wrong EIO to EGAIN conversion and pass all errors as is.
After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only
replace F/W specific error codes with Linux kernel generic, all EIO errors
suddenly started to be converted into EAGAIN which leads nvmupdate to retry
until it timeouts and sometimes fails after more than 20 minutes in the
middle of NVM update, so NVM becomes corrupted.
The bug affects users only at the time when they try to update NVM, and
only F/W versions that generate errors while nvmupdate. For example, X710DA2
with 0x8000ECB7 F/W is affected, but there are probably more...
Command for reproduction is just NVM update:
./nvmupdate64
The problematic code did silently convert EIO into EAGAIN which forced
nvmupdate to ignore EAGAIN error and retry the same operation until timeout.
That's why NVM update takes 20+ minutes to finish with the fail in the end.
Fixes: 230f3d53a547 ("i40e: remove i40e_status") Co-developed-by: Kelvin Kang <kelvin.kang@intel.com> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710224455.188502-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 12 Jul 2024 00:22:04 +0000 (17:22 -0700)]
Merge tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.11
Most likely the last "new features" pull request for v6.11 with
changes both in stack and in drivers. The big thing is the multiple
radios for wiphy feature which makes it possible to better advertise
radio capabilities to user space. mt76 enabled MLO and iwlwifi
re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support.
Major changes:
cfg80211/mac80211
* remove DEAUTH_NEED_MGD_TX_PREP flag
* multiple radios per wiphy support
mac80211_hwsim
* multi-radio wiphy support
ath12k
* DebugFS support for datapath statistics
* WCN7850: support for WoW (Wake on WLAN)
* WCN7850: device-tree bindings
rtw89
* preparation for RTL8852BE-VT support
* WoWLAN support for WiFi 6 chips
* 36-bit PCI DMA support
mt76
* mt7925 Multi-Link Operation (MLO) support
* tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (204 commits)
wifi: mac80211: fix AP chandef capturing in CSA
wifi: iwlwifi: correctly reference TSO page information
wifi: mt76: mt792x: fix scheduler interference in drv own process
wifi: mt76: mt7925: enabling MLO when the firmware supports it
wifi: mt76: mt7925: remove the unused mt7925_mcu_set_chan_info
wifi: mt76: mt7925: update mt7925_mac_link_bss_add for MLO
wifi: mt76: mt7925: update mt7925_mcu_bss_basic_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_set_timing for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_phy_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_rate_ctrl_tlv for MLO
wifi: mt76: mt7925: add mt7925_mcu_sta_eht_mld_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_update for MLO
wifi: mt76: mt7925: update mt7925_mcu_add_bss_info for MLO
wifi: mt76: mt7925: update mt7925_mcu_bss_mld_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_mld_tlv for MLO
wifi: mt76: mt7925: add mt7925_[assign,unassign]_vif_chanctx
wifi: mt76: add def_wcid to struct mt76_wcid
wifi: mt76: mt7925: report link information in rx status
wifi: mt76: mt7925: update rate index according to link id
wifi: mt76: mt7925: add link handling in the mt7925_ipv6_addr_change
...
====================
When user submits a rxfh set command without touching XFRM_SYM_XOR,
rxfh.input_xfrm is set to RXH_XFRM_NO_CHANGE, which is equal to 0xff.
Testing if (rxfh.input_xfrm & RXH_XFRM_SYM_XOR &&
!ops->cap_rss_sym_xor_supported)
return -EOPNOTSUPP;
Will always be true on devices that don't set cap_rss_sym_xor_supported,
since rxfh.input_xfrm & RXH_XFRM_SYM_XOR is always true, if input_xfrm
was not set, i.e RXH_XFRM_NO_CHANGE=0xff, which will result in failure
of any command that doesn't require any change of XFRM, e.g RSS context
or hash function changes.
To avoid this breakage, test if rxfh.input_xfrm != RXH_XFRM_NO_CHANGE
before testing other conditions. Note that the problem will only trigger
with XFRM-aware userspace, old ethtool CLI would continue to work.
Fixes: 0dd415d15505 ("net: ethtool: add a NO_CHANGE uAPI for new RXFH's input_xfrm") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20240710225538.43368-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 10 Jul 2024 15:16:53 +0000 (15:16 +0000)]
net: reduce rtnetlink_rcv_msg() stack usage
IFLA_MAX is increasing slowly but surely.
Some compilers use more than 512 bytes of stack in rtnetlink_rcv_msg()
because it calls rtnl_calcit() for RTM_GETLINK message.
Use noinline_for_stack attribute to not inline rtnl_calcit(),
and directly use nla_for_each_attr_type() (Jakub suggestion)
because we only care about IFLA_EXT_MASK at this stage.
Kent Overstreet [Fri, 5 Jul 2024 01:02:16 +0000 (21:02 -0400)]
bcachefs: bch2_gc_btree() should not use btree_root_lock
btree_root_lock is for the root keys in btree_root, not the pointers to
the nodes themselves; this fixes a lock ordering issue between
btree_root_lock and btree node locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mikulas Patocka:
- Fix broken discard for device mapper VDO target
* tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm vdo: replace max_discard_sectors with max_hw_discard_sectors
====================
ethtool: use the rss context XArray in ring deactivation safety-check
Now that we have an XArray storing information about all extra
RSS contexts - use it to extend checks already performed using
ethtool_get_max_rxfh_channel().
====================
Jakub Kicinski [Wed, 10 Jul 2024 17:40:43 +0000 (10:40 -0700)]
ethtool: use the rss context XArray in ring deactivation safety-check
ethtool_get_max_rxfh_channel() gets called when user requests
deactivating Rx channels. Check the additional RSS contexts, too.
While we do track whether RSS context has an indirection
table explicitly set by the user, no driver looks at that bit.
Assume drivers won't auto-regenerate the additional tables,
to be safe.
Jakub Kicinski [Wed, 10 Jul 2024 17:40:42 +0000 (10:40 -0700)]
ethtool: fail closed if we can't get max channel used in indirection tables
Commit 0d1b7d6c9274 ("bnxt: fix crashes when reducing ring count with
active RSS contexts") proves that allowing indirection table to contain
channels with out of bounds IDs may lead to crashes. Currently the
max channel check in the core gets skipped if driver can't fetch
the indirection table or when we can't allocate memory.
Both of those conditions should be extremely rare but if they do
happen we should try to be safe and fail the channel change.
Alan Maguire [Thu, 11 Jul 2024 18:23:21 +0000 (19:23 +0100)]
bpf: annotate BTF show functions with __printf
-Werror=suggest-attribute=format warns about two functions
in kernel/bpf/btf.c [1]; add __printf() annotations to silence
these warnings since for CONFIG_WERROR=y they will trigger
build failures.
net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
Bruce Johnston [Tue, 9 Jul 2024 17:00:36 +0000 (13:00 -0400)]
dm vdo: replace max_discard_sectors with max_hw_discard_sectors
Commit 4f563a64732d ("block: add a max_user_discard_sectors queue
limit") changed block core to set max_discard_sectors to:
min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors)
Commit 825d8bbd2f32 ("dm: always manage discard support in terms
of max_hw_discard_sectors") fixed most dm targetss to deal with
this, by replacing max_discard_sectors with max_hw_discard_sectors.
Unfortunately, dm-vdo did not get fixed at that time.
Fixes: 825d8bbd2f32 ("dm: always manage discard support in terms of max_hw_discard_sectors") Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Merge tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"This fixes two regressions that have been bubbling along for a large
part of this release.
One is a revert of the multi mode support for the OMAP SPI controller,
this introduced regressions on a number of systems and while there has
been progress on fixing those we've not got something that works for
everyone yet so let's just drop the change for now.
The other is a series of fixes from David Lechner for his recent
message optimisation work, this interacted badly with spi-mux which
is altogether too clever with recursive use of the bus and creates
situations that hadn't been considered.
There are also a couple of small driver specific fixes, including one
more patch from David for sleep duration calculations in the AXI
driver"
* tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: mux: set ctlr->bits_per_word_mask
spi: add defer_optimize_message controller flag
spi: don't unoptimize message in spi_async()
spi: omap2-mcspi: Revert multi mode support
spi: davinci: Unset POWERDOWN bit when releasing resources
spi: axi-spi-engine: fix sleep calculation
spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
- eth: bnxt: fix crashes when reducing ring count with active RSS
contexts
Previous releases - regressions:
- sched: fix UAF when resolving a clash
- skmsg: skip zero length skb in sk_msg_recvmsg2
- sunrpc: fix kernel free on connection failure in
xs_tcp_setup_socket
- tcp: avoid too many retransmit packets
- tcp: fix incorrect undo caused by DSACK of TLP retransmit
- udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
- eth: ks8851: fix deadlock with the SPI chip variant
- eth: i40e: fix XDP program unloading while removing the driver
Previous releases - always broken:
- bpf:
- fix too early release of tcx_entry
- fail bpf_timer_cancel when callback is being cancelled
- bpf: fix order of args in call to bpf_map_kvcalloc
- netfilter: nf_tables: prefer nft_chain_validate
- ppp: reject claimed-as-LCP but actually malformed packets
* tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
net/sched: Fix UAF when resolving a clash
net: ks8851: Fix potential TX stall after interface reopen
udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
netfilter: nf_tables: prefer nft_chain_validate
netfilter: nfnetlink_queue: drop bogus WARN_ON
ethtool: netlink: do not return SQI value if link is down
ppp: reject claimed-as-LCP but actually malformed packets
selftests/bpf: Add timer lockup selftest
net: ethernet: mtk-star-emac: set mac_managed_pm when probing
e1000e: fix force smbus during suspend flow
tcp: avoid too many retransmit packets
bpf: Defer work in bpf_timer_cancel_and_free
bpf: Fail bpf_timer_cancel when callback is being cancelled
bpf: fix order of args in call to bpf_map_kvcalloc
net: ethernet: lantiq_etop: fix double free in detach
i40e: Fix XDP program unloading while removing the driver
net: fix rc7's __skb_datagram_iter()
net: ks8851: Fix deadlock with the SPI chip variant
octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()
...
Merge tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"cachefiles:
- Export an existing and add a new cachefile helper to be used in
filesystems to fix reference count bugs
- Use the newly added fscache_ty_get_volume() helper to get a
reference count on an fscache_volume to handle volumes that are
about to be removed cleanly
- After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN
wait for all ongoing cookie lookups to complete and for the object
count to reach zero
- Propagate errors from vfs_getxattr() to avoid an infinite loop in
cachefiles_check_volume_xattr() because it keeps seeing ESTALE
- Don't send new requests when an object is dropped by raising
CACHEFILES_ONDEMAND_OJBSTATE_DROPPING
- Cancel all requests for an object that is about to be dropped
- Wait for the ondemand_boject_worker to finish before dropping a
cachefiles object to prevent use-after-free
- Use cyclic allocation for message ids to better handle id recycling
- Add missing lock protection when iterating through the xarray when
polling
netfs:
- Use standard logging helpers for debug logging
VFS:
- Fix potential use-after-free in file locks during
trace_posix_lock_inode(). The tracepoint could fire while another
task raced it and freed the lock that was requested to be traced
- Only increment the nr_dentry_negative counter for dentries that are
present on the superblock LRU. Currently, DCACHE_LRU_LIST list is
used to detect this case. However, the flag is also raised in
combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru
is used. So checking only DCACHE_LRU_LIST will lead to wrong
nr_dentry_negative count. Fix the check to not count dentries that
are on a shrink related list
Misc:
- hfsplus: fix an uninitialized value issue in copy_name
- minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even
though we switched it to kmap_local_page() a while ago"
* tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
minixfs: Fix minixfs_rename with HIGHMEM
hfsplus: fix uninit-value in copy_name
vfs: don't mod negative dentry count when on shrinker list
filelock: fix potential use-after-free in posix_lock_inode
cachefiles: add missing lock protection when polling
cachefiles: cyclic allocation of msg_id to avoid reuse
cachefiles: wait for ondemand_object_worker to finish when dropping object
cachefiles: cancel all requests for the object that is being dropped
cachefiles: stop sending new request when dropping object
cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()
cachefiles: fix slab-use-after-free in fscache_withdraw_volume()
netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()
netfs: Switch debug logging to pr_debug()
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls
__bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them
the struct bpf_tramp_image *im pointer as an argument in R0.
The trampoline generation code uses emit_addr_mov_i64() to emit
instructions for moving the bpf_tramp_image address into R0, but
emit_addr_mov_i64() assumes the address to be in the vmalloc() space
and uses only 48 bits. Because bpf_tramp_image is allocated using
kzalloc(), its address can use more than 48-bits, in this case the
trampoline will pass an invalid address to __bpf_tramp_enter/exit()
causing a kernel crash.
Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64()
as it can work with addresses that are greater than 48-bits.
Merge tag 'asoc-fix-v6.10-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.10
A few fairly small fixes for ASoC, there's a relatively large set of
hardening changes for the cs_dsp firmware file parsing and a couple of
other small device specific fixes.
Filipe Manana [Mon, 8 Jul 2024 14:42:45 +0000 (15:42 +0100)]
btrfs: avoid races when tracking progress for extent map shrinking
We store the progress (root and inode numbers) of the extent map shrinker
in fs_info without any synchronization but we can have multiple tasks
calling into the shrinker during memory allocations when there's enough
memory pressure for example.
This can result in a task A reading fs_info->extent_map_shrinker_last_ino
after another task B updates it, and task A reading
fs_info->extent_map_shrinker_last_root before task B updates it, making
task A see an odd state that isn't necessarily harmful but may make it
skip certain inode ranges or do more work than necessary by going over
the same inodes again. These unprotected accesses would also trigger
warnings from tools like KCSAN.
So add a lock to protect access to these progress fields.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 8 Jul 2024 14:42:44 +0000 (15:42 +0100)]
btrfs: stop extent map shrinker if reschedule is needed
The extent map shrinker can be called in a variety of contexts where we
are under memory pressure, and of them is when a task is trying to
allocate memory. For this reason the shrinker is typically called with a
value of struct shrink_control::nr_to_scan that is much smaller than what
we return in the nr_cached_objects callback of struct super_operations
(fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does
not take a long time and cause high latencies. However we can still take
a lot of time in the shrinker even for a limited amount of nr_to_scan:
1) When traversing the red black tree that tracks open inodes in a root,
as for example with millions of open inodes we get a deep tree which
takes time searching for an inode;
2) Iterating over the extent map tree, which is a red black tree, of an
inode when doing the rb_next() calls and when removing an extent map
from the tree, since often that requires rebalancing the red black
tree;
3) When trying to write lock an inode's extent map tree we may wait for a
significant amount of time, because there's either another task about
to do IO and searching for an extent map in the tree or inserting an
extent map in the tree, and we can have thousands or even millions of
extent maps for an inode. Furthermore, there can be concurrent calls
to the shrinker so the lock might be busy simply because there is
already another task shrinking extent maps for the same inode;
4) We often reschedule if we need to, which further increases latency.
So improve on this by stopping the extent map shrinking code whenever we
need to reschedule and make it skip an inode if we can't immediately lock
its extent map tree.
Filipe Manana [Wed, 3 Jul 2024 10:07:26 +0000 (11:07 +0100)]
btrfs: use delayed iput during extent map shrinking
When putting an inode during extent map shrinking we're doing a standard
iput() but that may take a long time in case the inode is dirty and we are
doing the final iput that triggers eviction - the VFS will have to wait
for writeback before calling the btrfs evict callback (see
fs/inode.c:evict()).
This slows down the task running the shrinker which may have been
triggered while updating some tree for example, meaning locks are held
as well as an open transaction handle.
Also if the iput() ends up triggering eviction and the inode has no links
anymore, then we trigger item truncation which requires flushing delayed
items, space reservation to start a transaction and that may trigger the
space reclaim task and wait for it, resulting in deadlocks in case the
reclaim task needs for example to commit a transaction and the shrinker
is being triggered from a path holding a transaction handle.
Syzbot reported such a case with the following stack traces:
======================================================
WARNING: possible circular locking dependency detected 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted
------------------------------------------------------
kswapd0/111 is trying to acquire lock: ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
but task is already holding lock: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
So fix this by using btrfs_add_delayed_iput() so that the final iput is
delegated to the cleaner kthread.
Link: https://lore.kernel.org/linux-btrfs/000000000000892280061a344581@google.com/ Reported-by: syzbot+3dad89b3993a4b275e72@syzkaller.appspotmail.com Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently, when built with "make W=1", the following warnings are
generated:
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'work' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_firstn'
Update the crush_choose_firstn() kernel-doc to document these
parameters.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently, when built with "make W=1", the following warnings are
generated:
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'map' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'work' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'bucket' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'x' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'left' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'numrep' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'type' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'outpos' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'tries' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_tries' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_to_leaf' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out2' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'parent_r' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_indep'
These warnings are generated because the prologue comment for
crush_choose_indep() uses the kernel-doc prefix, but the actual
comment is a very brief description that is not in kernel-doc
format. Since this is a static function there is no need to fully
document the function, so replace the kernel-doc comment prefix with a
standard comment prefix to remove these warnings.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Paolo Abeni [Thu, 11 Jul 2024 10:57:10 +0000 (12:57 +0200)]
Merge tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue.
Patch #2 fixes a crash due to stack overflow in chain loop detection
by using the existing chain validation routines
Both patches from Florian Westphal.
netfilter pull request 24-07-11
* tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: prefer nft_chain_validate
netfilter: nfnetlink_queue: drop bogus WARN_ON
====================
Paolo Abeni [Thu, 11 Jul 2024 10:38:33 +0000 (12:38 +0200)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-11
The following pull-request contains BPF updates for your *net* tree.
We've added 4 non-merge commits during the last 2 day(s) which contain
a total of 4 files changed, 262 insertions(+), 19 deletions(-).
The main changes are:
1) Fixes for a BPF timer lockup and a use-after-free scenario when timers
are used concurrently, from Kumar Kartikeya Dwivedi.
2) Fix the argument order in the call to bpf_map_kvcalloc() which could
otherwise lead to a compilation error, from Mohammad Shehar Yaar Tausif.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add timer lockup selftest
bpf: Defer work in bpf_timer_cancel_and_free
bpf: Fail bpf_timer_cancel when callback is being cancelled
bpf: fix order of args in call to bpf_map_kvcalloc
====================
Daniel Borkmann [Thu, 4 Jul 2024 06:41:57 +0000 (08:41 +0200)]
net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
The ct may be dropped if a clash has been resolved but is still passed to
the tcf_ct_flow_table_process_conn function for further usage. This issue
can be fixed by retrieving ct from skb again after confirming conntrack.
Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Chengen Du <chengen.du@canonical.com> Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ronald Wahl [Tue, 9 Jul 2024 19:58:45 +0000 (21:58 +0200)]
net: ks8851: Fix potential TX stall after interface reopen
The amount of TX space in the hardware buffer is tracked in the tx_space
variable. The initial value is currently only set during driver probing.
After closing the interface and reopening it the tx_space variable has
the last value it had before close. If it is smaller than the size of
the first send packet after reopeing the interface the queue will be
stopped. The queue is woken up after receiving a TX interrupt but this
will never happen since we did not send anything.
This commit moves the initialization of the tx_space variable to the
ks8851_net_open function right before starting the TX queue. Also query
the value from the hardware instead of using a hard coded value.
Only the SPI chip variant is affected by this issue because only this
driver variant actually depends on the tx_space variable in the xmit
function.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240709195845.9089-1-rwahl@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
syzkaller triggered the warning [0] in udp_v4_early_demux().
In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
of the looked-up sk and use sock_pfree() as skb->destructor, so we check
SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
period.
Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
into the hash table. Moreover, the SOCK_RCU_FREE check is done too early
in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
window:
nft_chain_validate already performs loop detection because a cycle will
result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE).
It also follows maps via ->validate callback in nft_lookup, so there
appears no reason to iterate the maps again.
nf_tables_check_loops() and all its helper functions can be removed.
This improves ruleset load time significantly, from 23s down to 12s.
This also fixes a crash bug. Old loop detection code can result in
unbounded recursion:
BUG: TASK stack guard page was hit at ....
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN
CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1
[..]
with a suitable ruleset during validation of register stores.
I can't see any actual reason to attempt to check for this from
nft_validate_register_store(), at this point the transaction is still in
progress, so we don't have a full picture of the rule graph.
For nf-next it might make sense to either remove it or make this depend
on table->validate_state in case we could catch an error earlier
(for improved error reporting to userspace).
Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>