]> www.infradead.org Git - nvme.git/log
nvme.git
11 months agoMerge branch 'simplify-tx-napi-logic-in-airoha_eth-driver'
Jakub Kicinski [Sun, 3 Nov 2024 19:36:18 +0000 (11:36 -0800)]
Merge branch 'simplify-tx-napi-logic-in-airoha_eth-driver'

Lorenzo Bianconi says:

====================
Simplify Tx napi logic in airoha_eth driver

Simplify Tx napi logic relying on the packet index provided by
completion queue indicating the completed packet that can be removed
from the Tx DMA ring.
Read completion queue head and pending entry in airoha_qdma_tx_napi_poll().
====================

Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-0-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: airoha: Simplify Tx napi logic
Lorenzo Bianconi [Tue, 29 Oct 2024 12:17:10 +0000 (13:17 +0100)]
net: airoha: Simplify Tx napi logic

Simplify Tx napi logic relying just on the packet index provided by
completion queue indicating the completed packet that can be removed
from the Tx DMA ring.
This is a preliminary patch to add Qdisc offload for airoha_eth driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-2-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: airoha: Read completion queue data in airoha_qdma_tx_napi_poll()
Lorenzo Bianconi [Tue, 29 Oct 2024 12:17:09 +0000 (13:17 +0100)]
net: airoha: Read completion queue data in airoha_qdma_tx_napi_poll()

In order to avoid any possible race, read completion queue head and
pending entry in airoha_qdma_tx_napi_poll routine instead of doing it in
airoha_irq_handler. Remove unused airoha_tx_irq_queue unused fields.
This is a preliminary patch to add Qdisc offload for airoha_eth driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-1-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bnxt: use ethtool string helpers
Rosen Penev [Tue, 29 Oct 2024 23:32:29 +0000 (16:32 -0700)]
net: bnxt: use ethtool string helpers

Avoids having to use manual pointer manipulation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phy: use ethtool string helpers
Rosen Penev [Tue, 29 Oct 2024 23:46:41 +0000 (16:46 -0700)]
net: phy: use ethtool string helpers

These are the preferred way to copy ethtool strings.

Avoids incrementing pointers all over the place.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241029234641.11448-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'uapi-net-ethtool-avoid-thousands-of-wflex-array-member-not-at-end-warnings'
Jakub Kicinski [Sun, 3 Nov 2024 19:07:01 +0000 (11:07 -0800)]
Merge branch 'uapi-net-ethtool-avoid-thousands-of-wflex-array-member-not-at-end-warnings'

Gustavo A. R. Silva says:

====================
UAPI: net/ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings

Small patch series aimed at fixing thousands of -Wflex-array-member-not-at-end
warnings by creating a new tagged struct within a flexible structure. We then
use this new struct type to fix problematic middle-flex-array declarations in
multiple composite structs, as well as to update the type of some variables in
various functions.

v1: https://lore.kernel.org/cover.1729536776.git.gustavoars@kernel.org
====================

Link: https://patch.msgid.link/cover.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings
Gustavo A. R. Silva [Tue, 29 Oct 2024 21:58:47 +0000 (15:58 -0600)]
net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Change the type of the middle struct member currently causing trouble from
`struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`.

Additionally, update the type of some variables in various functions that
don't access the flexible-array member, changing them to the newly created
`struct ethtool_link_settings_hdr`. These changes are needed because the
type of the conflicting middle members changed. So, those instances that
expect the type to be `struct ethtool_link_settings` should be adjusted to
the newly created type `struct ethtool_link_settings_hdr`.

Also, adjust variable declarations to follow the reverse xmas tree
convention.

Fix 3338 of the following -Wflex-array-member-not-at-end warnings:

include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoUAPI: ethtool: Use __struct_group() in struct ethtool_link_settings
Gustavo A. R. Silva [Tue, 29 Oct 2024 21:55:35 +0000 (15:55 -0600)]
UAPI: ethtool: Use __struct_group() in struct ethtool_link_settings

Use the `__struct_group()` helper to create a new tagged
`struct ethtool_link_settings_hdr`. This structure groups together
all the members of the flexible `struct ethtool_link_settings`
except the flexible array. As a result, the array is effectively
separated from the rest of the members without modifying the memory
layout of the flexible structure.

This new tagged struct will be used to fix problematic declarations
of middle-flex-arrays in composite structs[1].

[1] https://git.kernel.org/linus/d88cabfd9abc

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/9e9fb0bd72e5ba1e916acbb4995b1e358b86a689.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: use ethtool string helpers
Rosen Penev [Mon, 28 Oct 2024 04:48:28 +0000 (21:48 -0700)]
net: dsa: use ethtool string helpers

These are the preferred way to copy ethtool strings.

Avoids incrementing pointers all over the place.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
(for hellcreek driver)
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20241028044828.1639668-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'add-noinline_for_tracing-and-apply-it-to-tcp_drop_reason'
Jakub Kicinski [Sun, 3 Nov 2024 17:02:34 +0000 (09:02 -0800)]
Merge branch 'add-noinline_for_tracing-and-apply-it-to-tcp_drop_reason'

Yafang Shao says:

====================
Add noinline_for_tracing and apply it to tcp_drop_reason

This patchset introduces a new compiler annotation, noinline_for_tracing,
designed to prevent specific functions from being inlined to facilitate
tracing. In Patch #2, this annotation is applied to the tcp_drop_reason().
====================

Link: https://patch.msgid.link/20241024093742.87681-1-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: tcp: Add noinline_for_tracing annotation for tcp_drop_reason()
Yafang Shao [Thu, 24 Oct 2024 09:37:42 +0000 (17:37 +0800)]
net: tcp: Add noinline_for_tracing annotation for tcp_drop_reason()

We previously hooked the tcp_drop_reason() function using BPF to monitor
TCP drop reasons. However, after upgrading our compiler from GCC 9 to GCC
11, tcp_drop_reason() is now inlined, preventing us from hooking into it.
To address this, it would be beneficial to make noinline explicitly for
tracing.

Link: https://lore.kernel.org/netdev/CANn89iJuShCmidCi_ZkYABtmscwbVjhuDta1MS5LxV_4H9tKOA@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Link: https://patch.msgid.link/20241024093742.87681-3-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agocompiler_types: Add noinline_for_tracing annotation
Yafang Shao [Thu, 24 Oct 2024 09:37:41 +0000 (17:37 +0800)]
compiler_types: Add noinline_for_tracing annotation

Kernel functions that are not inlined can be easily hooked with BPF for
tracing. However, functions intended for tracing may still be inlined
unexpectedly. For example, in our case, after upgrading the compiler from
GCC 9 to GCC 11, the tcp_drop_reason() function was inlined, which broke
our monitoring tools. To prevent this, we need to ensure that the function
remains non-inlined.

The noinline_for_tracing annotation is introduced as a general solution for
preventing inlining of kernel functions that need to be traced. This
approach avoids the need for adding individual noinline comments to each
function and provides a more consistent way to maintain traceability.

Link: https://lore.kernel.org/netdev/CANn89iKvr44ipuRYFaPTpzwz=B_+pgA94jsggQ946mjwreV6Aw@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://patch.msgid.link/20241024093742.87681-2-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'dpll-expose-clock-quality-level'
Jakub Kicinski [Sun, 3 Nov 2024 16:39:12 +0000 (08:39 -0800)]
Merge branch 'dpll-expose-clock-quality-level'

Jiri Pirko says:

====================
dpll: expose clock quality level

Some device driver might know the quality of the clock it is running.
In order to expose the information to the user, introduce new netlink
attribute and dpll device op. Implement the op in mlx5 driver.

Example:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --dump device-get
[{'clock-id': 13316852727532664826,
  'clock-quality-level': ['itu-opt1-eeec'],    <<<<<<<<<<<<<<<<<
  'id': 0,
  'lock-status': 'unlocked',
  'lock-status-error': 'none',
  'mode': 'manual',
  'mode-supported': ['manual'],
  'module-name': 'mlx5_dpll',
  'type': 'eec'}]
====================

Link: https://patch.msgid.link/20241030081157.966604-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5: DPLL, Add clock quality level op implementation
Jiri Pirko [Wed, 30 Oct 2024 08:11:57 +0000 (09:11 +0100)]
net/mlx5: DPLL, Add clock quality level op implementation

Use MSECQ register to query clock quality from firmware. Implement the
dpll op and fill-up the quality level value properly.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241030081157.966604-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodpll: add clock quality level attribute and op
Jiri Pirko [Wed, 30 Oct 2024 08:11:56 +0000 (09:11 +0100)]
dpll: add clock quality level attribute and op

In order to allow driver expose quality level of the clock it is
running, introduce a new netlink attr with enum to carry it to the
userspace. Also, introduce an op the dpll netlink code calls into the
driver to obtain the value.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241030081157.966604-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: fjes: use ethtool string helpers
Rosen Penev [Tue, 29 Oct 2024 23:27:21 +0000 (16:27 -0700)]
net: fjes: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241029232721.8442-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonetlink: Remove the dead code in netlink_proto_init()
Jinjie Ruan [Wed, 30 Oct 2024 01:21:47 +0000 (09:21 +0800)]
netlink: Remove the dead code in netlink_proto_init()

In the error path of netlink_proto_init(), frees the already allocated
bucket table for new hash tables in a loop, but it is going to panic,
so it is not necessary to clean up the resources, just remove the
dead code.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20241030012147.357400-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoselftests/net: Fix ./ns-XXXXXX not cleanup
Li Zhijian [Wed, 30 Oct 2024 00:59:43 +0000 (08:59 +0800)]
selftests/net: Fix ./ns-XXXXXX not cleanup

```
readonly STATS="$(mktemp -p /tmp ns-XXXXXX)"
readonly BASE=`basename $STATS`
```
It could be a mistake to write to $BASE rather than $STATS, where $STATS
is used to save the NSTAT_HISTORY and it will be cleaned up before exit.

Although since we've been creating the wrong file this whole time and
everything worked, it's fine to remove these 2 lines completely

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20241030005943.400225-1-lizhijian@fujitsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoselftests: netdevsim: add fib_notifications to Makefile
Jakub Kicinski [Tue, 29 Oct 2024 19:26:03 +0000 (12:26 -0700)]
selftests: netdevsim: add fib_notifications to Makefile

Commit 19d36d2971e6 ("selftests: netdevsim: Add fib_notifications test")
added the test but didn't include it in the Makefile.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241029192603.509295-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodql: annotate data-races around dql->last_obj_cnt
Eric Dumazet [Tue, 29 Oct 2024 19:14:25 +0000 (19:14 +0000)]
dql: annotate data-races around dql->last_obj_cnt

dql->last_obj_cnt is read/written from different contexts,
without any lock synchronization.

Use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241029191425.2519085-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonetlink: add NLA_POLICY_MAX_LEN macro
Antonio Quartulli [Tue, 29 Oct 2024 10:47:14 +0000 (11:47 +0100)]
netlink: add NLA_POLICY_MAX_LEN macro

Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy
with a maximum length value.

The netlink generator for YAML specs has been extended accordingly.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20241029-b4-ovpn-v11-1-de4698c73a25@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agofsl/fman: Validate cell-index value obtained from Device Tree
Aleksandr Mishin [Mon, 28 Oct 2024 06:58:24 +0000 (09:58 +0300)]
fsl/fman: Validate cell-index value obtained from Device Tree

Cell-index value is obtained from Device Tree and then used to calculate
the index for accessing arrays port_mfl[], mac_mfl[] and intr_mng[].
In case of broken DT due to any error cell-index can contain any value
and it is possible to go beyond the array boundaries which can lead
at least to memory corruption.

Validate cell-index value obtained from Device Tree.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Link: https://patch.msgid.link/20241028065824.15452-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonetlabel: document doi_remove field of struct netlbl_calipso_ops
George Guo [Mon, 28 Oct 2024 12:34:35 +0000 (20:34 +0800)]
netlabel: document doi_remove field of struct netlbl_calipso_ops

Add documentation of doi_remove field to Kernel doc for struct netlbl_calipso_ops.

Flagged by ./scripts/kernel-doc -none.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20241028123435.3495916-1-dongtai.guo@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoptp_pch: Replace deprecated PCI functions
Philipp Stanner [Mon, 28 Oct 2024 09:59:44 +0000 (10:59 +0100)]
ptp_pch: Replace deprecated PCI functions

pcim_iomap_regions() and pcim_iomap_table() have been deprecated in
commit e354bb84a4c1 ("PCI: Deprecate pcim_iomap_table(),
pcim_iomap_regions_request_all()").

Replace these functions with pcim_iomap_region().

Additionally, pass KBUILD_MODNAME to that function, since the 'name'
parameter should indicate who (i.e., which driver) has requested the
resource.

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Link: https://patch.msgid.link/20241028095943.20498-2-pstanner@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: freescale: use ethtool string helpers
Rosen Penev [Fri, 25 Oct 2024 20:37:57 +0000 (13:37 -0700)]
net: freescale: use ethtool string helpers

The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer. Cleans up the code quite well.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lee Trager <lee@trager.us>
Link: https://patch.msgid.link/20241025203757.288367-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp: only release congestion control if it has been initialized
Pengcheng Yang [Fri, 25 Oct 2024 08:45:44 +0000 (16:45 +0800)]
tcp: only release congestion control if it has been initialized

Currently, when cleaning up congestion control, we always call the
release regardless of whether it has been initialized. There is no
need to release when closing TCP_LISTEN and TCP_CLOSE (close
immediately after socket()).

In this case, tcp_cdg calls kfree(NULL) in release without causing
an exception, but for some customized ca, this could lead to
unexpected exceptions. We need to ensure that init and release are
called in pairs.

Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Link: https://patch.msgid.link/1729845944-6003-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 1 Nov 2024 00:30:16 +0000 (17:30 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.12-rc6).

Conflicts:

drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
  cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
  188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/

net/mac80211/cfg.c
  c4382d5ca1af ("wifi: mac80211: update the right link for tx power")
  8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx")

drivers/net/ethernet/intel/ice/ice_ptp_hw.h
  6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM")
  e4291b64e118 ("ice: Align E810T GPIO to other products")
  ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions")
  ac532f4f4251 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Fri, 1 Nov 2024 00:56:19 +0000 (14:56 -1000)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

 - Fix BPF verifier to force a checkpoint when the program's jump
   history becomes too long (Eduard Zingerman)

 - Add several fixes to the BPF bits iterator addressing issues like
   memory leaks and overflow problems (Hou Tao)

 - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)

 - Fix BPF test infra's LIVE_FRAME frame update after a page has been
   recycled (Toke Høiland-Jørgensen)

 - Fix BPF verifier and undo the 40-bytes extra stack space for
   bpf_fastcall patterns due to various bugs (Eduard Zingerman)

 - Fix a BPF sockmap race condition which could trigger a NULL pointer
   dereference in sock_map_link_update_prog (Cong Wang)

 - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
   the socket lock (Jiayuan Chen)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
  selftests/bpf: Add three test cases for bits_iter
  bpf: Use __u64 to save the bits in bits iterator
  bpf: Check the validity of nr_words in bpf_iter_bits_new()
  bpf: Add bpf_mem_alloc_check_size() helper
  bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
  selftests/bpf: Add test for trie_get_next_key()
  bpf: Fix out-of-bounds write in trie_get_next_key()
  selftests/bpf: Test with a very short loop
  bpf: Force checkpoint when jmp history is too long
  bpf: fix filed access without lock
  sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()

11 months agoMerge tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 31 Oct 2024 22:39:58 +0000 (12:39 -1000)]
Merge tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from WiFi, bluetooth and netfilter.

  No known new regressions outstanding.

  Current release - regressions:

   - wifi: mt76: do not increase mcu skb refcount if retry is not
     supported

  Current release - new code bugs:

   - wifi:
      - rtw88: fix the RX aggregation in USB 3 mode
      - mac80211: fix memory corruption bug in struct ieee80211_chanctx

  Previous releases - regressions:

   - sched:
      - stop qdisc_tree_reduce_backlog on TC_H_ROOT
      - sch_api: fix xa_insert() error path in tcf_block_get_ext()

   - wifi:
      - revert "wifi: iwlwifi: remove retry loops in start"
      - cfg80211: clear wdev->cqm_config pointer on free

   - netfilter: fix potential crash in nf_send_reset6()

   - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()

   - bluetooth: fix null-ptr-deref in hci_read_supported_codecs

   - eth: mlxsw: add missing verification before pushing Tx header

   - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
     bounds issue

  Previous releases - always broken:

   - wifi: mac80211: do not pass a stopped vif to the driver in
     .get_txpower

   - netfilter: sanitize offset and length before calling skb_checksum()

   - core:
      - fix crash when config small gso_max_size/gso_ipv4_max_size
      - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension

   - mptcp: protect sched with rcu_read_lock

   - eth: ice: fix crash on probe for DPLL enabled E810 LOM

   - eth: macsec: fix use-after-free while sending the offloading packet

   - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data

   - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices

   - eth: mtk_wed: fix path of MT7988 WO firmware"

* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
  net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
  net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
  net: hns3: don't auto enable misc vector
  net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  net: hns3: fix missing features due to dev->features configuration too early
  net: hns3: fixed reset failure issues caused by the incorrect reset type
  net: hns3: add sync command to sync io-pgtable
  net: hns3: default enable tx bounce buffer when smmu enabled
  netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
  net: ethernet: mtk_wed: fix path of MT7988 WO firmware
  selftests: forwarding: Add IPv6 GRE remote change tests
  mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
  mlxsw: pci: Sync Rx buffers for device
  mlxsw: pci: Sync Rx buffers for CPU
  mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
  net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
  Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
  netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
  netfilter: Fix use-after-free in get_info()
  ...

11 months agoMerge tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 31 Oct 2024 18:15:40 +0000 (08:15 -1000)]
Merge tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here we see slightly more commits than wished, but basically all are
  small and mostly trivial fixes.

  The only core change is the workaround for __counted_by() usage in
  ASoC DAPM code, while the rest are device-specific fixes for Intel
  Baytrail devices, Cirrus and wcd937x codecs, and HD-audio / USB-audio
  devices"

* tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1
  ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3
  ALSA: usb-audio: Add quirks for Dell WD19 dock
  ASoC: codecs: wcd937x: relax the AUX PDM watchdog
  ASoC: codecs: wcd937x: add missing LO Switch control
  ASoC: dt-bindings: rockchip,rk3308-codec: add port property
  ALSA: hda/realtek: Add subwoofer quirk for Infinix ZERO BOOK 13
  ASoC: dapm: fix bounds checker error in dapm_widget_list_create
  ASoC: Intel: sst: Fix used of uninitialized ctx to log an error
  ASoC: cs42l51: Fix some error handling paths in cs42l51_probe()
  ASoC: Intel: sst: Support LPE0F28 ACPI HID
  ALSA: hda/realtek: Limit internal Mic boost on Dell platform
  ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet
  ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec
  ASoC: codecs: rt5640: Always disable IRQs from rt5640_cancel_work()

11 months agobpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
Toke Høiland-Jørgensen [Wed, 30 Oct 2024 10:48:26 +0000 (11:48 +0100)]
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled

The test_run code detects whether a page has been modified and
re-initialises the xdp_frame structure if it has, using
xdp_update_frame_from_buff(). However, xdp_update_frame_from_buff()
doesn't touch frame->mem, so that wasn't correctly re-initialised, which
led to the pages from page_pool not being returned correctly. Syzbot
noticed this as a memory leak.

Fix this by also copying the frame->mem structure when re-initialising
the frame, like we do on initialisation of a new page from page_pool.

Fixes: e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20241030-test-run-mem-fix-v1-1-41e88e8cae43@redhat.com
11 months agonet: phy: dp83822: Configure RMII mode on DP83825 devices
Erik Schumacher [Thu, 24 Oct 2024 13:24:23 +0000 (13:24 +0000)]
net: phy: dp83822: Configure RMII mode on DP83825 devices

Like the DP83826, the DP83825 can also be configured as an RMII master or
slave via a control register. The existing function responsible for this
configuration is renamed to a general dp8382x function. The DP83825 only
supports RMII so nothing more needs to be configured.

With this change, the dp83822_driver list is reorganized according to the
device name.

Signed-off-by: Erik Schumacher <erik.schumacher@iris-sensing.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/aa62d081804f44b5af0e8de2372ae6bfe1affd34.camel@iris-sensing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 31 Oct 2024 11:13:08 +0000 (12:13 +0100)]
Merge tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for net:

1) Remove unused parameters in conntrack_dump_flush.c used by
   selftests, from Liu Jing.

2) Fix possible UaF when removing xtables module via getsockopt()
   interface, from Dong Chenchen.

3) Fix potential crash in nf_send_reset6() reported by syzkaller.
   From Eric Dumazet

4) Validate offset and length before calling skb_checksum()
   in nft_payload, otherwise hitting BUG() is possible.

netfilter pull request 24-10-31

* tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
  netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
  netfilter: Fix use-after-free in get_info()
  selftests: netfilter: remove unused parameter
====================

Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Paolo Abeni [Thu, 31 Oct 2024 10:32:57 +0000 (11:32 +0100)]
Merge tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci: fix null-ptr-deref in hci_read_supported_codecs

* tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
====================

Link: https://patch.msgid.link/20241030192205.38298-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Paolo Abeni [Thu, 31 Oct 2024 10:15:47 +0000 (11:15 +0100)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

ChangeLog:
v2 -> v3:
  - Rewrite the commit logs of net: hns3: add sync command to sync io-pgtable' to
    add more verbose explanation, suggested Paolo.
  - Add fixes tag for hardware issue, suggested Paolo and Simon Horman.
v2: https://lore.kernel.org/all/20241018101059.1718375-1-shaojijie@huawei.com/
v1 -> v2:
  - Pass IRQF_NO_AUTOEN to request_irq(), suggested by Jakub.
  - Rewrite the commit logs of 'net: hns3: default enable tx bounce buffer when smmu enabled'
    and 'net: hns3: add sync command to sync io-pgtable'.
v1: https://lore.kernel.org/all/20241011094521.3008298-1-shaojijie@huawei.com/
====================

Link: https://patch.msgid.link/20241025092938.2912958-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: fix kernel crash when 1588 is sent on HIP08 devices
Jie Wang [Fri, 25 Oct 2024 09:29:38 +0000 (17:29 +0800)]
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices

Currently, HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL. But the tx process would still try to set hardware time
stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash.

[  128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
...
[  128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge]
[  128.292938] sp : ffff800059b93140
[  128.297200] x29: ffff800059b93140 x28: 0000000000003280
[  128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080
[  128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001
[  128.315969] x23: 0000000000000000 x22: 0000000000000194
[  128.322219] x21: ffff0cd94f986000 x20: 0000000000000000
[  128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000
[  128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24
[  128.340934] x15: 0000ffffd530a518 x14: 0000000000000000
[  128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368
[  128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02
[  128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0
[  128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000
[  128.372040] x5 : 0000000000000000 x4 : 000000000000ffff
[  128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294
[  128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080
[  128.390626] Call trace:
[  128.393964]  hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.399893]  hns3_nic_net_xmit+0x39c/0x4c4 [hns3]
[  128.405468]  xmit_one.constprop.0+0xc4/0x200
[  128.410600]  dev_hard_start_xmit+0x54/0xf0
[  128.415556]  sch_direct_xmit+0xe8/0x634
[  128.420246]  __dev_queue_xmit+0x224/0xc70
[  128.425101]  dev_queue_xmit+0x1c/0x40
[  128.429608]  ovs_vport_send+0xac/0x1a0 [openvswitch]
[  128.435409]  do_output+0x60/0x17c [openvswitch]
[  128.440770]  do_execute_actions+0x898/0x8c4 [openvswitch]
[  128.446993]  ovs_execute_actions+0x64/0xf0 [openvswitch]
[  128.453129]  ovs_dp_process_packet+0xa0/0x224 [openvswitch]
[  128.459530]  ovs_vport_receive+0x7c/0xfc [openvswitch]
[  128.465497]  internal_dev_xmit+0x34/0xb0 [openvswitch]
[  128.471460]  xmit_one.constprop.0+0xc4/0x200
[  128.476561]  dev_hard_start_xmit+0x54/0xf0
[  128.481489]  __dev_queue_xmit+0x968/0xc70
[  128.486330]  dev_queue_xmit+0x1c/0x40
[  128.490856]  ip_finish_output2+0x250/0x570
[  128.495810]  __ip_finish_output+0x170/0x1e0
[  128.500832]  ip_finish_output+0x3c/0xf0
[  128.505504]  ip_output+0xbc/0x160
[  128.509654]  ip_send_skb+0x58/0xd4
[  128.513892]  udp_send_skb+0x12c/0x354
[  128.518387]  udp_sendmsg+0x7a8/0x9c0
[  128.522793]  inet_sendmsg+0x4c/0x8c
[  128.527116]  __sock_sendmsg+0x48/0x80
[  128.531609]  __sys_sendto+0x124/0x164
[  128.536099]  __arm64_sys_sendto+0x30/0x5c
[  128.540935]  invoke_syscall+0x50/0x130
[  128.545508]  el0_svc_common.constprop.0+0x10c/0x124
[  128.551205]  do_el0_svc+0x34/0xdc
[  128.555347]  el0_svc+0x20/0x30
[  128.559227]  el0_sync_handler+0xb8/0xc0
[  128.563883]  el0_sync+0x160/0x180

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
Hao Lan [Fri, 25 Oct 2024 09:29:37 +0000 (17:29 +0800)]
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue

The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.

The error message:

Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
 hclge_fetch_pf_reg+0x138/0x250 [hclge]
 hclge_get_regs+0x84/0x1d0 [hclge]
 hns3_get_regs+0x2c/0x50 [hns3]
 ethtool_get_regs+0xf4/0x270
 dev_ethtool+0x674/0x8a0
 dev_ioctl+0x270/0x36c
 sock_do_ioctl+0x110/0x2a0
 sock_ioctl+0x2ac/0x530
 __arm64_sys_ioctl+0xa8/0x100
 invoke_syscall+0x4c/0x124
 el0_svc_common.constprop.0+0x140/0x15c
 do_el0_svc+0x30/0xd0
 el0_svc+0x1c/0x2c
 el0_sync_handler+0xb0/0xb4
 el0_sync+0x168/0x180

Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: initialize reset_timer before hclgevf_misc_irq_init()
Jian Shen [Fri, 25 Oct 2024 09:29:36 +0000 (17:29 +0800)]
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()

Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.

Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: don't auto enable misc vector
Jian Shen [Fri, 25 Oct 2024 09:29:35 +0000 (17:29 +0800)]
net: hns3: don't auto enable misc vector

Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:

[   16.324639] Call trace:
[   16.324641]  __queue_delayed_work+0xb8/0xe0
[   16.324643]  mod_delayed_work_on+0x78/0xd0
[   16.324655]  hclge_errhand_task_schedule+0x58/0x90 [hclge]
[   16.324662]  hclge_misc_irq_handle+0x168/0x240 [hclge]
[   16.324666]  __handle_irq_event_percpu+0x64/0x1e0
[   16.324667]  handle_irq_event+0x80/0x170
[   16.324670]  handle_fasteoi_edge_irq+0x110/0x2bc
[   16.324671]  __handle_domain_irq+0x84/0xfc
[   16.324673]  gic_handle_irq+0x88/0x2c0
[   16.324674]  el1_irq+0xb8/0x140
[   16.324677]  arch_cpu_idle+0x18/0x40
[   16.324679]  default_idle_call+0x5c/0x1bc
[   16.324682]  cpuidle_idle_call+0x18c/0x1c4
[   16.324684]  do_idle+0x174/0x17c
[   16.324685]  cpu_startup_entry+0x30/0x6c
[   16.324687]  secondary_start_kernel+0x1a4/0x280
[   16.324688] ---[ end trace 6aa0bff672a964aa ]---

So don't auto enable misc vector when request irq..

Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: Resolved the issue that the debugfs query result is inconsistent.
Hao Lan [Fri, 25 Oct 2024 09:29:34 +0000 (17:29 +0800)]
net: hns3: Resolved the issue that the debugfs query result is inconsistent.

This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the user
process is called next time, save_buf is used to copy the cached data
to the user space. As a result, the queried data is inconsistent. To solve
this problem, determine whether the function is invoked for the first time
based on the value of *ppos. If *ppos is 0, obtain the actual data.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: fix missing features due to dev->features configuration too early
Hao Lan [Fri, 25 Oct 2024 09:29:33 +0000 (17:29 +0800)]
net: hns3: fix missing features due to dev->features configuration too early

Currently, the netdev->features is configured in hns3_nic_set_features.
As a result, __netdev_update_features considers that there is no feature
difference, and the procedures of the real features are missing.

Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: fixed reset failure issues caused by the incorrect reset type
Hao Lan [Fri, 25 Oct 2024 09:29:32 +0000 (17:29 +0800)]
net: hns3: fixed reset failure issues caused by the incorrect reset type

When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.

The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized

Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
default_reset_request = 0x20,
reset_level = HNAE3_NONE_RESET,
reset_pending = 0x100,
reset_type = HNAE3_NONE_RESET,
...
};

Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: add sync command to sync io-pgtable
Jian Shen [Fri, 25 Oct 2024 09:29:31 +0000 (17:29 +0800)]
net: hns3: add sync command to sync io-pgtable

To avoid errors in pgtable prefectch, add a sync command to sync
io-pagtable.

This is a supplement for the previous patch.
We want all the tx packet can be handled with tx bounce buffer path.
But it depends on the remain space of the spare buffer, checked by the
hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true.
But once it return false by no available space, the packet will be handled
with the former path, which will map/unmap the skb buffer.
Then the driver will face the smmu prefetch risk again.

So add a sync command in this case to avoid smmu prefectch,
just protects corner scenes.

Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: hns3: default enable tx bounce buffer when smmu enabled
Peiyang Wang [Fri, 25 Oct 2024 09:29:30 +0000 (17:29 +0800)]
net: hns3: default enable tx bounce buffer when smmu enabled

The SMMU engine on HIP09 chip has a hardware issue.
SMMU pagetable prefetch features may prefetch and use a invalid PTE
even the PTE is valid at that time. This will cause the device trigger
fake pagefaults. The solution is to avoid prefetching by adding a
SYNC command when smmu mapping a iova. But the performance of nic has a
sharp drop. Then we do this workaround, always enable tx bounce buffer,
avoid mapping/unmapping on TX path.

This issue only affects HNS3, so we always enable
tx bounce buffer when smmu enabled to improve performance.

Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonetfilter: nft_payload: sanitize offset and length before calling skb_checksum()
Pablo Neira Ayuso [Wed, 30 Oct 2024 22:13:48 +0000 (23:13 +0100)]
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()

If access to offset + length is larger than the skbuff length, then
skb_checksum() triggers BUG_ON().

skb_checksum() internally subtracts the length parameter while iterating
over skbuff, BUG_ON(len) at the end of it checks that the expected
length to be included in the checksum calculation is fully consumed.

Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Reported-by: Slavin Liu <slavin-ayu@qq.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 months agoMerge branch 'add-ethernet-dts-schema-for-qcs615-qcs8300'
Jakub Kicinski [Thu, 31 Oct 2024 01:57:55 +0000 (18:57 -0700)]
Merge branch 'add-ethernet-dts-schema-for-qcs615-qcs8300'

Yijie Yang says:

====================
Add ethernet dts schema for qcs615/qcs8300

Document the ethernet and SerDes compatible for qcs8300. This platform
shares the same EMAC and SerDes as sa8775p, so the compatible fallback to
it.
Document the ethernet compatible for qcs615. This platform shares the
same EMAC as sm8150, so the compatible fallback to it.
Document the compatible for revision 2 of the qcs8300-ride board.

v2: https://lore.kernel.org/20241017-schema-v2-0-2320f68dc126@quicinc.com
v1: https://lore.kernel.org/132a8e29-3be7-422a-bc83-d6be00fac3e8@kernel.org
====================

Link: https://patch.msgid.link/20241029-schema-v3-0-fbde519eaf00@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodt-bindings: net: qcom,ethqos: add description for qcs8300
Yijie Yang [Tue, 29 Oct 2024 03:11:56 +0000 (11:11 +0800)]
dt-bindings: net: qcom,ethqos: add description for qcs8300

Add compatible for the MAC controller on qcs8300 platforms.
Since qcs8300 shares the same EMAC as sa8775p, so it fallback to the
compatible.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com>
Link: https://patch.msgid.link/20241029-schema-v3-2-fbde519eaf00@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodt-bindings: net: qcom,ethqos: add description for qcs615
Yijie Yang [Tue, 29 Oct 2024 03:11:55 +0000 (11:11 +0800)]
dt-bindings: net: qcom,ethqos: add description for qcs615

Add compatible for the MAC controller on qcs615 platform.
Since qcs615 shares the same EMAC as sm8150, so it fallback to that
compatible.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com>
Link: https://patch.msgid.link/20241029-schema-v3-1-fbde519eaf00@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'bna-remove-error-checking-for-debugfs-create-apis'
Jakub Kicinski [Thu, 31 Oct 2024 01:51:54 +0000 (18:51 -0700)]
Merge branch 'bna-remove-error-checking-for-debugfs-create-apis'

Zhen Lei says:

====================
bna: Remove error checking for debugfs create APIs

1. Fix the incorrect return value check for debugfs_create_dir() and
   debugfs_create_file(), which returns ERR_PTR(-ERROR) instead of NULL
   when it fails.
2. Remove field bnad_dentry_files[] in struct bnad. When a directory is
   deleted, files in the directory are automatically deleted. Therefore,
   there is need to record these files.
====================

Link: https://patch.msgid.link/20241028020943.507-1-thunder.leizhen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agobna: Remove field bnad_dentry_files[] in struct bnad
Zhen Lei [Mon, 28 Oct 2024 02:09:43 +0000 (10:09 +0800)]
bna: Remove field bnad_dentry_files[] in struct bnad

Function debugfs_remove() recursively removes a directory, include all
files created by debugfs_create_file(). Therefore, there is no need to
explicitly record each file with member ->bnad_dentry_files[] and
explicitly delete them at the end. Remove field bnad_dentry_files[] and
its related processing codes for simplification.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241028020943.507-3-thunder.leizhen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agobna: Remove error checking for debugfs create APIs
Zhen Lei [Mon, 28 Oct 2024 02:09:42 +0000 (10:09 +0800)]
bna: Remove error checking for debugfs create APIs

Driver bna can work fine even if any previous call to debugfs create
APIs failed. All return value checks of them should be dropped, as
debugfs APIs say.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241028020943.507-2-thunder.leizhen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agortnetlink: Fix an error handling path in rtnl_newlink()
Christophe JAILLET [Sat, 26 Oct 2024 14:17:44 +0000 (16:17 +0200)]
rtnetlink: Fix an error handling path in rtnl_newlink()

When some code has been moved in the commit in Fixes, some "return err;"
have correctly been changed in goto <some_where_in_the_error_handling_path>
but this one was missed.

Should "ops->maxtype > RTNL_MAX_TYPE" happen, then some resources would
leak.

Go through the error handling path to fix these leaks.

Fixes: 0d3008d1a9ae ("rtnetlink: Move ops->validate to rtnl_newlink().")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/eca90eeb4d9e9a0545772b68aeaab883d9fe2279.1729952228.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: ethernet: mtk_wed: fix path of MT7988 WO firmware
Daniel Golle [Sat, 26 Oct 2024 13:52:25 +0000 (14:52 +0100)]
net: ethernet: mtk_wed: fix path of MT7988 WO firmware

linux-firmware commit 808cba84 ("mtk_wed: add firmware for mt7988
Wireless Ethernet Dispatcher") added mt7988_wo_{0,1}.bin in the
'mediatek/mt7988' directory while driver current expects the files in
the 'mediatek' directory.

Change path in the driver header now that the firmware has been added.

Fixes: e2f64db13aa1 ("net: ethernet: mtk_wed: introduce WED support for MT7988")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/Zxz0GWTR5X5LdWPe@pidgin.makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mlxsw-fixes'
Jakub Kicinski [Thu, 31 Oct 2024 01:24:41 +0000 (18:24 -0700)]
Merge branch 'mlxsw-fixes'

Petr Machata says:

====================
mlxsw: Fixes

In this patchset:

- Tx header should be pushed for each packet which is transmitted via
  Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
  sure that there is both enough room to push the Tx header and that the
  SKB header is not cloned and can be modified.

- Commit b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers
  allocation") converted mlxsw to use page pool for Rx buffers allocation.
  Sync for CPU and for device should be done for Rx pages. In patches #2
  and #3, add the missing calls to sync pages for, respectively, CPU and
  the device.

- Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds
  a generic forwarding test that fails with mlxsw ports prior to the fix.
====================

Link: https://patch.msgid.link/cover.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoselftests: forwarding: Add IPv6 GRE remote change tests
Ido Schimmel [Fri, 25 Oct 2024 14:26:29 +0000 (16:26 +0200)]
selftests: forwarding: Add IPv6 GRE remote change tests

Test that after changing the remote address of an ip6gre net device
traffic is forwarded as expected. Test with both flat and hierarchical
topologies and with and without an input / output keys.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
Ido Schimmel [Fri, 25 Oct 2024 14:26:28 +0000 (16:26 +0200)]
mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address

The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.

Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.

Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.

[1]
 # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
 # ip link set dev bla type ip6gre remote 2001:db8:3::1
 # ip link del dev bla
 # devlink dev reload pci/0000:01:00.0

[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_router_netdevice_event+0x55f/0x1240
 notifier_call_chain+0x5a/0xd0
 call_netdevice_notifiers_info+0x39/0x90
 unregister_netdevice_many_notify+0x63e/0x9d0
 rtnl_dellink+0x16b/0x3a0
 rtnetlink_rcv_msg+0x142/0x3f0
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x242/0x390
 netlink_sendmsg+0x1de/0x420
 ____sys_sendmsg+0x2bd/0x320
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
unreferenced object 0xffff898081f597a0 (size 32):
  comm "ip", pid 1626, jiffies 4294719324
  hex dump (first 32 bytes):
    20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01   ...............
    21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00  !Ia.............
  backtrace (crc fd9be911):
    [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
    [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
    [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
    [<00000000743e7757>] notifier_call_chain+0x5a/0xd0
    [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
    [<000000002509645d>] register_netdevice+0x5f7/0x7a0
    [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
    [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
    [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
    [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
    [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
    [<00000000908bca63>] netlink_unicast+0x242/0x390
    [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
    [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
    [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
    [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0

Fixes: cf42911523e0 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlxsw: pci: Sync Rx buffers for device
Amit Cohen [Fri, 25 Oct 2024 14:26:27 +0000 (16:26 +0200)]
mlxsw: pci: Sync Rx buffers for device

Non-coherent architectures, like ARM, may require invalidating caches
before the device can use the DMA mapped memory, which means that before
posting pages to device, drivers should sync the memory for device.

Sync for device can be configured as page pool responsibility. Set the
relevant flag and define max_len for sync.

Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/92e01f05c4f506a4f0a9b39c10175dcc01994910.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlxsw: pci: Sync Rx buffers for CPU
Amit Cohen [Fri, 25 Oct 2024 14:26:26 +0000 (16:26 +0200)]
mlxsw: pci: Sync Rx buffers for CPU

When Rx packet is received, drivers should sync the pages for CPU, to
ensure the CPU reads the data written by the device and not stale
data from its cache.

Add the missing sync call in Rx path, sync the actual length of data for
each fragment.

Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/461486fac91755ca4e04c2068c102250026dcd0b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agomlxsw: spectrum_ptp: Add missing verification before pushing Tx header
Amit Cohen [Fri, 25 Oct 2024 14:26:25 +0000 (16:26 +0200)]
mlxsw: spectrum_ptp: Add missing verification before pushing Tx header

Tx header should be pushed for each packet which is transmitted via
Spectrum ASICs. The cited commit moved the call to skb_cow_head() from
mlxsw_sp_port_xmit() to functions which handle Tx header.

In case that mlxsw_sp->ptp_ops->txhdr_construct() is used to handle Tx
header, and txhdr_construct() is mlxsw_sp_ptp_txhdr_construct(), there is
no call for skb_cow_head() before pushing Tx header size to SKB. This flow
is relevant for Spectrum-1 and Spectrum-4, for PTP packets.

Add the missing call to skb_cow_head() to make sure that there is both
enough room to push the Tx header and that the SKB header is not cloned and
can be modified.

An additional set will be sent to net-next to centralize the handling of
the Tx header by pushing it to every packet just before transmission.

Cc: Richard Cochran <richardcochran@gmail.com>
Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/5145780b07ebbb5d3b3570f311254a3a2d554a44.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodt-bindings: net: renesas,ether: Add iommus property
Geert Uytterhoeven [Fri, 25 Oct 2024 15:12:24 +0000 (17:12 +0200)]
dt-bindings: net: renesas,ether: Add iommus property

make dtbs_check:

    arch/arm64/boot/dts/renesas/r8a77980-condor.dtb: ethernet@e7400000: 'iommus' does not match any of the regexes: '@[0-9a-f]$', 'pinctrl-[0-9]+'
    from schema $id: http://devicetree.org/schemas/net/renesas,ether.yaml#

Ethernet Controllers on R-Car Gen2/Gen3 SoCs can make use of the IOMMU,
so add the missing iommus property.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/2ca890323477a21c22e13f6a1328288f4ee816f9.1729868894.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
Benoît Monin [Thu, 24 Oct 2024 14:01:54 +0000 (16:01 +0200)]
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension

As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability
can only checksum TCP and UDP over IPv6 if the IP header does not
contains extension.

This is enforced for UDP packets emitted from user-space to an IPv6
address as they go through ip6_make_skb(), which calls
__ip6_append_data() where a check is done on the header size before
setting CHECKSUM_PARTIAL.

But the introduction of UDP encapsulation with fou6 added a code-path
where it is possible to get an skb with a partial UDP checksum and an
IPv6 header with extension:
* fou6 adds a UDP header with a partial checksum if the inner packet
does not contains a valid checksum.
* ip6_tunnel adds an IPv6 header with a destination option extension
header if encap_limit is non-zero (the default value is 4).

The thread linked below describes in more details how to reproduce the
problem with GRE-in-UDP tunnel.

Add a check on the network header size in skb_csum_hwoffload_help() to
make sure no IPv6 packet with extension header is handed to a network
device with NETIF_F_IPV6_CSUM capability.

Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u
Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: Benoît Monin <benoit.monin@gmx.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'net-sparx5-add-support-for-lan969x-switch-device'
Jakub Kicinski [Thu, 31 Oct 2024 01:08:08 +0000 (18:08 -0700)]
Merge branch 'net-sparx5-add-support-for-lan969x-switch-device'

Daniel Machon says:

====================
net: sparx5: add support for lan969x switch device

== Description:

This series is the second of a multi-part series, that prepares and adds
support for the new lan969x switch driver.

The upstreaming efforts is split into multiple series (might change a
bit as we go along):

        1) Prepare the Sparx5 driver for lan969x (merged)

    --> 2) add support lan969x (same basic features as Sparx5
           provides excl. FDMA and VCAP).

        3) Add support for lan969x VCAP, FDMA and RGMII

== Lan969x in short:

The lan969x Ethernet switch family [1] provides a rich set of
switching features and port configurations (up to 30 ports) from 10Mbps
to 10Gbps, with support for RGMII, SGMII, QSGMII, USGMII, and USXGMII,
ideal for industrial & process automation infrastructure applications,
transport, grid automation, power substation automation, and ring &
intra-ring topologies. The LAN969x family is hardware and software
compatible and scalable supporting 46Gbps to 102Gbps switch bandwidths.

== Preparing Sparx5 for lan969x:

The main preparation work for lan969x has already been merged [1].

After this series is applied, lan969x will have the same functionality
as Sparx5, except for VCAP and FDMA support. QoS features that requires
the VCAP (e.g. PSFP, port mirroring) will obviously not work until VCAP
support is added later.

== Patch breakdown:

Patch #1-#4  do some preparation work for lan969x

Patch #5     adds new registers required by lan969x

Patch #6     adds initial match data for all lan969x targets

Patch #7     defines the lan969x register differences

Patch #8     adds lan969x constants to match data

Patch #9     adds some lan969x ops in bulk

Patch #10    adds PTP function to ops

Patch #11    adds lan969x_calendar.c for calculating the calendar

Patch #12    makes additional use of the is_sparx5() macro to branch out
             in certain places.

Patch #13    documents lan969x in the dt-bindings

Patch #14    adds lan969x compatible string to sparx5 driver

Patch #15    introduces new concept of per-target features

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/

v1: https://lore.kernel.org/20241021-sparx5-lan969x-switch-driver-2-v1-0-c8c49ef21e0f@microchip.com
====================

Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-0-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add feature support
Daniel Machon [Wed, 23 Oct 2024 22:01:34 +0000 (00:01 +0200)]
net: sparx5: add feature support

Lan969x supports a number of different features, depending on the
target. Add new field sparx5->features and initialize the features based
on the target. Also add the function sparx5_has_feature() and use it
throughout. For now, we only need to handle features: PSFP and PTP -
more will come in the future.

[1] https://www.microchip.com/en-us/product/lan9698

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-15-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add compatible string for lan969x
Daniel Machon [Wed, 23 Oct 2024 22:01:33 +0000 (00:01 +0200)]
net: sparx5: add compatible string for lan969x

Add lan9691-switch compatible string to mchp_sparx5_match. Guard it with
IS_ENABLED(CONFIG_LAN969X_SWITCH) to make sure Sparx5 can be compiled on
its own.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-14-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodt-bindings: net: add compatible strings for lan969x targets
Daniel Machon [Wed, 23 Oct 2024 22:01:32 +0000 (00:01 +0200)]
dt-bindings: net: add compatible strings for lan969x targets

Add compatible strings for the twelve different lan969x targets that we
support. Either a sparx5-switch or lan9691-switch compatible string
provided on their own, or any lan969x-switch compatible string with a
fallback to lan9691-switch.

Also, add myself as a maintainer.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-13-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: use is_sparx5() macro throughout
Daniel Machon [Wed, 23 Oct 2024 22:01:31 +0000 (00:01 +0200)]
net: sparx5: use is_sparx5() macro throughout

Use the is_sparx5() macro (introduced in earlier series [1]), in places
where we need to handle things a bit differently on lan969x.

These places are:

    - in sparx5_dsm_calendar_update() we need to switch the calendar
      from a to b on lan969x.

    - in sparx5_start() we need to make sure the HSCH_SYS_CLK_PER
      register is only touched on Sparx5.

    - in sparx5_start() we need to disable VCAP and FDMA for lan969x
      (will come in later series).

    - in sparx5_mirror_port_get() we must make sure the
      ANA_AC_PROBE_PORT_CFG1 register is only read on Sparx5.

    - sparx5_netdev.c and sparx5_packet.c we need to use different IFH
      (Internal Frame Header) offsets for lan969x.

    - in sparx5_port_fifo_sz() we must bail out on lan969x.

    - in sparx5_port_config_low_set() we must configure the phase
      detection registers.

    - in sparx5_port_config() and sparx5_port_init() we must do some
      additional configuration of the port devices.

    - in sparx5_dwrr_conf_set() we must derive the scheduling layer

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-12-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add function for calculating the DSM calendar
Daniel Machon [Wed, 23 Oct 2024 22:01:30 +0000 (00:01 +0200)]
net: lan969x: add function for calculating the DSM calendar

Lan969x has support for RedBox / HSR / PRP (not implemented yet). In
order to accommodate for this in the future, we need to give lan969x it's
own function for calculating the DSM calendar.

The function calculates the calendar for each taxi bus. The calendar is
used for bandwidth allocation towards the ports attached to the taxi
bus. A calendar configuration consists of up-to 64 slots, which may be
allocated to ports or left unused. Each slot accounts for 1 clock cycle.

Also expose sparx5_cal_speed_to_value(), sparx5_get_port_cal_speed,
sparx5_cal_bw and SPX5_DSM_CAL_EMPTY for use by lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-11-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add PTP handler function
Daniel Machon [Wed, 23 Oct 2024 22:01:29 +0000 (00:01 +0200)]
net: lan969x: add PTP handler function

Add PTP IRQ handler for lan969x. This is required, as the PTP registers
are placed in two different targets on Sparx5 and lan969x. The
implementation is otherwise the same as on Sparx5.

Also, expose sparx5_get_hwtimestamp() for use by lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add lan969x ops to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:28 +0000 (00:01 +0200)]
net: lan969x: add lan969x ops to match data

Add a bunch of small lan969x ops in bulk. These ops are explained in
detail in a previous series [1].

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-9-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add constants to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:27 +0000 (00:01 +0200)]
net: lan969x: add constants to match data

Add the lan969x constants to match data. These are already used
throughout the Sparx5 code (introduced in earlier series [1]), so no
need to update any code use.

[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-8-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add register diffs to match data
Daniel Machon [Wed, 23 Oct 2024 22:01:26 +0000 (00:01 +0200)]
net: lan969x: add register diffs to match data

Add new file lan969x_regs.c that defines all the register differences
for lan969x, and add it to the lan969x match data.

GW_DEV2G5_PHASE_DETECTOR_CTRL, FP_DEV2G5_PHAD_CTRL_PHAD_ENA and
FP_DEV2G5_PHAD_CTRL_PHAD_FAILED are required by the new register macros
which was introduced earlier. Add these for Sparx5 also.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-7-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan969x: add match data for lan969x
Daniel Machon [Wed, 23 Oct 2024 22:01:25 +0000 (00:01 +0200)]
net: lan969x: add match data for lan969x

Add match data for lan969x, with initial fields for iomap, iomap_size
and ioranges. Add new Kconfig symbol CONFIG_LAN969X_CONFIG for compiling
the lan969x driver.

It has been decided to give lan969x its own Kconfig symbol, as a
considerable amount of code is needed, beside the Sparx5 code, to add
full chip support (and more will be added in future series). Also this
makes it possible to compile Sparx5 without lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-6-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add registers required by lan969x
Daniel Machon [Wed, 23 Oct 2024 22:01:24 +0000 (00:01 +0200)]
net: sparx5: add registers required by lan969x

Lan969x will require a few additional registers for certain operations.
Some are shared, some are not. Add these.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-5-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add sparx5 context pointer to a few functions
Daniel Machon [Wed, 23 Oct 2024 22:01:23 +0000 (00:01 +0200)]
net: sparx5: add sparx5 context pointer to a few functions

In preparation for lan969x, add the sparx5 context pointer to certain
IFH (Internal Frame Header) functions. This is required, as the
is_sparx5() function will be used here in a subsequent patch.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-4-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: change frequency calculation for SDLB's
Daniel Machon [Wed, 23 Oct 2024 22:01:22 +0000 (00:01 +0200)]
net: sparx5: change frequency calculation for SDLB's

In preparation for lan969x, rework the function that calculates the SDLB
(Service Dual Leacky Bucket) clock. This is required, as the
HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock
from the core clock, using the sparx5_clk_period() function. The clock
stays the same before and after this patch, only now,
sparx5_sdlb_clk_hz_get() can be used for lan969x too.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-3-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: change spx5_wr to spx5_rmw in cal update()
Daniel Machon [Wed, 23 Oct 2024 22:01:21 +0000 (00:01 +0200)]
net: sparx5: change spx5_wr to spx5_rmw in cal update()

In preparation for lan969x, use spx5_rmw() for enabling the update of
the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG
register, as an additional write will be added before this one, in a
subsequent patch.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-2-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: add support for lan969x targets and core clock
Daniel Machon [Wed, 23 Oct 2024 22:01:20 +0000 (00:01 +0200)]
net: sparx5: add support for lan969x targets and core clock

In preparation for lan969x, add lan969x targets to
sparx5_target_chiptype and set the core clock frequency for these
throughout. Lan969x only supports a core clock frequency of 328MHz.

Also, set the policer update internal (pol_upd_int) matching the 328 MHz
frequency of the lan969x targets.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-1-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agogve: change to use page_pool_put_full_page when recycling pages
Harshitha Ramamurthy [Wed, 23 Oct 2024 22:11:41 +0000 (15:11 -0700)]
gve: change to use page_pool_put_full_page when recycling pages

The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.

But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().

Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/
Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'refactoring-rvu-nic-driver'
Jakub Kicinski [Thu, 31 Oct 2024 00:50:39 +0000 (17:50 -0700)]
Merge branch 'refactoring-rvu-nic-driver'

Geetha sowjanya says:

====================
Refactoring RVU NIC driver

This is a preparation pathset for follow-up "Introducing RVU representors
driver" patches. The RVU representor driver creates representor netdev
of each rvu device when switch dev mode is enabled.

RVU representor and NIC have a similar set of HW resources(NIX_LF,RQ/SQ/CQ)
and implements a subset of NIC functionality.
This patch set groups hw resources and queue configuration code into single
API and export the existing functions so, that code can be shared between
NIC and representor drivers.

These patches are part of "Introduce RVU representors" patchset.
https://lore.kernel.org/all/ZsdJ-w00yCI4NQ8T@nanopsycho.orion/T/
As suggested by "Jiri Pirko", submitting as separate patchset.
====================

Link: https://patch.msgid.link/20241023161843.15543-1-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Move shared APIs to header file
Geetha sowjanya [Wed, 23 Oct 2024 16:18:43 +0000 (21:48 +0530)]
octeontx2-pf: Move shared APIs to header file

Move mbox, hw resources and interrupt configuration functions to common
header file. So, that they can be used later by the RVU representor driver.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-5-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Reuse PF max mtu value
Geetha sowjanya [Wed, 23 Oct 2024 16:18:42 +0000 (21:48 +0530)]
octeontx2-pf: Reuse PF max mtu value

Reuse the maximum support HW MTU value that is fetch during probe.
Instead of fetching through mbox each time mtu is changed as the
value is fixed for interface.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-4-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Add new APIs for queue memory alloc/free.
Geetha sowjanya [Wed, 23 Oct 2024 16:18:41 +0000 (21:48 +0530)]
octeontx2-pf: Add new APIs for queue memory alloc/free.

Group the queue(RX/TX/CQ) memory allocation and free code to single APIs.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-3-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2-pf: Define common API for HW resources configuration
Geetha sowjanya [Wed, 23 Oct 2024 16:18:40 +0000 (21:48 +0530)]
octeontx2-pf: Define common API for HW resources configuration

Define new API "otx2_init_rsrc" and move the HW blocks
NIX/NPA resources configuration code under this API. So, that
it can be used by the RVU representor driver that has similar
resources of RVU NIC.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-2-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mirroring-to-dsa-cpu-port'
Jakub Kicinski [Thu, 31 Oct 2024 00:33:57 +0000 (17:33 -0700)]
Merge branch 'mirroring-to-dsa-cpu-port'

Vladimir Oltean says:

====================
Mirroring to DSA CPU port

Users of the NXP LS1028A SoC (drivers/net/dsa/ocelot L2 switch inside)
have requested to mirror packets from the ingress of a switch port to
software. Both port-based and flow-based mirroring is required.

The simplest way I could come up with was to set up tc mirred actions
towards a dummy net_device, and make the offloading of that be accepted
by the driver. Currently, the pattern in drivers is to reject mirred
towards ports they don't know about, but I'm now permitting that,
precisely by mirroring "to the CPU".

For testers, this series depends on commit 34d35b4edbbe ("net/sched:
act_api: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers") from net/main, which is absent from net-next as of the
day of posting (Oct 23). Without the bug fix it is possible to create
invalid configurations which are not rejected by the kernel.

v2:  https://lore.kernel.org/20241017165215.3709000-1-vladimir.oltean@nxp.com
RFC: https://lore.kernel.org/20240913152915.2981126-1-vladimir.oltean@nxp.com

For historical purposes, link to a much older (and much different) attempt:
https://lore.kernel.org/20191002233750.13566-1-olteanv@gmail.com
====================

Link: https://patch.msgid.link/20241023135251.1752488-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces
Vladimir Oltean [Wed, 23 Oct 2024 13:52:51 +0000 (16:52 +0300)]
net: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces

Debugging certain flows in the offloaded switch data path can be done by
installing two tc-mirred filters for mirroring: one in the hardware data
path, which copies the frames to the CPU, and one which takes the frame
from there and mirrors it to a virtual interface like a dummy device,
where it can be seen with tcpdump.

The effect of having 2 filters run on the same packet can be obtained by
default using tc, by not specifying either the 'skip_sw' or 'skip_hw'
keywords.

Instead of refusing to offload mirroring/redirecting packets towards
interfaces that aren't switch ports, just treat every other destination
for what it is: something that is handled in software, behind the CPU
port.

Usage:

$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: allow matchall mirroring rules towards the CPU
Vladimir Oltean [Wed, 23 Oct 2024 13:52:50 +0000 (16:52 +0300)]
net: dsa: allow matchall mirroring rules towards the CPU

If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.

This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].

The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).

There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.

Example:

$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0

Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.

[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:49 +0000 (16:52 +0300)]
net: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()

Do not leave -EOPNOTSUPP errors without an explanation. It is confusing
for the user to figure out what is wrong otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:48 +0000 (16:52 +0300)]
net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()

We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: clean up dsa_user_add_cls_matchall()
Vladimir Oltean [Wed, 23 Oct 2024 13:52:47 +0000 (16:52 +0300)]
net: dsa: clean up dsa_user_add_cls_matchall()

The body is a bit hard to read, hard to extend, and has duplicated
conditions.

Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:

- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
  function call.

This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sched: propagate "skip_sw" flag to struct flow_cls_common_offload
Vladimir Oltean [Wed, 23 Oct 2024 13:52:46 +0000 (16:52 +0300)]
net: sched: propagate "skip_sw" flag to struct flow_cls_common_offload

Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.

In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.

It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.

In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.

[1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/
[2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'ptp-driver-for-s390-clocks'
Jakub Kicinski [Thu, 31 Oct 2024 00:02:43 +0000 (17:02 -0700)]
Merge branch 'ptp-driver-for-s390-clocks'

Sven Schnelle says:

====================
PtP driver for s390 clocks

these patches add support for using the s390 physical and TOD clock as ptp
clock. To do so, the first patch adds a clock id to the s390 TOD clock,
while the second patch adds the PtP driver itself.
====================

Link: https://patch.msgid.link/20241023065601.449586-1-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agos390/time: Add PtP driver
Sven Schnelle [Wed, 23 Oct 2024 06:56:01 +0000 (08:56 +0200)]
s390/time: Add PtP driver

Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agos390/time: Add clocksource id to TOD clock
Sven Schnelle [Wed, 23 Oct 2024 06:56:00 +0000 (08:56 +0200)]
s390/time: Add clocksource id to TOD clock

To allow specifying the clock source in the upcoming PtP driver,
add a clocksource ID to the s390 TOD clock.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotests: hsr: Increase timeout to 50 seconds
Yunshui Jiang [Mon, 28 Oct 2024 08:27:56 +0000 (16:27 +0800)]
tests: hsr: Increase timeout to 50 seconds

The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to
be exact, and even more on a debug kernel or kernel with other network
security limits. The timeout setting for the kselftest is currently 45
seconds, which is way too short to integrate hsr tests to run_kselftest
infrastructure. However, timeout of hundreds of seconds is quite a long
time, especially in a CI/CD environment. It seems that we need
accelerate the test and balance with timeout setting.

The most time-consuming func is do_ping_long, where ping command sends
10 packages to the given address. The default interval between two ping
packages is 1s according to the ping Mannual. There isn't any operation
between pings thus we could pass -i 0.1 to ping to make it 10 times
faster.

While even with this short interval, the test still need about 46.4
seconds to finish because of the two HSR interfaces, each of which is
tested by calling do_ping func 12 times and do_ping_long func 19 times
and sleep for 3s.

So, an explicit setting is also needed to slightly increase the
timeout. And to leave us some slack, use 50 as default timeout.

Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agox86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()
Linus Torvalds [Wed, 30 Oct 2024 02:03:31 +0000 (16:03 -1000)]
x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()

The barrier_nospec() in 64-bit copy_from_user() is slow. Instead use
pointer masking to force the user pointer to all 1's for an invalid
address.

The kernel test robot reports a 2.6% improvement in the per_thread_ops
benchmark [1].

This is a variation on a patch originally by Josh Poimboeuf [2].

Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com
Link: https://lore.kernel.org/5b887fe4c580214900e21f6c61095adf9a142735.1730166635.git.jpoimboe@kernel.org
Tested-and-reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agoMerge tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm...
Linus Torvalds [Wed, 30 Oct 2024 21:17:47 +0000 (11:17 -1000)]
Merge tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Update more header copies with the kernel sources, including const.h,
   msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h

 - The return from 'write' isn't a pid, fix cut'n'paste error in 'perf
   trace'

 - Fix up the python binding build on architectures without
   HAVE_KVM_STAT_SUPPORT

 - Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to
   collect syscall pointer arguments in 'perf trace') to make the
   resulting bytecode to pass the kernel BPF verifier, allowing us to go
   back accepting clang 12.0.1 as the minimum version required for
   compiling BPF sources

 - Add __NR_capget for x86 to fix a regression on running perf + intel
   PT (hw tracing) as non-root setting up the capabilities as described
   in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

 - Fix missing syscalltbl in non-explicitly listed architectures,
   noticed on ARM 32-bit, that still needs a .tbl generator for the
   syscall id<->name tables, should be added for v6.13

 - Handle 'perf test' failure when handling broken DWARF for ASM files

* tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf cap: Add __NR_capget to arch/x86 unistd
  tools headers: Update the linux/unaligned.h copy with the kernel sources
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT
  perf test: Handle perftool-testsuite_probe failure due to broken DWARF
  tools headers UAPI: Sync kvm headers with the kernel sources
  perf trace: Fix non-listed archs in the syscalltbl routines
  perf build: Change the clang check back to 12.0.1
  perf trace augmented_raw_syscalls: Add more checks to pass the verifier
  perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
  perf trace: The return from 'write' isn't a pid
  tools headers UAPI: Sync linux/const.h with the kernel headers

11 months agoMerge branch 'fixes-for-bits-iterator'
Alexei Starovoitov [Wed, 30 Oct 2024 19:13:46 +0000 (12:13 -0700)]
Merge branch 'fixes-for-bits-iterator'

Hou Tao says:

====================
The patch set fixes several issues in bits iterator. Patch #1 fixes the
kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem
of nr_bits. Patch #4 fixes the potential stack corruption when bits
iterator is used on 32-bit host. Patch #5 adds more test cases for bits
iterator.

Please see the individual patches for more details. And comments are
always welcome.
---
v4:
 * patch #1: add ack from Yafang
 * patch #3: revert code-churn like changes:
   (1) compute nr_bytes and nr_bits before the check of nr_words.
   (2) use nr_bits == 64 to check for single u64, preventing build
       warning on 32-bit hosts.
 * patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)"

v3: https://lore.kernel.org/bpf/20241025013233.804027-1-houtao@huaweicloud.com/T/#t
  * split the bits-iterator related patches from "Misc fixes for bpf"
    patch set
  * patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration
  * patch #2: add a new helper for the overflow problem
  * patch #3: decrease the limitation from 512 to 511 and check whether
    nr_bytes is too large for bpf memory allocator explicitly
  * patch #5: add two more test cases for bit iterator

v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com
====================

Link: https://lore.kernel.org/r/20241030100516.3633640-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoselftests/bpf: Add three test cases for bits_iter
Hou Tao [Wed, 30 Oct 2024 10:05:16 +0000 (18:05 +0800)]
selftests/bpf: Add three test cases for bits_iter

Add more test cases for bits iterator:

(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.

Also rename bits_nomem to bits_too_big to better reflect its purpose.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agobpf: Use __u64 to save the bits in bits iterator
Hou Tao [Wed, 30 Oct 2024 10:05:15 +0000 (18:05 +0800)]
bpf: Use __u64 to save the bits in bits iterator

On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
content of the u64. However, bits_copy is only 4 bytes, leading to stack
corruption.

The straightforward solution would be to replace u64 with unsigned long
in bpf_iter_bits_new(). However, this introduces confusion and problems
for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
but it is treated as 4-bytes after passed to bpf_iter_bits_new().

Fix it by changing the type of both bits and bit_count from unsigned
long to u64. However, the change is not enough. The main reason is that
bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
pointer passed to find_next_bit() is an unsigned long pointer instead
of a u64 pointer. For 32-bit little-endian host, it is fine but it is
not the case for 32-bit big-endian host. Because under 32-bit big-endian
host, the first iterated unsigned long will be the bits 32-63 of the u64
instead of the expected bits 0-31. Therefore, in addition to changing
the type, swap the two unsigned longs within the u64 for 32-bit
big-endian host.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agobpf: Check the validity of nr_words in bpf_iter_bits_new()
Hou Tao [Wed, 30 Oct 2024 10:05:14 +0000 (18:05 +0800)]
bpf: Check the validity of nr_words in bpf_iter_bits_new()

Check the validity of nr_words in bpf_iter_bits_new(). Without this
check, when multiplication overflow occurs for nr_bits (e.g., when
nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).

Fix it by limiting the maximum value of nr_words to 511. The value is
derived from the current implementation of BPF memory allocator. To
ensure compatibility if the BPF memory allocator's size limitation
changes in the future, use the helper bpf_mem_alloc_check_size() to
check whether nr_bytes is too larger. And return -E2BIG instead of
-ENOMEM for oversized nr_bytes.

Fixes: 4665415975b0 ("bpf: Add bits iterator")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>