]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 years agotcp: add support for PLB in DCTCP
Mubashir Adnan Qureshi [Wed, 26 Oct 2022 13:51:13 +0000 (13:51 +0000)]
tcp: add support for PLB in DCTCP

PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the
congestion signal, PLB also uses ECN to make decisions whether to change
the path or not upon sustained congestion.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: add PLB functionality for TCP
Mubashir Adnan Qureshi [Wed, 26 Oct 2022 13:51:12 +0000 (13:51 +0000)]
tcp: add PLB functionality for TCP

Congestion control algorithms track PLB state and cause the connection
to trigger a path change when either of the 2 conditions is satisfied:

- No packets are in flight and (# consecutive congested rounds >=
  sysctl_tcp_plb_idle_rehash_rounds)
- (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds)

A round (RTT) is marked as congested when congestion signal
(ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh.
In the event of RTO, PLB (via tcp_write_timeout()) triggers a path
change and disables congestion-triggered path changes for random time
between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec)
to avoid hopping onto the "connectivity blackhole". RTO-triggered
path changes can still happen during this cool-off period.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: add sysctls for TCP PLB parameters
Mubashir Adnan Qureshi [Wed, 26 Oct 2022 13:51:11 +0000 (13:51 +0000)]
tcp: add sysctls for TCP PLB parameters

PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226

This commit adds new sysctl knobs and sets their default values for
TCP PLB.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mxl-gpy-MDI-X'
David S. Miller [Fri, 28 Oct 2022 09:35:51 +0000 (10:35 +0100)]
Merge branch 'mxl-gpy-MDI-X'

Raju Lakkaraju says:

====================
net: phy: mxl-gpy: Add MDI-X

This patch series add the MDI-X feature to GPY211 PHYs and
Also Change return type to gpy_update_interface() function
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips
Raju Lakkaraju [Wed, 26 Oct 2022 05:59:18 +0000 (11:29 +0530)]
net: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips

Add support for MDI-X status and configuration for GPY211 chips

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: mxl-gpy: Change gpy_update_interface() function return type
Raju Lakkaraju [Wed, 26 Oct 2022 05:59:17 +0000 (11:29 +0530)]
net: phy: mxl-gpy: Change gpy_update_interface() function return type

gpy_update_interface() is called from gpy_read_status() which does
return error codes. gpy_read_status() would benefit from returning
-EINVAL, etc.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Jakub Kicinski [Fri, 28 Oct 2022 03:41:04 +0000 (20:41 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move struct nft_payload_set definition to .c file where it is
   only used.

2) Shrink transport and inner header offset fields in the nft_pktinfo
   structure to 16-bits, from Florian Westphal.

3) Get rid of nft_objref Kbuild toggle, make it built-in into
   nf_tables. This expression is used to instantiate conntrack helpers
   in nftables. After removing the conntrack helper auto-assignment
   toggle it this feature became more important so move it to the nf_tables
   core module. Also from Florian.

4) Extend the existing function to calculate payload inner header offset
   to deal with the GRE and IPIP transport protocols.

6) Add inner expression support for nf_tables. This new expression
   provides a packet parser for tunneled packets which uses a userspace
   description of the expected inner headers. The inner expression
   invokes the payload expression (via direct call) to match on the
   inner header protocol fields using the inner link, network and
   transport header offsets.

   An example of the bytecode generated from userspace to match on
   IP source encapsulated in a VxLAN packet:

   # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
     netdev x y
       [ meta load l4proto => reg 1 ]
       [ cmp eq reg 1 0x00000011 ]
       [ payload load 2b @ transport header + 2 => reg 1 ]
       [ cmp eq reg 1 0x0000b512 ]
       [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
       [ cmp eq reg 1 0x00000008 ]
       [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
       [ cmp eq reg 1 0x04030201 ]

7) Store inner link, network and transport header offsets in percpu
   area to parse inner packet header once only. Matching on a different
   tunnel type invalidates existing offsets in the percpu area and it
   invokes the inner tunnel parser again.

8) Add support for inner meta matching. This support for
   NFTA_META_PROTOCOL, which specifies the inner ethertype, and
   NFT_META_L4PROTO, which specifies the inner transport protocol.

9) Extend nft_inner to parse GENEVE optional fields to calculate the
   link layer offset.

10) Update inner expression so tunnel offset points to GRE header
    to normalize tunnel header handling. This also allows to perform
    different interpretations of the GRE header from userspace.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_inner: set tunnel offset to GRE header offset
  netfilter: nft_inner: add geneve support
  netfilter: nft_meta: add inner match support
  netfilter: nft_inner: add percpu inner context
  netfilter: nft_inner: support for inner tunnel header matching
  netfilter: nft_payload: access ipip payload for inner offset
  netfilter: nft_payload: access GRE payload via inner offset
  netfilter: nft_objref: make it builtin
  netfilter: nf_tables: reduce nft_pktinfo by 8 bytes
  netfilter: nft_payload: move struct nft_payload_set definition where it belongs
====================

Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dpaa2-eth: Simplify bool conversion
Yang Li [Wed, 26 Oct 2022 05:18:24 +0000 (13:18 +0800)]
net: dpaa2-eth: Simplify bool conversion

./drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c:453:42-47: WARNING: conversion to bool not needed here

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2577
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221026051824.38730-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'ionic-vf-attr-replay-and-other-updates'
Jakub Kicinski [Fri, 28 Oct 2022 03:34:18 +0000 (20:34 -0700)]
Merge branch 'ionic-vf-attr-replay-and-other-updates'

Shannon Nelson says:

====================
ionic: VF attr replay and other updates

For better VF management when a FW update restart or a FW crash recover is
detected, the PF now will replay any user specified VF attributes to be
sure the FW hasn't lost them in the restart.

Newer FW offers more packet processing offloads, so we now support them in
the driver.

A small refactor of the Rx buffer fill cleans a bit of code and will help
future work on buffer caching.
====================

Link: https://lore.kernel.org/r/20221026143744.11598-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: refactor use of ionic_rx_fill()
Neel Patel [Wed, 26 Oct 2022 14:37:44 +0000 (07:37 -0700)]
ionic: refactor use of ionic_rx_fill()

The same pre-work code is used before each call to
ionic_rx_fill(), so bring it in and make it a part of
the routine.

Signed-off-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: enable tunnel offloads
Neel Patel [Wed, 26 Oct 2022 14:37:43 +0000 (07:37 -0700)]
ionic: enable tunnel offloads

Support stateless offloads for GRE, VXLAN, GENEVE, IPXIP4
and IPXIP6 when the FW supports them.

Signed-off-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: new ionic device identity level and VF start control
Shannon Nelson [Wed, 26 Oct 2022 14:37:42 +0000 (07:37 -0700)]
ionic: new ionic device identity level and VF start control

A new ionic dev_cmd is added to the interface in ionic_if.h,
with a new capabilities field in the ionic device identity to
signal its availability in the FW.  The identity level code is
incremented to '2' to show support for this new capabilities
bitfield.

If the driver has indicated with the new identity level that
it has the VF_CTRL command, newer FW will wait for the start
command before starting the VFs after a FW update or crash
recovery.

This patch updates the driver to make use of the new VF start
control in fw_up path to be sure that the PF has set the user
attributes on the VF before the FW allows the VFs to restart.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: only save the user set VF attributes
Shannon Nelson [Wed, 26 Oct 2022 14:37:41 +0000 (07:37 -0700)]
ionic: only save the user set VF attributes

Report the current FW values for the VF attributes, but don't
save the FW values locally, only save the vf attributes that
are given to us from the user.  This allows us to replay user
data, and doesn't end up confusing things like "who set the
mac address".

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: replay VF attributes after fw crash recovery
Shannon Nelson [Wed, 26 Oct 2022 14:37:40 +0000 (07:37 -0700)]
ionic: replay VF attributes after fw crash recovery

The VF attributes that the user has set into the FW through
the PF can be lost over a FW crash recovery.  Much like we
already replay the PF mac/vlan filters, we now add a replay
in the recovery path to be sure the FW has the up-to-date
VF configurations.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 27 Oct 2022 22:37:56 +0000 (15:37 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
  2871edb32f46 ("can: kvaser_usb: Fix possible completions during init_completion")
  abb8670938b2 ("can: kvaser_usb_leaf: Ignore stale bus-off after start")
  8d21f5927ae6 ("can: kvaser_usb_leaf: Fix improved state not being reported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 27 Oct 2022 20:36:59 +0000 (13:36 -0700)]
Merge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from 802.15.4 (Zigbee et al).

  Current release - regressions:

   - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1

  Current release - new code bugs:

   - mptcp: fix abba deadlock on fastopen

   - eth: stmmac: rk3588: allow multiple gmac controllers in one system

  Previous releases - regressions:

   - ip: rework the fix for dflt addr selection for connected nexthop

   - net: couple more fixes for misinterpreting bits in struct page
     after the signature was added

  Previous releases - always broken:

   - ipv6: ensure sane device mtu in tunnels

   - openvswitch: switch from WARN to pr_warn on a user-triggerable path

   - ethtool: eeprom: fix null-deref on genl_info in dump

   - ieee802154: more return code fixes for corner cases in
     dgram_sendmsg

   - mac802154: fix link-quality-indicator recording

   - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
     offload

   - eth: fec: limit register access on i.MX6UL

   - eth: bcm4908_enet: update TX stats after actual transmission

   - can: rcar_canfd: improve IRQ handling for RZ/G2L

  Misc:

   - genetlink: piggy back on the newly added resv_op_start to enforce
     more sanity checks on new commands"

* tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  net: enetc: survive memory pressure without crashing
  kcm: do not sense pfmemalloc status in kcm_sendpage()
  net: do not sense pfmemalloc status in skb_append_pagefrags()
  net/mlx5e: Fix macsec sci endianness at rx sa update
  net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
  net/mlx5e: Fix macsec rx security association (SA) update/delete
  net/mlx5e: Fix macsec coverity issue at rx sa update
  net/mlx5: Fix crash during sync firmware reset
  net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
  net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
  net/mlx5e: TC, Reject forwarding from internal port to internal port
  net/mlx5: Fix possible use-after-free in async command interface
  net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
  net/mlx5e: Update restore chain id for slow path packets
  net/mlx5e: Extend SKB room check to include PTP-SQ
  net/mlx5: DR, Fix matcher disconnect error flow
  net/mlx5: Wait for firmware to enable CRS before pci_restore_state
  net/mlx5e: Do not increment ESN when updating IPsec ESN state
  netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
  netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
  ...

2 years agoMerge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Thu, 27 Oct 2022 20:16:36 +0000 (13:16 -0700)]
Merge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

 - Fix an ancient signal action copy race (Bernd Edlinger)

 - Fix a memory leak in ELF loader, when under memory pressure (Li
   Zetao)

* tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs/binfmt_elf: Fix memory leak in load_elf_binary()
  exec: Copy oldsighand->action under spin-lock

2 years agoMerge tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Thu, 27 Oct 2022 19:31:57 +0000 (12:31 -0700)]
Merge tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - Fix older Clang vs recent overflow KUnit test additions (Nick
   Desaulniers, Kees Cook)

 - Fix kern-doc visibility for overflow helpers (Kees Cook)

* tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  overflow: Refactor test skips for Clang-specific issues
  overflow: disable failing tests for older clang versions
  overflow: Fix kern-doc markup for functions

2 years agoMerge tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 27 Oct 2022 19:21:57 +0000 (12:21 -0700)]
Merge tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "A bunch of patches addressing issues in the vivid driver and adding
  new checks in V4L2 to validate the input parameters from some ioctls"

* tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: vivid.rst: loop_video is set on the capture devnode
  media: vivid: set num_in/outputs to 0 if not supported
  media: vivid: drop GFP_DMA32
  media: vivid: fix control handler mutex deadlock
  media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced'
  media: v4l2-dv-timings: add sanity checks for blanking values
  media: vivid: dev->bitmap_cap wasn't freed in all cases
  media: vivid: s_fbuf: add more sanity checks

2 years agoMerge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Linus Torvalds [Thu, 27 Oct 2022 18:44:18 +0000 (11:44 -0700)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt fix from Eric Biggers:
 "Fix a memory leak that was introduced by a change that went into -rc1"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: fix keyring memory leak on mount failure

2 years agonet: enetc: survive memory pressure without crashing
Vladimir Oltean [Thu, 27 Oct 2022 18:29:25 +0000 (21:29 +0300)]
net: enetc: survive memory pressure without crashing

Under memory pressure, enetc_refill_rx_ring() may fail, and when called
during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
checked for.

An extreme case of memory pressure will result in exactly zero buffers
being allocated for the RX ring, and in such a case it is expected that
hardware drops all RX packets due to lack of buffers.

This does not happen, because the reset-default value of the consumer
and produces index is 0, and this makes the ENETC think that all buffers
have been initialized and that it owns them (when in reality none were).

The hardware guide explains this best:

| Configure the receive ring producer index register RBaPIR with a value
| of 0. The producer index is initially configured by software but owned
| by hardware after the ring has been enabled. Hardware increments the
| index when a frame is received which may consume one or more BDs.
| Hardware is not allowed to increment the producer index to match the
| consumer index since it is used to indicate an empty condition. The ring
| can hold at most RBLENR[LENGTH]-1 received BDs.
|
| Configure the receive ring consumer index register RBaCIR. The
| consumer index is owned by software and updated during operation of the
| of the BD ring by software, to indicate that any receive data occupied
| in the BD has been processed and it has been prepared for new data.
| - If consumer index and producer index are initialized to the same
|   value, it indicates that all BDs in the ring have been prepared and
|   hardware owns all of the entries.
| - If consumer index is initialized to producer index plus N, it would
|   indicate N BDs have been prepared. Note that hardware cannot start if
|   only a single buffer is prepared due to the restrictions described in
|   (2).
| - Software may write consumer index to match producer index anytime
|   while the ring is operational to indicate all received BDs prior have
|   been processed and new BDs prepared for hardware.

Normally, the value of rx_ring->rcir (consumer index) is brought in sync
with the rx_ring->next_to_use software index, but this only happens if
page allocation ever succeeded.

When PI==CI==0, the hardware appears to receive frames and write them to
DMA address 0x0 (?!), then set the READY bit in the BD.

The enetc_clean_rx_ring() function (and its XDP derivative) is naturally
not prepared to handle such a condition. It will attempt to process
those frames using the rx_swbd structure associated with index i of the
RX ring, but that structure is not fully initialized (enetc_new_page()
does all of that). So what happens next is undefined behavior.

To operate using no buffer, we must initialize the CI to PI + 1, which
will block the hardware from advancing the CI any further, and drop
everything.

The issue was seen while adding support for zero-copy AF_XDP sockets,
where buffer memory comes from user space, which can even decide to
supply no buffers at all (example: "xdpsock --txonly"). However, the bug
is present also with the network stack code, even though it would take a
very determined person to trigger a page allocation failure at the
perfect time (a series of ifup/ifdown under memory pressure should
eventually reproduce it given enough retries).

Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agokcm: do not sense pfmemalloc status in kcm_sendpage()
Eric Dumazet [Thu, 27 Oct 2022 04:06:37 +0000 (04:06 +0000)]
kcm: do not sense pfmemalloc status in kcm_sendpage()

Similar to changes done in TCP in blamed commit.
We should not sense pfmemalloc status in sendpage() methods.

Fixes: 326140063946 ("tcp: TX zerocopy should not sense pfmemalloc status")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221027040637.1107703-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: do not sense pfmemalloc status in skb_append_pagefrags()
Eric Dumazet [Thu, 27 Oct 2022 04:03:46 +0000 (04:03 +0000)]
net: do not sense pfmemalloc status in skb_append_pagefrags()

skb_append_pagefrags() is used by af_unix and udp sendpage()
implementation so far.

In commit 326140063946 ("tcp: TX zerocopy should not sense
pfmemalloc status") we explained why we should not sense
pfmemalloc status for pages owned by user space.

We should also use skb_fill_page_desc_noacc()
in skb_append_pagefrags() to avoid following KCSAN report:

BUG: KCSAN: data-race in lru_add_fn / skb_append_pagefrags

write to 0xffffea00058fc1c8 of 8 bytes by task 17319 on cpu 0:
__list_add include/linux/list.h:73 [inline]
list_add include/linux/list.h:88 [inline]
lruvec_add_folio include/linux/mm_inline.h:323 [inline]
lru_add_fn+0x327/0x410 mm/swap.c:228
folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
lru_add_drain_cpu+0x73/0x250 mm/swap.c:669
lru_add_drain+0x21/0x60 mm/swap.c:773
free_pages_and_swap_cache+0x16/0x70 mm/swap_state.c:311
tlb_batch_pages_flush mm/mmu_gather.c:59 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:256 [inline]
tlb_flush_mmu+0x5b2/0x640 mm/mmu_gather.c:263
tlb_finish_mmu+0x86/0x100 mm/mmu_gather.c:363
exit_mmap+0x190/0x4d0 mm/mmap.c:3098
__mmput+0x27/0x1b0 kernel/fork.c:1185
mmput+0x3d/0x50 kernel/fork.c:1207
copy_process+0x19fc/0x2100 kernel/fork.c:2518
kernel_clone+0x166/0x550 kernel/fork.c:2671
__do_sys_clone kernel/fork.c:2812 [inline]
__se_sys_clone kernel/fork.c:2796 [inline]
__x64_sys_clone+0xc3/0xf0 kernel/fork.c:2796
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffffea00058fc1c8 of 8 bytes by task 17325 on cpu 1:
page_is_pfmemalloc include/linux/mm.h:1817 [inline]
__skb_fill_page_desc include/linux/skbuff.h:2432 [inline]
skb_fill_page_desc include/linux/skbuff.h:2453 [inline]
skb_append_pagefrags+0x210/0x600 net/core/skbuff.c:3974
unix_stream_sendpage+0x45e/0x990 net/unix/af_unix.c:2338
kernel_sendpage+0x184/0x300 net/socket.c:3561
sock_sendpage+0x5a/0x70 net/socket.c:1054
pipe_to_sendpage+0x128/0x160 fs/splice.c:361
splice_from_pipe_feed fs/splice.c:415 [inline]
__splice_from_pipe+0x222/0x4d0 fs/splice.c:559
splice_from_pipe fs/splice.c:594 [inline]
generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
do_splice_from fs/splice.c:764 [inline]
direct_splice_actor+0x80/0xa0 fs/splice.c:931
splice_direct_to_actor+0x305/0x620 fs/splice.c:886
do_splice_direct+0xfb/0x180 fs/splice.c:974
do_sendfile+0x3bf/0x910 fs/read_write.c:1255
__do_sys_sendfile64 fs/read_write.c:1323 [inline]
__se_sys_sendfile64 fs/read_write.c:1309 [inline]
__x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1309
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000000000 -> 0xffffea00058fc188

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17325 Comm: syz-executor.0 Not tainted 6.1.0-rc1-syzkaller-00158-g440b7895c990-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022

Fixes: 326140063946 ("tcp: TX zerocopy should not sense pfmemalloc status")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221027040346.1104204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Fix macsec sci endianness at rx sa update
Raed Salem [Wed, 26 Oct 2022 13:51:53 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec sci endianness at rx sa update

The cited commit at rx sa update operation passes the sci object
attribute, in the wrong endianness and not as expected by the HW
effectively create malformed hw sa context in case of update rx sa
consequently, HW produces unexpected MACsec packets which uses this
sa.

Fix by passing sci to create macsec object with the correct endianness,
while at it add __force u64 to prevent sparse check error of type
"sparse: error: incorrect type in assignment".

Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-16-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
Raed Salem [Wed, 26 Oct 2022 13:51:52 +0000 (14:51 +0100)]
net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function

The cited commit produces a sparse check error of type
"sparse: error: restricted __be64 degrades to integer". The
offending line wrongly did a bitwise operation between two different
storage types one of 64 bit when the other smaller side is 16 bit
which caused the above sparse error, furthermore bitwise operation
usage here is wrong in the first place as the constant MACSEC_PORT_ES
is not a bitwise field.

Fix by using the right mask to get the lower 16 bit if the sci number,
and use comparison operator '==' instead of bitwise '&' operator.

Fixes: 3b20949cb21b ("net/mlx5e: Add MACsec RX steering rules")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-15-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Fix macsec rx security association (SA) update/delete
Raed Salem [Wed, 26 Oct 2022 13:51:51 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec rx security association (SA) update/delete

The cited commit adds the support for update/delete MACsec Rx SA,
naturally, these operations need to check if the SA in question exists
to update/delete the SA and return error code otherwise, however they
do just the opposite i.e. return with error if the SA exists

Fix by change the check to return error in case the SA in question does
not exist, adjust error message and code accordingly.

Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-14-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Fix macsec coverity issue at rx sa update
Raed Salem [Wed, 26 Oct 2022 13:51:50 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec coverity issue at rx sa update

The cited commit at update rx sa operation passes object attributes
to MACsec object create function without initializing/setting all
attributes fields leaving some of them with garbage values, therefore
violating the implicit assumption at create object function, which
assumes that all input object attributes fields are set.

Fix by initializing the object attributes struct to zero, thus leaving
unset fields with the legal zero value.

Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-13-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: Fix crash during sync firmware reset
Suresh Devarakonda [Wed, 26 Oct 2022 13:51:49 +0000 (14:51 +0100)]
net/mlx5: Fix crash during sync firmware reset

When setting Bluefield to DPU NIC mode using mlxconfig tool +  sync
firmware reset flow, we run into scenario where the host was not
eswitch manager at the time of mlx5 driver load but becomes eswitch manager
after the sync firmware reset flow. This results in null pointer
access of mpfs structure during mac filter add. This change prevents null
pointer access but mpfs table entries will not be added.

Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate")
Signed-off-by: Suresh Devarakonda <ramad@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: Update fw fatal reporter state on PCI handlers successful recover
Roy Novich [Wed, 26 Oct 2022 13:51:48 +0000 (14:51 +0100)]
net/mlx5: Update fw fatal reporter state on PCI handlers successful recover

Update devlink health fw fatal reporter state to "healthy" is needed by
strictly calling devlink_health_reporter_state_update() after recovery
was done by PCI error handler. This is needed when fw_fatal reporter was
triggered due to PCI error. Poll health is called and set reporter state
to error. Health recovery failed (since EEH didn't re-enable the PCI).
PCI handlers keep on recover flow and succeed later without devlink
acknowledgment. Fix this by adding devlink state update at the end of
the PCI handler recovery process.

Fixes: 6181e5cb752e ("devlink: add support for reporter recovery completion")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
Roi Dayan [Wed, 26 Oct 2022 13:51:47 +0000 (14:51 +0100)]
net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed

On multi table split the driver creates a new attr instance with
data being copied from prev attr instance zeroing action flags.
Also need to reset dests properties to avoid incorrect dests per attr.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: TC, Reject forwarding from internal port to internal port
Ariel Levkovich [Wed, 26 Oct 2022 13:51:46 +0000 (14:51 +0100)]
net/mlx5e: TC, Reject forwarding from internal port to internal port

Reject TC rules that forward from internal port to internal port
as it is not supported.

This include rules that are explicitly have internal port as
the filter device as well as rules that apply on tunnel interfaces
as the route device for the tunnel interface can be an internal
port.

Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: Fix possible use-after-free in async command interface
Tariq Toukan [Wed, 26 Oct 2022 13:51:45 +0000 (14:51 +0100)]
net/mlx5: Fix possible use-after-free in async command interface

mlx5_cmd_cleanup_async_ctx should return only after all its callback
handlers were completed. Before this patch, the below race between
mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
lead to a use-after-free:

1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
   elevated by 1, a single inflight callback).
2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
   is about to call wake_up().
4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
   immediately as the condition (num_inflight == 0) holds.
5. mlx5_cmd_cleanup_async_ctx returns.
6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
   object.
7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
   object.

Fix it by syncing using a completion object. Mark it completed when
num_inflight reaches 0.

Trace:

BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0

CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x57/0x7d
 print_report.cold+0x2d5/0x684
 ? do_raw_spin_lock+0x23d/0x270
 kasan_report+0xb1/0x1a0
 ? do_raw_spin_lock+0x23d/0x270
 do_raw_spin_lock+0x23d/0x270
 ? rwlock_bug.part.0+0x90/0x90
 ? __delete_object+0xb8/0x100
 ? lock_downgrade+0x6e0/0x6e0
 _raw_spin_lock_irqsave+0x43/0x60
 ? __wake_up_common_lock+0xb9/0x140
 __wake_up_common_lock+0xb9/0x140
 ? __wake_up_common+0x650/0x650
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kasan_set_track+0x21/0x30
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kfree+0x1ba/0x520
 ? do_raw_spin_unlock+0x54/0x220
 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
 ? dump_command+0xcc0/0xcc0 [mlx5_core]
 ? lockdep_hardirqs_on_prepare+0x400/0x400
 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 ? irq_release+0x140/0x140 [mlx5_core]
 irq_int_handler+0x19/0x30 [mlx5_core]
 __handle_irq_event_percpu+0x1f2/0x620
 handle_irq_event+0xb2/0x1d0
 handle_edge_irq+0x21e/0xb00
 __common_interrupt+0x79/0x1a0
 common_interrupt+0x78/0xa0
 </IRQ>
 <TASK>
 asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0x42/0x60
Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
 ? default_idle_call+0xcc/0x450
 default_idle_call+0xec/0x450
 do_idle+0x394/0x450
 ? arch_cpu_idle_exit+0x40/0x40
 ? do_idle+0x17/0x450
 cpu_startup_entry+0x19/0x20
 start_secondary+0x221/0x2b0
 ? set_cpu_sibling_map+0x2070/0x2070
 secondary_startup_64_no_verify+0xcd/0xdb
 </TASK>

Allocated by task 49502:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x81/0xa0
 kvmalloc_node+0x48/0xe0
 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
 mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 49502:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_set_free_info+0x20/0x30
 ____kasan_slab_free+0x11d/0x1b0
 kfree+0x1ba/0x520
 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: ASO, Create the ASO SQ with the correct timestamp format
Saeed Mahameed [Wed, 26 Oct 2022 13:51:44 +0000 (14:51 +0100)]
net/mlx5: ASO, Create the ASO SQ with the correct timestamp format

mlx5 SQs must select the timestamp format explicitly according to the
active clock mode, select the current active timestamp mode so ASO SQ create
will succeed.

This fixes the following error prints when trying to create ipsec ASO SQ
while the timestamp format is real time mode.

mlx5_cmd_out_err:778:(pid 34874): CREATE_SQ(0x904) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xd61c0b), err(-22)
mlx5_aso_create_sq:285:(pid 34874): Failed to open aso wq sq, err=-22
mlx5e_ipsec_init:436:(pid 34874): IPSec initialization failed, -22

Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Update restore chain id for slow path packets
Paul Blakey [Wed, 26 Oct 2022 13:51:43 +0000 (14:51 +0100)]
net/mlx5e: Update restore chain id for slow path packets

Currently encap slow path rules just forward to software without
setting the chain id miss register, so driver doesn't restore
the chain, and packets hitting this rule will restart from tc chain
0 instead of continuing to the chain the encap rule was on.

Fix this by setting the chain id miss register to the chain id mapping.

Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Extend SKB room check to include PTP-SQ
Aya Levin [Wed, 26 Oct 2022 13:51:42 +0000 (14:51 +0100)]
net/mlx5e: Extend SKB room check to include PTP-SQ

When tx_port_ts is set, the driver diverts all UPD traffic over PTP port
to a dedicated PTP-SQ. The SKBs are cached until the wire-CQE arrives.
When the packet size is greater then MTU, the firmware might drop it and
the packet won't be transmitted to the wire, hence the wire-CQE won't
reach the driver. In this case the SKBs are accumulated in the SKB fifo.
Add room check to consider the PTP-SQ SKB fifo, when the SKB fifo is
full, driver stops the queue resulting in a TX timeout. Devlink
TX-reporter can recover from it.

Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: DR, Fix matcher disconnect error flow
Rongwei Liu [Wed, 26 Oct 2022 13:51:41 +0000 (14:51 +0100)]
net/mlx5: DR, Fix matcher disconnect error flow

When 2nd flow rules arrives, it will merge together with the
1st one if matcher criteria is the same.

If merge fails, driver will rollback the merge contents, and
reject the 2nd rule. At rollback stage, matcher can't be
disconnected unconditionally, otherise the 1st rule can't be
hit anymore.

Add logic to check if the matcher should be disconnected or not.

Fixes: cc2295cd54e4 ("net/mlx5: DR, Improve steering for empty or RX/TX-only matchers")
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: Wait for firmware to enable CRS before pci_restore_state
Moshe Shemesh [Wed, 26 Oct 2022 13:51:40 +0000 (14:51 +0100)]
net/mlx5: Wait for firmware to enable CRS before pci_restore_state

After firmware reset driver should verify firmware already enabled CRS
and became responsive to pci config cycles before restoring pci state.
Fix that by waiting till device_id is readable through PCI again.

Fixes: eabe8e5e88f5 ("net/mlx5: Handle sync reset now event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Do not increment ESN when updating IPsec ESN state
Hyong Youb Kim [Wed, 26 Oct 2022 13:51:39 +0000 (14:51 +0100)]
net/mlx5e: Do not increment ESN when updating IPsec ESN state

An offloaded SA stops receiving after about 2^32 + replay_window
packets. For example, when SA reaches <seq-hi 0x1, seq 0x2c>, all
subsequent packets get dropped with SA-icv-failure (integrity_failed).

To reproduce the bug:
- ConnectX-6 Dx with crypto enabled (FW 22.30.1004)
- ipsec.conf:
  nic-offload = yes
  replay-window = 32
  esn = yes
  salifetime=24h
- Run netperf for a long time to send more than 2^32 packets
  netperf -H <device-under-test> -t TCP_STREAM -l 20000

When 2^32 + replay_window packets are received, the replay window
moves from the 2nd half of subspace (overlap=1) to the 1st half
(overlap=0). The driver then updates the 'esn' value in NIC
(i.e. seq_hi) as follows.

 seq_hi = xfrm_replay_seqhi(seq_bottom)
 new esn in NIC = seq_hi + 1

The +1 increment is wrong, as seq_hi already contains the correct
seq_hi. For example, when seq_hi=1, the driver actually tells NIC to
use seq_hi=2 (esn). This incorrect esn value causes all subsequent
packets to fail integrity checks (SA-icv-failure). So, do not
increment.

Fixes: cb01008390bb ("net/mlx5: IPSec, Add support for ESN")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'fix-some-issues-in-netdevsim-driver'
Jakub Kicinski [Thu, 27 Oct 2022 17:47:31 +0000 (10:47 -0700)]
Merge branch 'fix-some-issues-in-netdevsim-driver'

Zhengchao Shao says:

====================
fix some issues in netdevsim driver

When strace tool is used to perform memory injection, memory leaks and
files not removed issues are found. Fix them.
====================

Link: https://lore.kernel.org/r/20221026014642.116261-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
Zhengchao Shao [Wed, 26 Oct 2022 01:46:42 +0000 (09:46 +0800)]
netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed

Remove dir in nsim_dev_debugfs_init() when creating ports dir failed.
Otherwise, the netdevsim device will not be created next time. Kernel
reports an error: debugfs: Directory 'netdevsim1' with parent 'netdevsim'
already present!

Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register(...
Zhengchao Shao [Wed, 26 Oct 2022 01:46:41 +0000 (09:46 +0800)]
netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed

If some items in nsim_dev_resources_register() fail, memory leak will
occur. The following is the memory leak information.

unreferenced object 0xffff888074c02600 (size 128):
  comm "echo", pid 8159, jiffies 4294945184 (age 493.530s)
  hex dump (first 32 bytes):
    40 47 ea 89 ff ff ff ff 01 00 00 00 00 00 00 00  @G..............
    ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
  backtrace:
    [<0000000011a31c98>] kmalloc_trace+0x22/0x60
    [<0000000027384c69>] devl_resource_register+0x144/0x4e0
    [<00000000a16db248>] nsim_drv_probe+0x37a/0x1260
    [<000000007d1f448c>] really_probe+0x20b/0xb10
    [<00000000c416848a>] __driver_probe_device+0x1b3/0x4a0
    [<00000000077e0351>] driver_probe_device+0x49/0x140
    [<0000000054f2465a>] __device_attach_driver+0x18c/0x2a0
    [<000000008538f359>] bus_for_each_drv+0x151/0x1d0
    [<0000000038e09747>] __device_attach+0x1c9/0x4e0
    [<00000000dd86e533>] bus_probe_device+0x1d5/0x280
    [<00000000839bea35>] device_add+0xae0/0x1cb0
    [<000000009c2abf46>] new_device_store+0x3b6/0x5f0
    [<00000000fb823d7f>] bus_attr_store+0x72/0xa0
    [<000000007acc4295>] sysfs_kf_write+0x106/0x160
    [<000000005f50cb4d>] kernfs_fop_write_iter+0x3a8/0x5a0
    [<0000000075eb41bf>] vfs_write+0x8f0/0xc80

Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetdevsim: fix memory leak in nsim_bus_dev_new()
Zhengchao Shao [Wed, 26 Oct 2022 01:54:05 +0000 (09:54 +0800)]
netdevsim: fix memory leak in nsim_bus_dev_new()

If device_register() failed in nsim_bus_dev_new(), the value of reference
in nsim_bus_dev->dev is 1. obj->name in nsim_bus_dev->dev will not be
released.

unreferenced object 0xffff88810352c480 (size 16):
  comm "echo", pid 5691, jiffies 4294945921 (age 133.270s)
  hex dump (first 16 bytes):
    6e 65 74 64 65 76 73 69 6d 31 00 00 00 00 00 00  netdevsim1......
  backtrace:
    [<000000005e2e5e26>] __kmalloc_node_track_caller+0x3a/0xb0
    [<0000000094ca4fc8>] kvasprintf+0xc3/0x160
    [<00000000aad09bcc>] kvasprintf_const+0x55/0x180
    [<000000009bac868d>] kobject_set_name_vargs+0x56/0x150
    [<000000007c1a5d70>] dev_set_name+0xbb/0xf0
    [<00000000ad0d126b>] device_add+0x1f8/0x1cb0
    [<00000000c222ae24>] new_device_store+0x3b6/0x5e0
    [<0000000043593421>] bus_attr_store+0x72/0xa0
    [<00000000cbb1833a>] sysfs_kf_write+0x106/0x160
    [<00000000d0dedb8a>] kernfs_fop_write_iter+0x3a8/0x5a0
    [<00000000770b66e2>] vfs_write+0x8f0/0xc80
    [<0000000078bb39be>] ksys_write+0x106/0x210
    [<00000000005e55a4>] do_syscall_64+0x35/0x80
    [<00000000eaa40bbc>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 40e4fe4ce115 ("netdevsim: move device registration and related code to bus.c")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221026015405.128795-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 27 Oct 2022 17:30:41 +0000 (10:30 -0700)]
Merge tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-10-27

Anssi Hannula fixes the use of the completions in the kvaser_usb
driver.

Biju Das contributes 2 patches for the rcar_canfd driver. A IRQ storm
that can be triggered by high CAN bus load and channel specific IRQ
handlers are fixed.

Yang Yingliang fixes the j1939 transport protocol by moving a
kfree_skb() out of a spin_lock_irqsave protected section.

* tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb()
  can: rcar_canfd: fix channel specific IRQ handling for RZ/G2L
  can: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO receive
  can: kvaser_usb: Fix possible completions during init_completion
====================

Link: https://lore.kernel.org/r/20221027114356.1939821-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: broadcom: bcm4908_enet: update TX stats after actual transmission
Rafał Miłecki [Thu, 27 Oct 2022 11:24:30 +0000 (13:24 +0200)]
net: broadcom: bcm4908_enet: update TX stats after actual transmission

Queueing packets doesn't guarantee their transmission. Update TX stats
after hardware confirms consuming submitted data.

This also fixes a possible race and NULL dereference.
bcm4908_enet_start_xmit() could try to access skb after freeing it in
the bcm4908_enet_poll_tx().

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: 4feffeadbcb2e ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221027112430.8696-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'ip-rework-the-fix-for-dflt-addr-selection-for-connected-nexthop'
Jakub Kicinski [Thu, 27 Oct 2022 17:16:56 +0000 (10:16 -0700)]
Merge branch 'ip-rework-the-fix-for-dflt-addr-selection-for-connected-nexthop'

Nicolas Dichtel says:

====================
ip: rework the fix for dflt addr selection for connected nexthop"

This series reworks the fix that is reverted in the second commit.
As Julian explained, nhc_scope is related to nhc_gw, it's not the scope of
the route.
====================

Link: https://lore.kernel.org/r/20221020100952.8748-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonh: fix scope used to find saddr when adding non gw nh
Nicolas Dichtel [Thu, 20 Oct 2022 10:09:52 +0000 (12:09 +0200)]
nh: fix scope used to find saddr when adding non gw nh

As explained by Julian, fib_nh_scope is related to fib_nh_gw4, but
fib_info_update_nhc_saddr() needs the scope of the route, which is
the scope "before" fib_nh_scope, ie fib_nh_scope - 1.

This patch fixes the problem described in commit 747c14307214 ("ip: fix
dflt addr selection for connected nexthop").

Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "ip: fix dflt addr selection for connected nexthop"
Nicolas Dichtel [Thu, 20 Oct 2022 10:09:51 +0000 (12:09 +0200)]
Revert "ip: fix dflt addr selection for connected nexthop"

This reverts commit 747c14307214b55dbd8250e1ab44cad8305756f1.

As explained by Julian, nhc_scope is related to nhc_gw, not to the route.
Revert the original patch. The initial problem is fixed differently in the
next commit.

Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "ip: fix triggering of 'icmp redirect'"
Nicolas Dichtel [Thu, 20 Oct 2022 10:09:50 +0000 (12:09 +0200)]
Revert "ip: fix triggering of 'icmp redirect'"

This reverts commit eb55dc09b5dd040232d5de32812cc83001a23da6.

The patch that introduces this bug is reverted right after this one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agogenetlink: limit the use of validation workarounds to old ops
Jakub Kicinski [Wed, 26 Oct 2022 00:15:24 +0000 (17:15 -0700)]
genetlink: limit the use of validation workarounds to old ops

During review of previous change another thing came up - we should
limit the use of validation workarounds to old commands.
Don't list the workarounds one by one, as we're rejecting all existing
ones. We can deal with the masking in the unlikely event that new flag
is added.

Link: https://lore.kernel.org/all/6ba9f727e555fd376623a298d5d305ad408c3d47.camel@sipsolutions.net/
Link: https://lore.kernel.org/r/20221026001524.1892202-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bcmsysport: Indicate MAC is in charge of PHY PM
Florian Fainelli [Tue, 25 Oct 2022 23:42:01 +0000 (16:42 -0700)]
net: bcmsysport: Indicate MAC is in charge of PHY PM

Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The SYSTEMPORT
driver essentially does exactly what mdio_bus_phy_resume() does by
calling phy_resume().

Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221025234201.2549360-1-f.fainelli@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoskbuff: Proactively round up to kmalloc bucket size
Kees Cook [Tue, 25 Oct 2022 22:39:35 +0000 (15:39 -0700)]
skbuff: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the
coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain
back the __alloc_size() hints that were temporarily reverted in commit
93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221021234713.you.031-kees@kernel.org/
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221025223811.up.360-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'net-ipa-don-t-use-fixed-table-sizes'
Paolo Abeni [Thu, 27 Oct 2022 11:38:15 +0000 (13:38 +0200)]
Merge branch 'net-ipa-don-t-use-fixed-table-sizes'

Alex Elder says:

====================
net: ipa: don't use fixed table sizes

Currently, routing and filter tables are assumed to have a fixed
size for all platforms.  In fact, these tables can support many more
entries than what has been assumed; the only limitation is the size
of the IPA-resident memory regions that contain them.

This series rearranges things so that the size of the table is
determined from the memory region size defined in configuration
data, rather than assuming it is fixed.

This will required for IPA versions 5.0+, where the number of
entries in a routing table is larger.
====================

Link: https://lore.kernel.org/r/20221025195143.255934-1-elder@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ipa: determine filter table size from memory region
Alex Elder [Tue, 25 Oct 2022 19:51:43 +0000 (14:51 -0500)]
net: ipa: determine filter table size from memory region

Currently we assume that any filter table contains a fixed number
of entries.  Like routing tables, the number of entries in a filter
table is limited only by the size of the IPA-local memory region
used to hold the table.

Stop assuming that a filter table has exactly 14 entries.  Instead,
determine the number of entries in a routing table by dividing its
memory region size by the size of an entry.  (Note that the first
"entry" in a filter table contains an endpoint bitmap.)

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ipa: don't assume 8 modem routing table entries
Alex Elder [Tue, 25 Oct 2022 19:51:42 +0000 (14:51 -0500)]
net: ipa: don't assume 8 modem routing table entries

Currently all platforms are assumed allot 8 routing table entries
for use by the modem.  Instead, add a new configuration data entry
that defines the number of modem routing table entries, and record
that in the IPA structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ipa: determine route table size from memory region
Alex Elder [Tue, 25 Oct 2022 19:51:41 +0000 (14:51 -0500)]
net: ipa: determine route table size from memory region

Currently we assume that any routing table contains a fixed number
of entries.  The number of entries in a routing table can actually
vary, depending only on the size of the IPA-local memory region used
to hold the table.

Stop assuming that a routing table has exactly 15 entries.  Instead,
determine the number of entries in a routing table by dividing its
memory region size by the size of an entry.

The number of entries is computed early, when ipa_table_mem_valid()
is called by ipa_table_init().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ipa: record the route table size in the IPA structure
Alex Elder [Tue, 25 Oct 2022 19:51:40 +0000 (14:51 -0500)]
net: ipa: record the route table size in the IPA structure

The non-hashed routing tables for IPv4 and IPv6 will be the same
size.  And if supported, the hashed routing tables will be the same
size as the non-hashed tables.

Record the size (number of entries) of all routing tables in the IPA
structure.  For now, initialize this field using IPA_ROUTE_TABLE_MAX,
and just do so when the first route table is validated.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocan: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before...
Yang Yingliang [Thu, 27 Oct 2022 09:12:37 +0000 (17:12 +0800)]
can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb()

It is not allowed to call kfree_skb() from hardware interrupt context
or with interrupts being disabled. The skb is unlinked from the queue,
so it can be freed after spin_unlock_irqrestore().

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20221027091237.2290111-1-yangyingliang@huawei.com
Cc: stable@vger.kernel.org
[mkl: adjust subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agoeth: fealnx: delete the driver for Myson MTD-800
Jakub Kicinski [Tue, 25 Oct 2022 18:42:54 +0000 (11:42 -0700)]
eth: fealnx: delete the driver for Myson MTD-800

The git history for this driver seems to be completely
automated / tree wide changes. I can't find any boards
or systems which would use this chip. Google search
shows pictures of towel warmers and no networking products.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221025184254.1717982-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoice: Add support Flex RXD
Michal Jaron [Tue, 25 Oct 2022 16:12:52 +0000 (09:12 -0700)]
ice: Add support Flex RXD

Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid
in struct virtchnl_rxq_info to support AVF Flex RXD
extension.

Add support to allow VF to query flexible descriptor RXDIDs supported
by DDP package and configure Rx queues with selected RXDID for IAVF.

Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be
processed. Add necessary macros for registers.

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Xu Ting <ting.xu@intel.com>
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: broadcom: bcm4908_enet: use build_skb()
Rafał Miłecki [Tue, 25 Oct 2022 13:22:45 +0000 (15:22 +0200)]
net: broadcom: bcm4908_enet: use build_skb()

RX code can be more efficient with the build_skb(). Allocating actual
SKB around eth packet buffer - right before passing it up - results in
a better cache usage.

Without RPS (echo 0 > rps_cpus) BCM4908 NAT masq performance "jumps"
between two speeds: ~900 Mbps and 940 Mbps (it's a 4 CPUs SoC). This
change bumps the lower speed from 905 Mb/s to 918 Mb/s (tested using
single stream iperf 2.0.5 traffic).

There are more optimizations to consider. One obvious to try is GRO
however as BCM4908 doesn't do hw csum is may actually lower performance.
Sometimes. Some early testing:

┌─────────────────────────────────┬─────────────────────┬────────────────────┐
│                                 │ netif_receive_skb() │ napi_gro_receive() │
├─────────────────────────────────┼─────────────────────┼────────────────────┤
│ netdev_alloc_skb()              │            905 Mb/s │           892 Mb/s │
│ napi_alloc_frag() + build_skb() │            918 Mb/s │           917 Mb/s │
└─────────────────────────────────┴─────────────────────┴────────────────────┘

Another ideas:
1. napi_build_skb()
2. skb_copy_from_linear_data() for small packets

Those need proper testing first though. That can be done later.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20221025132245.22871-1-zajec5@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ehea: fix possible memory leak in ehea_register_port()
Yang Yingliang [Tue, 25 Oct 2022 13:00:11 +0000 (21:00 +0800)]
net: ehea: fix possible memory leak in ehea_register_port()

If of_device_register() returns error, the of node and the
name allocated in dev_set_name() is leaked, call put_device()
to give up the reference that was set in device_initialize(),
so that of node is put in logical_port_release() and the name
is freed in kobject_cleanup().

Fixes: 1acf2318dd13 ("ehea: dynamic add / remove port")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221025130011.1071357-1-yangyingliang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: dp83822: Print the SOR1 strap status
Fabio Estevam [Tue, 25 Oct 2022 12:01:09 +0000 (09:01 -0300)]
net: dp83822: Print the SOR1 strap status

During the bring-up of the Ethernet PHY, it is very useful to
see the bootstrap status information, as it can help identifying
hardware bootstrap mistakes.

Allow printing the SOR1 register, which contains the strap status
to ease the bring-up.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221025120109.779337-1-festevam@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobond: Disable TLS features indication
Tariq Toukan [Tue, 25 Oct 2022 10:53:00 +0000 (13:53 +0300)]
bond: Disable TLS features indication

Bond agnostically interacts with TLS device-offload requests via the
.ndo_sk_get_lower_dev operation. Return value is true iff bond
guarantees fixed mapping between the TLS connection and a lower netdev.

Due to this nature, the bond TLS device offload features are not
explicitly controllable in the bond layer. As of today, these are
read-only values based on the evaluation of bond_sk_check().  However,
this indication might be incorrect and misleading, when the feature bits
are "fixed" by some dependency features.  For example,
NETIF_F_HW_TLS_TX/RX are forcefully cleared in case the corresponding
checksum offload is disabled. But in fact the bond ability to still
offload TLS connections to the lower device is not hurt.

This means that these bits can not be trusted, and hence better become
unused.

This patch revives some old discussion [1] and proposes a much simpler
solution: Clear the bond's TLS features bits. Everyone should stop
reading them.

[1] https://lore.kernel.org/netdev/20210526095747.22446-1-tariqt@nvidia.com/

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20221025105300.4718-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'openvswitch-syzbot-splat-fix-and-introduce-selftest'
Paolo Abeni [Thu, 27 Oct 2022 10:31:45 +0000 (12:31 +0200)]
Merge branch 'openvswitch-syzbot-splat-fix-and-introduce-selftest'

Aaron Conole says:

====================
openvswitch: syzbot splat fix and introduce selftest

Syzbot recently caught a splat when dropping features from
openvswitch datapaths that are in-use.  The WARN() call is
definitely too large a hammer for the situation, so change
to pr_warn.

Second patch in the series introduces a new selftest suite which
can help show that an issue is fixed.  This change might be
more suited to net-next tree, so it has been separated out
as an additional patch and can be either applied to either tree
based on preference.
====================

Link: https://lore.kernel.org/r/20221025105018.466157-1-aconole@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoselftests: add openvswitch selftest suite
Aaron Conole [Tue, 25 Oct 2022 10:50:18 +0000 (06:50 -0400)]
selftests: add openvswitch selftest suite

Previous commit resolves a WARN splat that can be difficult to reproduce,
but with the ovs-dpctl.py utility, it can be trivial.  Introduce a test
case which creates a DP, and then downgrades the feature set.  This will
include a utility 'ovs-dpctl.py' that can be extended to do additional
tests and diagnostics.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoopenvswitch: switch from WARN to pr_warn
Aaron Conole [Tue, 25 Oct 2022 10:50:17 +0000 (06:50 -0400)]
openvswitch: switch from WARN to pr_warn

As noted by Paolo Abeni, pr_warn doesn't generate any splat and can still
preserve the warning to the user that feature downgrade occurred.  We
likely cannot introduce other kinds of checks / enforcement here because
syzbot can generate different genl versions to the datapath.

Reported-by: syzbot+31cde0bef4bbf8ba2d86@syzkaller.appspotmail.com
Fixes: 44da5ae5fbea ("openvswitch: Drop user features if old user space attempted to create datapath")
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: stmmac: remove duplicate dma queue channel macros
Junxiao Chang [Tue, 25 Oct 2022 08:17:47 +0000 (16:17 +0800)]
net: stmmac: remove duplicate dma queue channel macros

It doesn't need extra macros for queue 0 & 4. Same macro could
be used for all 8 queues. Related queue/channel functions could
be combined together.

Original macro which has two same parameters is unsafe macro and
might have potential side effects. Each MTL RxQ DMA channel mask
is 4 bits, so using (0xf << chan) instead of GENMASK(x + 3, x) to
avoid unsafe macro.

Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Link: https://lore.kernel.org/r/20221025081747.1884926-1-junxiao.chang@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge patch series "R-Car CAN-FD fixes"
Marc Kleine-Budde [Thu, 27 Oct 2022 07:34:24 +0000 (09:34 +0200)]
Merge patch series "R-Car CAN-FD fixes"

Biju Das <biju.das.jz@bp.renesas.com> says:

This patch series fixes the below issues in R-Car CAN-FD driver.

1) Race condition in CAN driver under heavy CAN load condition
   with both channels enabled results in IRQ storm on global FIFO
   receive IRQ line.
2) Add channel specific TX interrupts handling for RZ/G2L SoC as it has
   separate IRQ lines for each TX.

changes since v1: https://lore.kernel.org/all/20221022081503.1051257-1-biju.das.jz@bp.renesas.com
 * Added check for IRQ active and enabled before handling the IRQ on a
   particular channel.

Link: https://lore.kernel.org/all/20221025155657.1426948-1-biju.das.jz@bp.renesas.com
[mkl: adjust message, add link, take only patches 1 + 2, upstream 3 via can-next]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: rcar_canfd: fix channel specific IRQ handling for RZ/G2L
Biju Das [Tue, 25 Oct 2022 15:56:56 +0000 (16:56 +0100)]
can: rcar_canfd: fix channel specific IRQ handling for RZ/G2L

RZ/G2L has separate channel specific IRQs for transmit and error
interrupts. But the IRQ handler processes both channels, even if there
no interrupt occurred on one of the channels.

This patch fixes the issue by passing a channel specific context
parameter instead of global one for the IRQ register and the IRQ
handler, it just handles the channel which is triggered the interrupt.

Fixes: 76e9353a80e9 ("can: rcar_canfd: Add support for RZ/G2L family")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/all/20221025155657.1426948-3-biju.das.jz@bp.renesas.com
Cc: stable@vger.kernel.org
[mkl: adjust commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO...
Biju Das [Tue, 25 Oct 2022 15:56:55 +0000 (16:56 +0100)]
can: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO receive

We are seeing an IRQ storm on the global receive IRQ line under heavy
CAN bus load conditions with both CAN channels enabled.

Conditions:

The global receive IRQ line is shared between can0 and can1, either of
the channels can trigger interrupt while the other channel's IRQ line
is disabled (RFIE).

When global a receive IRQ interrupt occurs, we mask the interrupt in
the IRQ handler. Clearing and unmasking of the interrupt is happening
in rx_poll(). There is a race condition where rx_poll() unmasks the
interrupt, but the next IRQ handler does not mask the IRQ due to
NAPIF_STATE_MISSED flag (e.g.: can0 RX FIFO interrupt is disabled and
can1 is triggering RX interrupt, the delay in rx_poll() processing
results in setting NAPIF_STATE_MISSED flag) leading to an IRQ storm.

This patch fixes the issue by checking IRQ active and enabled before
handling the IRQ on a particular channel.

Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver")
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/all/20221025155657.1426948-2-biju.das.jz@bp.renesas.com
Cc: stable@vger.kernel.org
[mkl: adjust commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: kvaser_usb: Fix possible completions during init_completion
Anssi Hannula [Mon, 10 Oct 2022 18:52:27 +0000 (20:52 +0200)]
can: kvaser_usb: Fix possible completions during init_completion

kvaser_usb uses completions to signal when a response event is received
for outgoing commands.

However, it uses init_completion() to reinitialize the start_comp and
stop_comp completions before sending the start/stop commands.

In case the device sends the corresponding response just before the
actual command is sent, complete() may be called concurrently with
init_completion() which is not safe.

This might be triggerable even with a properly functioning device by
stopping the interface (CMD_STOP_CHIP) just after it goes bus-off (which
also causes the driver to send CMD_STOP_CHIP when restart-ms is off),
but that was not tested.

Fix the issue by using reinit_completion() instead.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-2-extja@kvaser.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agoselftests: tc-testing: Add matchJSON to tdc
Victor Nogueira [Mon, 24 Oct 2022 11:16:03 +0000 (11:16 +0000)]
selftests: tc-testing: Add matchJSON to tdc

This allows the use of a matchJSON field in tests to match
against JSON output from the command under test, if that
command outputs JSON.

You specify what you want to match against as a JSON array
or object in the test's matchJSON field. You can leave out
any fields you don't want to match against that are present
in the output and they will be skipped.

An example matchJSON value would look like this:

"matchJSON": [
  {
    "Value": {
      "neighIP": {
        "family": 4,
        "addr": "AQIDBA==",
        "width": 32
      },
      "nsflags": 142,
      "ncflags": 0,
      "LLADDR": "ESIzRFVm"
    }
  }
]

The real output from the command under test might have some
extra fields that we don't care about for matching, and
since we didn't include them in our matchJSON value, those
fields will not be attempted to be matched. If everything
we included above has the same values as the real command
output, the test will pass.

The matchJSON field's type must be the same as the command
output's type, otherwise the test will fail. So if the
command outputs an array, then the value of matchJSON must
also be an array.

If matchJSON is an array, it must not contain more elements
than the command output's array, otherwise the test will
fail.

Signed-off-by: Jeremy Carter <jeremy@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20221024111603.2185410-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: ave: Fix MAC to be in charge of PHY PM
Kunihiko Hayashi [Mon, 24 Oct 2022 07:22:27 +0000 (16:22 +0900)]
net: ethernet: ave: Fix MAC to be in charge of PHY PM

The phylib callback is called after MAC driver's own resume callback is
called. For AVE driver, after resuming immediately, PHY state machine is
in PHY_NOLINK because there is a time lag from link-down to link-up due to
autoneg. The result is WARN_ON() dump in mdio_bus_phy_resume().

Since ave_resume() itself calls phy_resume(), AVE driver should manage
PHY PM. To indicate that MAC driver manages PHY PM, set
phydev->mac_managed_pm to true to avoid the unnecessary phylib call and
add missing phy_init_hw() to ave_resume().

Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/20221024072227.24769-1-hayashi.kunihiko@socionext.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: ave: Remove duplicate phy_resume() calls
Kunihiko Hayashi [Mon, 24 Oct 2022 07:23:14 +0000 (16:23 +0900)]
net: ethernet: ave: Remove duplicate phy_resume() calls

ave_open() in ave_resume() executes __phy_resume() via phy_start(), so no
need to call phy_resume() explicitly. Remove it.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/20221024072314.24969-1-hayashi.kunihiko@socionext.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: fec: limit register access on i.MX6UL
Juergen Borleis [Mon, 24 Oct 2022 08:05:52 +0000 (10:05 +0200)]
net: fec: limit register access on i.MX6UL

Using 'ethtool -d […]' on an i.MX6UL leads to a kernel crash:

   Unhandled fault: external abort on non-linefetch (0x1008) at […]

due to this SoC has less registers in its FEC implementation compared to other
i.MX6 variants. Thus, a run-time decision is required to avoid access to
non-existing registers.

Fixes: a51d3ab50702 ("net: fec: use a more proper compatible string for i.MX6UL type device")
Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221024080552.21004-1-jbe@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'linux-can-fixes-for-6.1-20221025' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 27 Oct 2022 03:12:50 +0000 (20:12 -0700)]
Merge tag 'linux-can-fixes-for-6.1-20221025' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-10-25

The 1st patch adds a missing cleanup call in the error path of the
probe function in mpc5xxx glue code for the mscan driver.

The 2nd patch adds a missing cleanup call in the error path of the
probe function of the mcp251x driver.

* tag 'linux-can-fixes-for-6.1-20221025' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: mcp251x: mcp251x_can_probe(): add missing unregister_candev() in error path
  can: mscan: mpc5xxx: mpc5xxx_can_probe(): add missing put_clock() in error path
====================

Link: https://lore.kernel.org/r/20221026075520.1502520-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/rds: remove variable total_copied
Colin Ian King [Mon, 24 Oct 2022 13:50:46 +0000 (14:50 +0100)]
net/rds: remove variable total_copied

Variable total_copied is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20221024135046.2159523-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agophylink: require valid state argument to phylink_validate_mask_caps()
Jakub Kicinski [Tue, 25 Oct 2022 18:51:26 +0000 (11:51 -0700)]
phylink: require valid state argument to phylink_validate_mask_caps()

state is deferenced earlier in the function, the NULL check
is pointless. Since we don't have any crash reports presumably
it's safe to assume state is not NULL.

Fixes: f392a1846489 ("net: phylink: provide phylink_validate_mask_caps() helper")
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/20221025185126.1720553-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: fix tracking issue in mptcp_subflow_create_socket()
Eric Dumazet [Tue, 25 Oct 2022 18:05:46 +0000 (18:05 +0000)]
mptcp: fix tracking issue in mptcp_subflow_create_socket()

My recent patch missed that mptcp_subflow_create_socket()
was creating a 'kernel' socket, then converted it to 'user' socket.

Fixes: 0cafd77dcd03 ("net: add a refcount tracker for kernel sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20221025180546.652251-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'ieee802154-for-net-next-2022-10-26' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 27 Oct 2022 01:14:32 +0000 (18:14 -0700)]
Merge tag 'ieee802154-for-net-next-2022-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next

Re-pull from Stefan to fix the warnings.

Stefan Schmidt says:

====================
pull-request v2: ieee802154-next 2022-10-26

* tag 'ieee802154-for-net-next-2022-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  net: mac802154: Fixup function parameter name in docs
====================

Link: https://lore.kernel.org/r/20221026075638.578840-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'perf-tools-fixes-for-v6.1-2022-10-26' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 27 Oct 2022 00:44:10 +0000 (17:44 -0700)]
Merge tag 'perf-tools-fixes-for-v6.1-2022-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool fixes from Arnaldo Carvalho de Melo:

 - Fix some aspects of building with an older (than the one in the
   kernel sources) libbpf present in a distro, when building with
   LIBBPF_DYNAMIC=1.

 - Fix errno setting races with event_fd and the signal handler in 'perf
   record'.

 - Fix Power10 hv-24x7 metric events when some events may have a zero
   count based on system configuration.

 - Do not fail Intel-PT misc test w/o libpython, just skip it.

 - Fix incorrect arm64 Hisi hip08 L3 metrics (IF_BP_MISP_BR_RET,
   IF_BP_MISP_BR_RET, IF_BP_MISP_BR_BL) due to mistakes in the
   documentation used to generate the JSON files for these metrics.

 - Fix auxtrace (Intel PT, ARM Coresight) address filter symbol name
   match for modules, we need to skip the module name.

 - Sync copies of files with the kernel sources, including ppc syscall
   tables and assorted headers, some resulting in tools being able to
   decode new network protocols (IPPROTO_L2TP) and statx masks
   (STATX_DIOALIGN).

 - Fix PMU name pai_crypto in the vendor events file (JSON) for s390.

 - Fix man page build wrt perf-arm-coresight.txt as the build process
   assumes files starting with 'perf-' are man pages, and this file
   isn't one.

* tag 'perf-tools-fixes-for-v6.1-2022-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf vendor events arm64: Fix incorrect Hisi hip08 L3 metrics
  perf auxtrace: Fix address filter symbol name match for modules
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers uapi: Sync linux/stat.h with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  tools headers uapi: Update linux/in.h copy
  tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  perf test: Do not fail Intel-PT misc test w/o libpython
  perf list: Fix PMU name pai_crypto in perf list on s390
  perf record: Fix event fd races
  perf bpf: Fix build with libbpf 0.7.0 by checking if bpf_program__set_insns() is available
  perf bpf: Fix build with libbpf 0.7.0 by adding prototype for bpf_load_program()
  perf vendor events power10: Fix hv-24x7 metric events
  perf docs: Fix man page build wrt perf-arm-coresight.txt
  tools headers UAPI: Sync powerpc syscall tables with the kernel sources

2 years agoMerge tag 'spi-fix-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Thu, 27 Oct 2022 00:38:46 +0000 (17:38 -0700)]
Merge tag 'spi-fix-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A collection of mostly unremarkable fixes for SPI that have built up
  since the merge window, all driver specific.

  The change to the qup adding support for GPIO chip selects is fixing a
  regression due to the removal of legacy GPIO handling, the driver had
  previously been silently relying on the legacy GPIO support in a
  slightly broken way which worked well enough on some systems. Fixing
  it is simply a case of setting a couple of bits of information in the
  driver description"

* tag 'spi-fix-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: aspeed: Fix window offset of CE1
  spi: qup: support using GPIO as chip select line
  spi: intel: Fix the offset to get the 64K erase opcode
  spi: aspeed: Fix typo in mode_bits field for AST2600 platform
  spi: mpc52xx: Replace NO_IRQ by 0
  spi: spi-mem: Fix typo (of -> or)
  spi: spi-gxp: fix typo in SPDX identifier line
  spi: tegra210-quad: Fix combined sequence

3 years agoMerge tag 'arc-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Wed, 26 Oct 2022 18:15:00 +0000 (11:15 -0700)]
Merge tag 'arc-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - fix for Page Table mem leak

 - defconfig updates

 - misc other fixes

* tag 'arc-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: fix leakage of memory allocated for PTE
  arc: update config files
  arc: iounmap() arg is volatile
  arc: dts: Harmonize EHCI/OHCI DT nodes name
  ARC: bitops: Change __fls to return unsigned long
  ARC: Fix comment typo
  ARC: Fix comment typo

3 years agoMerge tag 'ieee802154-for-net-next-2022-10-25' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 26 Oct 2022 14:24:36 +0000 (15:24 +0100)]
Merge tag 'ieee802154-for-net-next-2022-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================

==
One of the biggest cycles for ieee802154 in a long time. We are landing the
first pieces of a big enhancements in managing PAN's. We might have another pull
request ready for this cycle later on, but I want to get this one out first.

Miquel Raynal added support for sending frames synchronously as a dependency
to handle MLME commands. Also introducing more filtering levels to match with
the needs of a device when scanning or operating as a pan coordinator.
To support development and testing the hwsim driver for ieee802154 was also
enhanced for the new filtering levels and to update the PIB attributes.

Alexander Aring fixed quite a few bugs spotted during reviewing changes. He
also added support for TRAC in the atusb driver to have better failure
handling if the firmware provides the needed information.

Jilin Yuan fixed a comment with a repeated word in it.
==================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoperf vendor events arm64: Fix incorrect Hisi hip08 L3 metrics
Shang XiaoJing [Fri, 21 Oct 2022 10:50:33 +0000 (18:50 +0800)]
perf vendor events arm64: Fix incorrect Hisi hip08 L3 metrics

Commit 0cc177cfc95d565e ("perf vendor events arm64: Add Hisi hip08 L3
metrics") add L3 metrics of hip08, but some metrics (IF_BP_MISP_BR_RET,
IF_BP_MISP_BR_RET, IF_BP_MISP_BR_BL) have incorrect event number due to
the mistakes in document, which caused incorrect result. Fix the
incorrect metrics.

Before:

  65,811,214,308 armv8_pmuv3_0/event=0x1014/ # 18.87 push_branch
   # -40.19 other_branch
  3,564,316,780 BR_MIS_PRED # 0.51 indirect_branch
   # 21.81 pop_branch

After:

  6,537,146,245 BR_MIS_PRED # 0.48 indirect_branch
   # 0.47 pop_branch
   # 0.00 push_branch
   # 0.05 other_branch

Fixes: 0cc177cfc95d565e ("perf vendor events arm64: Add Hisi hip08 L3 metrics")
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221021105035.10000-2-shangxiaojing@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf auxtrace: Fix address filter symbol name match for modules
Adrian Hunter [Wed, 26 Oct 2022 07:27:36 +0000 (10:27 +0300)]
perf auxtrace: Fix address filter symbol name match for modules

For modules, names from kallsyms__parse() contain the module name which
meant that module symbols did not match exactly by name.

Fix by matching the name string up to the separating tab character.

Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221026072736.2982-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/perf_event.h with the kernel sources
Arnaldo Carvalho de Melo [Fri, 21 May 2021 19:00:31 +0000 (16:00 -0300)]
tools headers UAPI: Sync linux/perf_event.h with the kernel sources

To pick the changes in:

  cfef80bad4cf79cd ("perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file")
  ee3e88dfec23153d ("perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}")
  b4e12b2d70fd9ecc ("perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY")

There is a kernel patch pending that renames PERF_MEM_LVLNUM_EXTN_MEM to
PERF_MEM_LVLNUM_CXL, tooling this time is ahead of the kernel :-)

This thus partially addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/lkml/Y1k53KMdzypmU0WS@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
David S. Miller [Wed, 26 Oct 2022 12:46:38 +0000 (13:46 +0100)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of 29 patches for net-next/master.

The first patch is by Daniel S. Trevitz and adds documentation for
switchable termination resistors.

Zhang Changzhong's patch fixes a debug output in the j13939 stack.

Oliver Hartkopp finally removes the pch_can driver, which is
superseded by the generic c_can driver.

Gustavo A. R. Silva replaces a zero-length array with
DECLARE_FLEX_ARRAY() in the ucan driver.

Kees Cook's patch removes a no longer needed silencing of
"-Warray-bounds" warnings for the kvaser_usb driver.

The next 2 patches target the m_can driver. The first is by me cleans
up the LEC error handling, the second is by Vivek Yadav and extends
the LEC error handling to the data phase of CAN-FD frames.

The next 9 patches all target the gs_usb driver. The first 5 patches
are by me and improve the Kconfig prompt and help text, set
netdev->dev_id to distinguish multi CAN channel devices, allow
loopback and listen only at the same time, and clean up the
gs_can_open() function a bit. The remaining 4 patches are by Jeroen
Hofstee and add support for 2 new features: Bus Error Reporting and
Get State.

Jimmy Assarsson and Anssi Hannula contribute 10 patches for the
kvaser_usb driver. They first add Listen Only and Bus Error Reporting
support, handle CMD_ERROR_EVENT errors, improve CAN state handling,
restart events, and configuration of the bit timing parameters.

Another patch by me which fixes the indention in the m_can driver.

A patch by Dongliang Mu cleans up the ucan_disconnect() function in
the ucan driver.

The last patch by Biju Das is for the rcan_canfd driver and cleans up
the reset handling.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agorhashtable: make test actually random
Rolf Eike Beer [Fri, 21 Oct 2022 13:47:03 +0000 (15:47 +0200)]
rhashtable: make test actually random

The "random rhlist add/delete operations" actually wasn't very random, as all
cases tested the same bit. Since the later parts of this loop depend on the
first case execute this unconditionally, and then test on different bits for the
remaining tests. While at it only request as much random bits as are actually
used.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocan: rcar_canfd: Use devm_reset_control_get_optional_exclusive
Biju Das [Tue, 25 Oct 2022 15:56:57 +0000 (16:56 +0100)]
can: rcar_canfd: Use devm_reset_control_get_optional_exclusive

Replace devm_reset_control_get_exclusive->devm_reset_control_
get_optional_exclusive so that we can avoid unnecessary
SoC specific check in probe().

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221025155657.1426948-4-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: ucan: ucan_disconnect(): change unregister_netdev() to unregister_candev()
Dongliang Mu [Mon, 24 Oct 2022 11:00:30 +0000 (19:00 +0800)]
can: ucan: ucan_disconnect(): change unregister_netdev() to unregister_candev()

From API pairing, change unregister_netdev() to unregister_candev()
since the registration function is register_candev(). Actually, they
are the same.

Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/all/20221024110033.727542-1-dzm91@hust.edu.cn
[mkl: adjust subject + commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: m_can: use consistent indention
Marc Kleine-Budde [Wed, 12 Oct 2022 08:46:10 +0000 (10:46 +0200)]
can: m_can: use consistent indention

The driver uses indent-with-tab for defines. Replace spaces after
IR_ERR_BUS_31X with tab for consistent indention.

Link: https://lore.kernel.org/all/20221024185544.68240-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoMerge patch series "can: kvaser_usb: Fixes and improvements"
Marc Kleine-Budde [Wed, 26 Oct 2022 08:20:27 +0000 (10:20 +0200)]
Merge patch series "can: kvaser_usb: Fixes and improvements"

Jimmy Assarsson <extja@kvaser.com> says:

Split v4 series, since it got rejected [1]. This part only contains
non-critical fixes and improvements.

Note: This series depend on changes in [2].

Changes in v5:
 - Split v4 series, since it got rejected [1].
   This part only contains non-critical fixes and improvements.

Changes in v4: https://lore.kernel.org/all/20220903182344.139-1-extja@kvaser.com
 - Add Tested-by: Anssi Hannula to
   [PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
 - Update commit message in
   [PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device

Changes in v3: https://lore.kernel.org/all/20220901122729.271-1-extja@kvaser.com
 - Rebase on top of commit
   1d5eeda23f36 ("can: kvaser_usb: advertise timestamping capabilities and add ioctl support")
 - Add Tested-by: Anssi Hannula
 - Add stable@vger.kernel.org to CC.
 - Add my S-o-b to all patches
 - Fix regression introduced in
   [PATCH v2 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
   found by Anssi Hannula
   https://lore.kernel.org/all/b25bc059-d776-146d-0b3c-41aecf4bd9f8@bitwise.fi

v2: https://lore.kernel.org/all/20220708115709.232815-1-extja@kvaser.com

v1: https://lore.kernel.org/all/20220516134748.3724796-1-anssi.hannula@bitwise.fi

[1] https://lore.kernel.org/all/20220920122246.00dbe946@kernel.org
[2] https://lore.kernel.org/all/20221010150829.199676-1-extja@kvaser.com

Link: https://lore.kernel.org/all/20221010185237.319219-1-extja@kvaser.com
[mkl: move "1/1 can: kvaser_usb: Fix possible completions during init_completion" to linux-can tree]
[mkl: improve change log, update links]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb: Compare requested bittiming parameters with actual parameters in...
Jimmy Assarsson [Mon, 10 Oct 2022 18:52:37 +0000 (20:52 +0200)]
can: kvaser_usb: Compare requested bittiming parameters with actual parameters in do_set_{,data}_bittiming

The device will respond with a CMD_ERROR_EVENT command, with error_code
KVASER_USB_{LEAF,HYDRA}_ERROR_EVENT_PARAM, if the CMD_SET_BUSPARAMS_REQ
contains invalid bittiming parameters.
However, this command does not contain any channel reference.

To check if the CMD_SET_BUSPARAMS_REQ was successful, redback and compare
the requested bittiming parameters with the device reported parameters.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Tested-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Co-developed-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-12-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb: Add struct kvaser_usb_busparams
Jimmy Assarsson [Mon, 10 Oct 2022 18:52:36 +0000 (20:52 +0200)]
can: kvaser_usb: Add struct kvaser_usb_busparams

Add struct kvaser_usb_busparams containing the busparameters used in
CMD_{SET,GET}_BUSPARAMS* commands.

Tested-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-11-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb_leaf: Fix bogus restart events
Anssi Hannula [Mon, 10 Oct 2022 18:52:35 +0000 (20:52 +0200)]
can: kvaser_usb_leaf: Fix bogus restart events

When auto-restart is enabled, the kvaser_usb_leaf driver considers
transition from any state >= CAN_STATE_BUS_OFF as a bus-off recovery
event (restart).

However, these events may occur at interface startup time before
kvaser_usb_open() has set the state to CAN_STATE_ERROR_ACTIVE, causing
restarts counter to increase and CAN_ERR_RESTARTED to be sent despite no
actual restart having occurred.

Fix that by making the auto-restart condition checks more strict so that
they only trigger when the interface was actually in the BUS_OFF state.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-10-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb_leaf: Ignore stale bus-off after start
Anssi Hannula [Mon, 10 Oct 2022 18:52:34 +0000 (20:52 +0200)]
can: kvaser_usb_leaf: Ignore stale bus-off after start

With 0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778 it was observed
that if the device was bus-off when stopped, at next start (either via
interface down/up or manual bus-off restart) the initial
CMD_CHIP_STATE_EVENT received just after CMD_START_CHIP_REPLY will have
the M16C_STATE_BUS_OFF bit still set, causing the interface to
immediately go bus-off again.

The bit seems to internally clear quickly afterwards but we do not get
another CMD_CHIP_STATE_EVENT.

Fix the issue by ignoring any initial bus-off state until we see at
least one bus-on state. Also, poll the state periodically until that
occurs.

It is possible we lose one actual immediately occurring bus-off event
here in which case the HW will auto-recover and we see the recovery
event. We will then catch the next bus-off event, if any.

This issue did not reproduce with 0bfd:0017 Kvaser Memorator
Professional HS/HS FW 2.0.50.

Fixes: 71873a9b38d1 ("can: kvaser_usb: Add support for more Kvaser Leaf v2 devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-9-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb_leaf: Fix wrong CAN state after stopping
Anssi Hannula [Mon, 10 Oct 2022 18:52:33 +0000 (20:52 +0200)]
can: kvaser_usb_leaf: Fix wrong CAN state after stopping

0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778 sends a
CMD_CHIP_STATE_EVENT indicating bus-off after stopping the device,
causing a stopped device to appear as CAN_STATE_BUS_OFF instead of
CAN_STATE_STOPPED.

Fix that by not handling error events on stopped devices.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-8-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb_leaf: Fix improved state not being reported
Anssi Hannula [Mon, 10 Oct 2022 18:52:32 +0000 (20:52 +0200)]
can: kvaser_usb_leaf: Fix improved state not being reported

The tested 0bfd:0017 Kvaser Memorator Professional HS/HS FW 2.0.50 and
0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778 do not seem to send
any unsolicited events when error counters decrease or when the device
transitions from ERROR_PASSIVE to ERROR_ACTIVE (or WARNING).

This causes the interface to e.g. indefinitely stay in the ERROR_PASSIVE
state.

Fix that by asking for chip state (inc. counters) event every 0.5 secs
when error counters are non-zero.

Since there are non-error-counter devices, also always poll in
ERROR_PASSIVE even if the counters show zero.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-7-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: kvaser_usb_leaf: Set Warning state even without bus errors
Anssi Hannula [Mon, 10 Oct 2022 18:52:31 +0000 (20:52 +0200)]
can: kvaser_usb_leaf: Set Warning state even without bus errors

kvaser_usb_leaf_rx_error_update_can_state() sets error state according
to error counters when the hardware does not indicate a specific state
directly.

However, this is currently gated behind a check for
M16C_STATE_BUS_ERROR which does not always seem to be set when error
counters are increasing, and may not be set when error counters are
decreasing.

This causes the CAN_STATE_ERROR_WARNING state to not be set in some
cases even when appropriate.

Change the code to set error state from counters even without
M16C_STATE_BUS_ERROR.

The Error-Passive case seems superfluous as it is already set via
M16C_STATE_BUS_PASSIVE flag above, but it is kept for now.

Tested with 0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778.

Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/all/20221010185237.319219-6-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>