]> www.infradead.org Git - nvme.git/log
nvme.git
3 years agoionic: Query FW when getting VF info via ndo_get_vf_config
Brett Creeley [Mon, 24 Jan 2022 18:53:06 +0000 (10:53 -0800)]
ionic: Query FW when getting VF info via ndo_get_vf_config

Currently when an administrator configures a VF via ndo_set_vf*,
the driver will send the set command to FW and then update the
cached value.  The cached value is then used when reporting
VF info via ndo_get_vf_config.

A problem is that the VF info may have been updated between
the last ndo_set_vf* and ndo_get_vf_info commands via some
other method, i.e. a VF changes its MAC address (assuming it's
allowed to do so) and since this is all managed by the FW,
this new value won't be reflected in the PF's cache of values.

To fix this, update the driver to always get the latest VF
information by making use of the IONIC_CMD_VF_GETATTR dev
command. The FW may not support getting all the attributes for
IONIC_CMD_VF_GETATTR, so the driver will only update the cached
VF config members if their associated IONIC_CMD_VF_GETATTR
was successful. Otherwise the cached VF config members will
remain the same as what was set in ndo_set_vf*.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: Allow flexibility for error reporting on dev commands
Brett Creeley [Mon, 24 Jan 2022 18:53:05 +0000 (10:53 -0800)]
ionic: Allow flexibility for error reporting on dev commands

When dev commands fail, an error message will always be printed,
which may be overly alarming the to system administrators,
especially if the driver shouldn't be printing the error due
to some unsupported capability.

Similar to recent adminq request changes, we can update the
dev command interface with the ability to selectively print
error messages to allow the driver to prevent printing errors
that are expected.

Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: Correctly print AQ errors if completions aren't received
Brett Creeley [Mon, 24 Jan 2022 18:53:04 +0000 (10:53 -0800)]
ionic: Correctly print AQ errors if completions aren't received

Recent changes went into the driver to allow flexibility when
printing error messages. Unfortunately this had the unexpected
consequence of printing confusing messages like the following:

IONIC_CMD_RX_FILTER_ADD (31) failed: IONIC_RC_SUCCESS (-6)

In cases like this the completion of the admin queue command never
completes, so the completion status is 0, hence IONIC_RC_SUCCESS
is printed even though the command clearly failed. For example,
this could happen when the driver tries to add a filter and at
the same time the FW goes through a reset, so the AQ command
never completes.

Fix this by forcing the FW completion status to IONIC_RC_ERROR
in cases where we never get the completion.

Fixes: 8c9d956ab6fb ("ionic: allow adminq requests to override default error message")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: fix up printing of timeout error
Shannon Nelson [Mon, 24 Jan 2022 18:53:03 +0000 (10:53 -0800)]
ionic: fix up printing of timeout error

Make sure we print the TIMEOUT string if we had a timeout
error, rather than printing the wrong status.

Fixes: 8c9d956ab6fb ("ionic: allow adminq requests to override default error message")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: better handling of RESET event
Shannon Nelson [Mon, 24 Jan 2022 18:53:02 +0000 (10:53 -0800)]
ionic: better handling of RESET event

When IONIC_EVENT_RESET is received, we only need to start the
fw_down process if we aren't already down, and we need to be
sure to set the FW_STOPPING state on the way.

If this is how we noticed that FW was stopped, it is most
likely from a FW update, and we'll see a new FW generation.
The update happens quickly enough that we might not see
fw_status==0, so we need to be sure things get restarted when
we see the fw_generation change.

Fixes: d2662072c094 ("ionic: monitor fw status generation")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: add FW_STOPPING state
Shannon Nelson [Mon, 24 Jan 2022 18:53:01 +0000 (10:53 -0800)]
ionic: add FW_STOPPING state

Between fw running and fw actually stopped into reset, we need
a fw_stopping concept to catch and block some actions while
we're transitioning to FW_RESET state.  This will help to be
sure the fw_up task is not scheduled until after the fw_down
task has completed.

On some rare occasion timing, it is possible for the fw_up task
to try to run before the fw_down task, then not get run after
the fw_down task has run, leaving the device in a down state.
This is possible if the watchdog goes off in between finding the
down transition and starting the fw_down task, where the later
watchdog sees the FW is back up and schedules a fw_up task.

Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: Don't send reset commands if FW isn't running
Brett Creeley [Mon, 24 Jan 2022 18:53:00 +0000 (10:53 -0800)]
ionic: Don't send reset commands if FW isn't running

It's possible the FW is already shutting down while the driver is being
removed and/or when the driver is going through reset. This can cause
unexpected/unnecessary errors to be printed:

eth0: DEV_CMD IONIC_CMD_PORT_RESET (12) error, IONIC_RC_ERROR (29) failed
eth1: DEV_CMD IONIC_CMD_RESET (3) error, IONIC_RC_ERROR (29) failed

Fix this by checking the FW status register before issuing the reset
commands.

Also, since err may not be assigned in ionic_port_reset(), assign it a
default value of 0, and remove an unnecessary log message.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: separate function for watchdog init
Shannon Nelson [Mon, 24 Jan 2022 18:52:59 +0000 (10:52 -0800)]
ionic: separate function for watchdog init

Pull the watchdog init code out to a separate bite-sized
function.  Code cleaning for now, will be a useful change in
the near future.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: start watchdog after all is setup
Shannon Nelson [Mon, 24 Jan 2022 18:52:58 +0000 (10:52 -0800)]
ionic: start watchdog after all is setup

The watchdog expects the lif to fully exist when it goes off,
so lets not start the watchdog until all is ready in case there
is some quirky time dialation that makes probe take multiple
seconds.

Fixes: 089406bc5ad6 ("ionic: add a watchdog timer to monitor heartbeat")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: fix type complaint in ionic_dev_cmd_clean()
Shannon Nelson [Mon, 24 Jan 2022 18:52:57 +0000 (10:52 -0800)]
ionic: fix type complaint in ionic_dev_cmd_clean()

Sparse seems to have gotten a little more picky lately and
we need to revisit this bit of code to make sparse happy.

warning: incorrect type in initializer (different address spaces)
   expected union ionic_dev_cmd_regs *regs
   got union ionic_dev_cmd_regs [noderef] __iomem *dev_cmd_regs
warning: incorrect type in argument 2 (different address spaces)
   expected void [noderef] __iomem *
   got unsigned int *
warning: incorrect type in argument 1 (different address spaces)
   expected void volatile [noderef] __iomem *
   got union ionic_dev_cmd *

Fixes: d701ec326a31 ("ionic: clean up sparse complaints")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4: get rid of fib_info_hash_{alloc|free}
Eric Dumazet [Mon, 24 Jan 2022 17:31:15 +0000 (09:31 -0800)]
ipv4: get rid of fib_info_hash_{alloc|free}

Use kvzalloc()/kvfree() instead of hand coded functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoip6_tunnel: allow routing IPv4 traffic in NBMA mode
Qing Deng [Sun, 23 Jan 2022 06:00:00 +0000 (14:00 +0800)]
ip6_tunnel: allow routing IPv4 traffic in NBMA mode

Since IPv4 routes support IPv6 gateways now, we can route IPv4 traffic in
NBMA tunnels.

Signed-off-by: Qing Deng <i@moy.cat>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: use bool values to pass bool param of phy_init_eee()
Jisheng Zhang [Sun, 23 Jan 2022 15:22:41 +0000 (23:22 +0800)]
net: use bool values to pass bool param of phy_init_eee()

The 2nd param of phy_init_eee(): clk_stop_enable is a bool param, use
true or false instead of 1/0.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220123152241.1480-1-jszhang@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: fec_ptp: remove redundant initialization of variable val
Colin Ian King [Sun, 23 Jan 2022 18:49:36 +0000 (18:49 +0000)]
net: fec_ptp: remove redundant initialization of variable val

Variable val is being initialized with a value that is never read,
it is being re-assigned later. The assignment is redundant and
can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20220123184936.113486-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: usb: asix: remove redundant assignment to variable reg
Colin Ian King [Sun, 23 Jan 2022 18:40:35 +0000 (18:40 +0000)]
net: usb: asix: remove redundant assignment to variable reg

Variable reg is being masked however the variable is never read
after this. The assignment is redundant and can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220123184035.112785-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Mon, 24 Jan 2022 23:42:28 +0000 (15:42 -0800)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-01-24

We've added 80 non-merge commits during the last 14 day(s) which contain
a total of 128 files changed, 4990 insertions(+), 895 deletions(-).

The main changes are:

1) Add XDP multi-buffer support and implement it for the mvneta driver,
   from Lorenzo Bianconi, Eelco Chaudron and Toke Høiland-Jørgensen.

2) Add unstable conntrack lookup helpers for BPF by using the BPF kfunc
   infra, from Kumar Kartikeya Dwivedi.

3) Extend BPF cgroup programs to export custom ret value to userspace via
   two helpers bpf_get_retval() and bpf_set_retval(), from YiFei Zhu.

4) Add support for AF_UNIX iterator batching, from Kuniyuki Iwashima.

5) Complete missing UAPI BPF helper description and change bpf_doc.py script
   to enforce consistent & complete helper documentation, from Usama Arif.

6) Deprecate libbpf's legacy BPF map definitions and streamline XDP APIs to
   follow tc-based APIs, from Andrii Nakryiko.

7) Support BPF_PROG_QUERY for BPF programs attached to sockmap, from Di Zhu.

8) Deprecate libbpf's bpf_map__def() API and replace users with proper getters
   and setters, from Christy Lee.

9) Extend libbpf's btf__add_btf() with an additional hashmap for strings to
   reduce overhead, from Kui-Feng Lee.

10) Fix bpftool and libbpf error handling related to libbpf's hashmap__new()
    utility function, from Mauricio Vásquez.

11) Add support to BTF program names in bpftool's program dump, from Raman Shukhau.

12) Fix resolve_btfids build to pick up host flags, from Connor O'Brien.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (80 commits)
  selftests, bpf: Do not yet switch to new libbpf XDP APIs
  selftests, xsk: Fix rx_full stats test
  bpf: Fix flexible_array.cocci warnings
  xdp: disable XDP_REDIRECT for xdp frags
  bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
  bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
  net: xdp: introduce bpf_xdp_pointer utility routine
  bpf: generalise tail call map compatibility check
  libbpf: Add SEC name for xdp frags programs
  bpf: selftests: update xdp_adjust_tail selftest to include xdp frags
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  bpf: introduce frags support to bpf_prog_test_run_xdp()
  bpf: move user_size out of bpf_test_init
  bpf: add frags support to xdp copy helpers
  bpf: add frags support to the bpf_xdp_adjust_tail() API
  bpf: introduce bpf_xdp_get_buff_len helper
  net: mvneta: enable jumbo frames if the loaded XDP program support frags
  bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
  net: mvneta: add frags support to XDP_TX
  xdp: add frags support to xdp_return_{buff/frame}
  ...
====================

Link: https://lore.kernel.org/r/20220124221235.18993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests, bpf: Do not yet switch to new libbpf XDP APIs
Daniel Borkmann [Mon, 24 Jan 2022 22:01:28 +0000 (23:01 +0100)]
selftests, bpf: Do not yet switch to new libbpf XDP APIs

Revert commit 544356524dd6 ("selftests/bpf: switch to new libbpf XDP APIs")
for now given this will heavily conflict with 4b27480dcaa7 ("bpf/selftests:
convert xdp_link test to ASSERT_* macros") upon merge. Andrii agreed to redo
the conversion cleanly after trees merged.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
3 years agoMerge tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Mon, 24 Jan 2022 20:17:58 +0000 (12:17 -0800)]
Merge tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-01-24

The first patch updates the email address of Brian Silverman from his
former employer to his private address.

The next patch fixes DT bindings information for the tcan4x5x SPI CAN
driver.

The following patch targets the m_can driver and fixes the
introduction of FIFO bulk read support.

Another patch for the tcan4x5x driver, which fixes the max register
value for the regmap config.

The last patch for the flexcan driver marks the RX mailbox support for
the MCF5441X as support.

* tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: flexcan: mark RX via mailboxes as supported on MCF5441X
  can: tcan4x5x: regmap: fix max register value
  can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
  dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
  mailmap: update email address of Brian Silverman
====================

Link: https://lore.kernel.org/r/20220124175955.3464134-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agocan: flexcan: mark RX via mailboxes as supported on MCF5441X
Marc Kleine-Budde [Fri, 21 Jan 2022 08:22:34 +0000 (09:22 +0100)]
can: flexcan: mark RX via mailboxes as supported on MCF5441X

Most flexcan IP cores support 2 RX modes:
- FIFO
- mailbox

The flexcan IP core on the MCF5441X cannot receive CAN RTR messages
via mailboxes. However the mailbox mode is more performant. The commit

1c45f5778a3b ("can: flexcan: add ethtool support to change rx-rtr setting during runtime")

added support to switch from FIFO to mailbox mode on these cores.

After testing the mailbox mode on the MCF5441X by Angelo Dureghello,
this patch marks it (without RTR capability) as supported. Further the
IP core overview table is updated, that RTR reception via mailboxes is
not supported.

Link: https://lore.kernel.org/all/20220121084425.3141218-1-mkl@pengutronix.de
Tested-by: Angelo Dureghello <angelo@kernel-space.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: tcan4x5x: regmap: fix max register value
Marc Kleine-Budde [Fri, 14 Jan 2022 17:50:54 +0000 (18:50 +0100)]
can: tcan4x5x: regmap: fix max register value

The MRAM of the tcan4x5x has a size of 2K and starts at 0x8000. There
are no further registers in the tcan4x5x making 0x87fc the biggest
addressable register.

This patch fixes the max register value of the regmap config from
0x8ffc to 0x87fc.

Fixes: 6e1caaf8ed22 ("can: tcan4x5x: fix max register value")
Link: https://lore.kernel.org/all/20220119064011.2943292-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
Marc Kleine-Budde [Fri, 14 Jan 2022 14:35:01 +0000 (15:35 +0100)]
can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0

In order to optimize FIFO access, especially on m_can cores attached
to slow busses like SPI, in patch

e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors")

bulk read/write support has been added to the m_can_fifo_{read,write}
functions.

That change leads to the tcan driver to call
regmap_bulk_{read,write}() with a length of 0 (for CAN frames with 0
data length). regmap treats this as an error:

| tcan4x5x spi1.0 tcan4x5x0: FIFO write returned -22

This patch fixes the problem by not calling the
cdev->ops->{read,write)_fifo() in case of a 0 length read/write.

Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors")
Link: https://lore.kernel.org/all/20220114155751.2651888-1-mkl@pengutronix.de
Cc: stable@vger.kernel.org
Cc: Matt Kline <matt@bitbashing.io>
Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Reported-by: Michael Anochin <anochin@photo-meter.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agodt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
Marc Kleine-Budde [Fri, 14 Jan 2022 17:47:41 +0000 (18:47 +0100)]
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config

This tcan4x5x only comes with 2K of MRAM, a RX FIFO with a dept of 32
doesn't fit into the MRAM. Use a depth of 16 instead.

Fixes: 4edd396a1911 ("dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver")
Link: https://lore.kernel.org/all/20220119062951.2939851-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agomailmap: update email address of Brian Silverman
Marc Kleine-Budde [Mon, 10 Jan 2022 08:20:19 +0000 (09:20 +0100)]
mailmap: update email address of Brian Silverman

Brian Silverman's address at bluerivertech.com is not valid anymore,
use Brian's private email address instead.

Link: https://lore.kernel.org/all/20220110082359.2019735-1-mkl@pengutronix.de
Cc: Brian Silverman <bsilver16384@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoselftests, xsk: Fix rx_full stats test
Magnus Karlsson [Fri, 21 Jan 2022 12:35:08 +0000 (13:35 +0100)]
selftests, xsk: Fix rx_full stats test

Fix the rx_full stats test so that it correctly reports pass even when
the fill ring is not full of buffers.

Fixes: 872a1184dbf2 ("selftests: xsk: Put the same buffer only once in the fill ring")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20220121123508.12759-1-magnus.karlsson@gmail.com
3 years agobpf: Fix flexible_array.cocci warnings
kernel test robot [Sat, 22 Jan 2022 11:09:44 +0000 (12:09 +0100)]
bpf: Fix flexible_array.cocci warnings

Zero-length and one-element arrays are deprecated, see:
Documentation/process/deprecated.rst

Flexible-array members should be used instead.

Generated by: scripts/coccinelle/misc/flexible_array.cocci

Fixes: c1ff181ffabc ("selftests/bpf: Extend kfunc selftests")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/alpine.DEB.2.22.394.2201221206320.12220@hadrien
3 years agonet: stmmac: remove unused members in struct stmmac_priv
Jisheng Zhang [Sun, 23 Jan 2022 12:27:58 +0000 (20:27 +0800)]
net: stmmac: remove unused members in struct stmmac_priv

The tx_coalesce and mii_irq are not used at all now, so remove them.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: atlantic: Use the bitmap API instead of hand-writing it
Christophe JAILLET [Sun, 23 Jan 2022 06:53:46 +0000 (07:53 +0100)]
net: atlantic: Use the bitmap API instead of hand-writing it

Simplify code by using bitmap_weight() and bitmap_zero() instead of
hand-writing these functions.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoping: fix the sk_bound_dev_if match in ping_lookup
Xin Long [Sat, 22 Jan 2022 11:40:56 +0000 (06:40 -0500)]
ping: fix the sk_bound_dev_if match in ping_lookup

When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

the selftests 'router_broadcast.sh' will fail, as such command

  # ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b

can't receive the response skb by the PING socket. It's caused by mismatch
of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
as dif is vrf-h1 if dif's master was set to vrf-h1.

This patch is to fix this regression by also checking the sk_bound_dev_if
against sdif so that the packets can stil be received even if the socket
is not bound to the vrf device but to the real iif.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/smc: Transitional solution for clcsock race issue
Wen Gu [Sat, 22 Jan 2022 09:43:09 +0000 (17:43 +0800)]
net/smc: Transitional solution for clcsock race issue

We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
 RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
 Call Trace:
  <TASK>
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f16ba83918e
  </TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Fixes: fd57770dd198 ("net/smc: wait for pending work before clcsock release_sock")
Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: remove unused ->wait_capability
Sukadev Bhattiprolu [Sat, 22 Jan 2022 02:59:21 +0000 (18:59 -0800)]
ibmvnic: remove unused ->wait_capability

With previous bug fix, ->wait_capability flag is no longer needed and can
be removed.

Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: don't spin in tasklet
Sukadev Bhattiprolu [Sat, 22 Jan 2022 02:59:20 +0000 (18:59 -0800)]
ibmvnic: don't spin in tasklet

ibmvnic_tasklet() continuously spins waiting for responses to all
capability requests. It does this to avoid encountering an error
during initialization of the vnic. However if there is a bug in the
VIOS and we do not receive a response to one or more queries the
tasklet ends up spinning continuously leading to hard lock ups.

If we fail to receive a message from the VIOS it is reasonable to
timeout the login attempt rather than spin indefinitely in the tasklet.

Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: init ->running_cap_crqs early
Sukadev Bhattiprolu [Sat, 22 Jan 2022 02:59:19 +0000 (18:59 -0800)]
ibmvnic: init ->running_cap_crqs early

We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should
send out the next protocol message type. i.e when we get back responses
to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs.
Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we
send out the QUERY_IP_OFFLOAD CRQ.

We currently increment ->running_cap_crqs as we send out each CRQ and
have the ibmvnic_tasklet() send out the next message type, when this
running_cap_crqs count drops to 0.

This assumes that all the CRQs of the current type were sent out before
the count drops to 0. However it is possible that we send out say 6 CRQs,
get preempted and receive all the 6 responses before we send out the
remaining CRQs. This can result in ->running_cap_crqs count dropping to
zero before all messages of the current type were sent and we end up
sending the next protocol message too early.

Instead initialize the ->running_cap_crqs upfront so the tasklet will
only send the next protocol message after all responses are received.

Use the cap_reqs local variable to also detect any discrepancy (either
now or in future) in the number of capability requests we actually send.

Currently only send_query_cap() is affected by this behavior (of sending
next message early) since it is called from the worker thread (during
reset) and from application thread (during ->ndo_open()) and they can be
preempted. send_request_cap() is only called from the tasklet  which
processes CRQ responses sequentially, is not be affected.  But to
maintain the existing symmtery with send_query_capability() we update
send_request_capability() also.

Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: Allow extra failures before disabling
Sukadev Bhattiprolu [Sat, 22 Jan 2022 02:59:18 +0000 (18:59 -0800)]
ibmvnic: Allow extra failures before disabling

If auto-priority-failover (APF) is enabled and there are at least two
backing devices of different priorities, some resets like fail-over,
change-param etc can cause at least two back to back failovers. (Failover
from high priority backing device to lower priority one and then back
to the higher priority one if that is still functional).

Depending on the timimg of the two failovers it is possible to trigger
a "hard" reset and for the hard reset to fail due to failovers. When this
occurs, the driver assumes that the network is unstable and disables the
VNIC for a 60-second "settling time". This in turn can cause the ethtool
command to fail with "No such device" while the vnic automatically recovers
a little while later.

Given that it's possible to have two back to back failures, allow for extra
failures before disabling the vnic for the settling time.

Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4: fix ip option filtering for locally generated fragments
Jakub Kicinski [Sat, 22 Jan 2022 00:57:31 +0000 (16:57 -0800)]
ipv4: fix ip option filtering for locally generated fragments

During IP fragmentation we sanitize IP options. This means overwriting
options which should not be copied with NOPs. Only the first fragment
has the original, full options.

ip_fraglist_prepare() copies the IP header and options from previous
fragment to the next one. Commit 19c3401a917b ("net: ipv4: place control
buffer handling away from fragmentation iterators") moved sanitizing
options before ip_fraglist_prepare() which means options are sanitized
and then overwritten again with the old values.

Fixing this is not enough, however, nor did the sanitization work
prior to aforementioned commit.

ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen
for the length of the options. ipcb->opt of fragments is not populated
(it's 0), only the head skb has the state properly built. So even when
called at the right time ip_options_fragment() does nothing. This seems
to date back all the way to v2.5.44 when the fast path for pre-fragmented
skbs had been introduced. Prior to that ip_options_build() would have been
called for every fragment (in fact ever since v2.5.44 the fragmentation
handing in ip_options_build() has been dead code, I'll clean it up in
-next).

In the original patch (see Link) caixf mentions fixing the handling
for fragments other than the second one, but I'm not sure how _any_
fragment could have had their options sanitized with the code
as it stood.

Tested with python (MTU on lo lowered to 1000 to force fragmentation):

  import socket
  s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.setsockopt(socket.IPPROTO_IP, socket.IP_OPTIONS,
               bytearray([7,4,5,192, 20|0x80,4,1,0]))
  s.sendto(b'1'*2000, ('127.0.0.1', 1234))

Before:

IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
    localhost.36500 > localhost.search-agent: UDP, length 2000
IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
    localhost > localhost: udp
IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
    localhost > localhost: udp

After:

IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
    localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960
IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256))
    localhost > localhost: udp
IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256))
    localhost > localhost: udp

RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".

Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/
Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: caixf <ooppublic@163.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet-procfs: show net devices bound packet types
Jianguo Wu [Fri, 21 Jan 2022 09:15:31 +0000 (17:15 +0800)]
net-procfs: show net devices bound packet types

After commit:7866a621043f ("dev: add per net_device packet type chains"),
we can not get packet types that are bound to a specified net device by
/proc/net/ptype, this patch fix the regression.

Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:

Before:
  [root@localhost ~]# cat /proc/net/ptype
  Type Device      Function
  0800          ip_rcv
  0806          arp_rcv
  86dd          ipv6_rcv

After:
  [root@localhost ~]# cat /proc/net/ptype
  Type Device      Function
  ALL  ens192   tpacket_rcv
  0800          ip_rcv
  0806          arp_rcv
  86dd          ipv6_rcv

v1 -> v2:
  - fix the regression rather than adding new /proc API as
    suggested by Stephen Hemminger.

Fixes: 7866a621043f ("dev: add per net_device packet type chains")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobonding: use rcu_dereference_rtnl when get bonding active slave
Hangbin Liu [Fri, 21 Jan 2022 08:25:18 +0000 (16:25 +0800)]
bonding: use rcu_dereference_rtnl when get bonding active slave

bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
this function in rtnl protected context.

With this update, we can rmeove the rcu_read_lock/unlock in
bonding .ndo_eth_ioctl and .get_ts_info.

Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sfp: ignore disabled SFP node
Marek Behún [Wed, 19 Jan 2022 16:44:55 +0000 (17:44 +0100)]
net: sfp: ignore disabled SFP node

Commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices
and sfp cages") added code which finds SFP bus DT node even if the node
is disabled with status = "disabled". Because of this, when phylink is
created, it ends with non-null .sfp_bus member, even though the SFP
module is not probed (because the node is disabled).

We need to ignore disabled SFP bus node.

Fixes: ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages")
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org # 2203cbf2c8b5 ("net: sfp: move fwnode parsing into sfp-bus layer")
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: ioam: expect support for Queue depth data
Justin Iurman [Fri, 21 Jan 2022 17:34:49 +0000 (18:34 +0100)]
selftests: net: ioam: expect support for Queue depth data

The IOAM queue-depth data field was added a few weeks ago,
but the test unit was not updated accordingly.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: b63c5478e9cb ("ipv6: ioam: Support for Queue depth data field")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://lore.kernel.org/r/20220121173449.26918-1-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: Use struct_group() to avoid cross-field memset()
Kees Cook [Fri, 21 Jan 2022 07:39:35 +0000 (23:39 -0800)]
mptcp: Use struct_group() to avoid cross-field memset()

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Use struct_group() to capture the fields to be reset, so that memset()
can be appropriately bounds-checked by the compiler.

Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: mptcp@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220121073935.1154263-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agorxrpc: Adjust retransmission backoff
David Howells [Fri, 21 Jan 2022 23:12:58 +0000 (23:12 +0000)]
rxrpc: Adjust retransmission backoff

Improve retransmission backoff by only backing off when we retransmit data
packets rather than when we set the lost ack timer.

To this end:

 (1) In rxrpc_resend(), use rxrpc_get_rto_backoff() when setting the
     retransmission timer and only tell it that we are retransmitting if we
     actually have things to retransmit.

     Note that it's possible for the retransmission algorithm to race with
     the processing of a received ACK, so we may see no packets needing
     retransmission.

 (2) In rxrpc_send_data_packet(), don't bump the backoff when setting the
     ack_lost_at timer, as it may then get bumped twice.

With this, when looking at one particular packet, the retransmission
intervals were seen to be 1.5ms, 2ms, 3ms, 5ms, 9ms, 17ms, 33ms, 71ms,
136ms, 264ms, 544ms, 1.088s, 2.1s, 4.2s and 8.3s.

Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout")
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/164138117069.2023386.17446904856843997127.stgit@warthog.procyon.org.uk/
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mvneta: introduce XDP multi-buffer support'
Alexei Starovoitov [Fri, 21 Jan 2022 22:14:03 +0000 (14:14 -0800)]
Merge branch 'mvneta: introduce XDP multi-buffer support'

Lorenzo Bianconi says:

====================

This series introduces XDP frags support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

The main idea for the new XDP frags layout is to reuse the same
structure used for non-linear SKB. This rely on the "skb_shared_info"
struct at the end of the first buffer to link together subsequent
buffers. Keeping the layout compatible with SKBs is also done to ease
and speedup creating a SKB from an xdp_{buff,frame}.
Converting xdp_frame to SKB and deliver it to the network stack is shown
in patch 05/18 (e.g. cpumaps).

A frags bit (XDP_FLAGS_HAS_FRAGS) has been introduced in the flags
field of xdp_{buff,frame} structure to notify the bpf/network layer if
this is a non-linear xdp frame (XDP_FLAGS_HAS_FRAGS set) or not
(XDP_FLAGS_HAS_FRAGS not set).
The frags bit will be set by a xdp frags capable driver only
for non-linear frames maintaining the capability to receive linear frames
without any extra cost since the skb_shared_info structure at the end
of the first buffer will be initialized only if XDP_FLAGS_HAS_FRAGS bit
is set. Moreover the flags field in xdp_{buff,frame} will be reused even for
xdp rx csum offloading in future series.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO/GRO for XDP_REDIRECT

The three following ebpf helpers (and related selftests) has been introduced:
- bpf_xdp_load_bytes:
  This helper is provided as an easy way to load data from a xdp buffer. It
  can be used to load len bytes from offset from the frame associated to
  xdp_md, into the buffer pointed by buf.
- bpf_xdp_store_bytes:
  Store len bytes from buffer buf into the frame associated to xdp_md, at
  offset.
- bpf_xdp_get_buff_len:
  Return the total frame size (linear + paged parts)

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account non-linear xdp frames.
Moreover, similar to skb_header_pointer, we introduced bpf_xdp_pointer utility
routine to return a pointer to a given position in the xdp_buff if the
requested area (offset + len) is contained in a contiguous memory area
otherwise it must be copied in a bounce buffer provided by the caller running
bpf_xdp_copy_buf().

BPF_F_XDP_HAS_FRAGS flag has been introduced to notify the kernel the
eBPF program fully support xdp frags.
SEC("xdp.frags"), SEC_DEF("xdp.frags/devmap") and SEC_DEF("xdp.frags/cpumap")
have been introduced to declare xdp frags support.
The NIC driver is expected to reject an eBPF program if it is running in
XDP frags mode and the program does not support XDP frags.
In the same way it is not possible to mix XDP frags and XDP legacy
programs in a CPUMAP/DEVMAP or tailcall a XDP frags/legacy program from
a legacy/frags one.

More info about the main idea behind this approach can be found here [1][2].

Changes since v22:
- remove leftover CHECK macro usage
- reintroduce SEC_XDP_FRAGS flag in sec_def_flags
- rename xdp multi_frags in xdp frags
- do not report xdp_frags support in fdinfo

Changes since v21:
- rename *_mb in *_frags: e.g:
  s/xdp_buff_is_mb/xdp_buff_has_frags
- rely on ASSERT_* and not on CHECK in
  bpf_xdp_load_bytes/bpf_xdp_store_bytes self-tests
- change new multi.frags SEC definitions to use the following schema:
  prog_type.prog_flags/attach_place
- get rid of unnecessary properties in new multi.frags SEC definitions
- rebase on top of bpf-next

Changes since v20:
- rebase to current bpf-next

Changes since v19:
- do not run deprecated bpf_prog_load()
- rely on skb_frag_size_add/skb_frag_size_sub in
  bpf_xdp_mb_increase_tail/bpf_xdp_mb_shrink_tail
- rely on sinfo->nr_frags in bpf_xdp_mb_shrink_tail to check if the frame has
  been shrunk to a single-buffer one
- allow XDP_REDIRECT of a xdp-mb frame into a CPUMAP

Changes since v18:
- fix bpf_xdp_copy_buf utility routine when we want to load/store data
  contained in frag<n>
- add a selftest for bpf_xdp_load_bytes/bpf_xdp_store_bytes when the caller
  accesses data contained in frag<n> and frag<n+1>

Changes since v17:
- rework bpf_xdp_copy to squash base and frag management
- remove unused variable in bpf_xdp_mb_shrink_tail()
- move bpf_xdp_copy_buf() out of bpf_xdp_pointer()
- add sanity check for len in bpf_xdp_pointer()
- remove EXPORT_SYMBOL for __xdp_return()
- introduce frag_size field in xdp_rxq_info to let the driver specify max value
  for xdp fragments. frag_size set to 0 means the tail increase of last the
  fragment is not supported.

Changes since v16:
- do not allow tailcalling a xdp multi-buffer/legacy program from a
  legacy/multi-buff one.
- do not allow mixing xdp multi-buffer and xdp legacy programs in a
  CPUMAP/DEVMAP
- add selftests for CPUMAP/DEVMAP xdp mb compatibility
- disable XDP_REDIRECT for xdp multi-buff for the moment
- set max offset value to 0xffff in bpf_xdp_pointer
- use ARG_PTR_TO_UNINIT_MEM and ARG_CONST_SIZE for arg3_type and arg4_type
  of bpf_xdp_store_bytes/bpf_xdp_load_bytes

Changes since v15:
- let the verifier check buf is not NULL in
  bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
- return an error if offset + length is over frame boundaries in
  bpf_xdp_pointer routine
- introduce BPF_F_XDP_MB flag for bpf_attr to notify the kernel the eBPF
  program fully supports xdp multi-buffer.
- reject a non XDP multi-buffer program if the driver is running in
  XDP multi-buffer mode.

Changes since v14:
- intrudce bpf_xdp_pointer utility routine and
  bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
- drop bpf_xdp_adjust_data helper
- drop xdp_frags_truesize in skb_shared_info
- explode bpf_xdp_mb_adjust_tail in bpf_xdp_mb_increase_tail and
  bpf_xdp_mb_shrink_tail

Changes since v13:
- use u32 for xdp_buff/xdp_frame flags field
- rename xdp_frags_tsize in xdp_frags_truesize
- fixed comments

Changes since v12:
- fix bpf_xdp_adjust_data helper for single-buffer use case
- return -EFAULT in bpf_xdp_adjust_{head,tail} in case the data pointers are not
  properly reset
- collect ACKs from John

Changes since v11:
- add missing static to bpf_xdp_get_buff_len_proto structure
- fix bpf_xdp_adjust_data helper when offset is smaller than linear area length.

Changes since v10:
- move xdp->data to the requested payload offset instead of to the beginning of
  the fragment in bpf_xdp_adjust_data()

Changes since v9:
- introduce bpf_xdp_adjust_data helper and related selftest
- add xdp_frags_size and xdp_frags_tsize fields in skb_shared_info
- introduce xdp_update_skb_shared_info utility routine in ordere to not reset
  frags array in skb_shared_info converting from a xdp_buff/xdp_frame to a skb
- simplify bpf_xdp_copy routine

Changes since v8:
- add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
- switch back to skb_shared_info implementation from previous xdp_shared_info
  one
- avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
  regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
  penalties for regular codebase
- add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
- add data_len field in skb_shared_info struct
- introduce XDP_FLAGS_FRAGS_PF_MEMALLOC flag

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (3):
  bpf: add frags support to the bpf_xdp_adjust_tail() API
  bpf: add frags support to xdp copy helpers
  bpf: selftests: update xdp_adjust_tail selftest to include xdp frags

Lorenzo Bianconi (19):
  net: skbuff: add size metadata to skb_shared_info for xdp
  xdp: introduce flags field in xdp_buff/xdp_frame
  net: mvneta: update frags bit before passing the xdp buffer to eBPF
    layer
  net: mvneta: simplify mvneta_swbm_add_rx_fragment management
  net: xdp: add xdp_update_skb_shared_info utility routine
  net: marvell: rely on xdp_update_skb_shared_info utility routine
  xdp: add frags support to xdp_return_{buff/frame}
  net: mvneta: add frags support to XDP_TX
  bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf
    program
  net: mvneta: enable jumbo frames if the loaded XDP program support
    frags
  bpf: introduce bpf_xdp_get_buff_len helper
  bpf: move user_size out of bpf_test_init
  bpf: introduce frags support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature
  libbpf: Add SEC name for xdp frags programs
  net: xdp: introduce bpf_xdp_pointer utility routine
  bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
  bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
  xdp: disable XDP_REDIRECT for xdp frags
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoxdp: disable XDP_REDIRECT for xdp frags
Lorenzo Bianconi [Fri, 21 Jan 2022 10:10:06 +0000 (11:10 +0100)]
xdp: disable XDP_REDIRECT for xdp frags

XDP_REDIRECT is not fully supported yet for xdp frags since not
all XDP capable drivers can map non-linear xdp_frame in ndo_xdp_xmit
so disable it for the moment.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/0da25e117d0e2673f5d0ce6503393c55c6fb1be9.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
Lorenzo Bianconi [Fri, 21 Jan 2022 10:10:05 +0000 (11:10 +0100)]
bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags

Verify compatibility checks attaching a XDP frags program to a
CPUMAP/DEVMAP

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/d94b4d35adc1e42c9ca5004e6b2cdfd75992304d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
Lorenzo Bianconi [Fri, 21 Jan 2022 10:10:04 +0000 (11:10 +0100)]
bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest

Introduce kernel selftest for new bpf_xdp_{load,store}_bytes helpers.
and bpf_xdp_pointer/bpf_xdp_copy_buf utility routines.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/2c99ae663a5dcfbd9240b1d0489ad55dea4f4601.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: xdp: introduce bpf_xdp_pointer utility routine
Lorenzo Bianconi [Fri, 21 Jan 2022 10:10:03 +0000 (11:10 +0100)]
net: xdp: introduce bpf_xdp_pointer utility routine

Similar to skb_header_pointer, introduce bpf_xdp_pointer utility routine
to return a pointer to a given position in the xdp_buff if the requested
area (offset + len) is contained in a contiguous memory area otherwise it
will be copied in a bounce buffer provided by the caller.
Similar to the tc counterpart, introduce the two following xdp helpers:
- bpf_xdp_load_bytes
- bpf_xdp_store_bytes

Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/ab285c1efdd5b7a9d361348b1e7d3ef49f6382b3.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: generalise tail call map compatibility check
Toke Hoiland-Jorgensen [Fri, 21 Jan 2022 10:10:02 +0000 (11:10 +0100)]
bpf: generalise tail call map compatibility check

The check for tail call map compatibility ensures that tail calls only
happen between maps of the same type. To ensure backwards compatibility for
XDP frags we need a similar type of check for cpumap and devmap
programs, so move the state from bpf_array_aux into bpf_map, add
xdp_has_frags to the check, and apply the same check to cpumap and devmap.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agolibbpf: Add SEC name for xdp frags programs
Lorenzo Bianconi [Fri, 21 Jan 2022 10:10:01 +0000 (11:10 +0100)]
libbpf: Add SEC name for xdp frags programs

Introduce support for the following SEC entries for XDP frags
property:
- SEC("xdp.frags")
- SEC("xdp.frags/devmap")
- SEC("xdp.frags/cpumap")

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/af23b6e4841c171ad1af01917839b77847a4bc27.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: selftests: update xdp_adjust_tail selftest to include xdp frags
Eelco Chaudron [Fri, 21 Jan 2022 10:10:00 +0000 (11:10 +0100)]
bpf: selftests: update xdp_adjust_tail selftest to include xdp frags

This change adds test cases for the xdp frags scenarios when shrinking
and growing.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/d2e6a0ebc52db6f89e62b9befe045032e5e0a5fe.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:59 +0000 (11:09 +0100)]
bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature

introduce xdp_shared_info pointer in bpf_test_finish signature in order
to copy back paged data from a xdp frags frame to userspace buffer

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c803673798c786f915bcdd6c9338edaa9740d3d6.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: introduce frags support to bpf_prog_test_run_xdp()
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:58 +0000 (11:09 +0100)]
bpf: introduce frags support to bpf_prog_test_run_xdp()

Introduce the capability to allocate a xdp frags in
bpf_prog_test_run_xdp routine. This is a preliminary patch to
introduce the selftests for new xdp frags ebpf helpers

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/b7c0e425a9287f00f601c4fc0de54738ec6ceeea.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: move user_size out of bpf_test_init
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:57 +0000 (11:09 +0100)]
bpf: move user_size out of bpf_test_init

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp frags selftest

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/6b48d38ed3d60240d7d6bb15e6fa7fabfac8dfb2.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: add frags support to xdp copy helpers
Eelco Chaudron [Fri, 21 Jan 2022 10:09:56 +0000 (11:09 +0100)]
bpf: add frags support to xdp copy helpers

This patch adds support for frags for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/340b4a99cdc24337b40eaf8bb597f9f9e7b0373e.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: add frags support to the bpf_xdp_adjust_tail() API
Eelco Chaudron [Fri, 21 Jan 2022 10:09:55 +0000 (11:09 +0100)]
bpf: add frags support to the bpf_xdp_adjust_tail() API

This change adds support for tail growing and shrinking for XDP frags.

When called on a non-linear packet with a grow request, it will work
on the last fragment of the packet. So the maximum grow size is the
last fragments tailroom, i.e. no new buffer will be allocated.
A XDP frags capable driver is expected to set frag_size in xdp_rxq_info
data structure to notify the XDP core the fragment size.
frag_size set to 0 is interpreted by the XDP core as tail growing is
not allowed.
Introduce __xdp_rxq_info_reg utility routine to initialize frag_size field.

When shrinking, it will work from the last fragment, all the way down to
the base buffer depending on the shrinking size. It's important to mention
that once you shrink down the fragment(s) are freed, so you can not grow
again to the original size.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/eabda3485dda4f2f158b477729337327e609461d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: introduce bpf_xdp_get_buff_len helper
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:54 +0000 (11:09 +0100)]
bpf: introduce bpf_xdp_get_buff_len helper

Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/aac9ac3504c84026cf66a3c71b7c5ae89bc991be.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: mvneta: enable jumbo frames if the loaded XDP program support frags
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:53 +0000 (11:09 +0100)]
net: mvneta: enable jumbo frames if the loaded XDP program support frags

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode if the loaded program declare to properly support
xdp frags. At same time reject a xdp program not supporting xdp frags
if the driver is running in xdp frags mode.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/6909f81a3cbb8fb6b88e914752c26395771b882a.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:52 +0000 (11:09 +0100)]
bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program

Introduce BPF_F_XDP_HAS_FRAGS and the related field in bpf_prog_aux
in order to notify the driver the loaded program support xdp frags.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/db2e8075b7032a356003f407d1b0deb99adaa0ed.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: mvneta: add frags support to XDP_TX
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:51 +0000 (11:09 +0100)]
net: mvneta: add frags support to XDP_TX

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/5d46ab63870ffe96fb95e6075a7ff0c81ef6424d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoxdp: add frags support to xdp_return_{buff/frame}
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:50 +0000 (11:09 +0100)]
xdp: add frags support to xdp_return_{buff/frame}

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/a961069febc868508ce1bdf5e53a343eb4e57cb2.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: marvell: rely on xdp_update_skb_shared_info utility routine
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:49 +0000 (11:09 +0100)]
net: marvell: rely on xdp_update_skb_shared_info utility routine

Rely on xdp_update_skb_shared_info routine in order to avoid
resetting frags array in skb_shared_info structure building
the skb in mvneta_swbm_build_skb(). Frags array is expected to
be initialized by the receiving driver building the xdp_buff
and here we just need to update memory metadata.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/e0dad97f5d02b13f189f99f1e5bc8e61bef73412.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: xdp: add xdp_update_skb_shared_info utility routine
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:48 +0000 (11:09 +0100)]
net: xdp: add xdp_update_skb_shared_info utility routine

Introduce xdp_update_skb_shared_info routine to update frags array
metadata in skb_shared_info data structure converting to a skb from
a xdp_buff or xdp_frame.
According to the current skb_shared_info architecture in
xdp_frame/xdp_buff and to the xdp frags support, there is
no need to run skb_add_rx_frag() and reset frags array converting the buffer
to a skb since the frag array will be in the same position for xdp_buff/xdp_frame
and for the skb, we just need to update memory metadata.
Introduce XDP_FLAGS_PF_MEMALLOC flag in xdp_buff_flags in order to mark
the xdp_buff or xdp_frame as under memory-pressure if pages of the frags array
are under memory pressure. Doing so we can avoid looping over all fragments in
xdp_update_skb_shared_info routine. The driver is expected to set the
flag constructing the xdp_buffer using xdp_buff_set_frag_pfmemalloc
utility routine.
Rely on xdp_update_skb_shared_info in __xdp_build_skb_from_frame routine
converting the non-linear xdp_frame to a skb after performing a XDP_REDIRECT.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/bfd23fb8a8d7438724f7819c567cdf99ffd6226f.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: mvneta: simplify mvneta_swbm_add_rx_fragment management
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:47 +0000 (11:09 +0100)]
net: mvneta: simplify mvneta_swbm_add_rx_fragment management

Relying on xdp frags bit, remove skb_shared_info structure
allocated on the stack in mvneta_rx_swbm routine and simplify
mvneta_swbm_add_rx_fragment accessing skb_shared_info in the
xdp_buff structure directly. There is no performance penalty in
this approach since mvneta_swbm_add_rx_fragment is run just
for xdp frags use-case.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/45f050c094ccffce49d6bc5112939ed35250ba90.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: mvneta: update frags bit before passing the xdp buffer to eBPF layer
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:46 +0000 (11:09 +0100)]
net: mvneta: update frags bit before passing the xdp buffer to eBPF layer

Update frags bit (XDP_FLAGS_HAS_FRAGS) in xdp_buff to notify
XDP/eBPF layer and XDP remote drivers if this is a "non-linear"
XDP buffer. Access skb_shared_info only if XDP_FLAGS_HAS_FRAGS flag
is set in order to avoid possible cache-misses.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c00a73097f8a35860d50dae4a36e6cc9ef7e172f.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoxdp: introduce flags field in xdp_buff/xdp_frame
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:45 +0000 (11:09 +0100)]
xdp: introduce flags field in xdp_buff/xdp_frame

Introduce flags field in xdp_frame and xdp_buffer data structures
to define additional buffer features. At the moment the only
supported buffer feature is frags bit (XDP_FLAGS_HAS_FRAGS).
frags bit is used to specify if this is a linear buffer
(XDP_FLAGS_HAS_FRAGS not set) or a frags frame (XDP_FLAGS_HAS_FRAGS
set). In the latter case the driver is expected to initialize the
skb_shared_info structure at the end of the first buffer to link together
subsequent buffers belonging to the same frame.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/e389f14f3a162c0a5bc6a2e1aa8dd01a90be117d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agonet: skbuff: add size metadata to skb_shared_info for xdp
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:44 +0000 (11:09 +0100)]
net: skbuff: add size metadata to skb_shared_info for xdp

Introduce xdp_frags_size field in skb_shared_info data structure
to store xdp_buff/xdp_frame frame paged size (xdp_frags_size will
be used in xdp frags support). In order to not increase
skb_shared_info size we will use a hole due to skb_shared_info
alignment.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/8a849819a3e0a143d540f78a3a5add76e17e980d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoMerge branch 'octeontx2-af-fixes'
David S. Miller [Fri, 21 Jan 2022 14:32:21 +0000 (14:32 +0000)]
Merge branch 'octeontx2-af-fixes'

Subbaraya Sundeep says:

====================
octeontx-af2: Fixes for CN10K and CN9xxx platforms

This patchset has consolidated fixes in Octeontx2 driver
handling CN10K and CN9xxx platforms. When testing the
new CN10K hardware some issues resurfaced like accessing
wrong register for CN10K and enabling loopback on not supported
interfaces. Some fixes are needed for CN9xxx platforms as well.

Below is the description of patches

Patch 1: AF sets RX RSS action for all the VFs when a VF is
brought up. But when a PF sets RX action for its VF like Drop/Direct
to a queue in ntuple filter it is not retained because of AF fixup.
This patch skips modifying VF RX RSS action if PF has already
set its action.

Patch 2: When configuring backpressure wrong register is being read for
LBKs hence fixed it.

Patch 3: Some RVU blocks may take longer time to reset but are guaranteed
to complete the reset. Hence wait till reset is complete.

Patch 4: For enabling LMAC CN10K needs another register compared
to CN9xxx platforms. Hence changed it.

Patch 5: Adds missing barrier before submitting memory pointer
to the aura hardware.

Patch 6: Increase polling time while link credit restore and also
return proper error code when timeout occurs.

Patch 7: Internal loopback not supported on LPCS interfaces like
SGMII/QSGMII so do not enable it.

Patch 8: When there is a error in message processing, AF sets the error
response and replies back to requestor. PF forwards a invalid message to
VF back if AF reply has error in it. This way VF lacks the actual error set
by AF for its message. This is changed such that PF simply forwards the
actual reply and let VF handle the error.

Patch 9: ntuple filter with "flow-type ether proto 0x8842 vlan 0x92e"
was not working since ethertype 0x8842 is NGIO protocol. Hardware
parser explicitly parses such NGIO packets and sets the packet as
NGIO and do not set it as tagged packet. Fix this by changing parser
such that it sets the packet as both NGIO and tagged by using
separate layer types.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Add KPU changes to parse NGIO as separate layer
Kiran Kumar K [Fri, 21 Jan 2022 06:34:47 +0000 (12:04 +0530)]
octeontx2-af: Add KPU changes to parse NGIO as separate layer

With current KPU profile NGIO is being parsed along with CTAG as
a single layer. Because of this MCAM/ntuple rules installed with
ethertype as 0x8842 are not being hit. Adding KPU profile changes
to parse NGIO in separate ltype and CTAG in separate ltype.

Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes")
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-pf: Forward error codes to VF
Subbaraya Sundeep [Fri, 21 Jan 2022 06:34:46 +0000 (12:04 +0530)]
octeontx2-pf: Forward error codes to VF

PF forwards its VF messages to AF and corresponding
replies from AF to VF. AF sets proper error code in the
replies after processing message requests. Currently PF
checks the error codes in replies and sends invalid
message to VF. This way VF lacks the information of
error code set by AF for its messages. This patch
changes that such that PF simply forwards AF replies
so that VF can handle error codes.

Fixes: d424b6c02415 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces
Geetha sowjanya [Fri, 21 Jan 2022 06:34:45 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces

Internal looback is not supported to low rate LPCS interface like
SGMII/QSGMII. Hence don't allow to enable for such interfaces.

Fixes: 3ad3f8f93c81 ("octeontx2-af: cn10k: MAC internal loopback support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Increase link credit restore polling timeout
Geetha sowjanya [Fri, 21 Jan 2022 06:34:44 +0000 (12:04 +0530)]
octeontx2-af: Increase link credit restore polling timeout

It's been observed that sometimes link credit restore takes
a lot of time than the current timeout. This patch increases
the default timeout value and return the proper error value
on failure.

Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-pf: cn10k: Ensure valid pointers are freed to aura
Geetha sowjanya [Fri, 21 Jan 2022 06:34:43 +0000 (12:04 +0530)]
octeontx2-pf: cn10k: Ensure valid pointers are freed to aura

While freeing SQB pointers to aura, driver first memcpy to
target address and then triggers lmtst operation to free pointer
to the aura. We need to ensure(by adding dmb barrier)that memcpy
is finished before pointers are freed to the aura. This patch also
adds the missing sq context structure entry in debugfs.

Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: cn10k: Use appropriate register for LMAC enable
Geetha sowjanya [Fri, 21 Jan 2022 06:34:42 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Use appropriate register for LMAC enable

CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
register for lmac TX/RX enable whereas CN9xxx platforms use
CGX_CMRX_CONFIG register. This config change was missed when
adding support for CN10K RPM.

Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Retry until RVU block reset complete
Geetha sowjanya [Fri, 21 Jan 2022 06:34:41 +0000 (12:04 +0530)]
octeontx2-af: Retry until RVU block reset complete

Few RVU blocks like SSO require more time for reset on some
silicons. Hence retrying the block reset until success.

Fixes: c0fa2cff8822c ("octeontx2-af: Handle return value in block reset")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Fix LBK backpressure id count
Sunil Goutham [Fri, 21 Jan 2022 06:34:40 +0000 (12:04 +0530)]
octeontx2-af: Fix LBK backpressure id count

In rvu_nix_get_bpid() lbk_bpid_cnt is being read from
wrong register. Due to this backpressure enable is failing
for LBK VF32 onwards. This patch fixes that.

Fixes: fe1939bb2340 ("octeontx2-af: Add SDP interface support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Do not fixup all VF action entries
Subbaraya Sundeep [Fri, 21 Jan 2022 06:34:39 +0000 (12:04 +0530)]
octeontx2-af: Do not fixup all VF action entries

AF modifies all the rules destined for VF to use
the action same as default RSS action. This fixup
was needed because AF only installs default rules with
RSS action. But the action in rules installed by a PF
for its VFs should not be changed by this fixup.
This is because action can be drop or direct to
queue as specified by user(ntuple filters).
This patch fixes that problem.

Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'wireless-2022-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 21 Jan 2022 11:23:33 +0000 (11:23 +0000)]
Merge tag 'wireless-2022-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.17

First set of fixes for v5.17. This is the first pull request from the
new wireless tree and only changes to MAINTAINERS file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Fri, 21 Jan 2022 10:30:30 +0000 (10:30 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-20

This series contains updates to i40e driver only.

Jedrzej increases delay for EMP reset and adds checks to ensure a VF
request to change queues can be met.

Sylwester moves the placement of the Flow Director queue as to not
fragment the queue pile which would cause later re-allocation issues.

Karen prevents VF reset being invoked while another is still occurring
to avoid reading invalid data.

Joe Damato fixes some statistics fields to match the values of the
fields they are based on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: bpf: test BPF_PROG_QUERY for progs attached to sockmap
Di Zhu [Wed, 19 Jan 2022 01:40:05 +0000 (09:40 +0800)]
selftests: bpf: test BPF_PROG_QUERY for progs attached to sockmap

Add test for querying progs attached to sockmap. we use an existing
libbpf query interface to query prog cnt before and after progs
attaching to sockmap and check whether the queried prog id is right.

Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-2-zhudi2@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: support BPF_PROG_QUERY for progs attached to sockmap
Di Zhu [Wed, 19 Jan 2022 01:40:04 +0000 (09:40 +0800)]
bpf: support BPF_PROG_QUERY for progs attached to sockmap

Right now there is no way to query whether BPF programs are
attached to a sockmap or not.

we can use the standard interface in libbpf to query, such as:
bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...);
the mapFd is the fd of sockmap.

Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoMerge branch 'libbpf: streamline netlink-based XDP APIs'
Alexei Starovoitov [Fri, 21 Jan 2022 05:22:03 +0000 (21:22 -0800)]
Merge branch 'libbpf: streamline netlink-based XDP APIs'

Andrii Nakryiko says:

====================

Revamp existing low-level XDP APIs provided by libbpf to follow more
consistent naming (new APIs follow bpf_tc_xxx() approach where it makes
sense) and be extensible without ABI breakages (OPTS-based). See patch #1 for
details, remaining patches switch bpftool, selftests/bpf and samples/bpf to
new APIs.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agosamples/bpf: adapt samples/bpf to bpf_xdp_xxx() APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:22 +0000 (22:14 -0800)]
samples/bpf: adapt samples/bpf to bpf_xdp_xxx() APIs

Use new bpf_xdp_*() APIs across all XDP-related BPF samples.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: switch to new libbpf XDP APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:21 +0000 (22:14 -0800)]
selftests/bpf: switch to new libbpf XDP APIs

Switch to using new bpf_xdp_*() APIs across all selftests. Take
advantage of a more straightforward and user-friendly semantics of
old_prog_fd (0 means "don't care") in few places.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpftool: use new API for attaching XDP program
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:20 +0000 (22:14 -0800)]
bpftool: use new API for attaching XDP program

Switch to new bpf_xdp_attach() API to avoid deprecation warnings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agolibbpf: streamline low-level XDP APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:19 +0000 (22:14 -0800)]
libbpf: streamline low-level XDP APIs

Introduce 4 new netlink-based XDP APIs for attaching, detaching, and
querying XDP programs:
  - bpf_xdp_attach;
  - bpf_xdp_detach;
  - bpf_xdp_query;
  - bpf_xdp_query_id.

These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts,
bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter
don't follow a consistent naming pattern and some of them use
non-extensible approaches (e.g., struct xdp_link_info which can't be
modified without breaking libbpf ABI).

The approach I took with these low-level XDP APIs is similar to what we
did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs
bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching
when -1 is specified for prog_fd for generality and convenience, but
bpf_xdp_detach() is preferred due to clearer naming and associated
semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same
opts struct allowing to specify expected old_prog_fd.

While doing the refactoring, I noticed that old APIs require users to
specify opts with old_fd == -1 to declare "don't care about already
attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is
essentially never an intended behavior. So I made this behavior
consistent with other kernel and libbpf APIs, in which zero FD means "no
FD". This seems to be more in line with the latest thinking in BPF land
and should cause less user confusion, hopefully.

For querying, I left two APIs, both more generic bpf_xdp_query()
allowing to query multiple IDs and attach mode, but also
a specialization of it, bpf_xdp_query_id(), which returns only requested
prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so
prevalent across selftests and samples, that it seemed a very common use
case and using bpf_xdp_query() for doing it felt very cumbersome with
a highly branches if/else chain based on flags and attach mode.

Old APIs are scheduled for deprecation in libbpf 0.8 release.

  [0] Closes: https://github.com/libbpf/libbpf/issues/309

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoMerge branch 'libbpf: deprecate legacy BPF map definitions'
Alexei Starovoitov [Fri, 21 Jan 2022 05:19:05 +0000 (21:19 -0800)]
Merge branch 'libbpf: deprecate legacy BPF map definitions'

Andrii Nakryiko says:

====================

Officially deprecate legacy BPF map definitions in libbpf. They've been slated
for deprecation for a while in favor of more powerful BTF-defined map
definitions and this patch set adds warnings and a way to enforce this in
libbpf through LIBBPF_STRICT_MAP_DEFINITIONS strict mode flag.

Selftests are fixed up and updated, BPF documentation is updated, bpftool's
strict mode usage is adjusted to avoid breaking users unnecessarily.

v1->v2:
  - replace missed bpf_map_def case in Documentation/bpf/btf.rst (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agodocs/bpf: update BPF map definition example
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:29 +0000 (22:05 -0800)]
docs/bpf: update BPF map definition example

Use BTF-defined map definition in the documentation example.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agolibbpf: deprecate legacy BPF map definitions
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:28 +0000 (22:05 -0800)]
libbpf: deprecate legacy BPF map definitions

Enact deprecation of legacy BPF map definition in SEC("maps") ([0]). For
the definitions themselves introduce LIBBPF_STRICT_MAP_DEFINITIONS flag
for libbpf strict mode. If it is set, error out on any struct
bpf_map_def-based map definition. If not set, libbpf will print out
a warning for each legacy BPF map to raise awareness that it goes away.

For any use of BPF_ANNOTATE_KV_PAIR() macro providing a legacy way to
associate BTF key/value type information with legacy BPF map definition,
warn through libbpf's pr_warn() error message (but don't fail BPF object
open).

BPF-side struct bpf_map_def is marked as deprecated. User-space struct
bpf_map_def has to be used internally in libbpf, so it is left
untouched. It should be enough for bpf_map__def() to be marked
deprecated to raise awareness that it goes away.

bpftool is an interesting case that utilizes libbpf to open BPF ELF
object to generate skeleton. As such, even though bpftool itself uses
full on strict libbpf mode (LIBBPF_STRICT_ALL), it has to relax it a bit
for BPF map definition handling to minimize unnecessary disruptions. So
opt-out of LIBBPF_STRICT_MAP_DEFINITIONS for bpftool. User's code that
will later use generated skeleton will make its own decision whether to
enforce LIBBPF_STRICT_MAP_DEFINITIONS or not.

There are few tests in selftests/bpf that are consciously using legacy
BPF map definitions to test libbpf functionality. For those, temporary
opt out of LIBBPF_STRICT_MAP_DEFINITIONS mode for the duration of those
tests.

  [0] Closes: https://github.com/libbpf/libbpf/issues/272

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: convert remaining legacy map definitions
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:27 +0000 (22:05 -0800)]
selftests/bpf: convert remaining legacy map definitions

Converted few remaining legacy BPF map definition to BTF-defined ones.
For the remaining two bpf_map_def-based legacy definitions that we want
to keep for testing purposes until libbpf 1.0 release, guard them in
pragma to suppres deprecation warnings which will be added in libbpf in
the next commit.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: fail build on compilation warning
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:26 +0000 (22:05 -0800)]
selftests/bpf: fail build on compilation warning

It's very easy to miss compilation warnings without -Werror, which is
not set for selftests. libbpf and bpftool are already strict about this,
so make selftests/bpf also treat compilation warnings as errors to catch
such regressions early.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoMerge branch 'mptcp-a-few-fixes'
Jakub Kicinski [Fri, 21 Jan 2022 04:24:03 +0000 (20:24 -0800)]
Merge branch 'mptcp-a-few-fixes'

Mat Martineau says:

====================
mptcp: A few fixes

Patch 1 fixes a RCU locking issue when processing a netlink command that
updates endpoint flags in the in-kernel MPTCP path manager.

Patch 2 fixes a typo affecting available endpoint id tracking.

Patch 3 fixes IPv6 routing in the MPTCP self tests.
====================

Link: https://lore.kernel.org/r/20220121003529.54930-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: fix ipv6 routing setup
Paolo Abeni [Fri, 21 Jan 2022 00:35:29 +0000 (16:35 -0800)]
selftests: mptcp: fix ipv6 routing setup

MPJ ipv6 selftests currently lack per link route to the server
net. Additionally, ipv6 subflows endpoints are created without any
interface specified. The end-result is that in ipv6 self-tests
subflows are created all on the same link, leading to expected delays
and sporadic self-tests failures.

Fix the issue by adding the missing setup bits.

Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases")
Reported-and-tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: fix removing ids bitmap setting
Geliang Tang [Fri, 21 Jan 2022 00:35:28 +0000 (16:35 -0800)]
mptcp: fix removing ids bitmap setting

In mptcp_pm_nl_rm_addr_or_subflow(), the bit of rm_list->ids[i] in the
id_avail_bitmap should be set, not rm_list->ids[1]. This patch fixed it.

Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
Paolo Abeni [Fri, 21 Jan 2022 00:35:27 +0000 (16:35 -0800)]
mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()

The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.

This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.

Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Jakub Kicinski [Fri, 21 Jan 2022 04:22:30 +0000 (20:22 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Incorrect helper module alias in netbios_ns, from Florian Westphal.

2) Remove unused variable in nf_tables.

3) Uninitialized last expression in nf_tables register tracking.

4) Memleak in nft_connlimit after moving stateful data out of the
   expression data area.

5) Bogus invalid stats update when NF_REPEAT is returned, from Florian.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: conntrack: don't increment invalid counter on NF_REPEAT
  netfilter: nft_connlimit: memleak if nf_ct_netns_get() fails
  netfilter: nf_tables: set last expression in register tracking area
  netfilter: nf_tables: remove unused variable
  netfilter: nf_conntrack_netbios_ns: fix helper module alias
====================

Link: https://lore.kernel.org/r/20220120125212.991271-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv6: annotate accesses to fn->fn_sernum
Eric Dumazet [Thu, 20 Jan 2022 17:41:12 +0000 (09:41 -0800)]
ipv6: annotate accesses to fn->fn_sernum

struct fib6_node's fn_sernum field can be
read while other threads change it.

Add READ_ONCE()/WRITE_ONCE() annotations.

Do not change existing smp barriers in fib6_get_cookie_safe()
and __fib6_update_sernum_upto_root()

syzbot reported:

BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket

write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
 fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
 fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
 fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
 fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
 __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
 fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
 rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
 addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
 addrconf_dad_work+0x908/0x1170
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:359
 ret_from_fork+0x1f/0x30

read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
 fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
 rt6_get_cookie include/net/ip6_fib.h:306 [inline]
 ip6_dst_store include/net/ip6_route.h:234 [inline]
 inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
 inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
 __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
 mptcp_push_release net/mptcp/protocol.c:1491 [inline]
 __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
 mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 kernel_sendmsg+0x97/0xd0 net/socket.c:745
 sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
 inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
 kernel_sendpage+0x187/0x200 net/socket.c:3492
 sock_sendpage+0x5a/0x70 net/socket.c:1007
 pipe_to_sendpage+0x128/0x160 fs/splice.c:364
 splice_from_pipe_feed fs/splice.c:418 [inline]
 __splice_from_pipe+0x207/0x500 fs/splice.c:562
 splice_from_pipe fs/splice.c:597 [inline]
 generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x80/0xa0 fs/splice.c:936
 splice_direct_to_actor+0x345/0x650 fs/splice.c:891
 do_splice_direct+0x106/0x190 fs/splice.c:979
 do_sendfile+0x675/0xc40 fs/read_write.c:1245
 __do_sys_sendfile64 fs/read_write.c:1310 [inline]
 __se_sys_sendfile64 fs/read_write.c:1296 [inline]
 __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000026f -> 0x00000271

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

The Fixes tag I chose is probably arbitrary, I do not think
we need to backport this patch to older kernels.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: add a missing sk_defer_free_flush() in tcp_splice_read()
Eric Dumazet [Thu, 20 Jan 2022 12:45:30 +0000 (04:45 -0800)]
tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

Without it, splice users can hit the warning
added in commit 79074a72d335 ("net: Flush deferred skb free on socket destroy")

Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
Fixes: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20220120124530.925607-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: Add a stub for sk_defer_free_flush()
Gal Pressman [Thu, 20 Jan 2022 12:34:40 +0000 (14:34 +0200)]
tcp: Add a stub for sk_defer_free_flush()

When compiling the kernel with CONFIG_INET disabled, the
sk_defer_free_flush() should be defined as a nop.

This resolves the following compilation error:
  ld: net/core/sock.o: in function `sk_defer_free_flush':
  ./include/net/tcp.h:1378: undefined reference to `__sk_defer_free_flush'

Fixes: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220120123440.9088-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agophylib: fix potential use-after-free
Marek Behún [Wed, 19 Jan 2022 16:27:48 +0000 (17:27 +0100)]
phylib: fix potential use-after-free

Commit bafbdd527d56 ("phylib: Add device reset GPIO support") added call
to phy_device_reset(phydev) after the put_device() call in phy_detach().

The comment before the put_device() call says that the phydev might go
away with put_device().

Fix potential use-after-free by calling phy_device_reset() before
put_device().

Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220119162748.32418-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n
Kumar Kartikeya Dwivedi [Thu, 20 Jan 2022 16:49:32 +0000 (22:19 +0530)]
selftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n

Some users have complained that selftests fail to build when
CONFIG_NF_CONNTRACK=m. It would be useful to allow building as long as
it is set to module or built-in, even though in case of building as
module, user would need to load it before running the selftest. Note
that this also allows building selftest when CONFIG_NF_CONNTRACK is
disabled.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220120164932.2798544-1-memxor@gmail.com
3 years agoselftests: bpf: Fix bind on used port
Felix Maurer [Tue, 18 Jan 2022 15:11:56 +0000 (16:11 +0100)]
selftests: bpf: Fix bind on used port

The bind_perm BPF selftest failed when port 111/tcp was already in use
during the test. To fix this, the test now runs in its own network name
space.

To use unshare, it is necessary to reorder the includes. The style of
the includes is adapted to be consistent with the other prog_tests.

v2: Replace deprecated CHECK macro with ASSERT_OK

Fixes: 8259fdeb30326 ("selftests/bpf: Verify that rebinding to port < 1024 from BPF works")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/551ee65533bb987a43f93d88eaf2368b416ccd32.1642518457.git.fmaurer@redhat.com
3 years agoMerge branch 'rely on ASSERT marcos in xdp_bpf2bpf.c/xdp_adjust_tail.c'
Andrii Nakryiko [Thu, 20 Jan 2022 21:54:57 +0000 (13:54 -0800)]
Merge branch 'rely on ASSERT marcos in xdp_bpf2bpf.c/xdp_adjust_tail.c'

Lorenzo Bianconi says:

====================

Rely on ASSERT* macros and get rid of deprecated CHECK ones in xdp_bpf2bpf and
xdp_adjust_tail bpf selftests.
This is a preliminary series for XDP multi-frags support.

Changes since v1:
- run each ASSERT test separately
- drop unnecessary return statements
- drop unnecessary if condition in test_xdp_bpf2bpf()
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>