]> www.infradead.org Git - nvme.git/log
nvme.git
4 years agoMerge branch 'RPMSG-WWAN-CTRL-driver'
David S. Miller [Fri, 18 Jun 2021 20:13:40 +0000 (13:13 -0700)]
Merge branch 'RPMSG-WWAN-CTRL-driver'

Stephan Gerhold says:

====================
net: wwan: Add RPMSG WWAN CTRL driver

This patch series adds a WWAN "control" driver for the remote processor
messaging (rpmsg) subsystem. This subsystem allows communicating with
an integrated modem DSP on many Qualcomm SoCs, e.g. MSM8916 or MSM8974.

The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.

For more information, see commit message in PATCH 2/3.

I already posted a RFC for this a while ago:
https://lore.kernel.org/linux-arm-msm/YLfL9Q+4860uqS8f@gerhold.net/
and now I'm looking for some feedback for the actual changes. :)

Changes in v3:
  - PATCH 2/3: Clarify commit message
  - PATCH 3/3: Fix build error for cdc-wdm.c, use extra tx_blocking() op instead
v2: https://lore.kernel.org/netdev/20210618075243.42046-1-stephan@gerhold.net/

Changes in v2: Only in PATCH 3/3
  - Fix EPOLLOUT being always set even if poll op is defined
  - Rename poll() op -> tx_poll() since it should be only used for TX
v1: https://lore.kernel.org/netdev/20210615133229.213064-1-stephan@gerhold.net/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: wwan: Allow WWAN drivers to provide blocking tx and poll function
Stephan Gerhold [Fri, 18 Jun 2021 17:36:11 +0000 (19:36 +0200)]
net: wwan: Allow WWAN drivers to provide blocking tx and poll function

At the moment, the WWAN core provides wwan_port_txon/off() to implement
blocking writes. The tx() port operation should not block, instead
wwan_port_txon/off() should be called when the TX queue is full or has
free space again.

However, in some cases it is not straightforward to make use of that
functionality. For example, the RPMSG API used by rpmsg_wwan_ctrl.c
does not provide any way to be notified when the TX queue has space
again. Instead, it only provides the following operations:

  - rpmsg_send(): blocking write (wait until there is space)
  - rpmsg_trysend(): non-blocking write (return error if no space)
  - rpmsg_poll(): set poll flags depending on TX queue state

Generally that's totally sufficient for implementing a char device,
but it does not fit well to the currently provided WWAN port ops.

Most of the time, using the non-blocking rpmsg_trysend() in the
WWAN tx() port operation works just fine. However, with high-frequent
writes to the char device it is possible to trigger a situation
where this causes issues. For example, consider the following
(somewhat unrealistic) example:

 # dd if=/dev/zero bs=1000 of=/dev/wwan0qmi0
 dd: error writing '/dev/wwan0qmi0': Resource temporarily unavailable
 1+0 records out

This fails immediately after writing the first record. It's likely
only a matter of time until this triggers issues for some real application
(e.g. ModemManager sending a lot of large QMI packets).

The rpmsg_char device does not have this problem, because it uses
rpmsg_trysend() and rpmsg_poll() to support non-blocking operations.
Make it possible to use the same in the RPMSG WWAN driver by adding
two new optional wwan_port_ops:

  - tx_blocking(): send data blocking if allowed
  - tx_poll(): set additional TX poll flags

This integrates nicely with the RPMSG API and does not require
any change in existing WWAN drivers.

With these changes, the dd example above blocks instead of exiting
with an error.

Cc: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: wwan: Add RPMSG WWAN CTRL driver
Stephan Gerhold [Fri, 18 Jun 2021 17:36:10 +0000 (19:36 +0200)]
net: wwan: Add RPMSG WWAN CTRL driver

The remote processor messaging (rpmsg) subsystem provides an interface
to communicate with other remote processors. On many Qualcomm SoCs this
is used to communicate with an integrated modem DSP that implements most
of the modem functionality and provides high-level protocols like
QMI or AT to allow controlling the modem.

For QMI, most older Qualcomm SoCs (e.g. MSM8916/MSM8974) have
a standalone "DATA5_CNTL" channel that allows exchanging QMI messages.
Note that newer SoCs (e.g. SDM845) only allow exchanging QMI messages
via a shared QRTR channel that is available via a socket API on Linux.

For AT, the "DATA4" channel accepts at least a limited set of AT
commands, on many older and newer Qualcomm SoCs, although QMI is
typically the preferred control protocol.

Often there are additional QMI/AT channels (usually named DATA*_CNTL
for QMI and DATA* for AT), but it is not clear if those are really
functional on all devices. Also, at the moment there is no use case
for having multiple QMI/AT ports. If needed more channels could be
added later after more testing.

Note that the data path (network interface) is entirely separate
from the control path and varies between Qualcomm SoCs, e.g. "IPA"
on newer Qualcomm SoCs or "BAM-DMUX" on some older ones.

The RPMSG WWAN CTRL driver exposes the QMI/AT control ports via the
WWAN subsystem, and therefore allows userspace like ModemManager to
set up the modem. Until now, ModemManager had to use the RPMSG-specific
rpmsg-char where the channels must be explicitly exposed as a char
device first and don't show up directly in sysfs.

The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.

Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agorpmsg: core: Add driver_data for rpmsg_device_id
Stephan Gerhold [Fri, 18 Jun 2021 17:36:09 +0000 (19:36 +0200)]
rpmsg: core: Add driver_data for rpmsg_device_id

Most device_id structs provide a driver_data field that can be used
by drivers to associate data more easily for a particular device ID.
Add the same for the rpmsg_device_id.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 18 Jun 2021 20:10:36 +0000 (13:10 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Jesse Brandeburg says:

====================
100GbE Intel Wired LAN Driver Updates 2021-06-18

Update three of the Intel Ethernet drivers with similar (but not the
same) improvements to simplify the packet type table init, while removing
an unused structure entry. For the ice driver, the table is extended
to 10 bits, which is the hardware limit, and for now is initialized
to zero.

The end result is slightly reduced memory usage, removal of a bunch
of code, and more specific initialization.
====================

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
4 years agoRevert "net: add pf_family_names[] for protocol family"
David S. Miller [Fri, 18 Jun 2021 20:02:45 +0000 (13:02 -0700)]
Revert "net: add pf_family_names[] for protocol family"

This reverts commit 1f3c98eaddec857e16a7a1c6cd83317b3dc89438.

Does not build...

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: add pf_family_names[] for protocol family
Yejune Deng [Fri, 18 Jun 2021 14:32:47 +0000 (22:32 +0800)]
net: add pf_family_names[] for protocol family

Modify the pr_info content from int to char *, this looks more readable.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'csock-seqpoacket-small-fixes'
David S. Miller [Fri, 18 Jun 2021 19:59:53 +0000 (12:59 -0700)]
Merge branch 'csock-seqpoacket-small-fixes'

Stefano Garzarella says:

====================
vsock: small fixes for seqpacket support

This series contains few patches to clean up a bit the code
of seqpacket recently merged in the net-next tree.

No functionality changes.
====================

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
4 years agovsock/virtio: remove redundant `copy_failed` variable
Stefano Garzarella [Fri, 18 Jun 2021 13:35:26 +0000 (15:35 +0200)]
vsock/virtio: remove redundant `copy_failed` variable

When memcpy_to_msg() fails in virtio_transport_seqpacket_do_dequeue(),
we already set `dequeued_len` with the negative error value returned
by memcpy_to_msg().

So we can directly check `dequeued_len` value instead of using a
dedicated flag variable to skip the copy path for the rest of
fragments.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: rename vsock_wait_data()
Stefano Garzarella [Fri, 18 Jun 2021 13:35:25 +0000 (15:35 +0200)]
vsock: rename vsock_wait_data()

vsock_wait_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_wait_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: rename vsock_has_data()
Stefano Garzarella [Fri, 18 Jun 2021 13:35:24 +0000 (15:35 +0200)]
vsock: rename vsock_has_data()

vsock_has_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_has_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoNFC: nxp-nci: remove unnecessary label
wengjianfeng [Fri, 18 Jun 2021 08:52:26 +0000 (16:52 +0800)]
NFC: nxp-nci: remove unnecessary label

Remove unnecessary label chunk_exit and return directly.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: completely error out in sja1105_static_config_reload if something...
Vladimir Oltean [Fri, 18 Jun 2021 13:48:12 +0000 (16:48 +0300)]
net: dsa: sja1105: completely error out in sja1105_static_config_reload if something fails

If reloading the static config fails for whatever reason, for example if
sja1105_static_config_check_valid() fails, then we "goto out_unlock_ptp"
but we print anyway that "Reset switch and programmed static config.",
which is confusing because we didn't. We also do a bunch of other stuff
like reprogram the XPCS and reload the credit-based shapers, as if a
switch reset took place, which didn't.

So just unlock the PTP lock and goto out, skipping all of that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: allow the TTEthernet configuration in the static config for SJA1110
Vladimir Oltean [Fri, 18 Jun 2021 13:44:00 +0000 (16:44 +0300)]
net: dsa: sja1105: allow the TTEthernet configuration in the static config for SJA1110

Currently sja1105_static_config_check_valid() is coded up to detect
whether TTEthernet is supported based on device ID, and this check was
not updated to cover SJA1110.

However, it is desirable to have as few checks for the device ID as
possible, so the driver core is more generic. So what we can do is look
at the static config table operations implemented by that specific
switch family (populated by sja1105_static_config_init) whether the
schedule table has a non-zero maximum entry count (meaning that it is
supported) or not.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: fix reuse conflict of the rx page
Yunsheng Lin [Fri, 18 Jun 2021 12:09:45 +0000 (20:09 +0800)]
net: hns3: fix reuse conflict of the rx page

In the current rx page reuse handling process, the rx page buffer may
have conflict between driver and stack in high-pressure scenario.

To fix this problem, we need to check whether the page is only owned
by driver at the begin and at the end of a page to make sure there is
no reuse conflict between driver and stack when desc_cb->page_offset
is rollbacked to zero or increased.

Fixes: fa7711b888f2 ("net: hns3: optimize the rx page reuse handling process")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: properly power down the microcontroller clock for SJA1110
Vladimir Oltean [Fri, 18 Jun 2021 11:52:54 +0000 (14:52 +0300)]
net: dsa: sja1105: properly power down the microcontroller clock for SJA1110

It turns out that powering down the BASE_TIMER_CLK does not turn off the
microcontroller, just its timers, including the one for the watchdog.
So the embedded microcontroller is still running, and potentially still
doing things.

To prevent unwanted interference, we should power down the BASE_MCSS_CLK
as well (MCSS = microcontroller subsystem).

The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110
from the .clocking_setup() method, mostly because this is a Clock
Generation Unit (CGU) setting which was traditionally configured in that
method for SJA1105. But in SJA1105, the CGU was used for bringing up the
port clocks at the proper speeds, and in SJA1110 it's not (but rather
for initial configuration), so it's best that we rebrand the
sja1110_clocking_setup() method into what it really is - an implementation
of the .disable_microcontroller() method.

Since disabling the microcontroller only needs to be done once, at probe
time, we can choose the best place to do that as being in sja1105_setup(),
before we upload the static config to the device. This guarantees that
the static config being used by the switch afterwards is really ours.

Note that the procedure to upload a static config necessarily resets the
switch. This already did not reset the microcontroller, only the switch
core, so since the .disable_microcontroller() method is guaranteed to be
called by that point, if it's disabled, it remains disabled. Add a
comment to make that clear.

With the code movement for SJA1110 from .clocking_setup() to
.disable_microcontroller(), both methods are optional and are guarded by
"if" conditions.

Tested by enabling in the device tree the rev-mii switch port 0 that
goes towards the microcontroller, and flashing a firmware that would
have networking. Without this patch, the microcontroller can be pinged,
with this patch it cannot.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoqlcnic: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 10:19:19 +0000 (11:19 +0100)]
qlcnic: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
it is redundant and can be removed.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: bridge: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 10:01:55 +0000 (11:01 +0100)]
net: bridge: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
invert the if expression and remove the continue.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 09:44:25 +0000 (10:44 +0100)]
net: stmmac: remove redundant continue statement

The continue statement in the for-loop has no effect, remove it.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: pxa168_eth: Fix a potential data race in pxa168_eth_remove
Pavel Machek [Fri, 18 Jun 2021 09:35:26 +0000 (11:35 +0200)]
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove

Commit 0571a753cb07 cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.

Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes: 0571a753cb07 ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoNFC: nxp-nci: remove unnecessary labels
wengjianfeng [Fri, 18 Jun 2021 09:10:16 +0000 (17:10 +0800)]
NFC: nxp-nci: remove unnecessary labels

Simplify the code by removing unnecessary labels and returning directly.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoethernet: marvell/octeontx2: Simplify the return expression of npc_is_same
dingsenjie [Fri, 18 Jun 2021 07:34:31 +0000 (15:34 +0800)]
ethernet: marvell/octeontx2: Simplify the return expression of npc_is_same

Simplify the return expression in the rvu_npc_fs.c

Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: caif: modify the label out_err to out
Dongliang Mu [Fri, 18 Jun 2021 07:32:04 +0000 (15:32 +0800)]
net: caif: modify the label out_err to out

Modify the label out_err to out to avoid the meanless kfree.

Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: vlan: pass thru all GSO_SOFTWARE in hw_enc_features
Jakub Kicinski [Fri, 18 Jun 2021 04:55:56 +0000 (21:55 -0700)]
net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features

Currently UDP tunnel devices on top of VLANs lose the ability
to offload UDP GSO. Widen the pass thru features from TSO
to all GSO_SOFTWARE.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: dwmac: Add bindings for new Loongson SoC and bridge chip
Qing Zhang [Fri, 18 Jun 2021 02:53:37 +0000 (10:53 +0800)]
dt-bindings: dwmac: Add bindings for new Loongson SoC and bridge chip

Add the dwmac bindings for the Loongson-2K SoC and the LS7A
bridge chip.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMIPS: Loongson64: DTS: Add GMAC support for LS7A PCH
Qing Zhang [Fri, 18 Jun 2021 02:53:36 +0000 (10:53 +0800)]
MIPS: Loongson64: DTS: Add GMAC support for LS7A PCH

The GMAC module is now supported, enable it.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMIPS: Loongson64: Add GMAC support for Loongson-2K1000
Qing Zhang [Fri, 18 Jun 2021 02:53:35 +0000 (10:53 +0800)]
MIPS: Loongson64: Add GMAC support for Loongson-2K1000

The GMAC module is now supported, enable it.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agostmmac: pci: Add dwmac support for Loongson
Qing Zhang [Fri, 18 Jun 2021 02:53:34 +0000 (10:53 +0800)]
stmmac: pci: Add dwmac support for Loongson

This GMAC module is integrated into the Loongson-2K SoC and the LS7A
bridge chip.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hostess_sv11-cleanups'
David S. Miller [Fri, 18 Jun 2021 18:42:40 +0000 (11:42 -0700)]
Merge branch 'hostess_sv11-cleanups'

Peng Li says:

====================
net: hostess_sv11: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: fix the alignment issue
Peng Li [Fri, 18 Jun 2021 02:32:24 +0000 (10:32 +0800)]
net: hostess_sv11: fix the alignment issue

Alignment should match open parenthesis.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: fix the comments style issue
Peng Li [Fri, 18 Jun 2021 02:32:23 +0000 (10:32 +0800)]
net: hostess_sv11: fix the comments style issue

Networking block comments don't use an empty /* line,
use /* Comment...

Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.

This patch fixes the comments style issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: remove dead code
Peng Li [Fri, 18 Jun 2021 02:32:22 +0000 (10:32 +0800)]
net: hostess_sv11: remove dead code

This patch removes the dead code.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: fix the code style issue about switch and case
Peng Li [Fri, 18 Jun 2021 02:32:21 +0000 (10:32 +0800)]
net: hostess_sv11: fix the code style issue about switch and case

According to the chackpatch.pl,
switch and case should be at the same indent.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: remove trailing whitespace
Peng Li [Fri, 18 Jun 2021 02:32:20 +0000 (10:32 +0800)]
net: hostess_sv11: remove trailing whitespace

This patch removes trailing whitespace.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: move out assignment in if condition
Peng Li [Fri, 18 Jun 2021 02:32:19 +0000 (10:32 +0800)]
net: hostess_sv11: move out assignment in if condition

Should not use assignment in if condition.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hostess_sv11: fix the code style issue about "foo* bar"
Peng Li [Fri, 18 Jun 2021 02:32:18 +0000 (10:32 +0800)]
net: hostess_sv11: fix the code style issue about "foo* bar"

Fix the checkpatch error as "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mptcp-dss-checksums'
David S. Miller [Fri, 18 Jun 2021 18:40:12 +0000 (11:40 -0700)]
Merge branch 'mptcp-dss-checksums'

Mat Martineau says:

====================
mptcp: DSS checksum support

RFC 8684 defines a DSS checksum feature that allows MPTCP to detect
middlebox interference with the MPTCP DSS header and the portion of the
data stream associated with that header. So far, the MPTCP
implementation in the Linux kernel has not supported this feature.

This patch series adds DSS checksum support. By default, the kernel will
not request checksums when sending SYN or SYN/ACK packets for MPTCP
connections. Outgoing checksum requests can be enabled with a
per-namespace net.mptcp.checksum_enabled sysctl. MPTCP connections will
now proceed with DSS checksums when the peer requests them, whether the
sysctl is enabled or not.

Patches 1-5 add checksum bits to the outgoing SYN, SYN/ACK, and data
packet headers. This includes calculating the checksum using a range of
data and the MPTCP DSS mapping for that data.

Patches 6-10 handle the checksum request in the SYN or SYN/ACK, and
receiving and verifying the DSS checksum on data packets.

Patch 11 adjusts the MPTCP-level retransmission process for checksum
compatibility.

Patches 12-14 add checksum-related MIBs, the net.mptcp.checksum_enabled
sysctl, and a checksum field to debug trace output.

Patches 15 & 16 add selftests.

The series is slightly longer than the preferred 15-patch limit that
patchwork warns about. I do try to stay below that whenever possible -
this series does implement one feature and is, I think, cohesive enough
to justify keeping it together. If it's at all problematic please let me
know!

A trivial merge conflict with net/master is introduced in patch 15: a
commit in net/master removes a couple of nearby lines of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: mptcp: enable checksum in mptcp_join.sh
Geliang Tang [Thu, 17 Jun 2021 23:46:22 +0000 (16:46 -0700)]
selftests: mptcp: enable checksum in mptcp_join.sh

This patch added a new argument "-C" for the mptcp_join.sh script to set
the sysctl checksum_enabled to 1 in ns1 and ns2 to enable the data
checksum.

In chk_join_nr, check the counter of the mib for the data checksum.

Also added a new argument "-S" for the mptcp_join.sh script to start the
test cases that verify the checksum handshake:

  * Sender and listener both have checksums off
  * Sender and listener both have checksums on
  * Sender checksums off, listener checksums on
  * Sender checksums on, listener checksums off

The output looks like this:

 01 checksum test 0 0                  sum[ ok ] - csum  [ ok ]
 02 checksum test 1 1                  sum[ ok ] - csum  [ ok ]
 03 checksum test 0 1                  sum[ ok ] - csum  [ ok ]
 04 checksum test 1 0                  sum[ ok ] - csum  [ ok ]
 05 no JOIN                            syn[ ok ] - synack[ ok ] - ack[ ok ]
                                       sum[ ok ] - csum  [ ok ]
 06 single subflow, limited by client  syn[ ok ] - synack[ ok ] - ack[ ok ]
                                       sum[ ok ] - csum  [ ok ]

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: mptcp: enable checksum in mptcp_connect.sh
Geliang Tang [Thu, 17 Jun 2021 23:46:21 +0000 (16:46 -0700)]
selftests: mptcp: enable checksum in mptcp_connect.sh

This patch added a new argument "-C" for the mptcp_connect.sh script to
set the sysctl checksum_enabled to 1 in ns1, ns2, ns3 and ns4 to enable
the data checksum.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: dump csum fields in mptcp_dump_mpext
Geliang Tang [Thu, 17 Jun 2021 23:46:20 +0000 (16:46 -0700)]
mptcp: dump csum fields in mptcp_dump_mpext

In mptcp_dump_mpext, dump the csum fields, csum and csum_reqd in struct
mptcp_dump_mpext too.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add a new sysctl checksum_enabled
Geliang Tang [Thu, 17 Jun 2021 23:46:19 +0000 (16:46 -0700)]
mptcp: add a new sysctl checksum_enabled

This patch added a new sysctl, named checksum_enabled, to control
whether DSS checksum can be enabled.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add the mib for data checksum
Geliang Tang [Thu, 17 Jun 2021 23:46:18 +0000 (16:46 -0700)]
mptcp: add the mib for data checksum

This patch added the mib for the data checksum, MPTCP_MIB_DATACSUMERR.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: tune re-injections for csum enabled mode
Paolo Abeni [Thu, 17 Jun 2021 23:46:17 +0000 (16:46 -0700)]
mptcp: tune re-injections for csum enabled mode

If the MPTCP-level checksum is enabled, on re-injections we
must spool a complete DSS, or the receive side will not be
able to compute the csum and process any data.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: validate the data checksum
Paolo Abeni [Thu, 17 Jun 2021 23:46:16 +0000 (16:46 -0700)]
mptcp: validate the data checksum

This patch added three new members named data_csum, csum_len and
map_csum in struct mptcp_subflow_context, implemented a new function
named mptcp_validate_data_checksum().

If the current mapping is valid and csum is enabled traverse the later
pending skbs and compute csum incrementally till the whole mapping has
been covered. If not enough data is available in the rx queue, return
MAPPING_EMPTY - that is, no data.

Next subflow_data_ready invocation will trigger again csum computation.

When the full DSS is available, validate the csum and return to the
caller an appropriate error code, to trigger subflow reset of fallback
as required by the RFC.

Additionally:
- if the csum prevence in the DSS don't match the negotiated value e.g.
  csum present, but not requested, return invalid mapping to trigger
  subflow reset.
- keep some csum state, to avoid re-compute the csum on the same data
  when multiple rx queue traversal are required.
- clean-up the uncompleted mapping from the receive queue on close, to
  allow proper subflow disposal

Co-developed-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: receive checksum for DSS
Geliang Tang [Thu, 17 Jun 2021 23:46:15 +0000 (16:46 -0700)]
mptcp: receive checksum for DSS

In mptcp_parse_option, adjust the expected_opsize, and always parse the
data checksum value from the receiving DSS regardless of csum presence.
Then save it in mp_opt->csum.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: receive checksum for MP_CAPABLE with data
Geliang Tang [Thu, 17 Jun 2021 23:46:14 +0000 (16:46 -0700)]
mptcp: receive checksum for MP_CAPABLE with data

This patch added a new member named csum in struct mptcp_options_received.

When parsing the MP_CAPABLE with data, if the checksum is enabled,
adjust the expected_opsize. If the receiving option length matches the
length with the data checksum, get the checksum value and save it in
mp_opt->csum. And in mptcp_incoming_options, pass it to mpext->csum.

We always parse any csum/nocsum combination and delay the presence check
to later code, to allow reset if missing.

Additionally, in the TX path, use the newly introduce ext field to avoid
MPTCP csum recomputation on TCP retransmission and unneeded csum update
on when setting the data fin_flag.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add csum_reqd in mptcp_options_received
Geliang Tang [Thu, 17 Jun 2021 23:46:13 +0000 (16:46 -0700)]
mptcp: add csum_reqd in mptcp_options_received

This patch added a new flag csum_reqd in struct mptcp_options_received, if
the flag MPTCP_CAP_CHECKSUM_REQD is set in the receiving MP_CAPABLE
suboption, set this flag.

In mptcp_sk_clone and subflow_finish_connect, if the csum_reqd flag is set,
enable the msk->csum_enabled flag.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add sk parameter for mptcp_get_options
Geliang Tang [Thu, 17 Jun 2021 23:46:12 +0000 (16:46 -0700)]
mptcp: add sk parameter for mptcp_get_options

This patch added a new parameter name sk in mptcp_get_options().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: send out checksum for DSS
Geliang Tang [Thu, 17 Jun 2021 23:46:11 +0000 (16:46 -0700)]
mptcp: send out checksum for DSS

In mptcp_write_options, if the checksum is enabled, adjust the option
length and send out the data checksum with DSS suboption.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: send out checksum for MP_CAPABLE with data
Geliang Tang [Thu, 17 Jun 2021 23:46:10 +0000 (16:46 -0700)]
mptcp: send out checksum for MP_CAPABLE with data

If the checksum is enabled, send out the data checksum with the
MP_CAPABLE suboption with data.

In mptcp_established_options_mp, save the data checksum in
opts->ext_copy.csum. In mptcp_write_options, adjust the option length and
send it out with the MP_CAPABLE suboption.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add csum_reqd in mptcp_out_options
Geliang Tang [Thu, 17 Jun 2021 23:46:09 +0000 (16:46 -0700)]
mptcp: add csum_reqd in mptcp_out_options

This patch added a new member csum_reqd in struct mptcp_out_options and
struct mptcp_subflow_request_sock. Initialized it with the helper
function mptcp_is_checksum_enabled().

In mptcp_write_options, if this field is enabled, send out the MP_CAPABLE
suboption with the MPTCP_CAP_CHECKSUM_REQD flag.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: generate the data checksum
Geliang Tang [Thu, 17 Jun 2021 23:46:08 +0000 (16:46 -0700)]
mptcp: generate the data checksum

This patch added a new member named csum in struct mptcp_ext, implemented
a new function named mptcp_generate_data_checksum().

Generate the data checksum in mptcp_sendmsg_frag, save it in mpext->csum.

Note that we must generate the csum for zero window probe, too.

Do the csum update incrementally, to avoid multiple csum computation
when the data is appended to existing skb.

Note that in a later patch we will skip unneeded csum related operation.
Changes not included here to keep the delta small.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: add csum_enabled in mptcp_sock
Geliang Tang [Thu, 17 Jun 2021 23:46:07 +0000 (16:46 -0700)]
mptcp: add csum_enabled in mptcp_sock

This patch added a new member named csum_enabled in struct mptcp_sock,
used a dummy mptcp_is_checksum_enabled() helper to initialize it.

Also added a new member named mptcpi_csum_enabled in struct mptcp_info
to expose the csum_enabled flag.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'seg6.end.dt6'
David S. Miller [Fri, 18 Jun 2021 18:35:47 +0000 (11:35 -0700)]
Merge branch 'seg6.end.dt6'

Andrea Mayer says:

====================
seg6: add support for SRv6 End.DT46 Behavior

SRv6 End.DT46 Behavior is defined in the IETF RFC 8986 [1] along with SRv6
End.DT4 and End.DT6 Behaviors.

The proposed End.DT46 implementation is meant to support the decapsulation
of both IPv4 and IPv6 traffic coming from a *single* SRv6 tunnel.
The SRv6 End.DT46 Behavior greatly simplifies the setup and operations of
SRv6 VPNs in the Linux kernel.

 - patch 1/2 is the core patch that adds support for the SRv6 End.DT46
   Behavior;

 - patch 2/2 adds the selftest for SRv6 End.DT46 Behavior.

The patch introducing the new SRv6 End.DT46 Behavior in iproute2 will
follow shortly.

Comments, suggestions and improvements are very welcome as always!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: seg6: add selftest for SRv6 End.DT46 Behavior
Andrea Mayer [Thu, 17 Jun 2021 17:16:45 +0000 (19:16 +0200)]
selftests: seg6: add selftest for SRv6 End.DT46 Behavior

this selftest is designed for evaluating the new SRv6 End.DT46 Behavior
used, in this example, for implementing IPv4/IPv6 L3 VPN use cases.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoseg6: add support for SRv6 End.DT46 Behavior
Andrea Mayer [Thu, 17 Jun 2021 17:16:44 +0000 (19:16 +0200)]
seg6: add support for SRv6 End.DT46 Behavior

IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.

The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.

The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.

The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.

To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:

 $ sysctl -wq net.vrf.strict_mode=1

Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.

The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================

This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).

In details, the following decapsulation scenarios were considered:

 1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
 1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
 2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
 2.b) SRv6 End.DT4 Behavior on patched kernel;
 3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
      End.DT46 patch);
 3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).

All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.

Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;

Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).

Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.

[2] https://www.cloudlab.us

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoDocumentation: ACPI: DSD: fix block code comments
Ioana Ciornei [Thu, 17 Jun 2021 15:55:52 +0000 (18:55 +0300)]
Documentation: ACPI: DSD: fix block code comments

Use the '.. code-block:: none' to properly highlight the documented DSDT
entries. This also fixes warnings in the documentation build process.

Fixes: e71305acd81c ("Documentation: ACPI: DSD: Document MDIO PHY")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoDocumentation: ACPI: DSD: include phy.rst in the toctree
Ioana Ciornei [Thu, 17 Jun 2021 15:55:51 +0000 (18:55 +0300)]
Documentation: ACPI: DSD: include phy.rst in the toctree

Include the new phy.rst into the index of the ACPI support
documentation.

Fixes: e71305acd81c ("Documentation: ACPI: DSD: Document MDIO PHY")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: neterion: vxge: remove redundant continue statement
Colin Ian King [Thu, 17 Jun 2021 12:14:49 +0000 (13:14 +0100)]
net: neterion: vxge: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
invert the if expression and remove the continue.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodrivers: net: netdevsim: fix devlink_trap selftests failing
Oleksandr Mazur [Thu, 17 Jun 2021 11:36:32 +0000 (14:36 +0300)]
drivers: net: netdevsim: fix devlink_trap selftests failing

devlink_trap tests for the netdevsim fail due to misspelled
debugfs file name. Change this name, as well as name of callback
function, to match the naming as in the devlink itself - 'trap_drop_counter'.

Test-results:
selftests: drivers/net/netdevsim: devlink_trap.sh
TEST: Initialization                                                [ OK ]
TEST: Trap action                                                   [ OK ]
TEST: Trap metadata                                                 [ OK ]
TEST: Non-existing trap                                             [ OK ]
TEST: Non-existing trap action                                      [ OK ]
TEST: Trap statistics                                               [ OK ]
TEST: Trap group action                                             [ OK ]
TEST: Non-existing trap group                                       [ OK ]
TEST: Trap group statistics                                         [ OK ]
TEST: Trap policer                                                  [ OK ]
TEST: Trap policer binding                                          [ OK ]
TEST: Port delete                                                   [ OK ]
TEST: Device delete                                                 [ OK ]

Fixes: a7b3527a43fe ("drivers: net: netdevsim: add devlink trap_drop_counter_get implementation")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoiavf: clean up packet type lookup table
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:07 +0000 (15:47 -0800)]
iavf: clean up packet type lookup table

Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoi40e: clean up packet type lookup table
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:06 +0000 (15:47 -0800)]
i40e: clean up packet type lookup table

Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: report hash type such as L2/L3/L4
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:05 +0000 (15:47 -0800)]
ice: report hash type such as L2/L3/L4

The hardware is reporting the type of the hash used for RSS
as a PTYPE field in the receive descriptor. Use this value to set
the skb packet hash type by extending the hash type table to
cover all 10-bits of possible values (requiring some variables
to be changed from u8 to u16), and then use that table to convert
to one of the possible values in enum pkt_hash_types.

While we're here, remove the unused ptype struct value, which
makes table init easier for the zero entries, and use ranged
initializer to remove a bunch of code (works with gcc and clang).

Without this change, the kernel will recalculate the hash in software,
which can consume extra CPU cycles.

Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Thu, 17 Jun 2021 19:11:28 +0000 (12:11 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-06-17

This series contains updates to ice driver only.

Jake corrects a couple of entries in the PTYPE table to properly
reflect the datasheet and removes unneeded NULL checks for some
PTP calls.

Paul reduces the scope of variables and removes the use of a local
variable.

Shaokun Zhang removes a duplicate function declaration.

Lorenzo Bianconi fixes a compilation warning if PTP_1588_CLOCK is
disabled.

Colin Ian King changes a for loop to remove an unneeded 'continue'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hdlc_ppp-cleanups'
David S. Miller [Thu, 17 Jun 2021 19:08:46 +0000 (12:08 -0700)]
Merge branch 'hdlc_ppp-cleanups'

Guangbin Huang says:

====================
net: hdlc_ppp: clean up some code style issues

This patchset clean up some code style issues.

---
Change Log:
V1 -> V2:
1. remove patch "net: hdlc_ppp: fix the comments style issue" and
patch "net: hdlc_ppp: remove redundant spaces" from this patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: add required space
Peng Li [Thu, 17 Jun 2021 14:03:19 +0000 (22:03 +0800)]
net: hdlc_ppp: add required space

Add space required after that ','.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: remove unnecessary out of memory message
Peng Li [Thu, 17 Jun 2021 14:03:18 +0000 (22:03 +0800)]
net: hdlc_ppp: remove unnecessary out of memory message

This patch removes unnecessary out of memory message,
to fix the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: move out assignment in if condition
Peng Li [Thu, 17 Jun 2021 14:03:17 +0000 (22:03 +0800)]
net: hdlc_ppp: move out assignment in if condition

Should not use assignment in if condition.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: fix the code style issue about "foo* bar"
Peng Li [Thu, 17 Jun 2021 14:03:16 +0000 (22:03 +0800)]
net: hdlc_ppp: fix the code style issue about "foo* bar"

Fix the checkpatch error as "foo* bar" or "foo*bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: add blank line after declarations
Peng Li [Thu, 17 Jun 2021 14:03:15 +0000 (22:03 +0800)]
net: hdlc_ppp: add blank line after declarations

This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hdlc_ppp: remove redundant blank lines
Peng Li [Thu, 17 Jun 2021 14:03:14 +0000 (22:03 +0800)]
net: hdlc_ppp: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mdio-nodes'
David S. Miller [Thu, 17 Jun 2021 19:06:53 +0000 (12:06 -0700)]
Merge branch 'mdio-nodes'

Ioana Ciornei says:

====================
net: mdio: setup both fwnode and of_node

The first patch in this series fixes a bug introduced by mistake in the
previous ACPI MDIO patch set.

The next two patches are adding a new helper which takes a device and a
fwnode_handle and populates both the of_node and fwnode so that we make
sure that a bug like this does not happen anymore.
Also, the new helper is used in the MDIO area.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio: use device_set_node() to setup both fwnode and of
Ioana Ciornei [Thu, 17 Jun 2021 12:29:05 +0000 (15:29 +0300)]
net: mdio: use device_set_node() to setup both fwnode and of

Use the newly introduced helper to setup both the of_node and the
fwnode for a given device.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodriver core: add a helper to setup both the of_node and fwnode of a device
Ioana Ciornei [Thu, 17 Jun 2021 12:29:04 +0000 (15:29 +0300)]
driver core: add a helper to setup both the of_node and fwnode of a device

There are many places where both the fwnode_handle and the of_node of a
device need to be populated. Add a function which does both so that we
have consistency.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio: setup of_node for the MDIO device
Ioana Ciornei [Thu, 17 Jun 2021 12:29:03 +0000 (15:29 +0300)]
net: mdio: setup of_node for the MDIO device

By mistake, the of_node of the MDIO device was not setup in the patch
linked below. As a consequence, any PHY driver that depends on the
of_node in its probe callback was not be able to successfully finish its
probe on a PHY, thus the Generic PHY driver was used instead.

Fix this by actually setting up the of_node.

Fixes: bc1bee3b87ee ("net: mdiobus: Introduce fwnode_mdiobus_register_phy()")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8152: store the information of the pipes
Hayes Wang [Thu, 17 Jun 2021 10:00:15 +0000 (18:00 +0800)]
r8152: store the information of the pipes

Store the information of the pipes to avoid calling usb_rcvctrlpipe(),
usb_sndctrlpipe(), usb_rcvbulkpipe(), usb_sndbulkpipe(), and
usb_rcvintpipe() frequently.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Thu, 17 Jun 2021 18:54:56 +0000 (11:54 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-17

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).

The main changes are:

1) BPF infrastructure to migrate TCP child sockets from a listener to another
   in the same reuseport group/map, from Kuniyuki Iwashima.

2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
   noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.

3) Streamline error reporting changes in libbpf as planned out in the
   'libbpf: the road to v1.0' effort, from Andrii Nakryiko.

4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.

5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
   types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.

6) Support new LLVM relocations in libbpf to make them more linker friendly,
   also add a doc to describe the BPF backend relocations, from Yonghong Song.

7) Silence long standing KUBSAN complaints on register-based shifts in
   interpreter, from Daniel Borkmann and Eric Biggers.

8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
   target arch cannot be determined, from Lorenz Bauer.

9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.

10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.

11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
    programs to bpf_helpers.h header, from Florent Revest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'gianfar-64-bit-stats'
David S. Miller [Thu, 17 Jun 2021 18:39:48 +0000 (11:39 -0700)]
Merge branch 'gianfar-64-bit-stats'

Esben Haabendal says:

====================
net: gianfar: 64-bit statistics and rx_missed_errors counter

This series replaces the legacy 32-bit statistics to proper 64-bit ditto,
and implements rx_missed_errors counter on top of that.

The device supports a 16-bit RDRP counter, and a related carry bit and
interrupt, which allows implementation of a robust 64-bit counter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Implement rx_missed_errors counter
Esben Haabendal [Thu, 17 Jun 2021 09:49:28 +0000 (11:49 +0200)]
net: gianfar: Implement rx_missed_errors counter

Devices with RMON support has a 16-bit RDRP counter.  It provides: "Receive
dropped packets counter. Increments for frames received which are streamed
to system but are later dropped due to lack of system resources."

To handle more than 2^16 dropped packets, a carry bit in CAR1 register is
set on overflow, so we enable irq when this is set, extending the counter
to 2^64 for handling situations where lots of packets are missed (e.g.
during heavy network storms).

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Add definitions for CAR1 and CAM1 register bits
Esben Haabendal [Thu, 17 Jun 2021 09:49:26 +0000 (11:49 +0200)]
net: gianfar: Add definitions for CAR1 and CAM1 register bits

These are for carry status and interrupt mask bits of statistics registers.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Avoid 16 bytes of memset
Esben Haabendal [Thu, 17 Jun 2021 09:49:23 +0000 (11:49 +0200)]
net: gianfar: Avoid 16 bytes of memset

The memset on CAMx is wrong, as it actually unmasks all carry irq's,
which we clearly are not interested in.

The memset on CARx registers is just pointless, as they are W1C.

So let's just stop the memset before CAR1.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Clear CAR registers
Esben Haabendal [Thu, 17 Jun 2021 09:49:20 +0000 (11:49 +0200)]
net: gianfar: Clear CAR registers

The CAR1 and CAR2 registers are W1C style registers, to the memset does not
actually clear them.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Extend statistics counters to 64-bit
Esben Haabendal [Thu, 17 Jun 2021 09:49:17 +0000 (11:49 +0200)]
net: gianfar: Extend statistics counters to 64-bit

No reason to wrap counter values at 2^32.  Especially the bytes counters
can wrap pretty fast on Gbit networks.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gianfar: Convert to ndo_get_stats64 interface
Esben Haabendal [Thu, 17 Jun 2021 09:49:15 +0000 (11:49 +0200)]
net: gianfar: Convert to ndo_get_stats64 interface

No reason to produce the legacy net_device_stats struct, only to have it
converted to rtnl_link_stats64.  And as a bonus, this allows for improving
counter size to 64 bit.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: fix error return code in tcf_del_walker()
Yang Yingliang [Thu, 17 Jun 2021 08:02:07 +0000 (16:02 +0800)]
net: sched: fix error return code in tcf_del_walker()

When nla_put_u32() fails, 'ret' could be 0, it should
return error code in tcf_del_walker().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipa: Add missing of_node_put() in ipa_firmware_load()
Yang Yingliang [Thu, 17 Jun 2021 05:11:19 +0000 (13:11 +0800)]
net: ipa: Add missing of_node_put() in ipa_firmware_load()

This node pointer is returned by of_parse_phandle() with refcount
incremented in this function. of_node_put() on it before exiting
this function.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: fix mistake path for netdev_features_strings
Jian Shen [Thu, 17 Jun 2021 03:37:11 +0000 (11:37 +0800)]
net: fix mistake path for netdev_features_strings

Th_strings arrays netdev_features_strings, tunable_strings, and
phy_tunable_strings has been moved to file net/ethtool/common.c.
So fixes the comment.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodocumentation: networking: devlink: fix prestera.rst formatting that causes build...
Oleksandr Mazur [Wed, 16 Jun 2021 17:46:07 +0000 (20:46 +0300)]
documentation: networking: devlink: fix prestera.rst formatting that causes build warnings

Fixes: 66826c43e63d ("documentation: networking: devlink: add prestera switched driver Documentation")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: pcs: xpcs: Fix a less than zero u16 comparison error
Colin Ian King [Tue, 15 Jun 2021 13:52:53 +0000 (14:52 +0100)]
net: pcs: xpcs: Fix a less than zero u16 comparison error

Currently the check for the u16 variable val being less than zero is
always false because val is unsigned. Fix this by using the int
variable for the assignment and less than zero check.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: f7380bba42fd ("net: pcs: xpcs: add support for NXP SJA1110")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoice: remove redundant continue statement in a for-loop
Colin Ian King [Tue, 15 Jun 2021 14:28:47 +0000 (15:28 +0100)]
ice: remove redundant continue statement in a for-loop

The continue statement in the for-loop is redundant. Re-work the hw_lock
check to remove it.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agonet: ice: ptp: fix compilation warning if PTP_1588_CLOCK is disabled
Lorenzo Bianconi [Tue, 15 Jun 2021 14:14:12 +0000 (16:14 +0200)]
net: ice: ptp: fix compilation warning if PTP_1588_CLOCK is disabled

Fix the following compilation warning if PTP_1588_CLOCK is not enabled

drivers/net/ethernet/intel/ice/ice_ptp.h:149:1:
   error: return type defaults to ‘int’ [-Werror=return-type]
   ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb)

Fixes: ea9b847cda647 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: remove unnecessary NULL checks before ptp_read_system_*
Jacob Keller [Mon, 14 Jun 2021 16:59:16 +0000 (09:59 -0700)]
ice: remove unnecessary NULL checks before ptp_read_system_*

The ptp_read_system_prets and ptp_read_system_postts functions already
check for the NULL value of the ptp_system_timestamp structure pointer.
There is no need to check this manually in the ice driver code. Remove
the checks.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: Remove the repeated declaration
Shaokun Zhang [Mon, 24 May 2021 08:39:01 +0000 (16:39 +0800)]
ice: Remove the repeated declaration

Function 'ice_is_vsi_valid' is declared twice, remove the
repeated declaration.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: remove local variable
Paul M Stillwell Jr [Thu, 6 May 2021 15:40:08 +0000 (08:40 -0700)]
ice: remove local variable

Remove the local variable since it's only used once. Instead, use it
directly.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: reduce scope of variables
Paul M Stillwell Jr [Thu, 6 May 2021 15:40:07 +0000 (08:40 -0700)]
ice: reduce scope of variables

There are some places where the scope of a variable can
be reduced so do that.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: mark PTYPE 2 as reserved
Jacob Keller [Thu, 6 May 2021 15:40:05 +0000 (08:40 -0700)]
ice: mark PTYPE 2 as reserved

The entry for PTYPE 2 in the ice_ptype_lkup table incorrectly states
that this is an L2 packet with no payload. According to the datasheet,
this PTYPE is actually unused and reserved.

Fix the lookup entry to indicate this is an unused entry that is
reserved.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: fix incorrect payload indicator on PTYPE
Jacob Keller [Thu, 6 May 2021 15:40:04 +0000 (08:40 -0700)]
ice: fix incorrect payload indicator on PTYPE

The entry for PTYPE 90 indicates that the payload is layer 3. This does
not match the specification in the datasheet which indicates the packet
is a MAC, IPv6, UDP packet, with a payload in layer 4.

Fix the lookup table to match the data sheet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoselftests/bpf: Fix selftests build with old system-wide headers
Andrii Nakryiko [Thu, 17 Jun 2021 04:14:46 +0000 (21:14 -0700)]
selftests/bpf: Fix selftests build with old system-wide headers

migrate_reuseport.c selftest relies on having TCP_FASTOPEN_CONNECT defined in
system-wide netinet/tcp.h. Selftests can use up-to-date uapi/linux/tcp.h, but
that one doesn't have SOL_TCP. So instead of switching everything to uapi
header, add #define for TCP_FASTOPEN_CONNECT to fix the build.

Fixes: c9d0bdef89a6 ("bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20210617041446.425283-1-andrii@kernel.org
4 years agobpf: Fix up register-based shifts in interpreter to silence KUBSAN
Daniel Borkmann [Wed, 16 Jun 2021 09:25:11 +0000 (11:25 +0200)]
bpf: Fix up register-based shifts in interpreter to silence KUBSAN

syzbot reported a shift-out-of-bounds that KUBSAN observed in the
interpreter:

  [...]
  UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2
  shift exponent 255 is too large for 64-bit type 'long long unsigned int'
  CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x141/0x1d7 lib/dump_stack.c:120
   ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
   __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
   ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420
   __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
   bpf_prog_run_clear_cb include/linux/filter.h:755 [inline]
   run_filter+0x1a1/0x470 net/packet/af_packet.c:2031
   packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104
   dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387
   xmit_one net/core/dev.c:3588 [inline]
   dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609
   __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182
   __bpf_tx_skb net/core/filter.c:2116 [inline]
   __bpf_redirect_no_mac net/core/filter.c:2141 [inline]
   __bpf_redirect+0x548/0xc80 net/core/filter.c:2164
   ____bpf_clone_redirect net/core/filter.c:2448 [inline]
   bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420
   ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523
   __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50
   bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582
   bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline]
   __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Generally speaking, KUBSAN reports from the kernel should be fixed.
However, in case of BPF, this particular report caused concerns since
the large shift is not wrong from BPF point of view, just undefined.
In the verifier, K-based shifts that are >= {64,32} (depending on the
bitwidth of the instruction) are already rejected. The register-based
cases were not given their content might not be known at verification
time. Ideas such as verifier instruction rewrite with an additional
AND instruction for the source register were brought up, but regularly
rejected due to the additional runtime overhead they incur.

As Edward Cree rightly put it:

  Shifts by more than insn bitness are legal in the BPF ISA; they are
  implementation-defined behaviour [of the underlying architecture],
  rather than UB, and have been made legal for performance reasons.
  Each of the JIT backends compiles the BPF shift operations to machine
  instructions which produce implementation-defined results in such a
  case; the resulting contents of the register may be arbitrary but
  program behaviour as a whole remains defined.

  Guard checks in the fast path (i.e. affecting JITted code) will thus
  not be accepted.

  The case of division by zero is not truly analogous here, as division
  instructions on many of the JIT-targeted architectures will raise a
  machine exception / fault on division by zero, whereas (to the best
  of my knowledge) none will do so on an out-of-bounds shift.

Given the KUBSAN report only affects the BPF interpreter, but not JITs,
one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run().
That would make the shifts defined, and thus shuts up KUBSAN, and the
compiler would optimize out the AND on any CPU that interprets the shift
amounts modulo the width anyway (e.g., confirmed from disassembly that
on x86-64 and arm64 the generated interpreter code is the same before
and after this fix).

The BPF interpreter is slow path, and most likely compiled out anyway
as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of
BPF instructions by the interpreter. Given the main argument was to
avoid sacrificing performance, the fact that the AND is optimized away
from compiler for mainstream archs helps as well as a solution moving
forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors
to provide guidance when they see the ___bpf_prog_run() interpreter
code and use it as a model for a new JIT backend.

Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Cc: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com
Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
4 years agolibbpf: Fail compilation if target arch is missing
Lorenz Bauer [Wed, 16 Jun 2021 08:36:35 +0000 (09:36 +0100)]
libbpf: Fail compilation if target arch is missing

bpf2go is the Go equivalent of libbpf skeleton. The convention is that
the compiled BPF is checked into the repository to facilitate distributing
BPF as part of Go packages. To make this portable, bpf2go by default
generates both bpfel and bpfeb variants of the C.

Using bpf_tracing.h is inherently non-portable since the fields of
struct pt_regs differ between platforms, so CO-RE can't help us here.
The only way of working around this is to compile for each target
platform independently. bpf2go can't do this by default since there
are too many platforms.

Define the various PT_... macros when no target can be determined and
turn them into compilation failures. This works because bpf2go always
compiles for bpf targets, so the compiler fallback doesn't kick in.
Conditionally define __BPF_MISSING_TARGET so that we can inject a
more appropriate error message at build time. The user can then
choose which platform to target explicitly.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616083635.11434-1-lmb@cloudflare.com