]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
5 years agoMerge branch 'hinic-mailbox-channel-enhancement'
David S. Miller [Tue, 4 Aug 2020 19:17:06 +0000 (12:17 -0700)]
Merge branch 'hinic-mailbox-channel-enhancement'

Luo bin says:

====================
hinic: mailbox channel enhancement

add support to generate mailbox random id for VF to ensure that
the mailbox message from VF is valid and PF should check whether
the cmd from VF is supported before passing it to hw.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: add check for mailbox msg from VF
Luo bin [Tue, 4 Aug 2020 02:19:12 +0000 (10:19 +0800)]
hinic: add check for mailbox msg from VF

PF should check whether the cmd from VF is supported and its content
is right before passing it to hw.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: add generating mailbox random index support
Luo bin [Tue, 4 Aug 2020 02:19:11 +0000 (10:19 +0800)]
hinic: add generating mailbox random index support

add support to generate mailbox random id of VF to ensure that
mailbox messages PF received are from the correct VF.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc: Fix build with CONFIG_RFS_ACCEL disabled.
David S. Miller [Tue, 4 Aug 2020 01:29:39 +0000 (18:29 -0700)]
sfc: Fix build with CONFIG_RFS_ACCEL disabled.

   drivers/net/ethernet/sfc/ef100_nic.c:835:3: error: 'const struct efx_nic_type' has no member named 'filter_rfs_expire_one'
     835 |  .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
         |   ^~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/sfc/ef100_nic.c:835:27: error: initialization of 'void (*)(struct efx_nic *, u32)' {aka 'void (*)(struct efx_nic *, unsigned int)'} from incompatible pointer type 'bool (*)(struct efx_nic *, u32,  unsigned int)' {aka '_Bool (*)(struct efx_nic *, unsigned int,  unsigned int)'} [-Werror=incompatible-pointer-types]
     835 |  .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Tue, 4 Aug 2020 01:27:40 +0000 (18:27 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-08-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

The main changes are:

1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
   syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
   in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
   unwinder errors, from Song Liu.

4) Allow cgroup local storage map to be shared between programs on the same
   cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
   load instructions, from Jean-Philippe Brucker.

6) Follow-up fixes on BPF socket lookup in combination with reuseport group
   handling. Also add related BPF selftests, from Jakub Sitnicki.

7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
   socket create/release as well as bind functions, from Stanislav Fomichev.

8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
   xdp_statistics, from Peilin Ye.

9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
    fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.

11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 4 Aug 2020 01:24:30 +0000 (18:24 -0700)]
Merge tag 'mlx5-updates-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-08-03

This patchset introduces some updates to mlx5 driver.

1) Jakub converts mlx5 to use the new udp tunnel infrastructure.
   Starting with a hack to allow drivers to request a static configuration
   of the default vxlan port, and then a patch that converts mlx5.

2) Parav implements change_carrier ndo for VF eswitch representors,
   to speedup link state control of representors netdevices.

3) Alex Vesker, makes a simple update to software steering to fix an issue
   with push vlan action sequence

4) Leon removes a redundant dump stack on error flow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'sfc-driver-for-EF100-family-NICs-part-2'
David S. Miller [Tue, 4 Aug 2020 01:22:55 +0000 (18:22 -0700)]
Merge branch 'sfc-driver-for-EF100-family-NICs-part-2'

Edward Cree says:

====================
sfc: driver for EF100 family NICs, part 2

This series implements the data path and various other functionality
 for Xilinx/Solarflare EF100 NICs.

Changed from v2:
 * Improved error handling of design params (patch #3)
 * Removed 'inline' from .c file in patch #4
 * Don't report common stats to ethtool -S (patch #8)

Changed from v1:
 * Fixed build errors on CONFIG_RFS_ACCEL=n (patch #5) and 32-bit
   (patch #8)
 * Dropped patch #10 (ethtool ops) as it's buggy and will need a
   bigger rework to fix.
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: add nic-type for VFs, and bind to them
Edward Cree [Mon, 3 Aug 2020 20:40:01 +0000 (21:40 +0100)]
sfc_ef100: add nic-type for VFs, and bind to them

We don't yet have a .sriov_configure() to create them, though.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: read pf_index at probe time
Edward Cree [Mon, 3 Aug 2020 20:38:49 +0000 (21:38 +0100)]
sfc_ef100: read pf_index at probe time

We'll need it later, for VF representors.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: functions for selftests
Edward Cree [Mon, 3 Aug 2020 20:37:50 +0000 (21:37 +0100)]
sfc_ef100: functions for selftests

Self-tests for event and interrupt reception and NVRAM.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: statistics gathering
Edward Cree [Mon, 3 Aug 2020 20:37:20 +0000 (21:37 +0100)]
sfc_ef100: statistics gathering

MAC stats work much the same as on EF10, with a periodic DMA to a region
 specified via an MCDI.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: plumb in fini_dmaq
Edward Cree [Mon, 3 Aug 2020 20:36:44 +0000 (21:36 +0100)]
sfc_ef100: plumb in fini_dmaq

Bring down the TX and RX queues at ifdown, so that we can then fini the
 EVQs (otherwise the MC would return EBUSY because they're still in use).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: RX path for EF100
Edward Cree [Mon, 3 Aug 2020 20:36:28 +0000 (21:36 +0100)]
sfc_ef100: RX path for EF100

Includes RSS spreading.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: RX filter table management and related gubbins
Edward Cree [Mon, 3 Aug 2020 20:34:47 +0000 (21:34 +0100)]
sfc_ef100: RX filter table management and related gubbins

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: TX path for EF100 NICs
Edward Cree [Mon, 3 Aug 2020 20:34:00 +0000 (21:34 +0100)]
sfc_ef100: TX path for EF100 NICs

Includes checksum offload and TSO, so declare those in our netdev features.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: read Design Parameters at probe time
Edward Cree [Mon, 3 Aug 2020 20:33:20 +0000 (21:33 +0100)]
sfc_ef100: read Design Parameters at probe time

Several parts of the EF100 architecture are parameterised (to allow
 varying capabilities on FPGAs according to resource constraints), and
 these parameters are exposed to the driver through a TLV-encoded
 region of the BAR.
For the most part we either don't care about these values at all or
 just need to sanity-check them against the driver's assumptions, but
 there are a number of TSO limits which we record so that we will be
 able to check against them in the TX path when handling GSO skbs.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: fail the probe if NIC uses unsol_ev credits
Edward Cree [Mon, 3 Aug 2020 20:32:16 +0000 (21:32 +0100)]
sfc_ef100: fail the probe if NIC uses unsol_ev credits

In the future, EF100 is planned to have a credit-based scheme for
 handling unsolicited events, which drivers will need to use in order
 to function correctly.  However, current EF100 hardware does not yet
 generate unsolicited events and the credit scheme has not yet been
 implemented in firmware.  To prevent compatibility problems later if
 the current driver is used with future firmware which does implement
 it, we check for the corresponding capability flag (which that
 future firmware will set), and if found, we refuse to probe.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc_ef100: check firmware version at start-of-day
Edward Cree [Mon, 3 Aug 2020 20:32:05 +0000 (21:32 +0100)]
sfc_ef100: check firmware version at start-of-day

Early in EF100 development there was a different format of event
 descriptor; if the NIC is somehow running the very old firmware
 which will use that format, fail the probe.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: use napi_schedule to be compatible with PREEMPT_RT
Jiafei Pan [Mon, 3 Aug 2020 20:10:09 +0000 (23:10 +0300)]
enetc: use napi_schedule to be compatible with PREEMPT_RT

The driver calls napi_schedule_irqoff() from a context where, in RT,
hardirqs are not disabled, since the IRQ handler is force-threaded.

In the call path of this function, __raise_softirq_irqoff() is modifying
its per-CPU mask of pending softirqs that must be processed, using
or_softirq_pending(). The or_softirq_pending() function is not atomic,
but since interrupts are supposed to be disabled, nobody should be
preempting it, and the operation should be safe.

Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case,
it isn't safe, and the pending softirqs mask can get corrupted,
resulting in softirqs being lost and never processed.

To have common code that works with PREEMPT_RT and with mainline Linux,
we can use plain napi_schedule() instead. The difference is that
napi_schedule() (via __napi_schedule) also calls local_irq_save, which
disables hardirqs if they aren't already. But, since they already are
disabled in non-RT, this means that in practice we don't see any
measurable difference in throughput or latency with this patch.

Signed-off-by: Jiafei Pan <Jiafei.Pan@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-eth: use napi_schedule to be compatible with PREEMPT_RT
Jiafei Pan [Mon, 3 Aug 2020 20:10:08 +0000 (23:10 +0300)]
dpaa2-eth: use napi_schedule to be compatible with PREEMPT_RT

The driver calls napi_schedule_irqoff() from a context where, in RT,
hardirqs are not disabled, since the IRQ handler is force-threaded.

In the call path of this function, __raise_softirq_irqoff() is modifying
its per-CPU mask of pending softirqs that must be processed, using
or_softirq_pending(). The or_softirq_pending() function is not atomic,
but since interrupts are supposed to be disabled, nobody should be
preempting it, and the operation should be safe.

Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case,
it isn't safe, and the pending softirqs mask can get corrupted,
resulting in softirqs being lost and never processed.

To have common code that works with PREEMPT_RT and with mainline Linux,
we can use plain napi_schedule() instead. The difference is that
napi_schedule() (via __napi_schedule) also calls local_irq_save, which
disables hardirqs if they aren't already. But, since they already are
disabled in non-RT, this means that in practice we don't see any
measurable difference in throughput or latency with this patch.

Signed-off-by: Jiafei Pan <Jiafei.Pan@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-dsa-loop-Preparatory-changes-for-802-1Q-data-path'
David S. Miller [Tue, 4 Aug 2020 01:19:23 +0000 (18:19 -0700)]
Merge branch 'net-dsa-loop-Preparatory-changes-for-802-1Q-data-path'

net: dsa: loop: Preparatory changes for 802.1Q data path
Florian Fainelli says:

====================
These patches are all meant to help pave the way for a 802.1Q data path
added to the mockup driver, making it more useful than just testing for
configuration. Sending those out now since there is no real need to
wait.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: loop: Set correct number of ports
Florian Fainelli [Mon, 3 Aug 2020 20:03:54 +0000 (13:03 -0700)]
net: dsa: loop: Set correct number of ports

We only support DSA_LOOP_NUM_PORTS in the switch, do not tell the DSA
core to allocate up to DSA_MAX_PORTS which is nearly the double (6 vs.
11).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: loop: Wire-up MTU callbacks
Florian Fainelli [Mon, 3 Aug 2020 20:03:53 +0000 (13:03 -0700)]
net: dsa: loop: Wire-up MTU callbacks

For now we simply store the port MTU into a per-port member.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: loop: Move data structures to header
Florian Fainelli [Mon, 3 Aug 2020 20:03:52 +0000 (13:03 -0700)]
net: dsa: loop: Move data structures to header

In preparation for adding support for a mockup data path, move the
driver data structures to include/linux/dsa/loop.h such that we can
share them between net/dsa/ and drivers/net/dsa/ later on.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: loop: Support 4K VLANs
Florian Fainelli [Mon, 3 Aug 2020 20:03:51 +0000 (13:03 -0700)]
net: dsa: loop: Support 4K VLANs

Allocate a 4K array of VLANs instead of limiting ourselves to just 5
which is arbitrary.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: loop: PVID should be per-port
Florian Fainelli [Mon, 3 Aug 2020 20:03:50 +0000 (13:03 -0700)]
net: dsa: loop: PVID should be per-port

The PVID should be per-port, this is a preliminary change to support a
802.1Q data path in the driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: add TC-MATCHALL IPv6 support
Rahul Lakkireddy [Mon, 3 Aug 2020 18:30:08 +0000 (00:00 +0530)]
cxgb4: add TC-MATCHALL IPv6 support

Matching IPv6 traffic require allocating their own individual slots
in TCAM. So, fetch additional slots to insert IPv6 rules. Also, fetch
the cumulative stats of all the slots occupied by the Matchall rule.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: poll for extts events from a timer
Vladimir Oltean [Mon, 3 Aug 2020 17:51:58 +0000 (20:51 +0300)]
net: dsa: sja1105: poll for extts events from a timer

The current poll interval is enough to ensure that rising and falling
edge events are not lost for a 1 PPS signal with 50% duty cycle.

But when we deliver the events to user space, it will try to infer if
they were corresponding to a rising or to a falling edge (the kernel
driver doesn't know that either). User space will try to make that
inference based on the time at which the PPS master had emitted the
pulse (i.e. if it's a .0 time, it's rising edge, if it's .5 time, it's
falling edge).

But there is no in-kernel API for retrieving the precise timestamp
corresponding to a PPS master (aka perout) pulse. So user space has to
guess even that. It will read the PTP time on the PPS master right after
we've delivered the extts event, and declare that the PPS master time
was just the closest integer second, based on 2 thresholds (lower than
.25, or higher than .75, and ignore anything else).

Except that, if we poll for extts events (and our hardware doesn't
really help us, by not providing an interrupt), then there is a risk
that the poll period (and therefore the time at which the event is
delivered) might confuse user space.

Because we are always scheduling the next extts poll at
SJA1105_EXTTS_INTERVAL "from now" (that's the only thing that the
schedule_delayed_work() API gives us), it means that the start time of
the next delayed workqueue will always be shifted to the right a little
bit (shifted with the SPI access duration of this workqueue run).
In turn, because user space sees extts events that are non-periodic
compared to the PPS master's time, this means that it might start making
wrong guesses about rising/falling edge.

To understand the effect, here is the output of ts2phc currently. Notice
the 'src' timestamps of the 'SKIP extts' events, and how they have a
large wander. They keep increasing until the upper limit for the ignore
threshold (.75 seconds), after which the application starts ignoring the
_other_ edge.

ts2phc[26.624]: /dev/ptp3 SKIP extts index 0 at 21.449898912 src 21.657784518
ts2phc[27.133]: adding tstamp 21.949894240 to clock /dev/ptp3
ts2phc[27.133]: adding tstamp 22.000000000 to clock /dev/ptp1
ts2phc[27.133]: /dev/ptp3 offset        640 s2 freq   +5112
ts2phc[27.636]: /dev/ptp3 SKIP extts index 0 at 22.449889360 src 22.669398022
ts2phc[28.140]: adding tstamp 22.949884376 to clock /dev/ptp3
ts2phc[28.140]: adding tstamp 23.000000000 to clock /dev/ptp1
ts2phc[28.140]: /dev/ptp3 offset         96 s2 freq   +4760
ts2phc[28.644]: /dev/ptp3 SKIP extts index 0 at 23.449879504 src 23.677420422
ts2phc[29.153]: adding tstamp 23.949874704 to clock /dev/ptp3
ts2phc[29.153]: adding tstamp 24.000000000 to clock /dev/ptp1
ts2phc[29.153]: /dev/ptp3 offset       -264 s2 freq   +4429
ts2phc[29.656]: /dev/ptp3 SKIP extts index 0 at 24.449870008 src 24.689407238
ts2phc[30.160]: adding tstamp 24.949865376 to clock /dev/ptp3
ts2phc[30.160]: adding tstamp 25.000000000 to clock /dev/ptp1
ts2phc[30.160]: /dev/ptp3 offset       -280 s2 freq   +4334
ts2phc[30.664]: /dev/ptp3 SKIP extts index 0 at 25.449860760 src 25.697449926
ts2phc[31.168]: adding tstamp 25.949856176 to clock /dev/ptp3
ts2phc[31.168]: adding tstamp 26.000000000 to clock /dev/ptp1
ts2phc[31.168]: /dev/ptp3 offset       -176 s2 freq   +4354
ts2phc[31.672]: /dev/ptp3 SKIP extts index 0 at 26.449851584 src 26.705433606
ts2phc[32.180]: adding tstamp 26.949846992 to clock /dev/ptp3
ts2phc[32.180]: adding tstamp 27.000000000 to clock /dev/ptp1
ts2phc[32.180]: /dev/ptp3 offset        -80 s2 freq   +4397
ts2phc[32.684]: /dev/ptp3 SKIP extts index 0 at 27.449842384 src 27.717415110
ts2phc[33.192]: adding tstamp 27.949837768 to clock /dev/ptp3
ts2phc[33.192]: adding tstamp 28.000000000 to clock /dev/ptp1
ts2phc[33.192]: /dev/ptp3 offset          0 s2 freq   +4453
ts2phc[33.696]: /dev/ptp3 SKIP extts index 0 at 28.449833128 src 28.729412902
ts2phc[34.200]: adding tstamp 28.949828472 to clock /dev/ptp3
ts2phc[34.200]: adding tstamp 29.000000000 to clock /dev/ptp1
ts2phc[34.200]: /dev/ptp3 offset          8 s2 freq   +4461
ts2phc[34.704]: /dev/ptp3 SKIP extts index 0 at 29.449823816 src 29.737416038
ts2phc[35.208]: adding tstamp 29.949819152 to clock /dev/ptp3
ts2phc[35.208]: adding tstamp 30.000000000 to clock /dev/ptp1
ts2phc[35.208]: /dev/ptp3 offset         -8 s2 freq   +4447
ts2phc[35.712]: /dev/ptp3 SKIP extts index 0 at 30.449814496 src 30.745554982
ts2phc[36.216]: adding tstamp 30.949809840 to clock /dev/ptp3
ts2phc[36.216]: adding tstamp 31.000000000 to clock /dev/ptp1
ts2phc[36.216]: /dev/ptp3 offset         -8 s2 freq   +4445
ts2phc[36.468]: /dev/ptp3 SKIP extts index 0 at 31.449805184 src 31.501109446
ts2phc[36.972]: adding tstamp 31.949800536 to clock /dev/ptp3
ts2phc[36.972]: adding tstamp 32.000000000 to clock /dev/ptp1
ts2phc[36.972]: /dev/ptp3 offset         -8 s2 freq   +4442
ts2phc[37.480]: /dev/ptp3 SKIP extts index 0 at 32.449795896 src 32.513320070
ts2phc[37.984]: adding tstamp 32.949791248 to clock /dev/ptp3
ts2phc[37.984]: adding tstamp 33.000000000 to clock /dev/ptp1
ts2phc[37.984]: /dev/ptp3 offset          0 s2 freq   +4448

Fix that by taking the following measures:
- Schedule the poll from a timer. Because we are really scheduling the
  timer periodically, the extts events delivered to user space are
  periodic too, and don't suffer from the "shift-to-the-right" effect.
- Increase the poll period to 6 times a second. This imposes a smaller
  upper bound to the shift that can occur to the delivery time of extts
  events, and makes user space (ts2phc) to always interpret correctly
  which events should be skipped and which shouldn't.
- Move the SPI readout itself to the main PTP kernel thread, instead of
  the generic workqueue. This is because the timer runs in atomic
  context, but is also better than before, because if needed, we can
  chrt & taskset this kernel thread, to ensure it gets enough priority
  under load.

After this patch, one can notice that the wander is greatly reduced, and
that the latencies of one extts poll are not propagated to the next. The
'src' timestamp that is skipped is never larger than .65 seconds (which
means .15 seconds larger than the time at which the real event occurred
at, and .10 seconds smaller than the .75 upper threshold for ignoring
the falling edge):

ts2phc[40.076]: adding tstamp 34.949261296 to clock /dev/ptp3
ts2phc[40.076]: adding tstamp 35.000000000 to clock /dev/ptp1
ts2phc[40.076]: /dev/ptp3 offset         48 s2 freq   +4631
ts2phc[40.568]: /dev/ptp3 SKIP extts index 0 at 35.449256496 src 35.595791078
ts2phc[41.064]: adding tstamp 35.949251744 to clock /dev/ptp3
ts2phc[41.064]: adding tstamp 36.000000000 to clock /dev/ptp1
ts2phc[41.064]: /dev/ptp3 offset       -224 s2 freq   +4374
ts2phc[41.552]: /dev/ptp3 SKIP extts index 0 at 36.449247088 src 36.579825574
ts2phc[42.044]: adding tstamp 36.949242456 to clock /dev/ptp3
ts2phc[42.044]: adding tstamp 37.000000000 to clock /dev/ptp1
ts2phc[42.044]: /dev/ptp3 offset       -240 s2 freq   +4290
ts2phc[42.536]: /dev/ptp3 SKIP extts index 0 at 37.449237848 src 37.563828774
ts2phc[43.028]: adding tstamp 37.949233264 to clock /dev/ptp3
ts2phc[43.028]: adding tstamp 38.000000000 to clock /dev/ptp1
ts2phc[43.028]: /dev/ptp3 offset       -144 s2 freq   +4314
ts2phc[43.520]: /dev/ptp3 SKIP extts index 0 at 38.449228656 src 38.547823238
ts2phc[44.012]: adding tstamp 38.949224048 to clock /dev/ptp3
ts2phc[44.012]: adding tstamp 39.000000000 to clock /dev/ptp1
ts2phc[44.012]: /dev/ptp3 offset        -80 s2 freq   +4335
ts2phc[44.508]: /dev/ptp3 SKIP extts index 0 at 39.449219432 src 39.535846118
ts2phc[44.996]: adding tstamp 39.949214816 to clock /dev/ptp3
ts2phc[44.996]: adding tstamp 40.000000000 to clock /dev/ptp1
ts2phc[44.996]: /dev/ptp3 offset        -32 s2 freq   +4359
ts2phc[45.488]: /dev/ptp3 SKIP extts index 0 at 40.449210192 src 40.515824678
ts2phc[45.980]: adding tstamp 40.949205568 to clock /dev/ptp3
ts2phc[45.980]: adding tstamp 41.000000000 to clock /dev/ptp1
ts2phc[45.980]: /dev/ptp3 offset          8 s2 freq   +4390
ts2phc[46.636]: /dev/ptp3 SKIP extts index 0 at 41.449200928 src 41.664176902
ts2phc[47.132]: adding tstamp 41.949196288 to clock /dev/ptp3
ts2phc[47.132]: adding tstamp 42.000000000 to clock /dev/ptp1
ts2phc[47.132]: /dev/ptp3 offset          0 s2 freq   +4384
ts2phc[47.620]: /dev/ptp3 SKIP extts index 0 at 42.449191656 src 42.648117190
ts2phc[48.112]: adding tstamp 42.949187016 to clock /dev/ptp3
ts2phc[48.112]: adding tstamp 43.000000000 to clock /dev/ptp1
ts2phc[48.112]: /dev/ptp3 offset          0 s2 freq   +4384
ts2phc[48.604]: /dev/ptp3 SKIP extts index 0 at 43.449182384 src 43.632112582
ts2phc[49.100]: adding tstamp 43.949177736 to clock /dev/ptp3
ts2phc[49.100]: adding tstamp 44.000000000 to clock /dev/ptp1
ts2phc[49.100]: /dev/ptp3 offset         -8 s2 freq   +4376
ts2phc[49.588]: /dev/ptp3 SKIP extts index 0 at 44.449173096 src 44.616136774
ts2phc[50.080]: adding tstamp 44.949168464 to clock /dev/ptp3
ts2phc[50.080]: adding tstamp 45.000000000 to clock /dev/ptp1
ts2phc[50.080]: /dev/ptp3 offset          8 s2 freq   +4390
ts2phc[50.572]: /dev/ptp3 SKIP extts index 0 at 45.449163816 src 45.600134662
ts2phc[51.064]: adding tstamp 45.949159160 to clock /dev/ptp3
ts2phc[51.064]: adding tstamp 46.000000000 to clock /dev/ptp1
ts2phc[51.064]: /dev/ptp3 offset         -8 s2 freq   +4376
ts2phc[51.556]: /dev/ptp3 SKIP extts index 0 at 46.449154528 src 46.584588550
ts2phc[52.048]: adding tstamp 46.949149896 to clock /dev/ptp3
ts2phc[52.048]: adding tstamp 47.000000000 to clock /dev/ptp1
ts2phc[52.048]: /dev/ptp3 offset          0 s2 freq   +4382
ts2phc[52.540]: /dev/ptp3 SKIP extts index 0 at 47.449145256 src 47.568132198
ts2phc[53.032]: adding tstamp 47.949140616 to clock /dev/ptp3
ts2phc[53.032]: adding tstamp 48.000000000 to clock /dev/ptp1
ts2phc[53.032]: /dev/ptp3 offset          0 s2 freq   +4382
ts2phc[53.524]: /dev/ptp3 SKIP extts index 0 at 48.449135968 src 48.552121446
ts2phc[54.016]: adding tstamp 48.949131320 to clock /dev/ptp3
ts2phc[54.016]: adding tstamp 49.000000000 to clock /dev/ptp1
ts2phc[54.016]: /dev/ptp3 offset          0 s2 freq   +4382
ts2phc[54.512]: /dev/ptp3 SKIP extts index 0 at 49.449126680 src 49.540147014
ts2phc[55.000]: adding tstamp 49.949122040 to clock /dev/ptp3
ts2phc[55.000]: adding tstamp 50.000000000 to clock /dev/ptp1
ts2phc[55.000]: /dev/ptp3 offset          0 s2 freq   +4382
ts2phc[55.492]: /dev/ptp3 SKIP extts index 0 at 50.449117400 src 50.520119078
ts2phc[55.988]: adding tstamp 50.949112768 to clock /dev/ptp3
ts2phc[55.988]: adding tstamp 51.000000000 to clock /dev/ptp1
ts2phc[55.988]: /dev/ptp3 offset          8 s2 freq   +4390
ts2phc[56.476]: /dev/ptp3 SKIP extts index 0 at 51.449108120 src 51.504175910
ts2phc[57.132]: adding tstamp 51.949103480 to clock /dev/ptp3
ts2phc[57.132]: adding tstamp 52.000000000 to clock /dev/ptp1
ts2phc[57.132]: /dev/ptp3 offset          0 s2 freq   +4384
ts2phc[57.624]: /dev/ptp3 SKIP extts index 0 at 52.449098840 src 52.651833574
ts2phc[58.116]: adding tstamp 52.949094200 to clock /dev/ptp3
ts2phc[58.116]: adding tstamp 53.000000000 to clock /dev/ptp1
ts2phc[58.116]: /dev/ptp3 offset          8 s2 freq   +4392
ts2phc[58.612]: /dev/ptp3 SKIP extts index 0 at 53.449089560 src 53.639826918
ts2phc[59.100]: adding tstamp 53.949084920 to clock /dev/ptp3
ts2phc[59.100]: adding tstamp 54.000000000 to clock /dev/ptp1
ts2phc[59.100]: /dev/ptp3 offset          8 s2 freq   +4394
ts2phc[59.592]: /dev/ptp3 SKIP extts index 0 at 54.449080272 src 54.619842278
ts2phc[60.084]: adding tstamp 54.949075624 to clock /dev/ptp3
ts2phc[60.084]: adding tstamp 55.000000000 to clock /dev/ptp1
ts2phc[60.084]: /dev/ptp3 offset          8 s2 freq   +4397
ts2phc[60.576]: /dev/ptp3 SKIP extts index 0 at 55.449070968 src 55.603885542
ts2phc[61.068]: adding tstamp 55.949066312 to clock /dev/ptp3
ts2phc[61.068]: adding tstamp 56.000000000 to clock /dev/ptp1
ts2phc[61.068]: /dev/ptp3 offset          0 s2 freq   +4391
ts2phc[61.560]: /dev/ptp3 SKIP extts index 0 at 56.449061680 src 56.587885798
ts2phc[62.052]: adding tstamp 56.949057032 to clock /dev/ptp3
ts2phc[62.052]: adding tstamp 57.000000000 to clock /dev/ptp1
ts2phc[62.052]: /dev/ptp3 offset         -8 s2 freq   +4383

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomptcp: fix bogus sendmsg() return code under pressure
Paolo Abeni [Mon, 3 Aug 2020 16:40:39 +0000 (18:40 +0200)]
mptcp: fix bogus sendmsg() return code under pressure

In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.

Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Add-support-for-buffer-drop-traps'
David S. Miller [Tue, 4 Aug 2020 01:06:47 +0000 (18:06 -0700)]
Merge branch 'mlxsw-Add-support-for-buffer-drop-traps'

Ido Schimmel says:

====================
mlxsw: Add support for buffer drop traps

Petr says:

A recent patch set added the ability to mirror buffer related drops
(e.g., early drops) through a netdev. This patch set adds the ability to
trap such packets to the local CPU for analysis.

The trapping towards the CPU is configured by using tc-trap action
instead of tc-mirred as was done when the packets were mirrored through
a netdev. A future patch set will also add the ability to sample the
dropped packets using tc-sample action.

The buffer related drop traps are added to devlink, which means that the
dropped packets can be reported to user space via the kernel's
drop_monitor module.

Patch set overview:

Patch #1 adds the early_drop trap to devlink

Patch #2 adds extack to a few devlink operations to facilitate better
error reporting to user space. This is necessary - among other things -
because the action of buffer drop traps cannot be changed in mlxsw

Patch #3 performs a small refactoring in mlxsw, patch #4 fixes a bug that
this patchset would trigger.

Patches #5-#6 add the infrastructure required to support different traps
/ trap groups in mlxsw per-ASIC. This is required because buffer drop
traps are not supported by Spectrum-1

Patch #7 extends mlxsw to register the early_drop trap

Patch #8 adds the offload logic for the "trap" action at a qevent block.

Patch #9 adds a mlxsw-specific selftest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: RED: Test offload of trapping on RED qevents
Petr Machata [Mon, 3 Aug 2020 16:11:41 +0000 (19:11 +0300)]
selftests: mlxsw: RED: Test offload of trapping on RED qevents

Add a selftest for RED early_drop and mark qevents when a trap action is
attached at the associated block.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_qdisc: Offload action trap for qevents
Petr Machata [Mon, 3 Aug 2020 16:11:40 +0000 (19:11 +0300)]
mlxsw: spectrum_qdisc: Offload action trap for qevents

When offloading action trap on a qevent, pass to_dev of NULL to the SPAN
module to trigger the mirror to the CPU port. Query the buffer drops
policer and use it for policing of the trapped traffic.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_trap: Add early_drop trap
Ido Schimmel [Mon, 3 Aug 2020 16:11:39 +0000 (19:11 +0300)]
mlxsw: spectrum_trap: Add early_drop trap

As previously explained, packets that are dropped due to buffer related
reasons (e.g., tail drop, early drop) can be mirrored to the CPU port.
These packets are then trapped with one of the "mirror session" traps
and their CQE includes the reason for which the packet was mirrored.

Register with devlink a new trap, early_drop, and initialize the
corresponding Rx listener with the appropriate mirror reason. Return an
error in case user tries to change the traps' action, as this is not
supported.

Since Spectrum-1 does not support these traps, the above is only done
for Spectrum-2 onwards.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_trap: Allow for per-ASIC traps initialization
Ido Schimmel [Mon, 3 Aug 2020 16:11:38 +0000 (19:11 +0300)]
mlxsw: spectrum_trap: Allow for per-ASIC traps initialization

Subsequent patches will need to register different traps for Spectrum-1
and Spectrum-2 onwards.

Enable that by invoking a per-ASIC operation during traps
initialization.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_trap: Allow for per-ASIC trap groups initialization
Ido Schimmel [Mon, 3 Aug 2020 16:11:37 +0000 (19:11 +0300)]
mlxsw: spectrum_trap: Allow for per-ASIC trap groups initialization

Subsequent patches will need to register different trap groups for
Spectrum-1 and Spectrum-2 onwards.

Enable that by invoking a per-ASIC operation during trap groups
initialization.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_span: On policer_id_base_ref_count, use dec_and_test
Petr Machata [Mon, 3 Aug 2020 16:11:36 +0000 (19:11 +0300)]
mlxsw: spectrum_span: On policer_id_base_ref_count, use dec_and_test

When unsetting policer base, the SPAN code currently uses refcount_dec().
However that function splats when the counter reaches zero, because
reaching zero without actually testing is in general indicative of a
missing cleanup. There is no cleanup to be done here, but nonetheless, use
refcount_dec_and_test() as required.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_trap: Use 'size_t' for array sizes
Ido Schimmel [Mon, 3 Aug 2020 16:11:35 +0000 (19:11 +0300)]
mlxsw: spectrum_trap: Use 'size_t' for array sizes

Use 'size_t' instead of 'u64' for array sizes, as this this is correct
type to use for expressions involving sizeof().

Suggested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Pass extack when setting trap's action and group's parameters
Ido Schimmel [Mon, 3 Aug 2020 16:11:34 +0000 (19:11 +0300)]
devlink: Pass extack when setting trap's action and group's parameters

A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add early_drop trap
Amit Cohen [Mon, 3 Aug 2020 16:11:33 +0000 (19:11 +0300)]
devlink: Add early_drop trap

Add the packet trap that can report packets that were ECN marked due to RED
AQM.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofib: Fix undef compile warning
YueHaibing [Mon, 3 Aug 2020 13:19:48 +0000 (21:19 +0800)]
fib: Fix undef compile warning

net/core/fib_rules.c:26:7: warning: "CONFIG_IP_MULTIPLE_TABLES" is not defined, evaluates to 0 [-Wundef]
 #elif CONFIG_IP_MULTIPLE_TABLES
       ^~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 8b66a6fd34f5 ("fib: fix another fib_rules_ops indirect call wrapper problem")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-By: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomptcp: use mptcp_for_each_subflow in mptcp_stream_accept
Geliang Tang [Mon, 3 Aug 2020 13:00:44 +0000 (21:00 +0800)]
mptcp: use mptcp_for_each_subflow in mptcp_stream_accept

Use mptcp_for_each_subflow in mptcp_stream_accept instead of
open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mac80211-next-for-davem-2020-08-03' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Tue, 4 Aug 2020 01:00:22 +0000 (18:00 -0700)]
Merge tag 'mac80211-next-for-davem-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A few more changes, notably:
 * handle new SAE (WPA3 authentication) status codes in the correct way
 * fix a while that should be an if instead, avoiding infinite loops
 * handle beacon filtering changing better
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: fix failed to suspend if phy based WOL is enabled
Jisheng Zhang [Mon, 3 Aug 2020 08:56:47 +0000 (16:56 +0800)]
net: stmmac: fix failed to suspend if phy based WOL is enabled

With the latest net-next tree, if test suspend/resume after enabling
WOL, we get error as below:

[  487.086365] dpm_run_callback(): mdio_bus_suspend+0x0/0x30 returns -16
[  487.086375] PM: Device stmmac-0:00 failed to suspend: error -16

-16 means -EBUSY, this is because I didn't enable wakeup of the correct
device when implementing phy based WOL feature. To be honest, I caught
the issue when implementing phy based WOL and then fix it locally, but
forgot to amend the phy based wol patch. Today, I found the issue by
testing net-next tree.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoseg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header
Ioana-Ruxandra Stăncioi [Mon, 3 Aug 2020 07:33:33 +0000 (07:33 +0000)]
seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header

Refactor the function seg6_lwt_headroom out of the seg6_iptunnel.h uapi
header, because it is only used in seg6_iptunnel.c. Moreover, it is only
used in the kernel code, as indicated by the "#ifdef __KERNEL__".

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Ioana-Ruxandra Stăncioi <stancioi@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: apply a floor of 1 for RTT samples from TCP timestamps
Jianfeng Wang [Thu, 30 Jul 2020 23:49:16 +0000 (23:49 +0000)]
tcp: apply a floor of 1 for RTT samples from TCP timestamps

For retransmitted packets, TCP needs to resort to using TCP timestamps
for computing RTT samples. In the common case where the data and ACK
fall in the same 1-millisecond interval, TCP senders with millisecond-
granularity TCP timestamps compute a ca_rtt_us of 0. This ca_rtt_us
of 0 propagates to rs->rtt_us.

This value of 0 can cause performance problems for congestion control
modules. For example, in BBR, the zero min_rtt sample can bring the
min_rtt and BDP estimate down to 0, reduce snd_cwnd and result in a
low throughput. It would be hard to mitigate this with filtering in
the congestion control module, because the proper floor to apply would
depend on the method of RTT sampling (using timestamp options or
internally-saved transmission timestamps).

This fix applies a floor of 1 for the RTT sample delta from TCP
timestamps, so that seq_rtt_us, ca_rtt_us, and rs->rtt_us will be at
least 1 * (USEC_PER_SEC / TCP_TS_HZ).

Note that the receiver RTT computation in tcp_rcv_rtt_measure() and
min_rtt computation in tcp_update_rtt_min() both already apply a floor
of 1 timestamp tick, so this commit makes the code more consistent in
avoiding this edge case of a value of 0.

Signed-off-by: Jianfeng Wang <jfwang@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: Use is_broadcast_ether_addr() instead of memcmp()
Huang Guobin [Mon, 3 Aug 2020 02:00:55 +0000 (22:00 -0400)]
tipc: Use is_broadcast_ether_addr() instead of memcmp()

Using is_broadcast_ether_addr() instead of directly use
memcmp() to determine if the ethernet address is broadcast
address.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Huang Guobin <huangguobin4@huawei.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'DPAA-FMan-driver-fixes'
David S. Miller [Mon, 3 Aug 2020 23:20:15 +0000 (16:20 -0700)]
Merge branch 'DPAA-FMan-driver-fixes'

Florinel Iordache says:

====================
DPAA FMan driver fixes

Here are several fixes for the DPAA FMan driver.

v2 changes:
* corrected patch 4 by removing the line added by mistake
* used longer fixes tags with the first 12 characters of the SHA-1 ID

v3 changes:
* remove the empty line inserted after fixes tag
====================

Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofsl/fman: fix eth hash table allocation
Florinel Iordache [Mon, 3 Aug 2020 07:07:34 +0000 (10:07 +0300)]
fsl/fman: fix eth hash table allocation

Fix memory allocation for ethernet address hash table.
The code was wrongly allocating an array for eth hash table which
is incorrect because this is the main structure for eth hash table
(struct eth_hash_t) that contains inside a number of elements.

Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofsl/fman: check dereferencing null pointer
Florinel Iordache [Mon, 3 Aug 2020 07:07:33 +0000 (10:07 +0300)]
fsl/fman: check dereferencing null pointer

Add a safe check to avoid dereferencing null pointer

Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofsl/fman: fix unreachable code
Florinel Iordache [Mon, 3 Aug 2020 07:07:32 +0000 (10:07 +0300)]
fsl/fman: fix unreachable code

The parameter 'priority' is incorrectly forced to zero which ultimately
induces logically dead code in the subsequent lines.

Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofsl/fman: fix dereference null return value
Florinel Iordache [Mon, 3 Aug 2020 07:07:31 +0000 (10:07 +0300)]
fsl/fman: fix dereference null return value

Check before using returned value to avoid dereferencing null pointer.

Fixes: 18a6c85fcc78 ("fsl/fman: Add FMan Port Support")
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofsl/fman: use 32-bit unsigned integer
Florinel Iordache [Mon, 3 Aug 2020 07:07:30 +0000 (10:07 +0300)]
fsl/fman: use 32-bit unsigned integer

Potentially overflowing expression (ts_freq << 16 and intgr << 16)
declared as type u32 (32-bit unsigned) is evaluated using 32-bit
arithmetic and then used in a context that expects an expression of
type u64 (64-bit unsigned) which ultimately is used as 16-bit
unsigned by typecasting to u16. Fixed by using an unsigned 32-bit
integer since the value is truncated anyway in the end.

Fixes: 414fd46e7762 ("fsl/fman: Add FMan support")
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Mon, 3 Aug 2020 23:03:18 +0000 (16:03 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) UAF in chain binding support from previous batch, from Dan Carpenter.

2) Queue up delayed work to expire connections with no destination,
   from Andrew Sy Kim.

3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

5) Remove superfluous null header checks in ip6tables, from
   Gaurav Singh.

6) Add extended netlink error reporting for expression.

7) Report EEXIST on overlapping chain, set elements and flowtable
   devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: spider_net: Remove a useless memset
Christophe JAILLET [Sun, 2 Aug 2020 13:53:48 +0000 (15:53 +0200)]
net: spider_net: Remove a useless memset

Avoid a memset after a call to 'dma_alloc_coherent()'.
This is useless since
commit 518a2f1925c3 ("dma-mapping: zero memory returned from dma_alloc_*")

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: spider_net: Fix the size used in a 'dma_free_coherent()' call
Christophe JAILLET [Sun, 2 Aug 2020 13:53:33 +0000 (15:53 +0200)]
net: spider_net: Fix the size used in a 'dma_free_coherent()' call

Update the size used in 'dma_free_coherent()' in order to match the one
used in the corresponding 'dma_alloc_coherent()', in
'spider_net_init_chain()'.

Fixes: d4ed8f8d1fb7 ("Spidernet DMA coalescing")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sgi: ioc3-eth: Fix the size used in some 'dma_free_coherent()' calls
Christophe JAILLET [Sun, 2 Aug 2020 13:52:04 +0000 (15:52 +0200)]
net: sgi: ioc3-eth: Fix the size used in some 'dma_free_coherent()' calls

Update the size used in 'dma_free_coherent()' in order to match the one
used in the corresponding 'dma_alloc_coherent()'.

Fixes: 369a782af0f1 ("net: sgi: ioc3-eth: ensure tx ring is 16k aligned.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: Fix wrong return value in cn23xx_get_pf_num()
Tianjia Zhang [Sun, 2 Aug 2020 11:15:44 +0000 (19:15 +0800)]
liquidio: Fix wrong return value in cn23xx_get_pf_num()

On an error exit path, a negative error code should be returned
instead of a positive return value.

Fixes: 0c45d7fe12c7e ("liquidio: fix use of pf in pass-through mode in a virtual machine")
Cc: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/enetc: Fix wrong return value in enetc_psfp_parse_clsflower()
Tianjia Zhang [Sun, 2 Aug 2020 11:15:38 +0000 (19:15 +0800)]
net/enetc: Fix wrong return value in enetc_psfp_parse_clsflower()

In the case of invalid rule, a positive value EINVAL is returned here.
I think this is a typo error. It is necessary to return an error value.

Cc: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: aquantia: Fix wrong return value
Tianjia Zhang [Sun, 2 Aug 2020 11:15:37 +0000 (19:15 +0800)]
net: ethernet: aquantia: Fix wrong return value

In function hw_atl_a0_hw_multicast_list_set(), when an invalid
request is encountered, a negative error code should be returned.

Fixes: bab6de8fd180b ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions")
Cc: David VomLehn <vomlehn@texas.net>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoatm: idt77252: avoid accessing the data mapped to streaming DMA
Jia-Ju Bai [Sun, 2 Aug 2020 09:33:40 +0000 (17:33 +0800)]
atm: idt77252: avoid accessing the data mapped to streaming DMA

In queue_skb(), skb->data is mapped to streaming DMA on line 850:
  dma_map_single(..., skb->data, ...);

Then skb->data is accessed on lines 862 and 863:
  tbd->word_4 = (skb->data[0] << 24) | (skb->data[1] << 16) |
           (skb->data[2] <<  8) | (skb->data[3] <<  0);
and on lines 893 and 894:
  tbd->word_4 = (skb->data[0] << 24) | (skb->data[1] << 16) |
           (skb->data[2] <<  8) | (skb->data[3] <<  0);

These accesses may cause data inconsistency between CPU cache and
hardware.

To fix this problem, the calculation result of skb->data is stored in a
local variable before DMA mapping, and then the driver accesses this
local variable instead of skb->data.

Signed-off-by: Jia-Ju Bai <baijiaju@tsinghua.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoatm: eni: avoid accessing the data mapped to streaming DMA
Jia-Ju Bai [Sun, 2 Aug 2020 09:16:11 +0000 (17:16 +0800)]
atm: eni: avoid accessing the data mapped to streaming DMA

In do_tx(), skb->data is mapped to streaming DMA on line 1111:
  paddr = dma_map_single(...,skb->data,DMA_TO_DEVICE);

Then skb->data is accessed on line 1153:
  (skb->data[3] & 0xf)

This access may cause data inconsistency between CPU cache and hardware.

To fix this problem, skb->data[3] is assigned to a local variable before
DMA mapping, and then the driver accesses this local variable instead of
skb->data[3].

Signed-off-by: Jia-Ju Bai <baijiaju@tsinghua.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: mdio-mvusb: select MDIO_DEVRES in Kconfig
Bartosz Golaszewski [Sun, 2 Aug 2020 07:49:53 +0000 (09:49 +0200)]
net: phy: mdio-mvusb: select MDIO_DEVRES in Kconfig

PHYLIB is not selected by the mvusb driver but it uses mdio devres
helpers. Explicitly select MDIO_DEVRES in this driver's Kconfig entry.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1814cff26739 ("net: phy: add a Kconfig option for mdio_devres")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoappletalk: Fix atalk_proc_init() return path
Vincent Duvert [Sun, 2 Aug 2020 05:06:51 +0000 (07:06 +0200)]
appletalk: Fix atalk_proc_init() return path

Add a missing return statement to atalk_proc_init so it doesn't return
-ENOMEM when successful.  This allows the appletalk module to load
properly.

Fixes: e2bcd8b0ce6e ("appletalk: use remove_proc_subtree to simplify procfs code")
Link: https://www.downtowndougbrown.com/2020/08/hacking-up-a-fix-for-the-broken-appletalk-kernel-module-in-linux-5-1-and-newer/
Reported-by: Christopher KOBAYASHI <chris@disavowed.jp>
Reported-by: Doug Brown <doug@downtowndougbrown.com>
Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net>
[lukas: add missing tags]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v5.1+
Cc: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: qca8k: Add 802.1q VLAN support
Jonathan McDowell [Sat, 1 Aug 2020 17:06:46 +0000 (18:06 +0100)]
net: dsa: qca8k: Add 802.1q VLAN support

This adds full 802.1q VLAN support to the qca8k, allowing the use of
vlan_filtering and more complicated bridging setups than allowed by
basic port VLAN support.

Tested with a number of untagged ports with separate VLANs and then a
trunk port with all the VLANs tagged on it.

v3:
- Pull QCA8K_PORT_VID_DEF changes into separate cleanup patch
- Reverse Christmas tree notation for variable definitions
- Use untagged instead of tagged for consistency
v2:
- Return sensible errnos on failure rather than -1 (rmk)
- Style cleanups based on Florian's feedback
- Silently allow VLAN 0 as device correctly treats this as no tag

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: qca8k: Add define for port VID
Jonathan McDowell [Sat, 1 Aug 2020 17:05:54 +0000 (18:05 +0100)]
net: dsa: qca8k: Add define for port VID

Rather than using a magic value of 1 when configuring the port VIDs add
a QCA8K_PORT_VID_DEF define and use that instead. Also fix up the
bitmask in the process; the top 4 bits are reserved so this wasn't a
problem, but only masking 12 bits is the correct approach.

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 3 Aug 2020 22:43:59 +0000 (15:43 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2020-08-01

This series contains updates to the ice driver only.

Wei Yongjun marks power management functions with __maybe_unused.

Nick disables VLAN pruning in promiscuous mode and renames grst_delay to
grst_timeout.

Kiran modifies the check for linearization and corrects the vsi_id mask
value.

Vignesh replaces the use of flow profile locks to RSS profile locks for RSS
rule removal. Destroys flow profile lock on clearing XLT table and
clears extraction sequence entries.

Jesse adds some statistics and removes an unreported one.

Brett allows for 2 queue configuration for VFs.

Surabhi adds a check for failed allocation of an extraction sequence
table.

Tony updates the PTYPE lookup table and makes other trivial fixes.

Victor extends profile ID locks to be held until all references are
completed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Pass NULL to skb_network_protocol() when we don't care about vlan depth
Miaohe Lin [Sat, 1 Aug 2020 09:36:05 +0000 (17:36 +0800)]
net: Pass NULL to skb_network_protocol() when we don't care about vlan depth

When we don't care about vlan depth, we could pass NULL instead of the
address of a unused local variable to skb_network_protocol() as a param.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Use __skb_pagelen() directly in skb_cow_data()
Miaohe Lin [Sat, 1 Aug 2020 09:30:23 +0000 (17:30 +0800)]
net: Use __skb_pagelen() directly in skb_cow_data()

In fact, skb_pagelen() - skb_headlen() is equal to __skb_pagelen(), use it
directly to avoid unnecessary skb_headlen() call.

Also fix the CHECK note of checkpatch.pl:
    Comparison to NULL could be written "!__pskb_pull_tail"

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qed: use eth_zero_addr() to clear mac address
Miaohe Lin [Sat, 1 Aug 2020 09:14:41 +0000 (17:14 +0800)]
net: qed: use eth_zero_addr() to clear mac address

Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qede: use eth_zero_addr() to clear mac address
Miaohe Lin [Sat, 1 Aug 2020 09:13:54 +0000 (17:13 +0800)]
net: qede: use eth_zero_addr() to clear mac address

Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: fix extracting IP addresses in TC-FLOWER rules
Rahul Lakkireddy [Fri, 31 Jul 2020 20:53:40 +0000 (02:23 +0530)]
cxgb4: fix extracting IP addresses in TC-FLOWER rules

commit c8729cac2a11 ("cxgb4: add ethtool n-tuple filter insertion")
has removed checking control key for determining IP address types
for TC-FLOWER rules, which causes all the rules being inserted to
hardware to become IPv6 rule type always. So, add back the check
to select the correct IP address type to extract and hence fix the
correct rule type being inserted to hardware.

Also, ethtool_rx_flow_key doesn't have any control key and instead
directly sets the IPv4/IPv6 address keys. So, explicitly set the
IP address type for ethtool n-tuple filters to reuse the same code.

Fixes: c8729cac2a11 ("cxgb4: add ethtool n-tuple filter insertion")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: fix check for running offline ethtool selftest
Rahul Lakkireddy [Fri, 31 Jul 2020 20:50:52 +0000 (02:20 +0530)]
cxgb4: fix check for running offline ethtool selftest

The flag indicating the selftest to run is a bitmask. So, fix the
check. Also, the selftests will fail if adapter initialization has
not been completed yet. So, add appropriate check and bail sooner.

Fixes: 7235ffae3d2c ("cxgb4: add loopback ethtool self-test")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ionic-txrx-updates'
David S. Miller [Mon, 3 Aug 2020 22:32:02 +0000 (15:32 -0700)]
Merge branch 'ionic-txrx-updates'

Shannon Nelson says:

====================
ionic txrx updates

These are a few patches to do some cleanup in the packet
handling and give us more flexibility in tuning performance
by allowing us to put Tx handling on separate interrupts
when it makes sense for particular traffic loads.

v3: simplified queue count change logging, removed unnecessary
    check for no count change
v2: dropped the original patch 2 for ringsize change
    changed the separated tx/rx interrupts to use ethtool -L
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: separate interrupt for Tx and Rx
Shannon Nelson [Fri, 31 Jul 2020 20:15:36 +0000 (13:15 -0700)]
ionic: separate interrupt for Tx and Rx

Add the capability to split the Tx queues onto their own
interrupts with their own napi contexts.  This gives the
opportunity for more direct control of Tx interrupt
handling, such as CPU affinity and interrupt coalescing,
useful for some traffic loads.

v2: use ethtool -L, not a vendor specific priv-flag
v3: simplify logging, drop unnecessary "no-change" tests

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: tx separate servicing
Shannon Nelson [Fri, 31 Jul 2020 20:15:35 +0000 (13:15 -0700)]
ionic: tx separate servicing

We give the tx clean path its own budget and service routine in
order to give a little more leeway to be more aggressive, and
in preparation for coming changes.  We've found this gives us
a little better performance in some packet processing scenarios
without hurting other scenarios.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: use fewer firmware doorbells on rx fill
Shannon Nelson [Fri, 31 Jul 2020 20:15:34 +0000 (13:15 -0700)]
ionic: use fewer firmware doorbells on rx fill

We really don't need to hit the Rx queue doorbell so many times,
we can wait to the end and cause a little less thrash.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: gre: recompute gre csum for sctp over gre tunnels
Lorenzo Bianconi [Fri, 31 Jul 2020 18:12:05 +0000 (20:12 +0200)]
net: gre: recompute gre csum for sctp over gre tunnels

The GRE tunnel can be used to transport traffic that does not rely on a
Internet checksum (e.g. SCTP). The issue can be triggered creating a GRE
or GRETAP tunnel and transmitting SCTP traffic ontop of it where CRC
offload has been disabled. In order to fix the issue we need to
recompute the GRE csum in gre_gso_segment() not relying on the inner
checksum.
The issue is still present when we have the CRC offload enabled.
In this case we need to disable the CRC offload if we require GRE
checksum since otherwise skb_checksum() will report a wrong value.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: clear bridge's private skb space on xmit
Nikolay Aleksandrov [Fri, 31 Jul 2020 16:26:16 +0000 (19:26 +0300)]
net: bridge: clear bridge's private skb space on xmit

We need to clear all of the bridge private skb variables as they can be
stale due to the packet being recirculated through the stack and then
transmitted through the bridge device. Similar memset is already done on
bridge's input. We've seen cases where proxyarp_replied was 1 on routed
multicast packets transmitted through the bridge to ports with neigh
suppress which were getting dropped. Same thing can in theory happen with
the port isolation bit as well.

Fixes: 821f1b21cabb ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6/addrconf: use a boolean to choose between UNREGISTER/DOWN
Florent Fourcot [Fri, 31 Jul 2020 13:32:07 +0000 (15:32 +0200)]
ipv6/addrconf: use a boolean to choose between UNREGISTER/DOWN

"how" was used as a boolean. Change the type to bool, and improve
variable name

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6/addrconf: call addrconf_ifdown with consistent values
Florent Fourcot [Fri, 31 Jul 2020 13:32:06 +0000 (15:32 +0200)]
ipv6/addrconf: call addrconf_ifdown with consistent values

Second parameter of addrconf_ifdown "how" is used as a boolean
internally. It does not make sense to call it with something different
of 0 or 1.

This value is set to 2 in all git history.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-openvswitch-masks-cache-enhancements'
David S. Miller [Mon, 3 Aug 2020 22:17:48 +0000 (15:17 -0700)]
Merge branch 'net-openvswitch-masks-cache-enhancements'

Eelco Chaudron says:

====================
net: openvswitch: masks cache enhancements

This patchset adds two enhancements to the Open vSwitch masks cache.

Changes in v4 [patch 2/2 only]:
 - Remove null check before calling free_percpu()
 - Make ovs_dp_change() return appropriate error codes

Changes in v3 [patch 2/2 only]:
 - Use is_power_of_2() function
 - Use array_size() function
 - Fix remaining sparse errors

Changes in v2 [patch 2/2 only]:
 - Fix sparse warnings
 - Fix netlink policy items reported by Florian Westphal
====================

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: openvswitch: make masks cache size configurable
Eelco Chaudron [Fri, 31 Jul 2020 12:21:34 +0000 (14:21 +0200)]
net: openvswitch: make masks cache size configurable

This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: openvswitch: add masks cache hit counter
Eelco Chaudron [Fri, 31 Jul 2020 12:20:56 +0000 (14:20 +0200)]
net: openvswitch: add masks cache hit counter

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: fix memory leak in mvpp2_rx
Lorenzo Bianconi [Fri, 31 Jul 2020 08:38:32 +0000 (10:38 +0200)]
net: mvpp2: fix memory leak in mvpp2_rx

Release skb memory in mvpp2_rx() if mvpp2_rx_refill routine fails

Fixes: b5015854674b ("net: mvpp2: fix refilling BM pools in RX path")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoethtool: ethnl_set_linkmodes: remove redundant null check
Gaurav Singh [Fri, 31 Jul 2020 04:58:44 +0000 (00:58 -0400)]
ethtool: ethnl_set_linkmodes: remove redundant null check

info cannot be NULL here since its being accessed earlier
in the function: nlmsg_parse(info->nlhdr...). Remove this
redundant NULL check.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoopenvswitch: Prevent kernel-infoleak in ovs_ct_put_key()
Peilin Ye [Fri, 31 Jul 2020 04:48:38 +0000 (00:48 -0400)]
openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()

ovs_ct_put_key() is potentially copying uninitialized kernel stack memory
into socket buffers, since the compiler may leave a 3-byte hole at the end
of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix
it by initializing `orig` with memset().

Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/sched: act_ct: fix miss set mru for ovs after defrag in act_ct
wenxu [Fri, 31 Jul 2020 02:45:01 +0000 (10:45 +0800)]
net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct

When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Improve-MDIO-Ethernet-PHY-reset'
David S. Miller [Mon, 3 Aug 2020 22:01:02 +0000 (15:01 -0700)]
Merge branch 'Improve-MDIO-Ethernet-PHY-reset'

Bruno Thomsen says:

====================
Improve MDIO Ethernet PHY reset

This patch series is a result of trying to upstream a new device
tree for a TQMa7D based board[1][2]. Initial this DTS used some
deprecated PHY reset properties on the FEC device; NXP Ethernet
MAC also known as Freescale Fast Ethernet Controller.

When switching from FEC properties[3]:
"phy-reset-gpios"
"phy-reset-duration"
"phy-reset-post-delay"

To MDIO PHY properties[4]:
"reset-gpios"
"reset-assert-us"
"reset-deassert-us"

The result was that no Ethernet PHY device was detected on boot.

This issue could be worked around by disabling PHY type ID auto-
detection by using "ethernet-phy-id0022.1560" as compatible
string and not "ethernet-phy-ieee802.3-c22".

Upstreaming a DTS with this workaround was not accepted, so I
digged into the MDIO reset flow and found that it had a few
missing parts compared to the deprecated FEC reset function.
After some more testing and logic analyzer traces it was
revealed that the failed PHY communication was due to missing
initial device reset.

I was suggested[5] in a earlier mail thread to use MDIO bus
reset as that was performed before auto-detection, but current
device tree binding was limited to reset assert in usec.
Microchip/Micrel Ethernet PHYs recommended reset circuit[8],
figure 7-12, is a little "slow" after reset deassert as that
is left to a RC circuit with a tau of ~100ms; using a 10k PU
resistor together with a 10uF decoupling capacitor. The diode
in serie of the reset signal converts the GPIO push-pull output
into a open-drain output. So a post reset delay in the range
of 500-1000ms is needed, depending on component tolerances
and general hardware design margins.

In the first version of this patch series[6] I reused the
"reset-delay-us" property for reset deassert in usec as that
would cause 50/50% duty-cycle, but that would always apply.
The solution in this patch series is to add a new MDIO bus
property, so post reset delay is optional and configured
separately.

MDIO bus properties[7]:
"reset-delay-us"
"reset-post-delay-us" (new)

I have not marked this with "Fixes:" as no single commit is the
cause and historically this code has only supported MDIO devices
that need reset after auto-detection. The patch series also uses
a new flexible sleep helper function that was introduced in
5.8-rc1, so the driver uses the optimal sleep function depending
on value loaded from device tree.

Future work in this area could add new properties on the MDIO
device, so reset points are configurable, e.g. no reset,
before/after auto-detection or both.

[1] https://lore.kernel.org/linux-devicetree/20200629114927.17379-2-bruno.thomsen@gmail.com/
[2] https://lore.kernel.org/linux-devicetree/20200716172611.5349-2-bruno.thomsen@gmail.com/
[3] https://elixir.bootlin.com/linux/v5.7.8/source/Documentation/devicetree/bindings/net/fsl-fec.txt#L44
[4] https://elixir.bootlin.com/linux/v5.8-rc4/source/Documentation/devicetree/bindings/net/mdio.yaml#L78
[5] https://lore.kernel.org/netdev/CAOMZO5DtYDomD8FDCZDwYCSr2AwNT81Ay4==aDxXyBxtyvPiJA@mail.gmail.com/
[6] https://lore.kernel.org/netdev/20200728090203.17313-1-bruno.thomsen@gmail.com/
[7] https://elixir.bootlin.com/linux/v5.8-rc4/source/Documentation/devicetree/bindings/net/mdio.yaml#L36
[8] http://ww1.microchip.com/downloads/en/DeviceDoc/00002202C.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mdio device: use flexible sleeping in reset function
Bruno Thomsen [Thu, 30 Jul 2020 19:57:49 +0000 (21:57 +0200)]
net: mdio device: use flexible sleeping in reset function

MDIO device reset assert and deassert length was created by
usleep_range() but that does not ensure optimal handling of
all the different values from device tree properties.
By switching to the new flexible sleeping helper function,
fsleep(), the correct delay function is called depending on
delay length, e.g. udelay(), usleep_range() or msleep().

Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mdiobus: add reset-post-delay-us handling
Bruno Thomsen [Thu, 30 Jul 2020 19:57:48 +0000 (21:57 +0200)]
net: mdiobus: add reset-post-delay-us handling

Load new "reset-post-delay-us" value from MDIO properties,
and if configured to a greater then zero delay do a
flexible sleeping delay after MDIO bus reset deassert.
This allows devices to exit reset state before start
bus communication.

Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mdiobus: use flexible sleeping for reset-delay-us
Bruno Thomsen [Thu, 30 Jul 2020 19:57:47 +0000 (21:57 +0200)]
net: mdiobus: use flexible sleeping for reset-delay-us

MDIO bus reset pulse width is created by using udelay()
and that function might not be optimal depending on
device tree value. By switching to the new fsleep() helper
the correct delay function is called depending on
delay length, e.g. udelay(), usleep_range() or msleep().

Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: mdio: add reset-post-delay-us property
Bruno Thomsen [Thu, 30 Jul 2020 19:57:46 +0000 (21:57 +0200)]
dt-bindings: net: mdio: add reset-post-delay-us property

Add "reset-post-delay-us" parameter to MDIO bus properties,
so it's possible to add a delay after reset deassert.
This is optional in case external hardware slows down
release of the reset signal.

Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb
Dmitry Yakunin [Mon, 3 Aug 2020 09:05:45 +0000 (12:05 +0300)]
bpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb

Now skb->dev is unconditionally set to the loopback device in current net
namespace. But if we want to test bpf program which contains code branch
based on ifindex condition (eg filters out localhost packets) it is useful
to allow specifying of ifindex from userspace. This patch adds such option
through ctx_in (__sk_buff) parameter.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru
5 years agobpf: Setup socket family and addresses in bpf_prog_test_run_skb
Dmitry Yakunin [Mon, 3 Aug 2020 09:05:44 +0000 (12:05 +0300)]
bpf: Setup socket family and addresses in bpf_prog_test_run_skb

Now it's impossible to test all branches of cgroup_skb bpf program which
accesses skb->family and skb->{local,remote}_ip{4,6} fields because they
are zeroed during socket allocation. This commit fills socket family and
addresses from related fields in constructed skb.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200803090545.82046-2-zeil@yandex-team.ru
5 years agonet/mlx5: Delete extra dump stack that gives nothing
Leon Romanovsky [Sun, 19 Jul 2020 08:04:30 +0000 (11:04 +0300)]
net/mlx5: Delete extra dump stack that gives nothing

The WARN_*() macros are intended to catch impossible situations
from the SW point of view. They gave a little in case HW<->SW interface
is out-of-sync.

Such out-of-sync scenario can be due to SW errors that are not part
of this flow or because some HW errors, where dump stack won't help
either.

This specific WARN_ON() is useless because mlx5_core code is prepared
to handle such situations and will unfold everything correctly while
providing enough information to the users to understand why FS is not
working.

WARNING: CPU: 0 PID: 3222 at drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:825 connect_fts_in_prio.isra.20+0x1dd/0x260 linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:825
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 3222 Comm: syz-executor861 Not tainted 5.5.0-rc6+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack linux/lib/dump_stack.c:77 [inline]
 dump_stack+0x94/0xce linux/lib/dump_stack.c:118
 panic+0x234/0x56f linux/kernel/panic.c:221
 __warn+0x1cc/0x1e1 linux/kernel/panic.c:582
 report_bug+0x200/0x310 linux/lib/bug.c:195
 fixup_bug.part.11+0x32/0x80 linux/arch/x86/kernel/traps.c:174
 fixup_bug linux/arch/x86/kernel/traps.c:273 [inline]
 do_error_trap+0xd3/0x100 linux/arch/x86/kernel/traps.c:267
 do_invalid_op+0x31/0x40 linux/arch/x86/kernel/traps.c:286
 invalid_op+0x1e/0x30 linux/arch/x86/entry/entry_64.S:1027
RIP: 0010:connect_fts_in_prio.isra.20+0x1dd/0x260
linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:825
Code: 00 00 48 c7 c2 60 8c 31 84 48 c7 c6 00 81 31 84 48 8b 38 e8 3c a8
cb ff 41 83 fd 01 8b 04 24 0f 8e 29 ff ff ff e8 83 7b bc fe <0f> 0b 8b
04 24 e9 1a ff ff ff 89 04 24 e8 c1 20 e0 fe 8b 04 24 eb
RSP: 0018:ffffc90004bb7858 EFLAGS: 00010293
RAX: ffff88805de98e80 RBX: 0000000000000c96 RCX: ffffffff827a853d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffff52000976efa
RBP: 0000000000000007 R08: ffffed100da060e3 R09: ffffed100da060e3
R10: 0000000000000001 R11: ffffed100da060e2 R12: dffffc0000000000
R13: 0000000000000002 R14: ffff8880683a1a10 R15: ffffed100d07bc1c
 connect_prev_fts linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:844 [inline]
 connect_flow_table linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:975 [inline]
 __mlx5_create_flow_table+0x8f8/0x1710 linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:1064
 mlx5_create_flow_table linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:1094 [inline]
 mlx5_create_auto_grouped_flow_table+0xe1/0x210 linux/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:1136
 _get_prio linux/drivers/infiniband/hw/mlx5/main.c:3286 [inline]
 get_flow_table+0x2ea/0x760 linux/drivers/infiniband/hw/mlx5/main.c:3376
 mlx5_ib_create_flow+0x331/0x11c0 linux/drivers/infiniband/hw/mlx5/main.c:3896
 ib_uverbs_ex_create_flow+0x13e8/0x1b40 linux/drivers/infiniband/core/uverbs_cmd.c:3311
 ib_uverbs_write+0xaa5/0xdf0 linux/drivers/infiniband/core/uverbs_main.c:769
 __vfs_write+0x7c/0x100 linux/fs/read_write.c:494
 vfs_write+0x168/0x4a0 linux/fs/read_write.c:558
 ksys_write+0xc8/0x200 linux/fs/read_write.c:611
 do_syscall_64+0x9c/0x390 linux/arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45a059
Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcc17564c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fcc17564ca0 RCX: 000000000045a059
RDX: 0000000000000030 RSI: 00000000200003c0 RDI: 0000000000000005
RBP: 0000000000000007 R08: 0000000000000002 R09: 0000000000003131
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e636c
R13: 0000000000000000 R14: 00000000006e6360 R15: 00007ffdcbdaf6a0
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

Fixes: f90edfd279f3 ("net/mlx5_core: Connect flow tables")
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: convert to new udp_tunnel infrastructure
Jakub Kicinski [Tue, 28 Jul 2020 21:47:59 +0000 (14:47 -0700)]
net/mlx5: convert to new udp_tunnel infrastructure

Allocate nic_info dynamically - n_entries is not constant.

Attach the tunnel offload info only to the uplink representor.
We expect the "main" netdev to be unregistered in switchdev
mode, and there to be only one uplink representor.

Drop the udp_tunnel_drop_rx_info() call, it was not there until
commit b3c2ed21c0bd ("net/mlx5e: Fix VXLAN configuration restore after function reload")
so the device doesn't need it, and core should handle reloads and
reset just fine.

v2:
 - don't drop the ndos on reprs, and register info on uplink repr.
v4:
 - Move netdev tunnel structure handling to en_main.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoudp_tunnel: add the ability to hard-code IANA VXLAN
Jakub Kicinski [Tue, 28 Jul 2020 21:47:58 +0000 (14:47 -0700)]
udp_tunnel: add the ability to hard-code IANA VXLAN

mlx5 has the IANA VXLAN port (4789) hard coded by the device,
instead of being added dynamically when tunnels are created.

To support this add a workaround flag to struct udp_tunnel_nic_info.
Skipping updates for the port is fairly trivial, dumping the hard
coded port via ethtool requires some code duplication. The port
is not a part of any real table, we dump it in a special table
which has no tunnel types supported and only one entry.

This is the last known workaround / hack needed to convert
all drivers to the new infra.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: DR, Change push vlan action sequence
Alex Vesker [Mon, 13 Jul 2020 11:09:04 +0000 (14:09 +0300)]
net/mlx5: DR, Change push vlan action sequence

The DR TX state machine supports the following order:
modify header, push vlan and encapsulation.
Instead fs_dr would pass:
push vlan, modify header and encapsulation.

The above caused the rule creation to fail on invalid action
sequence provided error.

Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Enable users to change VF/PF representors carrier state
Parav Pandit [Sun, 21 Jun 2020 03:57:03 +0000 (06:57 +0300)]
net/mlx5e: Enable users to change VF/PF representors carrier state

Currently PF and VF representor netdevice carrier is always controlled
by controlling the representor netdevice device state as up/down.

Representor netdevice state change undergoes one or more txq/rxq
destroy/create commands to firmware, skb and its rx buffer allocation,
health reporters creation and more.

Due to this limitation users do not have the ability to just change
the carrier of the non uplink representors without modifying the
device state.

In one use case when the eswitch physical port carrier is down/up,
user needs to update the VF link state to same as physical port
carrier.

Example of updating VF representor carrier state:
$ ip link set enp0s8f0npf0vf0 carrier off
$ ip link set enp0s8f0npf0vf0 carrier on

This enhancement results into VF link state change which is
represented by the VF representor netdevice carrier.

This enables users to modify the representor carrier without modifying
the representor netdevice state.

A simple test is run using [1] to calculate the time difference between
updating carrier vs updating device state (to update just the carrier)
with one VF to simulate 255 VFs.

Time taken to update the carrier using device up/down:
$ time ./calculate.sh dev enp0s8f0npf0vf0
real    0m30.913s
user    0m0.200s
sys     0m11.168s

Time taken to update just the carrier using carrier iproute2 command:
$ time ./calculate.sh carrier enp0s8f0npf0vf0
real    0m2.142s
user    0m0.160s
sys     0m2.021s

Test shows that its better to use carrier on/off user interface to notify
link up/down event to VF compare to device up/down interface, because
carrier user interface delivers the same event 15 times faster.

[1] https://github.com/paravmellanox/myscripts/blob/master/calculate_carrier_time.sh

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'bpf-libbpf-btf-parsing'
Daniel Borkmann [Mon, 3 Aug 2020 14:39:48 +0000 (16:39 +0200)]
Merge branch 'bpf-libbpf-btf-parsing'

Andrii Nakryiko says:

====================
It's pretty common for applications to want to parse raw (binary) BTF data
from file, as opposed to parsing it from ELF sections. It's also pretty common
for tools to not care whether given file is ELF or raw BTF format. This patch
series exposes internal raw BTF parsing API and adds generic variant of BTF
parsing, which will efficiently determine the format of a given fail and will
parse BTF appropriately.

Patches #2 and #3 removes re-implementations of such APIs from bpftool and
resolve_btfids tools.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>