]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
8 years agoMerge branch 'tcp-readd-hp'
David S. Miller [Wed, 30 Aug 2017 18:20:09 +0000 (11:20 -0700)]
Merge branch 'tcp-readd-hp'

Florian Westphal says:

====================
tcp: re-add header prediction

Eric reported a performance regression caused by header prediction
removal.

We now call tcp_ack() much more frequently, for some workloads
this brings in enough cache line misses to become noticeable.

We could possibly still kill HP provided we find a different
way to suppress unneeded tcp_ack, but given we're late in
the cycle it seems preferable to revert.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: Revert "tcp: remove header prediction"
Florian Westphal [Wed, 30 Aug 2017 17:24:58 +0000 (19:24 +0200)]
tcp: Revert "tcp: remove header prediction"

This reverts commit 45f119bf936b1f9f546a0b139c5b56f9bb2bdc78.

Eric Dumazet says:
  We found at Google a significant regression caused by
  45f119bf936b1f9f546a0b139c5b56f9bb2bdc78 tcp: remove header prediction

  In typical RPC  (TCP_RR), when a TCP socket receives data, we now call
  tcp_ack() while we used to not call it.

  This touches enough cache lines to cause a slowdown.

so problem does not seem to be HP removal itself but the tcp_ack()
call.  Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: Revert "tcp: remove CA_ACK_SLOWPATH"
Florian Westphal [Wed, 30 Aug 2017 17:24:57 +0000 (19:24 +0200)]
tcp: Revert "tcp: remove CA_ACK_SLOWPATH"

This change was a followup to the header prediction removal,
so first revert this as a prerequisite to back out hp removal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostaging: irda: fix init level for irda core
Greg KH [Wed, 30 Aug 2017 11:16:49 +0000 (13:16 +0200)]
staging: irda: fix init level for irda core

When moving the IRDA code out of net/ into drivers/staging/irda/net, the
link order changes when IRDA is built into the kernel.  That causes a
kernel crash at boot time as netfilter isn't initialized yet.

To fix this, move the init call level of the irda core to be
device_initcall() as the link order keeps this being initialized at the
correct time.

Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bcmgenet: Do not return from void function
Florian Fainelli [Wed, 30 Aug 2017 04:48:51 +0000 (21:48 -0700)]
net: bcmgenet: Do not return from void function

A stray return was added in the macro bcmgenet_##name##_writel where it
should not, drop it.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 69d2ea9c7989 ("net: bcmgenet: Use correct I/O accessors")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoneigh: increase queue_len_bytes to match wmem_default
Eric Dumazet [Tue, 29 Aug 2017 22:16:01 +0000 (15:16 -0700)]
neigh: increase queue_len_bytes to match wmem_default

Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.

Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).

Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.

Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: remove dmaengine.h inclusion from netdevice.h
Dave Jiang [Tue, 29 Aug 2017 20:17:51 +0000 (13:17 -0700)]
net: remove dmaengine.h inclusion from netdevice.h

Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed
by netdevice.h anymore.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bcmgenet: Use correct I/O accessors
Florian Fainelli [Tue, 29 Aug 2017 19:25:31 +0000 (12:25 -0700)]
net: bcmgenet: Use correct I/O accessors

The GENET driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio: show NIC's U-Boot version in a dev_info() message
Weilin Chang [Tue, 29 Aug 2017 19:19:57 +0000 (12:19 -0700)]
liquidio: show NIC's U-Boot version in a dev_info() message

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: make some structures const
Bhumika Goyal [Tue, 29 Aug 2017 16:47:52 +0000 (22:17 +0530)]
net: dsa: make some structures const

Make these const as they are not modified anywhere.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: Use rt6i_idev index for echo replies to a local address
David Ahern [Mon, 28 Aug 2017 20:53:34 +0000 (13:53 -0700)]
ipv6: Use rt6i_idev index for echo replies to a local address

Tariq repored local pings to linklocal address is failing:
$ ifconfig ens8
ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 11.141.16.6  netmask 255.255.0.0  broadcast 11.141.255.255
        inet6 fe80::7efe:90ff:fecb:7502  prefixlen 64  scopeid 0x20<link>
        ether 7c:fe:90:cb:75:02  txqueuelen 1000  (Ethernet)
        RX packets 12  bytes 1164 (1.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30  bytes 2484 (2.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoamd-xgbe: Interrupt summary bits are h/w version dependent
Tom Lendacky [Mon, 28 Aug 2017 20:29:34 +0000 (15:29 -0500)]
amd-xgbe: Interrupt summary bits are h/w version dependent

There is a difference in the bit position of the normal interrupt summary
enable (NIE) and abnormal interrupt summary enable (AIE) between revisions
of the hardware.  For older revisions the NIE and AIE bits are positions
16 and 15 respectively.  For newer revisions the NIE and AIE bits are
positions 15 and 14.  The effect in changing the bit position is that
newer hardware won't receive AIE interrupts in the current version of the
driver.  Specifically, the driver uses this interrupt to collect
statistics on when a receive buffer unavailable event occurs and to
restart the driver/device when a fatal bus error occurs.

Update the driver to set the interrupt enable bit based on the reported
version of the hardware.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'nsh-headers-GSO'
David S. Miller [Tue, 29 Aug 2017 22:16:53 +0000 (15:16 -0700)]
Merge branch 'nsh-headers-GSO'

Jiri Benc says:

====================
nsh: headers, GSO

This adds header structs and helpers for NSH together with GSO support.

Note there is no code in this patchset that actually manipulates the NSH
headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3]
openvswitch: add NSH support"). The aim of this series is to lay the
groundwork and ease the implementation for him.

In addition to openvswitch, the NSH support should be added to tc (flower to
match, act_nsh to push/pop NSH headers). That will come later. There's
currently no plan to support NSH by other means than those two.

The patch 3 in this patchset was written by Yi Yang, I took it from the
aforementioned series and slightly modified it - see the note in the patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonsh: add GSO support
Jiri Benc [Mon, 28 Aug 2017 19:43:24 +0000 (21:43 +0200)]
nsh: add GSO support

Add a new nsh/ directory. It currently holds only GSO functions but more
will come: in particular, code shared by openvswitch and tc to manipulate
NSH headers.

For now, assume there's no hardware support for NSH segmentation. We can
always introduce netdev->nsh_features later.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add NSH header structures and helpers
Yi Yang [Mon, 28 Aug 2017 19:43:23 +0000 (21:43 +0200)]
net: add NSH header structures and helpers

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

[Jiri: added nsh_hdr() helper and renamed the header struct to "struct
nshhdr" to match the usual pattern. Removed packet type defines, these are
now shared with VXLAN-GPE.]

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: factor out VXLAN-GPE next protocol
Jiri Benc [Mon, 28 Aug 2017 19:43:22 +0000 (21:43 +0200)]
vxlan: factor out VXLAN-GPE next protocol

The values are shared between VXLAN-GPE and NSH. Originally probably by
coincidence but I notified both working groups about this last year and they
seem to keep the values in sync since then.

Hopefully they'll get a single IANA registry for the values, too. (I asked
them for that.)

Factor out the code to be shared by the NSH implementation.

NSH and MPLS values are added in this patch, too. For MPLS, the drafts
incorrectly assign only a single value, while we have two MPLS ethertypes.
I raised the problem with both groups. For now, I assume the value is for
unicast.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoether: add NSH ethertype
Jiri Benc [Mon, 28 Aug 2017 19:43:21 +0000 (21:43 +0200)]
ether: add NSH ethertype

The NSH draft says:

   An IEEE EtherType, 0x894F, has been allocated for NSH.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'ife-ethertype'
David S. Miller [Tue, 29 Aug 2017 22:14:19 +0000 (15:14 -0700)]
Merge branch 'ife-ethertype'

Alexander Aring says:

====================
tc: act_ife: handle IEEE IFE ethertype as default

this patch series will introduce the IFE ethertype which is registered by
IEEE. If the netlink act_ife type netlink attribute is not given it will
use this value by default now.
At least it will introduce some UAPI testcases to check if the default type
is used if not specified and vice versa.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotc-testing: add test for testing ife type
Alexander Aring [Mon, 28 Aug 2017 19:03:15 +0000 (15:03 -0400)]
tc-testing: add test for testing ife type

This patch adds a new testcase for the IFE type setting in tc. In case
of user specified the type it will check if the ife is correctly
configured to react on it. If it's not specified the default IFE type
should be used.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoact_ife: use registered ife_type as fallback
Alexander Aring [Mon, 28 Aug 2017 19:03:14 +0000 (15:03 -0400)]
act_ife: use registered ife_type as fallback

This patch handles a default IFE type if it's not given by user space
netlink api. The default IFE type will be the registered ethertype by
IEEE for IFE ForCES.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoif_ether: add forces ife lfb type
Alexander Aring [Mon, 28 Aug 2017 19:03:13 +0000 (15:03 -0400)]
if_ether: add forces ife lfb type

This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.

This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: networking: Add blurb about patches in patchwork
Florian Fainelli [Tue, 29 Aug 2017 22:07:51 +0000 (15:07 -0700)]
Documentation: networking: Add blurb about patches in patchwork

Explain that the patch queue in patchwork should not be touched by patch
submitters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlx4-misc-patches'
David S. Miller [Tue, 29 Aug 2017 21:58:33 +0000 (14:58 -0700)]
Merge branch 'mlx4-misc-patches'

Tariq Toukan says:

====================
mlx4 misc patches

This patchset contains misc patches from the team
to the mlx4 Core and Eth drivers.

Patch 1 by Eran replaces large static allocations by dynamic ones.
Patch 2 by Leon makes an explicit conversion and solves a smatch warning.
In patch 3 I fix a misplaced brackets of the sizeof operation.
Patch 4 by Moshe adds the ability to inform the FW regarding user mac updates.

Series generated against net-next commit:
901c5d2fbfcd ARM: dts: rk3228-evb: Fix the compiling error
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4: Add user mac FW update support
Moshe Shemesh [Mon, 28 Aug 2017 13:38:23 +0000 (16:38 +0300)]
net/mlx4: Add user mac FW update support

Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Fix misplaced brackets of sizeof
Tariq Toukan [Mon, 28 Aug 2017 13:38:22 +0000 (16:38 +0300)]
net/mlx4_core: Fix misplaced brackets of sizeof

When changing the sizeof style usage in the patch cited below,
one brackets misplacement was introduced. Here we fix it.

Fixes: 31975e27a4b5 ("mlx4: sizeof style usage")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Make explicit conversion to 64bit value
Leon Romanovsky [Mon, 28 Aug 2017 13:38:21 +0000 (16:38 +0300)]
net/mlx4_core: Make explicit conversion to 64bit value

The "lg" variable is declared as int so in all places where this variable
is used as a shift operand, the output will be int too.

This produces the following smatch warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1532 mlx4_map_cmd() warn:
should '1 << lg' be a 64 bit type?

Simple declaration of "1" to be "1ULL" will fix the issue.

Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Dynamically allocate structs at mlx4_slave_cap
Eran Ben Elisha [Mon, 28 Aug 2017 13:38:20 +0000 (16:38 +0300)]
net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap

In order to avoid temporary large structs on the stack,
allocate them dynamically.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobridge: fdb add and delete tracepoints
Roopa Prabhu [Tue, 29 Aug 2017 20:16:57 +0000 (13:16 -0700)]
bridge: fdb add and delete tracepoints

A few useful tracepoints to trace bridge forwarding
database updates.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'systemport-sf2-mdio-endian'
David S. Miller [Tue, 29 Aug 2017 21:42:17 +0000 (14:42 -0700)]
Merge branch 'systemport-sf2-mdio-endian'

Florian Fainelli says:

====================
Endian fixes for SYSTEMPORT/SF2/MDIO

While trying an ARM BE kernel for kinks, the 3 drivers below started not
working and the reasons why became pretty obvious because the register space
remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's
native endian (let's call that a feature).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: mdio-bcm-unimac: Use correct I/O accessors
Florian Fainelli [Tue, 29 Aug 2017 20:35:18 +0000 (13:35 -0700)]
net: phy: mdio-bcm-unimac: Use correct I/O accessors

The driver currently uses __raw_{read,write}l which works for all
platforms supported: Broadcom MIPS LE/BE (native endian), ARM LE (native
endian) but not ARM BE (registers are still LE). Switch to using the
proper accessors for all platforms and explain why Broadcom MIPS BE is
special here, in doing so, we introduce a couple of helper functions to
abstract these differences.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: systemport: Set correct RSB endian bits based on host
Florian Fainelli [Tue, 29 Aug 2017 20:35:17 +0000 (13:35 -0700)]
net: systemport: Set correct RSB endian bits based on host

RSB_SWAP0 needs to match the host CPU endian, and it needs to be set
for LE and clear for BE. RSB_SWAP1 must always be cleared for SYSTEMPORT
Lite.

With these settings, we have the Receive Status Block always match the
host endian and we do not need to perform any conversion. Since there is
not necessarily a CONFIG_CPU_LITTLE_ENDIAN option defined, we test for
!CONFIG_CPU_BIG_ENDIAN which is guaranteed to be set.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: bcm_sf2: Use correct I/O accessors
Florian Fainelli [Tue, 29 Aug 2017 20:35:16 +0000 (13:35 -0700)]
net: dsa: bcm_sf2: Use correct I/O accessors

The Starfigther 2 driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: systemport: Use correct I/O accessors
Florian Fainelli [Tue, 29 Aug 2017 20:35:15 +0000 (13:35 -0700)]
net: systemport: Use correct I/O accessors

The SYSTEMPORT driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'wireless-drivers-next-for-davem-2017-08-28' of git://git.kernel.org/pub...
David S. Miller [Tue, 29 Aug 2017 18:04:43 +0000 (11:04 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2017-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.14

rsi driver is getting a lot of new features lately, but as usual
active development happening on iwlwifi as well as other drivers.

I pulled wireless-drivers to fix multiple conflicts in iwlwifi and to
make it easier further development.

Major changes:

ath10k

* initial UBS bus support (no full support yet)

* add tdls support for 10.4 firmware

ath9k

* add Dell Wireless 1802

wil6210

* support FW RSSI reporting

rsi

* support legacy power save, U-APSD, rf-kill and AP mode

* RTS threshold configuration

brcmfmac

* support CYW4373 SDIO/USB chipset

iwlwifi

* some more code moved to a new directory

* add new PCI ID for 7265D
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: stmmac: constify clk_div_table
Arvind Yadav [Mon, 28 Aug 2017 05:52:20 +0000 (11:22 +0530)]
net: stmmac: constify clk_div_table

clk_div_table are not supposed to change at runtime.
meson8b_dwmac structure is working with const clk_div_table.
So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'XDP-redirect-tracepoints'
David S. Miller [Tue, 29 Aug 2017 17:51:29 +0000 (10:51 -0700)]
Merge branch 'XDP-redirect-tracepoints'

Jesper Dangaard Brouer says:

====================
XDP redirect tracepoints

I feel this is as far as I can take the tracepoint infrastructure to
assist XDP monitoring.

Tracepoints comes with a base overhead of 25 nanosec for an attached
bpf_prog, and 48 nanosec for using a full perf record. This is
problematic for the XDP use-case, but it is very convenient to use the
existing perf infrastructure.

From a performance perspective, the real solution would be to attach
another bpf_prog (that understand xdp_buff), but I'm not sure we want
to introduce yet another bpf attach API for this.

One thing left is to standardize the possible err return codes, to a
limited set, to allow easier (and faster) mapping into a bpf map.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosamples/bpf: xdp_monitor tool based on tracepoints
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:11 +0000 (16:38 +0200)]
samples/bpf: xdp_monitor tool based on tracepoints

This tool xdp_monitor demonstrate how to use the different xdp_redirect
tracepoints xdp_redirect{,_map}{,_err} from a BPF program.

The default mode is to only monitor the error counters, to avoid
affecting the per packet performance. Tracepoints comes with a base
overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
using a full perf record (with non-matching filter).  Thus, default
loading the --stats mode could affect the maximum performance.

This version of the tool is very simple and count all types of errors
as one.  It will be natural to extend this later with the different
types of errors that can occur, which should help users quickly
identify common mistakes.

Because the TP_STRUCT was kept in sync all the tracepoints loads the
same BPF code.  It would also be natural to extend the map version to
demonstrate how the map information could be used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosamples/bpf: xdp_redirect load XDP dummy prog on TX device
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:06 +0000 (16:38 +0200)]
samples/bpf: xdp_redirect load XDP dummy prog on TX device

For supporting XDP_REDIRECT, a device driver must (obviously)
implement the "TX" function ndo_xdp_xmit().  An additional requirement
is you cannot TX out a device, unless it also have a xdp bpf program
attached. This dependency is caused by the driver code need to setup
XDP resources before it can ndo_xdp_xmit.

Update bpf samples xdp_redirect and xdp_redirect_map to automatically
attach a dummy XDP program to the configured ifindex_out device.  Use
the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
overriding an existing XDP prog on the device.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxdp: separate xdp_redirect tracepoint in map case
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:01 +0000 (16:38 +0200)]
xdp: separate xdp_redirect tracepoint in map case

Creating as specific xdp_redirect_map variant of the xdp tracepoints
allow users to write simpler/faster BPF progs that get attached to
these tracepoints.

Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map
similar enough, that a tool can read the top part of the TP_STRUCT and
produce similar monitor statistics.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxdp: separate xdp_redirect tracepoint in error case
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:56 +0000 (16:37 +0200)]
xdp: separate xdp_redirect tracepoint in error case

There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.

Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact.  Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)

The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).

Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))

Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxdp: make xdp tracepoints report bpf prog id instead of prog_tag
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:51 +0000 (16:37 +0200)]
xdp: make xdp tracepoints report bpf prog id instead of prog_tag

Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxdp: tracepoint xdp_redirect also need a map argument
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:45 +0000 (16:37 +0200)]
xdp: tracepoint xdp_redirect also need a map argument

To make sense of the map index, the tracepoint user also need to know
that map we are talking about.  Supply the map pointer but only expose
the map->id.

The 'to_index' is renamed 'to_ifindex'.  In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.

This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxdp: remove redundant argument to trace_xdp_redirect
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:40 +0000 (16:37 +0200)]
xdp: remove redundant argument to trace_xdp_redirect

Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.

Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT.  This makes it easier to write a common bpf_prog
processing events.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'rxrpc-next-20170829' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Tue, 29 Aug 2017 16:42:48 +0000 (09:42 -0700)]
Merge tag 'rxrpc-next-20170829' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Miscellany

Here are a number of patches that make some changes/fixes and add a couple
of extensions to AF_RXRPC for kernel services to use.  The changes and
fixes are:

 (1) Use time64_t rather than u32 outside of protocol or
     UAPI-representative structures.

 (2) Use the correct time stamp when loading a key from an XDR-encoded
     Kerberos 5 key.

 (3) Fix IPv6 support.

 (4) Fix some places where the error code is being incorrectly made
     positive before returning.

 (5) Remove some white space.

And the extensions:

 (6) Add an end-of-Tx phase notification, thereby allowing kAFS to
     transition the state on its own call record at the correct point,
     rather than having to do it in advance and risk non-completion of the
     call in the wrong state.

 (7) Allow a kernel client call to be retried if it fails on a network
     error, thereby making it possible for kAFS to iterate over a number of
     IP addresses without having to reload the Tx queue and re-encrypt data
     each time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'addrlabel-no-rtnl-locking'
David S. Miller [Tue, 29 Aug 2017 16:41:56 +0000 (09:41 -0700)]
Merge branch 'addrlabel-no-rtnl-locking'

Florian Westphal says:

====================
addrlabel: don't use rtnl locking

addrlabel doesn't appear to require rtnl lock as the addrlabel
table uses a spinlock to serialize add/delete operations.

Also, entries are reference counted so it should be safe
to call the rtnl ops without the rtnl mutex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoaddrlabel: add/delete/get can run without rtnl
Florian Westphal [Tue, 29 Aug 2017 11:29:42 +0000 (13:29 +0200)]
addrlabel: add/delete/get can run without rtnl

There appears to be no need to use rtnl, addrlabel entries are refcounted
and add/delete is serialized by the addrlabel table spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoselftests: add addrlabel add/delete to rtnetlink.sh
Florian Westphal [Tue, 29 Aug 2017 11:29:41 +0000 (13:29 +0200)]
selftests: add addrlabel add/delete to rtnetlink.sh

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostaging: irda: update MAINTAINERS
Greg Kroah-Hartman [Tue, 29 Aug 2017 07:09:29 +0000 (09:09 +0200)]
staging: irda: update MAINTAINERS

Now that the IRDA code has moved under drivers/staging/irda/, update the
MAINTAINERS file with the new location.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()
Sathya Perla [Tue, 29 Aug 2017 06:15:03 +0000 (11:45 +0530)]
bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()

When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off)
bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid().

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorxrpc: Allow failed client calls to be retried
David Howells [Tue, 29 Aug 2017 09:19:01 +0000 (10:19 +0100)]
rxrpc: Allow failed client calls to be retried

Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1.  This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.

Two new functions are provided:

 (1) rxrpc_kernel_check_call() - This is used to find out the completion
     state of a call to guess whether it can be retried and whether it
     should be retried.

 (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
     connection, reset the state and submit it as a new client call to a
     new address.  The new address need not match the previous address.

A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected.  msg_data_left() can be used to find out
how much data was packaged before the error occurred.

Signed-off-by: David Howells <dhowells@redhat.com>
8 years agorxrpc: Add notification of end-of-Tx phase
David Howells [Tue, 29 Aug 2017 09:18:56 +0000 (10:18 +0100)]
rxrpc: Add notification of end-of-Tx phase

Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.

This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.

Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.

Signed-off-by: David Howells <dhowells@redhat.com>
8 years agorxrpc: Remove some excess whitespace
David Howells [Tue, 29 Aug 2017 09:18:50 +0000 (10:18 +0100)]
rxrpc: Remove some excess whitespace

Remove indentation from some blank lines.

Signed-off-by: David Howells <dhowells@redhat.com>
8 years agorxrpc: Don't negate call->error before returning it
David Howells [Tue, 29 Aug 2017 09:18:43 +0000 (10:18 +0100)]
rxrpc: Don't negate call->error before returning it

call->error is stored as 0 or a negative error code.  Don't negate this
value (ie. make it positive) before returning it from a kernel function
(though it should still be negated before passing to userspace through a
control message).

Signed-off-by: David Howells <dhowells@redhat.com>
8 years agorxrpc: Fix IPv6 support
David Howells [Tue, 29 Aug 2017 09:18:37 +0000 (10:18 +0100)]
rxrpc: Fix IPv6 support

Fix IPv6 support in AF_RXRPC in the following ways:

 (1) When extracting the address from a received IPv4 packet, if the local
     transport socket is open for IPv6 then fill out the sockaddr_rxrpc
     struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
     of an AF_INET one.

 (2) When sending CHALLENGE or RESPONSE packets, the transport length needs
     to be set from the sockaddr_rxrpc::transport_len field rather than
     sizeof() on the IPv4 transport address.

 (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
     the address correctly before searching for the affected peer.

Signed-off-by: David Howells <dhowells@redhat.com>
8 years agorxrpc: Use correct timestamp from Kerberos 5 ticket
David Howells [Tue, 29 Aug 2017 09:15:40 +0000 (10:15 +0100)]
rxrpc: Use correct timestamp from Kerberos 5 ticket

When an XDR-encoded Kerberos 5 ticket is added as an rxrpc-type key, the
expiry time should be drawn from the k5 part of the token union (which was
what was filled in), rather than the kad part of the union.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
8 years agonet: rxrpc: Replace time_t type with time64_t type
Baolin Wang [Tue, 29 Aug 2017 09:15:40 +0000 (10:15 +0100)]
net: rxrpc: Replace time_t type with time64_t type

Since the 'expiry' variable of 'struct key_preparsed_payload' has been
changed to 'time64_t' type, which is year 2038 safe on 32bits system.

In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type
when copying ticket expires time to 'prep->expiry', then this patch
introduces two helper functions to help convert 'u32' to 'time64_t'
type.

This patch also uses ktime_get_real_seconds() to get current time instead
of get_seconds() which is not year 2038 safe on 32bits system.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: David Howells <dhowells@redhat.com>
8 years agohinic: don't build the module by default
Vitaly Kuznetsov [Mon, 28 Aug 2017 13:16:05 +0000 (15:16 +0200)]
hinic: don't build the module by default

We probably don't want to enable code supporting particular hardware by
default e.g. when someone does 'make defconfig'. Other ethernet modules
don't do it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bnxt_en-next'
David S. Miller [Mon, 28 Aug 2017 23:57:10 +0000 (16:57 -0700)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates.

Various changes including updated firmware interface, improved TX ring
allocation scheme, improved out-of-memory logic in NAPI loop, reduced
default rings on multi-port devices, new PCI IDs. Of particular note,

CPU affinity hints from Vasundhara Volam.

TC Flower eswitch support from Sathya Perla.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: add code to query TC flower offload stats
Sathya Perla [Mon, 28 Aug 2017 17:40:35 +0000 (13:40 -0400)]
bnxt_en: add code to query TC flower offload stats

This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the
required FW code to query the stats from the HW.

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: add TC flower offload flow_alloc/free FW cmds
Sathya Perla [Mon, 28 Aug 2017 17:40:34 +0000 (13:40 -0400)]
bnxt_en: add TC flower offload flow_alloc/free FW cmds

This patch adds the hwrm_cfa_flow_alloc/free() routines
that are needed to issue the FW cmds needed for TC flower offload.

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: bnxt: add TC flower filter offload support
Sathya Perla [Mon, 28 Aug 2017 17:40:33 +0000 (13:40 -0400)]
bnxt_en: bnxt: add TC flower filter offload support

This patch adds support for offloading TC based flow
rules and actions for the 'flower' classifier in the bnxt_en driver.
It includes logic to parse flow rules and actions received from the
TC subsystem, store them and issue the corresponding
hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop,
redir, vlan push/pop actions are supported in this patch.

In this patch the hwrm_cfa_flow_xxx routines are just stubs.
The code for these routines is introduced in the next patch for easier
review. Also, the code to query the TC/flower action stats will
be introduced in a subsequent patch.

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: fix clearing devlink ptr from bnxt struct
Sathya Perla [Mon, 28 Aug 2017 17:40:32 +0000 (13:40 -0400)]
bnxt_en: fix clearing devlink ptr from bnxt struct

The routine bnxt_link_bp_to_dl() is used to set the devlink ptr
in bnxt struct (bp) and also to set the bnxt back ptr in
the devlink struct.  If devlink_register() fails, bp->dl must
be cleared which is not happening currently. This patch fixes
bnxt_link_bp_to_dl() to clear bp->dl by passing  a NULL dl ptr.

Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Reduce default rings on multi-port cards.
Michael Chan [Mon, 28 Aug 2017 17:40:31 +0000 (13:40 -0400)]
bnxt_en: Reduce default rings on multi-port cards.

Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Improve -ENOMEM logic in NAPI poll loop.
Michael Chan [Mon, 28 Aug 2017 17:40:30 +0000 (13:40 -0400)]
bnxt_en: Improve -ENOMEM logic in NAPI poll loop.

If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget.  This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers.  Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt: initialize board_info values with proper enums
Scott Branden [Mon, 28 Aug 2017 17:40:29 +0000 (13:40 -0400)]
bnxt: initialize board_info values with proper enums

initialize board_info values with proper enums for defensive programming
purposes.  This will avoid any errors of the enums being declared not
lining up with the board_info array.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt: Add PCIe device IDs for bcm58802/bcm58808
Ray Jui [Mon, 28 Aug 2017 17:40:28 +0000 (13:40 -0400)]
bnxt: Add PCIe device IDs for bcm58802/bcm58808

Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: assign CPU affinity hints to bnxt_en IRQs
Vasundhara Volam [Mon, 28 Aug 2017 17:40:27 +0000 (13:40 -0400)]
bnxt_en: assign CPU affinity hints to bnxt_en IRQs

This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Improve tx ring reservation logic.
Michael Chan [Mon, 28 Aug 2017 17:40:26 +0000 (13:40 -0400)]
bnxt_en: Improve tx ring reservation logic.

When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC.  If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.

The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings.  We fix it as follows:

1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available.  There is a flag in
the firmware message to do that.  If not available, abort and the
current TX rings will not be disabled.

2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved.  If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Update firmware interface spec. to 1.8.1.4.
Michael Chan [Mon, 28 Aug 2017 17:40:25 +0000 (13:40 -0400)]
bnxt_en: Update firmware interface spec. to 1.8.1.4.

Flow APIs are added in this firmware interface.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'NCSI-vlan-filtering'
David S. Miller [Mon, 28 Aug 2017 23:49:49 +0000 (16:49 -0700)]
Merge branch 'NCSI-vlan-filtering'

Samuel Mendoza-Jonas says:

====================
NCSI VLAN Filtering Support

This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).

Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).

v3: - Add comment describing change to ncsi_find_filter()
- Catch NULL in clear_one_vid() from ncsi_get_filter()
- Simplify state changes when kicking updated channel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoftgmac100: Support NCSI VLAN filtering when available
Samuel Mendoza-Jonas [Mon, 28 Aug 2017 06:18:43 +0000 (16:18 +1000)]
ftgmac100: Support NCSI VLAN filtering when available

Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Configure VLAN tag filter
Samuel Mendoza-Jonas [Mon, 28 Aug 2017 06:18:42 +0000 (16:18 +1000)]
net/ncsi: Configure VLAN tag filter

Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Fix several packet definitions
Samuel Mendoza-Jonas [Mon, 28 Aug 2017 06:18:41 +0000 (16:18 +1000)]
net/ncsi: Fix several packet definitions

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 28 Aug 2017 23:46:25 +0000 (16:46 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-08-27

This series contains updates to i40e and i40evf only.

Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.

Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status.  Adds VF support for controlling VLAN tag
stripping via ethtool.

Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS.  Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.

Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.

Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function.  So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet-next/hinic: fix comparison of a uint16_t type with -1
Aviad Krawczyk [Sun, 27 Aug 2017 17:35:30 +0000 (01:35 +0800)]
net-next/hinic: fix comparison of a uint16_t type with -1

Remove the search for index of constant buffer size

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet-next/hinic: Fix MTU limitation
Aviad Krawczyk [Sun, 27 Aug 2017 17:20:26 +0000 (01:20 +0800)]
net-next/hinic: Fix MTU limitation

Fix the hw MTU limitation by setting max_mtu

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'irda-move-to-staging'
David S. Miller [Mon, 28 Aug 2017 23:42:57 +0000 (16:42 -0700)]
Merge branch 'irda-move-to-staging'

Greg Kroah-Hartman says:

====================
irda: move it to drivers/staging so we can delete it

The IRDA code has long been obsolete and broken.  So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostaging: irda: add a TODO file.
Greg Kroah-Hartman [Sun, 27 Aug 2017 15:03:34 +0000 (17:03 +0200)]
staging: irda: add a TODO file.

The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoirda: move include/net/irda into staging subdirectory
Greg Kroah-Hartman [Sun, 27 Aug 2017 15:03:33 +0000 (17:03 +0200)]
irda: move include/net/irda into staging subdirectory

And finally, move the irda include files into
drivers/staging/irda/include/net/irda.  Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoirda: move drivers/net/irda to drivers/staging/irda/drivers
Greg Kroah-Hartman [Sun, 27 Aug 2017 15:03:32 +0000 (17:03 +0200)]
irda: move drivers/net/irda to drivers/staging/irda/drivers

Move the irda drivers from drivers/net/irda/ to
drivers/staging/irda/drivers as they will be deleted in a future kernel
release.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoirda: move net/irda/ to drivers/staging/irda/net/
Greg Kroah-Hartman [Sun, 27 Aug 2017 15:03:31 +0000 (17:03 +0200)]
irda: move net/irda/ to drivers/staging/irda/net/

It's time to get rid of IRDA.  It's long been broken, and no one seems
to use it anymore.  So move it to staging and after a while, we can
delete it from there.

To start, move the network irda core from net/irda to
drivers/staging/irda/net/

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'dpaa_eth-rss'
David S. Miller [Mon, 28 Aug 2017 23:41:01 +0000 (16:41 -0700)]
Merge branch 'dpaa_eth-rss'

Madalin Bucur says:

====================
Add RSS to DPAA 1.x Ethernet driver

This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

v2: removed a C++ style comment
v3: move struct fman to header file to avoid exporting a function
v4: addressed compilation issues introduced in v3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodpaa_eth: check allocation result
Madalin Bucur [Sun, 27 Aug 2017 13:13:43 +0000 (16:13 +0300)]
dpaa_eth: check allocation result

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: networking: add RSS information
Madalin Bucur [Sun, 27 Aug 2017 13:13:42 +0000 (16:13 +0300)]
Documentation: networking: add RSS information

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodpaa_eth: add NETIF_F_RXHASH
Madalin Bucur [Sun, 27 Aug 2017 13:13:41 +0000 (16:13 +0300)]
dpaa_eth: add NETIF_F_RXHASH

Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodpaa_eth: enable Rx hashing control
Madalin Bucur [Sun, 27 Aug 2017 13:13:40 +0000 (16:13 +0300)]
dpaa_eth: enable Rx hashing control

Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodpaa_eth: use multiple Rx frame queues
Madalin Bucur [Sun, 27 Aug 2017 13:13:39 +0000 (16:13 +0300)]
dpaa_eth: use multiple Rx frame queues

Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: enable FMan Keygen
Iordache Florinel-R70177 [Sun, 27 Aug 2017 13:13:38 +0000 (16:13 +0300)]
fsl/fman: enable FMan Keygen

Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: move struct fman to header file
Madalin Bucur [Sun, 27 Aug 2017 13:13:37 +0000 (16:13 +0300)]
fsl/fman: move struct fman to header file

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: broadcom: Remove null check before kfree
Himanshu Jha [Sat, 26 Aug 2017 20:17:47 +0000 (01:47 +0530)]
net: ethernet: broadcom: Remove null check before kfree

Kfree on NULL pointer is a no-op and therefore checking is redundant.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosched: sfq: drop packets after root qdisc lock is released
Gao Feng [Sat, 26 Aug 2017 14:58:58 +0000 (22:58 +0800)]
sched: sfq: drop packets after root qdisc lock is released

The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. But there are
some points which are not changed in SFQ enqueue operation.
1. Fail to find the SFQ hash slot;
2. When the queue is full;

Now use qdisc_drop instead free skb directly.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlxsw-dpipe-fixes'
David S. Miller [Mon, 28 Aug 2017 22:41:15 +0000 (15:41 -0700)]
Merge branch 'mlxsw-dpipe-fixes'

Jiri Pirko says:

====================
mlxsw: spectrum: Fix couple of dpipe ipv4 host table bugs

Arkadi Sharshevsky (1):
  mlxsw: spectrum_dpipe: Fix host table dump

Jiri Pirko (1):
  mlxsw: spectrum: compile-in dpipe support only if devlink is enabled
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_dpipe: Fix host table dump
Arkadi Sharshevsky [Sat, 26 Aug 2017 06:35:39 +0000 (08:35 +0200)]
mlxsw: spectrum_dpipe: Fix host table dump

During the neighbor traversal the neighbors from different families
should be ignored.

Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: compile-in dpipe support only if devlink is enabled
Jiri Pirko [Sat, 26 Aug 2017 06:35:38 +0000 (08:35 +0200)]
mlxsw: spectrum: compile-in dpipe support only if devlink is enabled

Makes no sense to have dpipe compiled in when devlink is not enabled,
because the devlink dpipe registation is noop function. So don't compile
it in. This also fixes missing extern structs errors.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: a86f030915f2 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Dexuan Cui [Sat, 26 Aug 2017 04:52:43 +0000 (04:52 +0000)]
hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: Asias He <asias@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoselftests/bpf: check the instruction dumps are populated
Jakub Kicinski [Fri, 25 Aug 2017 21:39:57 +0000 (14:39 -0700)]
selftests/bpf: check the instruction dumps are populated

Add a basic test for checking whether kernel is populating
the jited and xlated BPF images.  It was used to confirm
the behaviour change from commit d777b2ddbecf ("bpf: don't
zero out the info struct in bpf_obj_get_info_by_fd()"),
which made bpf_obj_get_info_by_fd() usable for retrieving
the image dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: fix oops on allocation failure
Dan Carpenter [Fri, 25 Aug 2017 20:27:14 +0000 (23:27 +0300)]
bpf: fix oops on allocation failure

"err" is set to zero if bpf_map_area_alloc() fails so it means we return
ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is not
expecting NULL returns and will oops.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Add comment that early_demux can change via sysctl
David Ahern [Mon, 28 Aug 2017 22:14:20 +0000 (15:14 -0700)]
net: Add comment that early_demux can change via sysctl

Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2a8 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054137 and then 03157937fe0b5 ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea67.

Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoxen-netback: update ubuf_info initialization to anonymous union
Willem de Bruijn [Fri, 25 Aug 2017 17:10:43 +0000 (13:10 -0400)]
xen-netback: update ubuf_info initialization to anonymous union

The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.

This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:

   drivers/net//xen-netback/interface.c: In function
   'xenvif_init_queue':
   >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
   >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
      drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
   >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
   >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.

Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'gre-add-collect_md-mode-for-ERSPAN-tunnel'
David S. Miller [Mon, 28 Aug 2017 22:04:52 +0000 (15:04 -0700)]
Merge branch 'gre-add-collect_md-mode-for-ERSPAN-tunnel'

William Tu says:

====================
gre: add collect_md mode for ERSPAN tunnel

This patch series provide collect_md mode for ERSPAN tunnel.  The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit.  The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic.  The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>