]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
11 years agonet/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)
Arvid Brodin [Wed, 30 Oct 2013 20:10:47 +0000 (21:10 +0100)]
net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)

High-availability Seamless Redundancy ("HSR") provides instant failover
redundancy for Ethernet networks. It requires a special network topology where
all nodes are connected in a ring (each node having two physical network
interfaces). It is suited for applications that demand high availability and
very short reaction time.

HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
send special HSR frames in both directions over the ring. The driver creates
virtual network interfaces that can be used just like any ordinary Linux
network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
must be HSR capable.

This code is a "best effort" to comply with the HSR standard as described in
IEC 62439-3:2010 (HSRv0).

Signed-off-by: Arvid Brodin <arvid.brodin@xdin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: extend net_device allocation to vmalloc()
Eric Dumazet [Wed, 30 Oct 2013 20:10:44 +0000 (13:10 -0700)]
net: extend net_device allocation to vmalloc()

Joby Poriyath provided a xen-netback patch to reduce the size of
xenvif structure as some netdev allocation could fail under
memory pressure/fragmentation.

This patch is handling the problem at the core level, allowing
any netdev structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joby Poriyath <joby.poriyath@citrix.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'sctp_csum'
David S. Miller [Mon, 4 Nov 2013 04:05:16 +0000 (23:05 -0500)]
Merge branch 'sctp_csum'

Daniel Borkmann says:

====================
SCTP fix/updates

Please see patch 5 for the main description/motivation, the rest just
brings in the needed functionality for that. Although this is actually
a fix, I've based it against net-next as some additional work for
fixing it was needed.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: fix and consolidate SCTP checksumming code
Daniel Borkmann [Wed, 30 Oct 2013 10:50:52 +0000 (11:50 +0100)]
net: sctp: fix and consolidate SCTP checksumming code

This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: skb_checksum: allow custom update/combine for walking skb
Daniel Borkmann [Wed, 30 Oct 2013 10:50:51 +0000 (11:50 +0100)]
net: skb_checksum: allow custom update/combine for walking skb

Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agolib: crc32: add test cases for crc32{, c}_combine routines
Daniel Borkmann [Wed, 30 Oct 2013 10:50:50 +0000 (11:50 +0100)]
lib: crc32: add test cases for crc32{, c}_combine routines

We already have 100 test cases for crcs itself, so split the test
buffer with a-prio known checksums, and test crc of two blocks
against crc of the whole block for the same results.

Output/result with CONFIG_CRC32_SELFTEST=y:

  [    2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
  [    2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec
  [    2.687383] crc32c: CRC_LE_BITS = 64
  [    2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec
  [    7.336771] crc32_combine: 113072 self tests passed
  [   12.050479] crc32c_combine: 113072 self tests passed
  [   17.633089] alg: No test for crc32 (crc32-pclmul)

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agolib: crc32: add functionality to combine two crc32{, c}s in GF(2)
Daniel Borkmann [Wed, 30 Oct 2013 10:50:49 +0000 (11:50 +0100)]
lib: crc32: add functionality to combine two crc32{, c}s in GF(2)

This patch adds a combinator to merge two or more crc32{,c}s
into a new one. This is useful for checksum computations of
fragmented skbs that use crc32/crc32c as checksums.

The arithmetics for combining both in the GF(2) was taken and
slightly modified from zlib. Only passing two crcs is insufficient
as two crcs and the length of the second piece is needed for
merging. The code is made generic, so that only polynomials
need to be passed for crc32_le resp. crc32c_le.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agolib: crc32: clean up spacing in test cases
Daniel Borkmann [Wed, 30 Oct 2013 10:50:48 +0000 (11:50 +0100)]
lib: crc32: clean up spacing in test cases

This is nothing more but a whitepace cleanup, as 80 chars is not a
hard but soft limit, and otherwise makes the test cases array really
look ugly. So fix it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Sat, 2 Nov 2013 06:13:48 +0000 (02:13 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Conflicts:
net/xfrm/xfrm_policy.c

Minor merge conflict in xfrm_policy.c, consisting of overlapping
changes which were trivial to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: bond_get_size() returns wrong size
Dan Carpenter [Fri, 1 Nov 2013 10:18:44 +0000 (13:18 +0300)]
bonding: bond_get_size() returns wrong size

There is an extra semi-colon so bond_get_size() doesn't return the
correct value.

Fixes: ec76aa49855f ('bonding: add Netlink support active_slave option')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'cdc_ncm'
David S. Miller [Sat, 2 Nov 2013 06:02:19 +0000 (02:02 -0400)]
Merge branch 'cdc_ncm'

Bjørn Mork says:

====================
cdc_ncm: many small and mostly trivial fixes

This series ended up longer than expected, and it is still not
complete. There is more to come when time allows...

Most changes are trivial. Notable non-trivial changes are
 - removed filtering of identical speed notifications
 - tx_max calulation is changed to count the pad byte if
   necessary, and respect the device limit as an absolute
   upper limit even if it is too low according to the spec
 - remove the bug preventing SET_MAX_DATAGRAM_SIZE from having
   any effect
 - drop the pad-to-max if ZLPs are enabled
 - the driver specific VERSION is dropped
 - dev->hard_mtu is set to tx_max instead of max_datagram_size
   causing usbnet to calculate the qlen based on the real max
   size of tx skbs

This series has been tested, along with the previously posted
cdc_mbim series, on the NCM and MBIM devices I have:
 - Ericsson F5521gw (NCM)
 - Huawei E367 (MBIM)
 - D-Link DWM-156 A7 (MBIM w/ too low dwNtb{In,Out}MaxSize bug)
 - Sierra Wireless MC7710 (MBIM w/ ZLP and CDC Union bugs)

Apart from the D-Link modem dropping a lot less oversized
frames with the fix dedicated to it, there are no end user
noticable functional changes as a result of this series.  But
all the non-trivial changes I listed above are of course
detectable by users looking at that specific area (except maybe
the removed speed notification, which requires a device sending
duplicates to be noticable - I don't have any such device).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: no not set tx_max higher than the device supports
Bjørn Mork [Fri, 1 Nov 2013 10:17:01 +0000 (11:17 +0100)]
net: cdc_ncm: no not set tx_max higher than the device supports

There are MBIM devices out there reporting

  dwNtbInMaxSize=2048 dwNtbOutMaxSize=2048

and since the spec require a datagram max size of at least
2048, this means that a full sized datagram will never fit.

Still, sending larger NTBs than the device supports is not
going to help.  We do not have any other options than either
 a) refusing to bindi, or
 b) respect the insanely low value.

Alternative b will at least make these devices work, so go
for it.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: improve bind error debug messages
Bjørn Mork [Fri, 1 Nov 2013 10:17:00 +0000 (11:17 +0100)]
net: cdc_ncm: improve bind error debug messages

Make it a bit easier for users to figure out what goes
wrong when bind fails.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: return proper error if setup fails
Bjørn Mork [Fri, 1 Nov 2013 10:16:59 +0000 (11:16 +0100)]
net: cdc_ncm: return proper error if setup fails

Most setup errors are ignored to ensure maximum firmware
compatibilty.  But GET_NTB_PARAMETERS and the functional
descriptors are required.  Use proper error codes and
log level if these fail.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: refactoring cdc_ncm_setup
Bjørn Mork [Fri, 1 Nov 2013 10:16:58 +0000 (11:16 +0100)]
net: cdc_ncm: refactoring cdc_ncm_setup

Rewriting the "set max datagram" part of dc_ncm_setup to
separate the selection and validatation of the size from
the code which optionally informs the device of this
value.  This ensures that we use the correct value
regardless of device support for the get and set commands.

Removing some of the many indent levels while doing this
to make the code more readable.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: drop "extern" from header declarations
Bjørn Mork [Fri, 1 Nov 2013 10:16:57 +0000 (11:16 +0100)]
net: cdc_ncm: drop "extern" from header declarations

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: endian convert constants instead of variables
Bjørn Mork [Fri, 1 Nov 2013 10:16:56 +0000 (11:16 +0100)]
net: cdc_ncm: endian convert constants instead of variables

Converting the constants used in these comparisons at build
time instead of converting the variables for every received
frame at run time.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: log signatures in hex
Bjørn Mork [Fri, 1 Nov 2013 10:16:55 +0000 (11:16 +0100)]
net: cdc_ncm: log signatures in hex

These signatures are well known bit patterns, mostly made up
of ascii characters.  Mentally parsing works best if they
are printed in hex.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: use netif_* and dev_* instead of pr_*
Bjørn Mork [Fri, 1 Nov 2013 10:16:54 +0000 (11:16 +0100)]
net: cdc_ncm: use netif_* and dev_* instead of pr_*

Take advantage of standard device name prefixing and
netdevice msglvl control where possible.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: log the length we warn about
Bjørn Mork [Fri, 1 Nov 2013 10:16:53 +0000 (11:16 +0100)]
net: cdc_ncm: log the length we warn about

Fix cut'n'paste typo.  Log the bogus length and not the
irrelevant signature.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: set correct dev->hard_mtu
Bjørn Mork [Fri, 1 Nov 2013 10:16:52 +0000 (11:16 +0100)]
net: cdc_ncm: set correct dev->hard_mtu

usbnet use the hard_mtu value for sizing the tx queue and nothing
else.  We will be transmitting buffers of up to tx_max size, so
that's the proper value to give usbnet.

The individual datagram size is completely irrelevant here.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove ethtool ops
Bjørn Mork [Fri, 1 Nov 2013 10:16:51 +0000 (11:16 +0100)]
net: cdc_ncm: remove ethtool ops

No need to keep this code duplicated from usbnet.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove probe and disconnect wrappers
Bjørn Mork [Fri, 1 Nov 2013 10:16:50 +0000 (11:16 +0100)]
net: cdc_ncm: remove probe and disconnect wrappers

These functions were merely wrappers around the usbnet
variants.  Remove them.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: no point in filling up the NTBs if we send ZLPs
Bjørn Mork [Fri, 1 Nov 2013 10:16:49 +0000 (11:16 +0100)]
net: cdc_ncm: no point in filling up the NTBs if we send ZLPs

Padding NTBs to max size is part of the support for devices
optimizing their DMA transfers. This optimization depends on
max sized NTBs not being ZLP terminated. So we are much better
off dropping the padding if we are going to send a ZLP anyway.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: only the control intf can be probed
Bjørn Mork [Fri, 1 Nov 2013 10:16:48 +0000 (11:16 +0100)]
net: cdc_ncm: only the control intf can be probed

The probed interface must be the master/control interface of the
function.  Make this explicit and simplify redundant tests.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove descriptor pointers
Bjørn Mork [Fri, 1 Nov 2013 10:16:47 +0000 (11:16 +0100)]
net: cdc_ncm: remove descriptor pointers

header_desc was completely unused and union_desc was never used
outside cdc_ncm_bind_common.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: fix SET_MAX_DATAGRAM_SIZE
Bjørn Mork [Fri, 1 Nov 2013 10:16:46 +0000 (11:16 +0100)]
net: cdc_ncm: fix SET_MAX_DATAGRAM_SIZE

We need to inform the device about the *new* value, not the
old one.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove ncm_parm field
Bjørn Mork [Fri, 1 Nov 2013 10:16:45 +0000 (11:16 +0100)]
net: cdc_ncm: remove ncm_parm field

Moving the call to cdc_ncm_setup() after the endpoint
setup removes the last remaining reference to ncm_parm
outside cdc_ncm_setup.

Collecting all the ncm_parm based calculations in
cdc_ncm_setup improves readability.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove tx_speed and rx_speed fields
Bjørn Mork [Fri, 1 Nov 2013 10:16:44 +0000 (11:16 +0100)]
net: cdc_ncm: remove tx_speed and rx_speed fields

These fields are only used to prevent printing the same speeds
multiple times if we receive multiple identical speed notifications.

The value of these printk's is questionable, and even more so when
we filter out some of the notifications sent us by the firmware. If
we are going to print any of these, then we should print them all.

Removing little used fields is a bonus.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove unused udev field
Bjørn Mork [Fri, 1 Nov 2013 10:16:43 +0000 (11:16 +0100)]
net: cdc_ncm: remove unused udev field

We already use the usbnet udev field everywhere this could have
been used.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove redundant netdev field
Bjørn Mork [Fri, 1 Nov 2013 10:16:42 +0000 (11:16 +0100)]
net: cdc_ncm: remove redundant netdev field

Too many pointers back and forth are likely to confuse developers,
creating subtle bugs whenever we forget to syncronize them all.

As a usbnet driver, we should stick with the standard struct
usbnet fields as much as possible.  The netdevice is one such
field.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove redundant endpoint pointers
Bjørn Mork [Fri, 1 Nov 2013 10:16:41 +0000 (11:16 +0100)]
net: cdc_ncm: remove redundant endpoint pointers

No need to duplicate stuff already in the common usbnet
struct.  We still need to keep our special find_endpoints
function because we need explicit control over the selected
altsetting.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: remove redundant "intf" field
Bjørn Mork [Fri, 1 Nov 2013 10:16:40 +0000 (11:16 +0100)]
net: cdc_ncm: remove redundant "intf" field

This is always a duplicate of the "control" field. It causes
confusion wrt intf_data updates and cleanups.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: add include protection to cdc_ncm.h
Bjørn Mork [Fri, 1 Nov 2013 10:16:39 +0000 (11:16 +0100)]
net: cdc_ncm: add include protection to cdc_ncm.h

This makes it a lot easier to test modified versions

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ncm: simplify and optimize frame padding
Bjørn Mork [Fri, 1 Nov 2013 10:16:38 +0000 (11:16 +0100)]
net: cdc_ncm: simplify and optimize frame padding

We can avoid the costly division for the common case where
we pad the frame to tx_max size as long as we ensure that
tx_max is either the device specified dwNtbOutMaxSize or not
a multiplum of wMaxPacketSize.

Using the preconverted 'maxpacket' field avoids converting
wMaxPacketSize to CPU endianness for every transmitted frame

And since we only will hit the one byte padding rule for short
frames, we can drop testing the skb for tailroom.

The change means that tx_max now represents the real maximum
skb size, enabling us to allocate the correct size instead of
always making room for one extra byte.

Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_mbim: change the default to send ZLPs
Bjørn Mork [Thu, 31 Oct 2013 14:56:11 +0000 (15:56 +0100)]
net: cdc_mbim: change the default to send ZLPs

A number of devices in the wild have turned out to require ZLPs.
Even if this is a spec violation, our priority is to make any
device work as good as possible. Devices needing ZLPs will fail
to receive any full sized frame we send. On the other hand,
devices which do not need the ZLP will still work if we send
them.

This gives us no other option than sending ZLPs by default.

This will prevent devices conforming to the spec from making the
optimizations which are possible without ZLPs.  Adding known
such devices to a whitelist, to avoid the possible negative
impact of the new spec violating default.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Cc: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_mbim: handle IPv6 Neigbor Solicitations
Bjørn Mork [Thu, 31 Oct 2013 14:56:10 +0000 (15:56 +0100)]
net: cdc_mbim: handle IPv6 Neigbor Solicitations

MBIM is a point-to-point protocol transporting raw IP packets
with no L2 headers. Only IPv4 and IPv6 are supported. ARP in
particular is not, which is quite logical given the lack of
L2 headers.

The driver still emulates an ethernet interface, dropping all
unsupported protocols, and avoiding neigbour resolving by
setting the IFF_NOARP flag.

The MBIM specification does not explicitly forbid IPv6 Neighbor
Discovery, and it seems the other OS support will respond to
Neighbor Solicitations on MBIM links. There are therefore
buggy devices out there, which despite the pointlessness, still
require Neighbor Discovery for IPv6 over MBIM.

This is incompatible with the IFF_NOARP flag which disables
both ARP and ND.  We cannot support ARP in any case, so we
have to keep that flag. This patch implements a workaround
for the buggy devices, letting the driver respond directly
to Neighbor Solicitations from the device.

This is not optimal, but will have minimal effect on any sane
device.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Reported-and-tested-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmsc9420: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:36 +0000 (08:53 -0400)]
smsc9420: replace printk with netdev_ calls

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmc91c92_cs: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:35 +0000 (08:53 -0400)]
smc91c92_cs: replace printk with netdev_ calls

Also snipes some trailing whitespace.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmc9194: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:34 +0000 (08:53 -0400)]
smc9194: replace printk with netdev_ calls

Also snipes some whitespace errors.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmsc911x: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:33 +0000 (08:53 -0400)]
smsc911x: replace printk with netdev_ calls

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmc911x: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:32 +0000 (08:53 -0400)]
smc911x: replace printk with netdev_ calls

Also fixes an incorrect function comment (probably copy/paste).

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmc91x: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:31 +0000 (08:53 -0400)]
smc91x: replace printk with netdev_ calls

Also snipes some whitespace errors.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoepic100: replace printk with netdev_ calls
Ben Boeckel [Fri, 1 Nov 2013 12:53:30 +0000 (08:53 -0400)]
epic100: replace printk with netdev_ calls

Also snipes some whitespace errors.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sat, 2 Nov 2013 05:16:15 +0000 (01:16 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to e1000, igb, ixgbe and ixgbevf.

Hong Zhiguo provides a fix for e1000 where tx_ring and adapter->tx_ring
are already of type "struct e1000_tx_ring" so no need to divide by
e1000_tx_ring size in the idx calculation.

Emil provides a fix for ixgbevf to remove a redundant workaround related
to header split and a fix for ixgbe to resolve an issue where the MTA table
can be cleared when the interface is reset while in promisc mode.

Todd provides a fix for igb to prevent ethtool from writing to the iNVM
in i210/i211 devices.  This issue was reported by Marek Vasut <marex@denx.de>.

Anton Blanchard provides a fix for ixgbe to reduce memory consumption
with larger page sizes, seen on PPC.

Don provides a cleanup in ixgbe to replace the IXGBE_DESC_UNUSED macro with
the inline function ixgbevf_desc_unused() to make the logic a bit more
readable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Sat, 2 Nov 2013 05:10:54 +0000 (01:10 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
A single fix by Alexandre Rames for the recent changes to TSO.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: fix inconsistent clearing of the multicast table
Emil Tantilov [Sat, 26 Oct 2013 08:13:20 +0000 (08:13 +0000)]
ixgbe: fix inconsistent clearing of the multicast table

This patch resolves an issue where the MTA table can be cleared when the
interface is reset while in promisc mode. As result IPv6 traffic between
VFs will be interrupted.

This patch makes the update of the MTA table unconditional to avoid the
inconsistent clearing on reset.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: cleanup IXGBE_DESC_UNUSED
Don Skidmore [Wed, 23 Oct 2013 02:17:52 +0000 (02:17 +0000)]
ixgbe: cleanup IXGBE_DESC_UNUSED

This patch just replaces the IXGBE_DESC_UNUSED macro with a like named
inline function ixgbevf_desc_unused. The inline function makes the logic
a bit more readable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Reduce memory consumption with larger page sizes
Anton Blanchard [Tue, 22 Oct 2013 18:34:01 +0000 (18:34 +0000)]
ixgbe: Reduce memory consumption with larger page sizes

The ixgbe driver allocates pages for its receive rings. It currently
uses 512 pages, regardless of page size. During receive handling it
adds the unused part of the page back into the rx ring, avoiding the
need for a new allocation.

On a ppc64 box with 64 threads and 64kB pages, we end up with
512 entries * 64 rx queues * 64kB = 2GB memory used. Even more of a
concern is that we use up 2GB of IOMMU space in order to map all this
memory.

The driver makes a number of decisions based on if PAGE_SIZE is less
than 8kB, so use this as the breakpoint and only allocate 128 entries
on 8kB or larger page sizes.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Don't let ethtool try to write to iNVM in i210/i211
Fujinaka, Todd [Wed, 23 Oct 2013 05:52:11 +0000 (05:52 +0000)]
igb: Don't let ethtool try to write to iNVM in i210/i211

Don't let ethtool try to write to iNVM in i210/i211.

This fixes an issue seen by Marek Vasut.

Reported-by: Marek Vasut <marex@denx.de>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: remove redundant workaround
Emil Tantilov [Tue, 29 Oct 2013 08:31:49 +0000 (08:31 +0000)]
ixgbevf: remove redundant workaround

This patch removes a workaround related to header split, which is redundant
because the driver does not support splitting packet headers on Rx.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000: fix wrong queue idx calculation
Hong Zhiguo [Tue, 22 Oct 2013 18:32:56 +0000 (18:32 +0000)]
e1000: fix wrong queue idx calculation

tx_ring and adapter->tx_ring are already of type "struct
e1000_tx_ring *"

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agosfc: Fix DMA unmapping issue with firmware assisted TSO
Alexandre Rames [Thu, 31 Oct 2013 12:42:32 +0000 (12:42 +0000)]
sfc: Fix DMA unmapping issue with firmware assisted TSO

When using firmware assisted TSO, we use a single DMA mapping for
the linear area of a TSO skb.

We still have to segment the super-packet and insert a descriptor
containing the original headers before each segment of payload, so we
can unmap the linear area only after the last segment is completed.
The unmapping information for the linear area is therefore associated
with the last header descriptor.

We calculate the DMA address to unmap from using the map length and
the invariant that the end of the DMA mapping matches the end of
the data referenced by the last descriptor.  But this invariant is
broken when there is TCP payload in the linear area.

Fix this by adding and using an explicit dma_offset field.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agoMerge branch '6lowpan'
David S. Miller [Wed, 30 Oct 2013 21:18:57 +0000 (17:18 -0400)]
Merge branch '6lowpan'

Alexander Aring says:

====================
This patch series cleanup the 6LoWPAN header creation and extend the use
of skb_*_header functions.

Patch 2/4 fix issues of parsing the mac header. The ieee802.15.4 header
has a dynamic size which depends on frame control bits. This patch replaces the
static mac header len calculation with a dynamic one.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: cleanup skb copy data
Alexander Aring [Wed, 30 Oct 2013 08:18:24 +0000 (09:18 +0100)]
6lowpan: cleanup skb copy data

This patch drops the direct memcpy on skb and uses the right skb
memcpy functions. Also remove an unnecessary check if plen is non zero.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: set 6lowpan network and transport header
Alexander Aring [Wed, 30 Oct 2013 08:18:23 +0000 (09:18 +0100)]
6lowpan: set 6lowpan network and transport header

This is necessary to access network header with the skb_network_header
function instead of calculate the position with mac_len, etc.
Do the same for the transport header, when we replace the IPv6 header
with the 6LoWPAN header.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: set and use mac_len for mac header length
Alexander Aring [Wed, 30 Oct 2013 08:18:22 +0000 (09:18 +0100)]
6lowpan: set and use mac_len for mac header length

Set the mac header length while creating the 802.15.4 mac header.

Drop the function for recalculate mac header length in upper layers
which was static and works for intra pan communication only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: remove unnecessary set of headers
Alexander Aring [Wed, 30 Oct 2013 08:18:21 +0000 (09:18 +0100)]
6lowpan: remove unnecessary set of headers

On receiving side we don't need to set any headers in skb because the
6LoWPAN layer do not access it. Currently these values will set twice
after calling netif_rx.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: remove the unnecessary statement in find_match()
Duan Jiong [Wed, 30 Oct 2013 07:39:26 +0000 (15:39 +0800)]
ipv6: remove the unnecessary statement in find_match()

After reading the function rt6_check_neigh(), we can
know that the RT6_NUD_FAIL_SOFT can be returned only
when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false.
so in function find_match(), there is no need to execute
the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF).

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomac802154: Use pr_err(...) rather than printk(KERN_ERR ...)
Chen Weilong [Wed, 30 Oct 2013 07:28:07 +0000 (15:28 +0800)]
mac802154: Use pr_err(...) rather than printk(KERN_ERR ...)

This change is inspired by checkpatch.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobgmac: pass received packet to the netif instead of copying it
Rafał Miłecki [Wed, 30 Oct 2013 07:00:00 +0000 (08:00 +0100)]
bgmac: pass received packet to the netif instead of copying it

Copying whole packets with skb_copy_from_linear_data_offset is a pretty
bad idea. CPU was spending time in __copy_user_common and network
performance was lower. With the new solution iperf-measured speed
increased from 116Mb/s to 134Mb/s.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: remove two indentation levels in tipc_recv_msg routine
Ying Xue [Wed, 30 Oct 2013 03:26:57 +0000 (11:26 +0800)]
tipc: remove two indentation levels in tipc_recv_msg routine

The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoarc_emac: drop redundant mac address check
Luka Perkov [Tue, 29 Oct 2013 23:11:00 +0000 (00:11 +0100)]
arc_emac: drop redundant mac address check

Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address(). While at it, reorganize checking so it matches checks in
other drivers.

Signed-off-by: Luka Perkov <luka@openwrt.org>
CC: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomvneta: drop redundant mac address check
Luka Perkov [Tue, 29 Oct 2013 23:10:01 +0000 (00:10 +0100)]
mvneta: drop redundant mac address check

Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address().

Signed-off-by: Luka Perkov <luka@openwrt.org>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoocteon_mgmt: drop redundant mac address check
Luka Perkov [Tue, 29 Oct 2013 23:09:12 +0000 (00:09 +0100)]
octeon_mgmt: drop redundant mac address check

Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address().

Signed-off-by: Luka Perkov <luka@openwrt.org>
Acked-by: David Daney <david.daney@cavium.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: temporarily disable Fast Open on SYN timeout
Yuchung Cheng [Tue, 29 Oct 2013 17:09:05 +0000 (10:09 -0700)]
tcp: temporarily disable Fast Open on SYN timeout

Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")

However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections.  An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".

The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Tue, 29 Oct 2013 22:57:49 +0000 (18:57 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to vxlan, net, ixgbe, ixgbevf, and i40e.

Joseph provides a single patch against vxlan which removes the burden
from the NIC drivers to check if the vxlan driver is enabled in the
kernel and also makes available the vxlan headrooms to the drivers.

Jacob provides majority of the patches, with patches against net, ixgbe
and ixgbevf.  His net patch adds might_sleep() call to napi_disable so
that every use of napi_disable during atomic context will be visible.
Then Jacob provides a patch to fix qv_lock_napi call in
ixgbe_napi_disable_all.  The other ixgbe patches cleanup
ixgbe_check_minimum_link function to correctly show that there are some
minor loss of encoding, even though we don't calculate it and remove
unnecessary duplication of PCIe bandwidth display.  Lastly, Jacob
provides 4 patches against ixgbevf to add ixgbevf_rx_skb in line with
how ixgbe handles the variations on how packets can be received, adds
support in order to track how many packets were cleaned during busy poll
as part of the extended statistics.

Wei Yongjun provides a fix for i40e to return -ENOMEN in the memory
allocation error handling case instead of returning 0, as done
elsewhere in this function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: doc: mvmdio now used by mv643xx_eth
Leigh Brown [Tue, 29 Oct 2013 09:33:34 +0000 (09:33 +0000)]
net: mvmdio: doc: mvmdio now used by mv643xx_eth

Amend the documentation in the mvmdio driver to note the fact
that it is now used by both the mvneta and mv643xx_eth drivers.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: slight optimisation of orion_mdio_write
Leigh Brown [Tue, 29 Oct 2013 09:33:33 +0000 (09:33 +0000)]
net: mvmdio: slight optimisation of orion_mdio_write

Make only a single call to mutex_unlock in orion_mdio_write.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: orion_mdio_ready: remove manual poll
Leigh Brown [Tue, 29 Oct 2013 09:33:32 +0000 (09:33 +0000)]
net: mvmdio: orion_mdio_ready: remove manual poll

Replace manual poll of MVMDIO_SMI_READ_VALID with a call to
orion_mdio_wait_ready.  This ensures a consistent timeout,
eliminates a busy loop, and allows for use of interrupts on
systems that support them.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: make orion_mdio_wait_ready consistent
Leigh Brown [Tue, 29 Oct 2013 09:33:31 +0000 (09:33 +0000)]
net: mvmdio: make orion_mdio_wait_ready consistent

Amend orion_mdio_wait_ready so that the same timeout is used when
polling or using wait_event_timeout.  Set the timeout to 1ms.

Replace udelay with usleep_range to avoid a busy loop, and set the
polling interval range as 45us to 55us, so that the first sleep
will be enough in almost all cases.

Generate the same log message at timeout when polling or using
wait_event_timeout.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/benet: Make lancer_wait_ready() static
Gavin Shan [Tue, 29 Oct 2013 09:30:57 +0000 (17:30 +0800)]
net/benet: Make lancer_wait_ready() static

The function needn't to be public, so to make it as static.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/benet: Remove interface type
Gavin Shan [Tue, 29 Oct 2013 09:30:56 +0000 (17:30 +0800)]
net/benet: Remove interface type

The interface type, which is being traced by "struct be_adapter::
if_type", isn't used currently. So we can remove that safely
according to Sathya's comments.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetconsole: Convert to pr_<level>
Joe Perches [Mon, 28 Oct 2013 19:53:21 +0000 (12:53 -0700)]
netconsole: Convert to pr_<level>

Use a more current logging style.

Convert printks to pr_<level>.

Consolidate multiple printks into a single printk to avoid
any possible dmesg interleaving.  Add a default "event" msg
in case the listed types are ever expanded.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sched: cls_bpf: add BPF-based classifier
Daniel Borkmann [Mon, 28 Oct 2013 15:43:02 +0000 (16:43 +0100)]
net: sched: cls_bpf: add BPF-based classifier

This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobgmac: separate RX descriptor setup code into a new function
Rafał Miłecki [Mon, 28 Oct 2013 13:40:29 +0000 (14:40 +0100)]
bgmac: separate RX descriptor setup code into a new function

This cleans code a bit and will be useful when allocating buffers in
other places (like RX path, to avoid skb_copy_from_linear_data_offset).

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoi40e: fix error return code in i40e_probe()
Wei Yongjun [Tue, 24 Sep 2013 05:17:25 +0000 (05:17 +0000)]
i40e: fix error return code in i40e_probe()

Fix to return -ENOMEM in the memory alloc error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: Add zero_base handler to network statistics
Don Skidmore [Wed, 25 Sep 2013 08:03:09 +0000 (08:03 +0000)]
ixgbevf: Add zero_base handler to network statistics

This patch removes the need to keep a zero_base variable in the adapter
structure. Now we just use two different macros to set the non-zero and
zero base. This adds to readability and shortens some of the structure
initialization under 80 columns. The gathering of status for ethtool was
slightly modified to again better fit into 80 columns and become a bit
more readable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: add BP_EXTENDED_STATS for CONFIG_NET_RX_BUSY_POLL
Jacob Keller [Sat, 21 Sep 2013 06:24:25 +0000 (06:24 +0000)]
ixgbevf: add BP_EXTENDED_STATS for CONFIG_NET_RX_BUSY_POLL

This patch adds the extended statistics similar to the ixgbe driver. These
statistics keep track of how often the busy polling yields, as well as how many
packets are cleaned or missed by the polling routine.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: implement CONFIG_NET_RX_BUSY_POLL
Jacob Keller [Sat, 21 Sep 2013 06:24:20 +0000 (06:24 +0000)]
ixgbevf: implement CONFIG_NET_RX_BUSY_POLL

This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables
sockets which have enabled the SO_BUSY_POLL socket option to use the
ndo_busy_poll_recv operation which could result in lower latency, at the cost
of higher CPU utilization, and increased power usage. This support is similar
to how the ixgbe driver works.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: have clean_rx_irq return total_rx_packets cleaned
Jacob Keller [Sat, 21 Sep 2013 06:24:14 +0000 (06:24 +0000)]
ixgbevf: have clean_rx_irq return total_rx_packets cleaned

Rather than return true/false indicating whether there was budget left, return
the total packets cleaned. This currently has no use, but will be used in a
following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track
how many packets were cleaned during the busy poll as part of the extended
statistics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbevf: add ixgbevf_rx_skb
Jacob Keller [Sat, 21 Sep 2013 06:24:09 +0000 (06:24 +0000)]
ixgbevf: add ixgbevf_rx_skb

This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on
how packets can be received. It will be extended in a following patch for
CONFIG_NET_RX_BUSY_POLL support.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: remove unnecessary duplication of PCIe bandwidth display
Jacob Keller [Fri, 18 Oct 2013 05:09:24 +0000 (05:09 +0000)]
ixgbe: remove unnecessary duplication of PCIe bandwidth display

This patch removes the unnecessary display of PCIe bandwidth twice. Since the
ixgbe_check_minimum_link does a better job, and ensures accurate detection on
even complex chains, this older check is no longer necessary.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: show <2% for encoding loss on PCIe Gen3
Jacob Keller [Fri, 18 Oct 2013 05:09:19 +0000 (05:09 +0000)]
ixgbe: show <2% for encoding loss on PCIe Gen3

This patch updates the ixgbe_check_minimum_link function to correctly show that
there is some minor loss of encoding, even though we don't calculate it in the
max GT/s equation. It is small enough to not bother, but is better to report it
than not.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix qv_lock_napi call in ixgbe_napi_disable_all
Jacob Keller [Sat, 21 Sep 2013 05:05:44 +0000 (05:05 +0000)]
ixgbe: fix qv_lock_napi call in ixgbe_napi_disable_all

ixgbe_napi_disable_all calls napi_disable on each queue, however the busy
polling code introduced a local_bh_disable()d context around the napi_disable.
The original author did not realize that napi_disable might sleep, which would
cause a sleep while atomic BUG. In addition, on a single processor system, the
ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an
ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to
indicate to the poll and napi routines that the q_vector has been disabled. Now
the ixgbe_napi_disable_all function will wait until all pending work has been
finished and prevent any future work from being started.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@intel.com>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: add might_sleep() call to napi_disable
Jacob Keller [Sat, 21 Sep 2013 05:05:39 +0000 (05:05 +0000)]
net: add might_sleep() call to napi_disable

napi_disable uses an msleep() call to wait for outstanding napi work to be
finished after setting the disable bit. It does not always sleep incase there
was no outstanding work. This resulted in a rare bug in ixgbe_down operation
where a napi_disable call took place inside of a local_bh_disable()d context.
In order to enable easier detection of future sleep while atomic BUGs, this
patch adds a might_sleep() call, so that every use of napi_disable during
atomic context will be visible.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@intel.com>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agovxlan: Have the NIC drivers do less work for offloads
Joseph Gasparakis [Thu, 24 Oct 2013 06:27:10 +0000 (06:27 +0000)]
vxlan: Have the NIC drivers do less work for offloads

This patch removes the burden from the NIC drivers to check if the
vxlan driver is enabled in the kernel and also makes available
the vxlan headrooms to them.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: esp{4,6}: get rid of struct esp_data
Mathias Krause [Fri, 18 Oct 2013 10:09:05 +0000 (12:09 +0200)]
net: esp{4,6}: get rid of struct esp_data

struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
11 years agonet: esp{4,6}: remove padlen from struct esp_data
Mathias Krause [Fri, 18 Oct 2013 10:09:04 +0000 (12:09 +0200)]
net: esp{4,6}: remove padlen from struct esp_data

The padlen member of struct esp_data is always zero. Get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
11 years agonet, mc: fix the incorrect comments in two mc-related functions
Zhi Yong Wu [Mon, 28 Oct 2013 08:15:50 +0000 (16:15 +0800)]
net, mc: fix the incorrect comments in two mc-related functions

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet, iovec: fix the incorrect comment in memcpy_fromiovecend()
Zhi Yong Wu [Mon, 28 Oct 2013 06:01:50 +0000 (14:01 +0800)]
net, iovec: fix the incorrect comment in memcpy_fromiovecend()

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()
Zhi Yong Wu [Mon, 28 Oct 2013 06:01:49 +0000 (14:01 +0800)]
net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: silence one build warning
Zhi Yong Wu [Mon, 28 Oct 2013 06:01:48 +0000 (14:01 +0800)]
vxlan: silence one build warning

drivers/net/vxlan.c: In function ‘vxlan_sock_add’:
drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here
  LD      drivers/net/built-in.o

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK
Hannes Frederic Sowa [Sun, 27 Oct 2013 16:29:11 +0000 (17:29 +0100)]
ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK

UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and
IP_PMTUDISC_PROBE well enough.

UFO enabled packet delivery just appends all frags to the cork and hands
it over to the network card. So we just deliver non-DF udp fragments
(DF-flag may get overwritten by hardware or virtual UFO enabled
interface).

UDP_CORK does enqueue the data until the cork is disengaged. At this
point it sets the correct IP_DF and local_df flags and hands it over to
ip_fragment which in this case will generate an icmp error which gets
appended to the error socket queue. This is not reflected in the syscall
error (of course, if UFO is enabled this also won't happen).

Improve this by checking the pmtudisc flags before appending data to the
socket and if we still can fit all data in one packet when IP_PMTUDISC_DO
or IP_PMTUDISC_PROBE is set, only then proceed.

We use (mtu-fragheaderlen) to check for the maximum length because we
ensure not to generate a fragment and non-fragmented data does not need
to have its length aligned on 64 bit boundaries. Also the passed in
ip_options are already aligned correctly.

Maybe, we can relax some other checks around ip_fragment. This needs
more research.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovirtio_net: migrate mergeable rx buffers to page frag allocators
Michael Dalton [Mon, 28 Oct 2013 22:44:18 +0000 (15:44 -0700)]
virtio_net: migrate mergeable rx buffers to page frag allocators

The virtio_net driver's mergeable receive buffer allocator
uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
is > 4KB but only ~1500 bytes of the buffer is used to store
packet data, reducing the effective TCP window size
substantially. This patch addresses the performance concerns
with mergeable receive buffers by allocating MTU-sized packet
buffers using page frag allocators. If more than MAX_SKB_FRAGS
buffers are needed, the SKB frag_list is used.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: Remove privacy config option.
David S. Miller [Tue, 29 Oct 2013 00:07:50 +0000 (20:07 -0400)]
ipv6: Remove privacy config option.

The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch '6lowpan'
David S. Miller [Mon, 28 Oct 2013 23:48:29 +0000 (19:48 -0400)]
Merge branch '6lowpan'

Alexander Aring says:

====================
6lowpan: trivial changes

This patch series includes some trivial changes to prepare the 6lowpan stack
for upcomming patch-series which mainly fix fragmentation according to rfc4944
and udp handling(which is currently broken).

Changes since v3:
  - really fix intendation in patch 3/5

Changes since v2:
  - change intendation in patch 3/5
  - fix typo in 5/5 unecessary -> unnecessary
  - add missing 6lowpan tag in cover-letter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: remove unnecessary break
Alexander Aring [Mon, 28 Oct 2013 09:24:20 +0000 (10:24 +0100)]
6lowpan: remove unnecessary break

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: remove skb->dev assignment
Alexander Aring [Mon, 28 Oct 2013 09:24:19 +0000 (10:24 +0100)]
6lowpan: remove skb->dev assignment

This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: use netdev_alloc_skb instead dev_alloc_skb
Alexander Aring [Mon, 28 Oct 2013 09:24:18 +0000 (10:24 +0100)]
6lowpan: use netdev_alloc_skb instead dev_alloc_skb

This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>