]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
9 years agophy: Add API for {un}registering an mdio device to a bus.
Andrew Lunn [Wed, 6 Jan 2016 19:11:18 +0000 (20:11 +0100)]
phy: Add API for {un}registering an mdio device to a bus.

Rather than have drivers directly manipulate the mii_bus structure,
provide and API for registering and unregistering devices on an MDIO
bus, and performing lookups.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoof: phy: Only register a phy device for phys
Andrew Lunn [Wed, 6 Jan 2016 19:11:17 +0000 (20:11 +0100)]
of: phy: Only register a phy device for phys

We will soon support devices other than phys on the mdio bus. Look at
a child's compatibility string to determine if it is a phy, before
registering a phy device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: Add an mdio_device structure
Andrew Lunn [Wed, 6 Jan 2016 19:11:16 +0000 (20:11 +0100)]
phy: Add an mdio_device structure

Not all devices attached to an MDIO bus are phys. So add an
mdio_device structure to represent the generic parts of an mdio
device, and place this structure into the phy_device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomdio: Move allocation of interrupts into core
Andrew Lunn [Wed, 6 Jan 2016 19:11:15 +0000 (20:11 +0100)]
mdio: Move allocation of interrupts into core

Have mdio_alloc() create the array of interrupt numbers, and
initialize it to POLLING. This is what most MDIO drivers want, so
allowing code to be removed from the drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: mdio-octeon: Use devm_mdiobus_alloc_size()
Andrew Lunn [Wed, 6 Jan 2016 19:11:14 +0000 (20:11 +0100)]
phy: mdio-octeon: Use devm_mdiobus_alloc_size()

Rather than use devm_kzalloc(), use the mdio helper function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: Centralise print about attached phy
Andrew Lunn [Wed, 6 Jan 2016 19:11:13 +0000 (20:11 +0100)]
phy: Centralise print about attached phy

Many Ethernet drivers contain the same netdev_info() print statement
about the attached phy. Move it into the phy device code. Additionally
add a varargs function which can be used to append additional
information.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: phy_{read|write}_mmd_indirect: get addr from phydev
Andrew Lunn [Wed, 6 Jan 2016 19:11:12 +0000 (20:11 +0100)]
phy: phy_{read|write}_mmd_indirect: get addr from phydev

The address of the device can be determined from the phydev structure,
rather than passing it as a parameter.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dnet: Use phy_find_first() helper
Andrew Lunn [Wed, 6 Jan 2016 19:11:11 +0000 (20:11 +0100)]
net: dnet: Use phy_find_first() helper

Replace the open coded search for the first phy with a call to the
existing helper function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: add phydev_name() wrapper
Andrew Lunn [Wed, 6 Jan 2016 19:11:10 +0000 (20:11 +0100)]
phy: add phydev_name() wrapper

Add a phydev_name() function, to help with moving some structure members
from phy_device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: Add phydev_err() and phydev_dbg() macros
Andrew Lunn [Wed, 6 Jan 2016 19:11:09 +0000 (20:11 +0100)]
phy: Add phydev_err() and phydev_dbg() macros

In preparation for moving some of the phy_device structure members,
add macros for printing errors and debug information.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: Use phy_read() instead of mdiobus_read()
Andrew Lunn [Wed, 6 Jan 2016 19:11:08 +0000 (20:11 +0100)]
phy: Use phy_read() instead of mdiobus_read()

Since we have a phydev, make use of it and the phy_read() function.
This will help with later refactoring.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomdio: Move mdiobus_read/write operatings into mdio.h
Andrew Lunn [Wed, 6 Jan 2016 19:11:07 +0000 (20:11 +0100)]
mdio: Move mdiobus_read/write operatings into mdio.h

These are logically MDIO operations, not phy operations, so move them
into the mdio header.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: Consistently use addr for address on an MII bus
Andrew Lunn [Wed, 6 Jan 2016 19:11:06 +0000 (20:11 +0100)]
phy: Consistently use addr for address on an MII bus

Within phy.h, an address on an MII bus has been called both addr and
phy_id. phy_id is particularly confusion, since it also means the ID
found in register 3, if the device on the bus is a phy. Consistently
use addr.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 7 Jan 2016 03:54:18 +0000 (22:54 -0500)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 7 Jan 2016 00:15:03 +0000 (16:15 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "As usual, there are a couple straggler bug fixes:

   1) qlcnic_alloc_mbx_args() error returns are not checked in qlcnic
      driver.  Fix from Insu Yun.

   2) SKB refcounting bug in connector, from Florian Westphal.

   3) vrf_get_saddr() has to propagate fib_lookup() errors to it's
      callers, from David Ahern.

   4) Fix AF_UNIX splice/bind deadlock, from Rainer Weikusat.

   5) qdisc_rcu_free() fails to free the per-cpu qstats.  Fix from John
      Fastabend.

   6) vmxnet3 driver passes wrong page to dma_map_page(), fix from
     Shrikrishna Khare.

   7) Don't allow zero cwnd in tcp_cwnd_reduction(), from Yuchung Cheng"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tcp: fix zero cwnd in tcp_cwnd_reduction
  Driver: Vmxnet3: Fix regression caused by 5738a09
  net: qmi_wwan: Add WeTelecom-WPD600N
  mkiss: fix scribble on freed memory
  net: possible use after free in dst_release
  net: sched: fix missing free per cpu on qstats
  ARM: net: bpf: fix zero right shift
  6pack: fix free memory scribbles
  net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
  bridge: Only call /sbin/bridge-stp for the initial network namespace
  af_unix: Fix splice-bind deadlock
  net: Propagate lookup failure in l3mdev_get_saddr to caller
  r8152: add reset_resume function
  connector: bump skb->users before callback invocation
  cxgb4: correctly handling failed allocation
  qlcnic: correctly handle qlcnic_alloc_mbx_args

9 years agotcp: fix zero cwnd in tcp_cwnd_reduction
Yuchung Cheng [Wed, 6 Jan 2016 20:42:38 +0000 (12:42 -0800)]
tcp: fix zero cwnd in tcp_cwnd_reduction

Patch 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode
conditionally") introduced a bug that cwnd may become 0 when both
inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
to a div-by-zero if the connection starts another cwnd reduction
phase by setting tp->prior_cwnd to the current cwnd (0) in
tcp_init_cwnd_reduction().

To prevent this we skip PRR operation when nothing is acked or
sacked. Then cwnd must be positive in all cases as long as ssthresh
is positive:

1) The proportional reduction mode
   inflight > ssthresh > 0

2) The reduction bound mode
  a) inflight == ssthresh > 0

  b) inflight < ssthresh
     sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh

Therefore in all cases inflight and sndcnt can not both be 0.
We check invalid tp->prior_cwnd to avoid potential div0 bugs.

In reality this bug is triggered only with a sequence of less common
events.  For example, the connection is terminating an ECN-triggered
cwnd reduction with an inflight 0, then it receives reordered/old
ACKs or DSACKs from prior transmission (which acks nothing). Or the
connection is in fast recovery stage that marks everything lost,
but fails to retransmit due to local issues, then receives data
packets from other end which acks nothing.

Fixes: 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode conditionally")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add eth_platform_get_mac_address() helper.
David S. Miller [Thu, 5 Nov 2015 16:34:57 +0000 (11:34 -0500)]
net: Add eth_platform_get_mac_address() helper.

A repeating pattern in drivers has become to use OF node information
and, if not found, platform specific host information to extract the
ethernet address for a given device.

Currently this is done with a call to of_get_mac_address() and then
some ifdef'd stuff for SPARC.

Consolidate this into a portable routine, and provide the
arch_get_platform_mac_address() weak function hook for all
architectures to implement if they want.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoDriver: Vmxnet3: Fix regression caused by 5738a09
Shrikrishna Khare [Wed, 6 Jan 2016 18:44:27 +0000 (10:44 -0800)]
Driver: Vmxnet3: Fix regression caused by 5738a09

Reported-by: Bingkuo Liu <bingkuol@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: move ndo_features_check() close to ndo_start_xmit()
Eric Dumazet [Wed, 6 Jan 2016 14:53:50 +0000 (06:53 -0800)]
net: move ndo_features_check() close to ndo_start_xmit()

TX fast path uses ndo_start_xmit(), ndo_features_check() and
ndo_select_queue().

Move ndo_features_check() close to ndo_start_xmit() to increase
data locality.

All "struct net_device_ops" should now be using C99 initializers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: qmi_wwan: Add WeTelecom-WPD600N
Kristian Evensen [Wed, 6 Jan 2016 13:15:50 +0000 (14:15 +0100)]
net: qmi_wwan: Add WeTelecom-WPD600N

The WeTelecom-WPD600N is an LTE module that, in addition to supporting most
"normal" bands, also supports LTE over 450MHz. Manual testing showed that
only interface number three replies to QMI messages.

Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofsl/fman: double free on probe failure
Dan Carpenter [Wed, 6 Jan 2016 09:59:10 +0000 (12:59 +0300)]
fsl/fman: double free on probe failure

"priv" is allocated with devm_kzalloc() so freeing it here with kfree()
will lead to a double free.

Fixes: 3933961682a3 ('fsl/fman: Add FMan MAC driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofsl/fman: fix the pause_time test
Dan Carpenter [Wed, 6 Jan 2016 09:58:09 +0000 (12:58 +0300)]
fsl/fman: fix the pause_time test

pause_time is unsigned so it can't be less than zero.  The bug means
that we allow invalid pause-times.

Fixes: 57ba4c9b56d8 ('fsl/fman: Add FMan MAC support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: core: remove an unnecessary condition
Dan Carpenter [Wed, 6 Jan 2016 09:56:30 +0000 (12:56 +0300)]
mlxsw: core: remove an unnecessary condition

We checked "err" on the lines before so we know it's zero here.

These cause a static checker warning because checking known things can
indicate a bug.  Maybe there is a missing assignment or we are checking
the wrong variable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomkiss: fix scribble on freed memory
Alan [Wed, 6 Jan 2016 14:55:02 +0000 (14:55 +0000)]
mkiss: fix scribble on freed memory

commit d79f16c046086f4fe0d42184a458e187464eb83e fixed a user triggerable
scribble on free memory but added a new one which allows the user to
scribble even more and user controlled data into freed space.

As with 6pack we need to halt the queue before we free the buffers, because
the transmit logic is not protected by the semaphore.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoethernet/atheros/alx: sanitize buffer sizing and padding
Jarod Wilson [Wed, 6 Jan 2016 14:36:37 +0000 (09:36 -0500)]
ethernet/atheros/alx: sanitize buffer sizing and padding

This is based on the work done by Przemek Rudy in bug 70761 at
bugzilla.kernel.org, but with some work done to disentagle and clarify
things a bit.

Similar to Przemek's work and other drivers, we're adding a padding of 16
here, but we're also disentangling mtu size calculations from max buffer
size calculations a bit, and adding ETH_HLEN to the value written into
ALX_MTU. Hopefully, with a bit more consistency and clarity, things behave
better here. Sadly, I can only test in my alx-driven E2200, which worked
just fine before this patch.

In comment #58 of bug 70761, Eugene A. Shatokhin reports that this patch
does help considerably for a ROSA Linux user of his with an AR8162 network
adapter when patched into a 4.1.x-based kernel, with several days of
normal operation where wired network previously wasn't usable without
setting MTU to 9000 as a work-around.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
CC: "Eugene A. Shatokhin" <eugene.shatokhin@rosalab.ru>
CC: Przemek Rudy <prudy1@o2.pl>
CC: Jay Cliburn <jcliburn@gmail.com>
CC: Chris Snook <chris.snook@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: possible use after free in dst_release
Francesco Ruggeri [Wed, 6 Jan 2016 08:18:48 +0000 (00:18 -0800)]
net: possible use after free in dst_release

dst_release should not access dst->flags after decrementing
__refcnt to 0. The dst_entry may be in dst_busy_list and
dst_gc_task may dst_destroy it before dst_release gets a chance
to access dst->flags.

Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()")
Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlxsw-vlan_filtering-offload'
David S. Miller [Wed, 6 Jan 2016 19:42:42 +0000 (14:42 -0500)]
Merge branch 'mlxsw-vlan_filtering-offload'

Jiri Pirko says:

====================
mlxsw: add offload support for vlan_filtering option

Elad says:

This patch adds SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute.
When a bridge is offloaded to hardware, the hardware can learn if the bridge is
.1Q bridge (VLAN-aware) or not VLAN aware bridge.
In order to toggle the mode a user can use sysfs:
$ echo 1 > /sys/devices/virtual/net/br0/bridge/vlan_filtering
or via iproute2:
$ ip link set dev br0 type bridge vlan_filtering 1

---
v1->v2: small fix in patch #1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Remember untagged VLANs
Elad Raz [Wed, 6 Jan 2016 12:01:11 +0000 (13:01 +0100)]
mlxsw: Remember untagged VLANs

When a vlan is been configured, remeber the untagged mode of the vlan.
When displaying the list of configured VLANs, show the untagged attribute.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Disable vlan_filtering for non .1D bridge
Elad Raz [Wed, 6 Jan 2016 12:01:10 +0000 (13:01 +0100)]
mlxsw: Disable vlan_filtering for non .1D bridge

When a port is bridged, the bridge must be vlan aware bridge (.1Q)
or the bridging should be on top of VLAN interfaces (.1D bridge).

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Renaming local variable names for consistency
Elad Raz [Wed, 6 Jan 2016 12:01:09 +0000 (13:01 +0100)]
mlxsw: Renaming local variable names for consistency

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Fixing vlans init range
Elad Raz [Wed, 6 Jan 2016 12:01:08 +0000 (13:01 +0100)]
mlxsw: Fixing vlans init range

Initialize VLANs 0..4095 (Remove init for VID 4096).

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: add vlan filtering change for new bridged device
Elad Raz [Wed, 6 Jan 2016 12:01:07 +0000 (13:01 +0100)]
bridge: add vlan filtering change for new bridged device

Notifying hardware about newly bridged port vlan-aware changes.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: add vlan filtering change notification
Elad Raz [Wed, 6 Jan 2016 12:01:06 +0000 (13:01 +0100)]
bridge: add vlan filtering change notification

Notifying hardware about bridge vlan-aware changes.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: add bridge vlan_filtering attribute
Elad Raz [Wed, 6 Jan 2016 12:01:05 +0000 (13:01 +0100)]
switchdev: add bridge vlan_filtering attribute

Adding vlan_filtering attribute to allow hardware vendor to support
vlan-aware bridges. Vlan_filtering is a per-bridge attribute.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: Propagate vlan add failure to user
Elad Raz [Wed, 6 Jan 2016 12:01:04 +0000 (13:01 +0100)]
bridge: Propagate vlan add failure to user

Disallow adding interfaces to a bridge when vlan filtering operation
failed. Send the failure code to the user.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: sched: fix missing free per cpu on qstats
John Fastabend [Tue, 5 Jan 2016 17:11:36 +0000 (09:11 -0800)]
net: sched: fix missing free per cpu on qstats

When a qdisc is using per cpu stats (currently just the ingress
qdisc) only the bstats are being freed. This also free's the qstats.

Fixes: b0ab6f92752b9f9d8 ("net: sched: enable per cpu qstats")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: net: bpf: fix zero right shift
Rabin Vincent [Tue, 5 Jan 2016 17:34:04 +0000 (18:34 +0100)]
ARM: net: bpf: fix zero right shift

The LSR instruction cannot be used to perform a zero right shift since a
0 as the immediate value (imm5) in the LSR instruction encoding means
that a shift of 32 is perfomed.  See DecodeIMMShift() in the ARM ARM.

Make the JIT skip generation of the LSR if a zero-shift is requested.

This was found using american fuzzy lop.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: change consume_skb to kfree_skb in error case
Craig Gallek [Tue, 5 Jan 2016 15:57:13 +0000 (10:57 -0500)]
soreuseport: change consume_skb to kfree_skb in error case

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: pass skb to secondary UDP socket lookup
Craig Gallek [Tue, 5 Jan 2016 20:08:07 +0000 (15:08 -0500)]
soreuseport: pass skb to secondary UDP socket lookup

This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch.  The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.

udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot.  Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years ago6pack: fix free memory scribbles
One Thousand Gnomes [Tue, 5 Jan 2016 11:51:25 +0000 (11:51 +0000)]
6pack: fix free memory scribbles

commit acf673a3187edf72068ee2f92f4dc47d66baed47 fixed a user triggerable free
memory scribble but in doing so replaced it with a different one that allows
the user to control the data and scribble even more.

sixpack_close is called by the tty layer in tty context. The tty context is
protected by sp_get() and sp_put(). However network layer activity via
sp_xmit() is not protected this way. We must therefore stop the queue
otherwise the user gets to dump a buffer mostly of their choice into freed
kernel pages.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: pci: Adjust value of CPU egress traffic class
Ido Schimmel [Tue, 5 Jan 2016 10:36:40 +0000 (11:36 +0100)]
mlxsw: pci: Adjust value of CPU egress traffic class

During initialization, when creating the send descriptor queues (SDQs),
we specify the CPU egress traffic class of each SDQ. The maximum number
of classes of this type is different in the two ASICs supported by this
PCI driver.

New firmware versions check this value is set correctly, which causes
errors on the Spectrum ASIC, as its max exposed egress traffic class is
lower than 7.

Solve this by setting this field to 3, which is an acceptable value for
both ASICs.

Note that we currently do not expose the QoS capabilities of the ASICs,
so setting this to an hardcoded value is OK for now.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: filter: make JITs zero A for SKF_AD_ALU_XOR_X
Rabin Vincent [Tue, 5 Jan 2016 15:23:07 +0000 (16:23 +0100)]
net: filter: make JITs zero A for SKF_AD_ALU_XOR_X

The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'wireless-drivers-next-for-davem-2016-01-05' of git://git.kernel.org/pub...
David S. Miller [Wed, 6 Jan 2016 05:05:04 +0000 (00:05 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
brcfmac

* fix IBSS which got broken over time
* new USB id for bcm43242 dongle
* arp offload configuration through inet notifier

ath9k

* add random number generator support (CONFIG_ATH9K_HWRNG)

iwlwifi

* Make scan parameters low latency aware
* Fix in the NL80211_FEATURE_FULL_AP_CLIENT_STATE state case
* Fix enable injection mode (Chaya Rachel)
* Various cleanups (Dan / Julia / myself)
* Allow to stay more time on popular channels (David Spinadel)
* Bug fixes for D0i3 (Eliad / Luca)
* Fixes for GO uAPSD (myself)
* Start of TSO support (myself)
* Rate control bug fixes (Eyal / Gregory)
* Start the work on 9000 devices (Johannes / Sara / Oren)
* Start the work on a new Tx queue allocation model (Liad)
* Debug infrastructure enhancements (Golan)

mwifiex

* add a debugfs file for chip reset
* advertise SMS4 cipher suite
* increase ap and station interface limit to 3
* enable MSI support on newer pcie devices (8897 onwards)

rtlwifi

* fix lots of module parameter usage
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: hns: avoid uninitialized variable warning:
Arnd Bergmann [Fri, 1 Jan 2016 22:27:57 +0000 (23:27 +0100)]
net: hns: avoid uninitialized variable warning:

gcc fails to see that the use of the 'last_offset' variable
in hns_nic_reuse_page() is used correctly and issues a bogus
warning:

drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 'hns_nic_reuse_page':
drivers/net/ethernet/hisilicon/hns/hns_enet.c:541:6: warning: 'last_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]

This simplifies the function to make it more obvious what is
going on to both readers and compilers, which makes the warning
go away.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoinet: kill unused skb_free op
Florian Westphal [Tue, 5 Jan 2016 21:17:55 +0000 (22:17 +0100)]
inet: kill unused skb_free op

The only user was removed in commit
029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations").

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: Only call /sbin/bridge-stp for the initial network namespace
Hannes Frederic Sowa [Tue, 5 Jan 2016 09:46:00 +0000 (10:46 +0100)]
bridge: Only call /sbin/bridge-stp for the initial network namespace

[I stole this patch from Eric Biederman. He wrote:]

> There is no defined mechanism to pass network namespace information
> into /sbin/bridge-stp therefore don't even try to invoke it except
> for bridge devices in the initial network namespace.
>
> It is possible for unprivileged users to cause /sbin/bridge-stp to be
> invoked for any network device name which if /sbin/bridge-stp does not
> guard against unreasonable arguments or being invoked twice on the
> same network device could cause problems.

[Hannes: changed patch using netns_eq]

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoinclude/uapi/linux/sockios.h: mark SIOCRTMSG unused
xypron.glpk@gmx.de [Tue, 5 Jan 2016 09:12:49 +0000 (10:12 +0100)]
include/uapi/linux/sockios.h: mark SIOCRTMSG unused

IOCTL SIOCRTMSG does nothing but return EINVAL.

So comment it as unused.

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Tue, 5 Jan 2016 21:32:39 +0000 (13:32 -0800)]
Merge tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two more fixes:

  1. The recordmcount change had an output that used sprintf()
     (incorrectly) when it should have been a fprintf() to stderr.

  2. The printk_formats file could crash if someone added a
     trace_printk() in the core kernel, and also added one in a module.
     This does not affect production kernels.  Only kernels where
     developers add trace_printk() for debugging can crash"

* tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix setting of start_index in find_next()
  ftrace/scripts: Fix incorrect use of sprintf in recordmcount

9 years agoMerge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux...
Linus Torvalds [Tue, 5 Jan 2016 21:21:19 +0000 (13:21 -0800)]
Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile

Pull tile bugfix from Chris Metcalf:
 "This fixes a bug that Sudip's buildbot found for tilepro allmodconfig.

  I've tagged it for stable only back to 3.19, which was when most of
  the other affected architectures added their support for working
  around this issue"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro

9 years agoMerge branch 'mlx5e-tstamp'
David S. Miller [Tue, 5 Jan 2016 19:11:51 +0000 (14:11 -0500)]
Merge branch 'mlx5e-tstamp'

Saeed Mahameed says:

====================
Introduce mlx5 ethernet timestamping

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
- Remove one line function
- Change function name

net/mlx5e: Add HW timestamping (TS) support:
- Data path performance optimization (caching tstamp struct in rq,sq)
- Change read/write_lock_irqsave to read/write_lock
- Move ioctl functions to en_clock file
- Changed overflow start algorithm according to comments from Richard
- Move timestamp init/cleanup to open/close ndos.

In details:

1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.

2nd patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions
- Added the needed ioctl for setting/getting the current timestamping
  configuration.
- used this configuration in RX/TX data path to fill the SKB with
  the timestamp.

4th patch Introduces PTP (PHC) support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Add PTP Hardware Clock (PHC) support
Eran Ben Elisha [Tue, 29 Dec 2015 12:58:32 +0000 (14:58 +0200)]
net/mlx5e: Add PTP Hardware Clock (PHC) support

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Add HW timestamping (TS) support
Eran Ben Elisha [Tue, 29 Dec 2015 12:58:31 +0000 (14:58 +0200)]
net/mlx5e: Add HW timestamping (TS) support

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5_core: Introduce access function to read internal timer
Eran Ben Elisha [Tue, 29 Dec 2015 12:58:30 +0000 (14:58 +0200)]
net/mlx5_core: Introduce access function to read internal timer

A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Do not modify the TX SKB
Achiad Shochat [Tue, 29 Dec 2015 12:58:29 +0000 (14:58 +0200)]
net/mlx5e: Do not modify the TX SKB

If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sctp-transport-rhashtable'
David S. Miller [Tue, 5 Jan 2016 17:24:06 +0000 (12:24 -0500)]
Merge branch 'sctp-transport-rhashtable'

Xin Long says:

====================
sctp: use transport hashtable to replace association's with rhashtable

for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.

when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:

in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
       and use one of them to send data to server]

===== test on net-next
-- perf top
server:
  55.73%  [kernel]             [k] sctp_assoc_is_match
   6.80%  [kernel]             [k] sctp_assoc_lookup_paddr
   4.81%  [kernel]             [k] sctp_v4_cmp_addr
   3.12%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.94%  [kernel]             [k] sctp_cmp_addr_exact

client:
  46.01%  [kernel]                    [k] sctp_endpoint_lookup_assoc
   5.55%  libc-2.17.so                [.] __libc_calloc
   5.39%  libc-2.17.so                [.] _int_free
   3.92%  libc-2.17.so                [.] _int_malloc
   3.23%  [kernel]                    [k] __memset

-- spent time
time is 487s, send pkt is 10000000

we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.

besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.

===== test with this patchset:
-- perf top
server:
  15.98%  [kernel]                 [k] _raw_spin_unlock_irqrestore
   9.92%  [kernel]                 [k] __pv_queued_spin_lock_slowpath
   7.22%  [kernel]                 [k] copy_user_generic_string
   2.38%  libpthread-2.17.so       [.] __recvmsg_nocancel
   1.88%  [kernel]                 [k] sctp_recvmsg

client:
  11.90%  [kernel]                   [k] sctp_hash_cmp
   8.52%  [kernel]                   [k] rht_deferred_worker
   4.94%  [kernel]                   [k] __pv_queued_spin_lock_slowpath
   3.95%  [kernel]                   [k] sctp_bind_addr_match
   2.49%  [kernel]                   [k] __memset

-- spent time
time is 22s, send pkt is 10000000
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc
Xin Long [Wed, 30 Dec 2015 15:50:50 +0000 (23:50 +0800)]
sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc

sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: drop the old assoc hashtable of sctp
Xin Long [Wed, 30 Dec 2015 15:50:49 +0000 (23:50 +0800)]
sctp: drop the old assoc hashtable of sctp

transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: apply rhashtable api to sctp procfs
Xin Long [Wed, 30 Dec 2015 15:50:48 +0000 (23:50 +0800)]
sctp: apply rhashtable api to sctp procfs

Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: apply rhashtable api to send/recv path
Xin Long [Wed, 30 Dec 2015 15:50:47 +0000 (23:50 +0800)]
sctp: apply rhashtable api to send/recv path

apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: add the rhashtable apis for sctp global transport hashtable
Xin Long [Wed, 30 Dec 2015 15:50:46 +0000 (23:50 +0800)]
sctp: add the rhashtable apis for sctp global transport hashtable

tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro
Chris Metcalf [Tue, 22 Dec 2015 17:28:51 +0000 (12:28 -0500)]
tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro

This allows the build system to know that it can't attempt to
configure the Lustre virtual block device, for example, when tilepro
is using 64KB pages (as it does by default).  The tilegx build
already provided those symbols.

Previously we required that the tilepro hypervisor be rebuilt with
a different hardcoded page size in its headers, and then Linux be
rebuilt using the updated hypervisor header.  Now we allow each of
the hypervisor and Linux to be built independently.  We still check
at boot time to ensure that the page size provided by the hypervisor
matches what Linux expects.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: stable@vger.kernel.org [3.19+]
9 years agoaf_unix: Fix splice-bind deadlock
Rainer Weikusat [Sun, 3 Jan 2016 18:56:38 +0000 (18:56 +0000)]
af_unix: Fix splice-bind deadlock

On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets,

http://lists.openwall.net/netdev/2015/11/06/24

The situation was analyzed as

(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
does sb_start_write() on /mnt
C: try to freeze /mnt
wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
lock our socket, see it not bound yet
decide that it needs to create something in /mnt
try to do sb_start_write() on /mnt, block (it's
waiting for C).
D: splice() from the same pipe to our socket
lock the pipe, see that socket is connected
try to lock the socket, block waiting for A
B: get around to actually feeding a chunk from
pipe to file, try to lock the pipe.  Deadlock.

on 2015/11/10 by Al Viro,

http://lists.openwall.net/netdev/2015/11/10/4

The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).

Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Propagate lookup failure in l3mdev_get_saddr to caller
David Ahern [Mon, 4 Jan 2016 17:09:27 +0000 (09:09 -0800)]
net: Propagate lookup failure in l3mdev_get_saddr to caller

Commands run in a vrf context are not failing as expected on a route lookup:
    root@kenny:~# ip ro ls table vrf-red
    unreachable default

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    ping: Warning: source address might be selected on device other than vrf-red.
    PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.

    --- 10.100.1.254 ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 999ms

Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    connect: No route to host

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'faster-soreuseport'
David S. Miller [Tue, 5 Jan 2016 03:49:59 +0000 (22:49 -0500)]
Merge branch 'faster-soreuseport'

Craig Gallek says:

====================
Faster SO_REUSEPORT

This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.

This series only includes the UDP path.  I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.

Changes in v4:
- pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)

Changes in v3:
- skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
- reuseport_attach* -> sk_reuseport_attach* and simple return statement
  syntax change (per Daniel Borkmann)

Changes in v2:
- Fix ARM build; remove unnecessary include.
- Handle case where protocol header is not in linear section (per
  Alexei Starovoitov).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: BPF selection functional test
Craig Gallek [Mon, 4 Jan 2016 22:41:48 +0000 (17:41 -0500)]
soreuseport: BPF selection functional test

This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.

It also validates the re-programing flow and several edge cases.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
Craig Gallek [Mon, 4 Jan 2016 22:41:47 +0000 (17:41 -0500)]
soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: fast reuseport UDP socket selection
Craig Gallek [Mon, 4 Jan 2016 22:41:46 +0000 (17:41 -0500)]
soreuseport: fast reuseport UDP socket selection

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosoreuseport: define reuseport groups
Craig Gallek [Mon, 4 Jan 2016 22:41:45 +0000 (17:41 -0500)]
soreuseport: define reuseport groups

struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group.  When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.

Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made.  With this structure, an appropriate
socket can be found after looking up just one socket in the group.

This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Tue, 5 Jan 2016 03:07:58 +0000 (22:07 -0500)]
Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of fixes from Ido.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Change bridge port attributes only when bridged
Ido Schimmel [Mon, 4 Jan 2016 09:42:26 +0000 (10:42 +0100)]
mlxsw: spectrum: Change bridge port attributes only when bridged

Bridge port attributes are offloaded to hardware when invoked with SELF
flag set, but it really makes no sense to reflect them when port is not
bridged.

Allow a user to change these attribute only when port is bridged and
initialize them correctly when joining or leaving a bridge.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Set bridge status in appropriate functions
Ido Schimmel [Mon, 4 Jan 2016 09:42:25 +0000 (10:42 +0100)]
mlxsw: spectrum: Set bridge status in appropriate functions

Set the bridge status of physical ports in the appropriate functions, to
be consistent with LAG join/leave and vPorts joining/leaving bridge.

Also, remove the error messages in these two functions, as we already
emit errors in both the single functions they call.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Return NOTIFY_BAD on bridge failure
Ido Schimmel [Mon, 4 Jan 2016 09:42:24 +0000 (10:42 +0100)]
mlxsw: spectrum: Return NOTIFY_BAD on bridge failure

It is possible for us to fail when joining or leaving a bridge, so let
the user know about that by returning NOTIFY_BAD, as already done for
LAG join/leave and 802.1D bridges.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Initialize PVID only once
Ido Schimmel [Mon, 4 Jan 2016 09:42:23 +0000 (10:42 +0100)]
mlxsw: spectrum: Initialize PVID only once

We set PVID to 1 in mlxsw_sp_port_vlan_init(), so we can remove this
statement.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: add reset_resume function
hayeswang [Mon, 4 Jan 2016 06:38:46 +0000 (14:38 +0800)]
r8152: add reset_resume function

When the reset_resume() is called, the flag of SELECTIVE_SUSPEND should be
cleared and reinitialize the device, whether the SELECTIVE_SUSPEND is set
or not. If reset_resume() is called, it means the power supply is cut or the
device is reset. That is, the device wouldn't be in runtime suspend state and
the reinitialization is necessary.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agochelsio: constify cphy_ops structures
Julia Lawall [Sun, 3 Jan 2016 13:09:37 +0000 (14:09 +0100)]
chelsio: constify cphy_ops structures

The cphy_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofsl/fman: allow modular build
Arnd Bergmann [Fri, 1 Jan 2016 13:55:24 +0000 (14:55 +0100)]
fsl/fman: allow modular build

ARM allmodconfig fails because of the addition of the FMAN driver:

drivers/built-in.o: In function `dtsec_restart_autoneg':
binder.c:(.text+0x173328): undefined reference to `mdiobus_read'
binder.c:(.text+0x173348): undefined reference to `mdiobus_write'
drivers/built-in.o: In function `dtsec_config':
binder.c:(.text+0x173d24): undefined reference to `of_phy_find_device'
drivers/built-in.o: In function `init_phy':
binder.c:(.text+0x1763b0): undefined reference to `of_phy_connect'
drivers/built-in.o: In function `stop':
binder.c:(.text+0x176014): undefined reference to `phy_stop'
drivers/built-in.o: In function `start':
binder.c:(.text+0x176078): undefined reference to `phy_start'

The reason is that the driver uses PHYLIB, but that is a loadable
module here, and fman itself is built-in.

This patch makes it possible to configure fman as a module as well
so we don't change the status of PHYLIB in an allmodconfig kernel,
and it adds a 'select PHYLIB' statement to ensure that phylib is
always built-in when fman is.

The driver uses "builtin_platform_driver(fman_driver);", which means
it cannot be unloaded, but it's still possible to have it as a loadable
module that gets loaded once and never removed.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5adae51a64b8 ("fsl/fman: Add FMan MURAM support")
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: make ip6tunnel_xmit definition conditional
Arnd Bergmann [Fri, 1 Jan 2016 12:18:48 +0000 (13:18 +0100)]
net: make ip6tunnel_xmit definition conditional

Moving the caller of iptunnel_xmit_stats causes a build error in
randconfig builds that disable CONFIG_INET:

In file included from ../net/xfrm/xfrm_input.c:17:0:
../include/net/ip6_tunnel.h: In function 'ip6tunnel_xmit':
../include/net/ip6_tunnel.h:93:2: error: implicit declaration of function 'iptunnel_xmit_stats' [-Werror=implicit-function-declaration]
  iptunnel_xmit_stats(dev, pkt_len);

The reason is that the iptunnel_xmit_stats definition is hidden
inside #ifdef CONFIG_INET but the caller is not. We can change
one or the other to fix it, and this patch adds a second #ifdef
around ip6tunnel_xmit() to avoid seeing the invalid call.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()")
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'nfc-next-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo...
David S. Miller [Tue, 5 Jan 2016 02:48:15 +0000 (21:48 -0500)]
Merge tag 'nfc-next-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.5 pull request

This is the first NFC pull request for 4.5 and it brings:

- A new driver for the STMicroelectronics ST95HF NFC chipset.
  The ST95HF is an NFC digital transceiver with an embedded analog
  front-end and as such relies on the Linux NFC digital
  implementation. This is the 3rd user of the NFC digital stack.

- ACPI support for the ST st-nci and st21nfca drivers.

- A small improvement for the nfcsim driver, as we can now tune
  the Rx delay through sysfs.

- A bunch of minor cleanups and small fixes from Christophe Ricard,
  for a few drivers and the NFC core code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoconnector: bump skb->users before callback invocation
Florian Westphal [Thu, 31 Dec 2015 13:26:33 +0000 (14:26 +0100)]
connector: bump skb->users before callback invocation

Dmitry reports memleak with syskaller program.
Problem is that connector bumps skb usecount but might not invoke callback.

So move skb_get to where we invoke the callback.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoudp: properly support MSG_PEEK with truncated buffers
Eric Dumazet [Wed, 30 Dec 2015 13:51:12 +0000 (08:51 -0500)]
udp: properly support MSG_PEEK with truncated buffers

Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: correctly handling failed allocation
Insu Yun [Tue, 29 Dec 2015 22:20:11 +0000 (17:20 -0500)]
cxgb4: correctly handling failed allocation

Since t4_alloc_mem can be failed in memory pressure,
if not properly handled, NULL dereference could be happened.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoqlcnic: correctly handle qlcnic_alloc_mbx_args
Insu Yun [Tue, 29 Dec 2015 20:02:18 +0000 (15:02 -0500)]
qlcnic: correctly handle qlcnic_alloc_mbx_args

Since qlcnic_alloc_mbx_args can be failed,
return value should be checked.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'r8169-hw-programming-typo-fixes'
David S. Miller [Mon, 4 Jan 2016 21:50:50 +0000 (16:50 -0500)]
Merge branch 'r8169-hw-programming-typo-fixes'

Chunhao Lin says:

====================
Fix some typos in setting hardware parameter

The typos are in setting RTL8168DP, RTL8168EP and RTL8168H hardware parameters.
This series of patch fix these typos.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8169:Correct the way of setting RTL8168DP ephy
Chun-Hao Lin [Tue, 29 Dec 2015 14:13:39 +0000 (22:13 +0800)]
r8169:Correct the way of setting RTL8168DP ephy

The original way is wrong, it always writes ephy reg 0x03.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8169:Fix typo in setting RTL8168H PHY PFM mode.
Chun-Hao Lin [Tue, 29 Dec 2015 14:13:38 +0000 (22:13 +0800)]
r8169:Fix typo in setting RTL8168H PHY PFM mode.

The PHY PFM register is in PHY page 0x0a44 register 0x11, not 0x14.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8169:Fix typo in setting RTL8168EP and RTL8168H D3cold PFM mode
Chun-Hao Lin [Tue, 29 Dec 2015 14:13:37 +0000 (22:13 +0800)]
r8169:Fix typo in setting RTL8168EP and RTL8168H D3cold PFM mode

The register for setting D3code PFM mode is  MISC_1, not DLLPR.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agol2tp: rely on ppp layer for skb scrubbing
Guillaume Nault [Tue, 29 Dec 2015 12:06:59 +0000 (13:06 +0100)]
l2tp: rely on ppp layer for skb scrubbing

Since 79c441ae505c ("ppp: implement x-netns support"), the PPP layer
calls skb_scrub_packet() whenever the skb is received on the PPP
device. Manually resetting packet meta-data in the L2TP layer is thus
redundant.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sh_eth-remove-BE-desc-support'
David S. Miller [Mon, 4 Jan 2016 21:11:12 +0000 (16:11 -0500)]
Merge branch 'sh_eth-remove-BE-desc-support'

Sergei Shtylyov says:

====================
sh_eth: remove unused BE descriptor support

   Here's a set of 2 patches against DaveM's 'net-next.git' repo plus the
recently merged to 'net.git' repo fix for the 16-bit descriptor endianness.
We get rid of ~30 LoCs and ~300 bytes of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosh_eth: get rid of {cpu|edmac}_to_{edmac|cpu}()
Sergei Shtylyov [Sun, 27 Dec 2015 23:10:47 +0000 (02:10 +0300)]
sh_eth: get rid of {cpu|edmac}_to_{edmac|cpu}()

Now that {cpu|edmac}_to_{edmac|cpu}() functions boiled down to the mere
{cpu|le32}_to_{le32|cpu}() calls, there's no need for these functions
anymore, so just get rid of them.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosh_eth: remove EDMAC_BIG_ENDIAN
Sergei Shtylyov [Sun, 27 Dec 2015 23:07:08 +0000 (02:07 +0300)]
sh_eth: remove EDMAC_BIG_ENDIAN

Commit  71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") added support
for the big-endian EDMAC descriptors. However, it was never used and never
worked right until the recent driver  fixes. I think we now  can just remove
this support,  it was only burdening the driver from the start. It should be
easy to do without disturbing the SH platform code, at least for now...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotilepro: use to_delayed_work
Geliang Tang [Fri, 1 Jan 2016 15:48:57 +0000 (23:48 +0800)]
tilepro: use to_delayed_work

Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bnxt_en-combined-rx-tx-channels'
David S. Miller [Mon, 4 Jan 2016 20:54:41 +0000 (15:54 -0500)]
Merge branch 'bnxt_en-combined-rx-tx-channels'

Michael Chan says:

====================
bnxt_en: Support combined and rx/tx channels.

The bnxt hardware uses a completion ring for rx and tx events.  The driver
has to process the completion ring entries sequentially for the events.
The current code only supports an rx/tx ring pair for each completion ring.
This patch series add support for using a dedicated completion ring for
rx only or tx only as an option configuarble using ethtool -L.

The benefits for using dedicated completion rings are:

1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared.  If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.

2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2.  When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.

3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Modify ethtool -l|-L to support combined or rx/tx rings.
Michael Chan [Sun, 3 Jan 2016 04:45:04 +0000 (23:45 -0500)]
bnxt_en: Modify ethtool -l|-L to support combined or rx/tx rings.

The driver can support either all combined or all rx/tx rings.  The
default is combined, but the user can now select rx/tx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Modify init sequence to support shared or non shared rings.
Michael Chan [Sun, 3 Jan 2016 04:45:03 +0000 (23:45 -0500)]
bnxt_en: Modify init sequence to support shared or non shared rings.

Modify ring memory allocation and MSIX setup to support shared or
non shared rings and do the proper mapping.  Default is still to
use shared rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.
Michael Chan [Sun, 3 Jan 2016 04:45:02 +0000 (23:45 -0500)]
bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.

Add logic to calculate how many shared or non shared rings can be
supported.  Default is to use shared rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Re-structure ring indexing and mapping.
Michael Chan [Sun, 3 Jan 2016 04:45:01 +0000 (23:45 -0500)]
bnxt_en: Re-structure ring indexing and mapping.

In order to support dedicated or shared completion rings, the ring
indexing and mapping are re-structured as below:

1. bp->grp_info[] array index is 1:1 with bp->bnapi[] array index and
completion ring index.

2. rx rings 0 to n will be mapped to completion rings 0 to n.

3. If tx and rx rings share completion rings, then tx rings 0 to m will
be mapped to completion rings 0 to m.

4. If tx and rx rings use dedicated completion rings, then tx rings 0 to
m will be mapped to completion rings n + 1 to n + m.

5. Each tx or rx ring will use the corresponding completion ring index
for doorbell mapping and MSIX mapping.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Check for NULL rx or tx ring.
Michael Chan [Sun, 3 Jan 2016 04:45:00 +0000 (23:45 -0500)]
bnxt_en: Check for NULL rx or tx ring.

Each bnxt_napi structure may no longer be having both an rx ring and
a tx ring.  Check for a valid ring before using it.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Separate bnxt_{rx|tx}_ring_info structs from bnxt_napi struct.
Michael Chan [Sun, 3 Jan 2016 04:44:59 +0000 (23:44 -0500)]
bnxt_en: Separate bnxt_{rx|tx}_ring_info structs from bnxt_napi struct.

Currently, an rx and a tx ring are always paired with a completion ring.
We want to restructure it so that it is possible to have a dedicated
completion ring for tx or rx only.

The bnxt hardware uses a completion ring for rx and tx events.  The driver
has to process the completion ring entries sequentially for the rx and tx
events.  Using a dedicated completion ring for rx only or tx only has these
benefits:

1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared.  If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.

2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2.  When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.

3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.

The first step is to separate the rx and tx ring structures from the
bnxt_napi struct.

In this patch, an rx ring and a tx ring will point to the same bnxt_napi
struct to share the same completion ring.  No change in ring assignment
and mapping yet.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnxt_en: Refactor bnxt_dbg_dump_states().
Michael Chan [Sun, 3 Jan 2016 04:44:58 +0000 (23:44 -0500)]
bnxt_en: Refactor bnxt_dbg_dump_states().

By adding 3 separate functions to dump the different ring states.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotracing: Fix setting of start_index in find_next()
Qiu Peiyang [Thu, 31 Dec 2015 05:11:28 +0000 (13:11 +0800)]
tracing: Fix setting of start_index in find_next()

When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
panic at t_show.

general protection fault: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
RIP: 0010:[<ffffffff811375b2>]
 [<ffffffff811375b2>] t_show+0x22/0xe0
RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
Call Trace:
 [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
 [<ffffffff811b749b>] vfs_read+0x9b/0x160
 [<ffffffff811b7f69>] SyS_read+0x49/0xb0
 [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
 ---[ end trace 5bd9eb630614861e ]---
Kernel panic - not syncing: Fatal exception

When the first time find_next calls find_next_mod_format, it should
iterate the trace_bprintk_fmt_list to find the first print format of
the module. However in current code, start_index is smaller than *pos
at first, and code will not iterate the list. Latter container_of will
get the wrong address with former v, which will cause mod_fmt be a
meaningless object and so is the returned mod_fmt->fmt.

This patch will fix it by correcting the start_index. After fixed,
when the first time calls find_next_mod_format, start_index will be
equal to *pos, and code will iterate the trace_bprintk_fmt_list to
get the right module printk format, so is the returned mod_fmt->fmt.

Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
Cc: stable@vger.kernel.org # 3.12+
Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers"
Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>