]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
9 years agonet/sched/sch_hfsc.c: remove unused cl_myfadj
Michal Soltys [Tue, 2 Aug 2016 22:44:55 +0000 (00:44 +0200)]
net/sched/sch_hfsc.c: remove unused cl_myfadj

The code using this variable has been commented out in the past as it
was causing issues in upperlimited link-sharing scenarios.

Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/sched/sch_hfsc.c: keep fsc and virtual times in sync; fix an old bug
Michal Soltys [Tue, 2 Aug 2016 22:44:54 +0000 (00:44 +0200)]
net/sched/sch_hfsc.c: keep fsc and virtual times in sync; fix an old bug

This patch simplifies how we update fsc and calculate vt from it - while
keeping the expected functionality identical with how hfsc behaves
curently. It also fixes a certain issue introduced with
a very old patch.

The idea is, that instead of correcting cl_vt before fsc curve update
(rtsc_min) and correcting cl_vt after calculation (rtsc_y2x) to keep
cl_vt local to the current period - we can simply rely on virtual times
and curve values always being in sync - analogously to how rsc and usc
function, except that we use virtual time here.

Why hasn't it been done since the beginning this way ? The likely scenario
(basing on the code trying to correct curves whenever possible) was to
keep the virtual times as small as possible - as they have tendency to
"gallop" forward whenever their siblings and other fair sharing
subtrees are idling. On top of that, current code is subtly bugged, so
cumulative time (without any corrections) is always kept and used in
init_vf() when a new backlog period begins (using cl_cvtoff).

Is cumulative value safe ? Generally yes, though corner cases are easy
to create. For example consider:

1gbit interface
some 100kbit leaf, everything else idle

With current tick (64ns) 1s is 15625000 ticks, but the leaf is alone and
it's virtual time, so in reality it's 10000 times more. ITOW 38 bits are
needed to hold 1 second. 54 - 1 day, 59 - 1 month, 63 - 1 year (all
logarithms rounded up). It's getting somewhat dangerous, but also
requires setup excusing this kind of values not mentioning permanently
backlogged class for a year. In near most extreme case (10gbit, 10kbit
leaf), we have "enough" to hold ~13.6 days in 64 bits.

Well, the issue remains mostly theoretical and cl_cvtoff has been
working fine for all those years. Sensible configuration are de-facto
immune to this issue, and not so sensible can solve it with a cronjob
and its period inversely proportional to the insanity of such setup =)

Now let's explain the subtle bug mentioned earlier.

The issue is related to how offsets are kept and how we calculate
virtual times and update fair service curve(s). The issue itself is
subtle, but easy to observe with long m1 segments. It was introduced in
rather old patch:

Commit 99296150c7: "[NET_SCHED]: O(1) children vtoff adjustment
in HFSC scheduler"

(available in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git)

Originally when a new backlog period was started, cl_vtoff of each
sibling was updated with cl_cvtmax from past period - naturally moving
all cl_vt to proper starting point. That patch adjusted it so cumulative
offset is kept in the parent, and there is no need for traversing the
list (as any subsequent child activation derives new vt from already
active sibling(s)).

But with this change, cl_vtoff (of each sibling) is no longer persistent
across the inactivity periods, as it's calculated from parent's
cl_cvtoff on a new backlog period, conflicting with the following curve
correction from the previous period:

if (cl->cl_virtual.x == vt) {
        cl->cl_virtual.x -= cl->cl_vtoff;
cl->cl_vtoff = 0;
}

This essentially tries to keep curve as if it was local to the period
and resets cl_vtoff (cumulative vt offset of the class) to 0 when
possible (read: when we have an intersection or if a new curve is below
the old one). But then it's recalculated from cl_cvtoff on next active
period.  Then rtsc_min() call preceding the above if() doesn't really
do what we expect it to do in such scenario - as it calculates the
minimum of corrected curve (from the previous backlog period) and the
new uncorrected curve (with offset derived from cl_cvtoff).

Example:

tc class add dev $ife parent 1:0 classid 1:1  hfsc ls m2 100mbit ul m2 100mbit
tc class add dev $ife parent 1:1 classid 1:10 hfsc ls m1 80mbit d 10s m2 20mbit
tc class add dev $ife parent 1:1 classid 1:11 hfsc ls m2 20mbit

start B, keep it backlogged, let it run 6s (30s worth of vt as A is idle)
pause B briefly to force cl_cvtoff update in parent (whole 1:1 going idle)
start A, let it run 10s
pause A briefly to force rtsc_min()

At this point we would expect A to continue at 20mbit after a brief
moment of 80mbit. But instead A will use 80mbit for full 10s again. It's
the effect of first correcting A (during 'start A'), and then - after
unpausing - calculating rtsc_min() from old corrected and new uncorrected
curve.

The patch fixes this bug and keepis vt and fsc in sync (virtual times
are cumulative, not local to the backlog period).

Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoqed: Use DEFINE_SPINLOCK() for spinlock
Wei Yongjun [Tue, 2 Aug 2016 13:49:00 +0000 (13:49 +0000)]
qed: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/multicast: should not send source list records when have filter mode change
Hangbin Liu [Tue, 2 Aug 2016 10:02:57 +0000 (18:02 +0800)]
net/multicast: should not send source list records when have filter mode change

Based on RFC3376 5.1 and RFC3810 6.1

   If the per-interface listening change that triggers the new report is
   a filter mode change, then the next [Robustness Variable] State
   Change Reports will include a Filter Mode Change Record.  This
   applies even if any number of source list changes occur in that
   period.

   Old State         New State         State Change Record Sent
   ---------         ---------         ------------------------
   INCLUDE (A)       EXCLUDE (B)       TO_EX (B)
   EXCLUDE (A)       INCLUDE (B)       TO_IN (B)

So we should not send source-list change if there is a filter-mode change.

Here are two scenarios:
1. Group deleted and filter mode is EXCLUDE, which means we need send a
   TO_IN { }.
2. Not group deleted, but has pcm->crcount, which means we need send a
   normal filter-mode-change.

At the same time, if the type is ALLOW or BLOCK, and have psf->sf_crcount,
we stop add records and decrease sf_crcount directly

Reference: https://www.ietf.org/mail-archive/web/magma/current/msg01274.html

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: marvell: mvneta: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sat, 30 Jul 2016 15:42:12 +0000 (17:42 +0200)]
net: ethernet: marvell: mvneta: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move the mvneta driver to new api {get|set}_link_ksettings.

We use the generic function phy_ethtool_get_link_ksettings,
and update old mvneta_ethtool_set_settings to the new api.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: marvell: mvneta: use phydev from struct net_device
Philippe Reynes [Sat, 30 Jul 2016 15:42:11 +0000 (17:42 +0200)]
net: ethernet: marvell: mvneta: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy_dev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: greth: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Sat, 30 Jul 2016 14:22:06 +0000 (16:22 +0200)]
net: ethernet: greth: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: greth: use phydev from struct net_device
Philippe Reynes [Sat, 30 Jul 2016 14:22:05 +0000 (16:22 +0200)]
net: ethernet: greth: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: octeon: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Sat, 30 Jul 2016 14:14:44 +0000 (16:14 +0200)]
net: ethernet: octeon: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

There was a check on CAP_NET_ADMIN in cvm_oct_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ethernet: octeon: use phydev from struct net_device
Philippe Reynes [Sat, 30 Jul 2016 14:14:43 +0000 (16:14 +0200)]
net: ethernet: octeon: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bna-next'
David S. Miller [Mon, 8 Aug 2016 22:41:28 +0000 (15:41 -0700)]
Merge branch 'bna-next'

Ivan Vecera says:

====================
bna: remove useless global variables

The set removes useless global bnad_list as well as bnad->entry that track
a list of driver instances but it is not used anywhere. The associated
bnad_list_mutex is removed as well but as it is also used to protect
bna_id increment it is necessary to convert bna_id to atomic_t.
====================

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
9 years agobna: remove global bnad_list_mutex
Ivan Vecera [Fri, 29 Jul 2016 17:52:58 +0000 (19:52 +0200)]
bna: remove global bnad_list_mutex

Remove global bnad_list_mutex as it is not used anymore. This makes
bnad_add_to_list() and bnad_remove_from_list() empty so remove them too.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobna: change type of bna_id to atomic_t
Ivan Vecera [Fri, 29 Jul 2016 17:52:57 +0000 (19:52 +0200)]
bna: change type of bna_id to atomic_t

Change type of bna_id to atomic_t. The bnad_list_mutex is used to prevent
a race when bna_id is incremented. After the change the mutex can be
removed in the next step.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobna: remove useless linked list
Ivan Vecera [Fri, 29 Jul 2016 17:52:56 +0000 (19:52 +0200)]
bna: remove useless linked list

Remove global variable bnad_list and bnad->list_entry that are used
as list of bna driver instances. It is not necessary and useless.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ipconfig-improve-dhcp-timeouts'
David S. Miller [Mon, 8 Aug 2016 22:40:06 +0000 (15:40 -0700)]
Merge branch 'ipconfig-improve-dhcp-timeouts'

Uwe Kleine-König says:

====================
net: ipconfig: improve DHCP timeout handling

this series teaches the ipconfig code to handle a DHCP reply on eth0 even if a
request on eth1 was already sent out.
This is a follow fix to 2513dfb83fc7 ("ipconfig: handle case of delayed DHCP
server") that dropped a late reply.

This makes it possible at all to work with slow DHCP servers at all in some
configurations and improves boot speed in general.

The first patch is not really necessary, it only helps decoding debug messages
when there is more than one device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ipconfig: drop inter-device timeout
Uwe Kleine-König [Fri, 29 Jul 2016 09:30:39 +0000 (11:30 +0200)]
net: ipconfig: drop inter-device timeout

Now that ipconfig learned to handle "delayed replies" in the previous
commit, there is no reason any more to delay sending a first request per
device.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ipconfig: Support using "delayed" DHCP replies
Uwe Kleine-König [Fri, 29 Jul 2016 09:30:38 +0000 (11:30 +0200)]
net: ipconfig: Support using "delayed" DHCP replies

The dhcp code only waits 1s between sending DHCP requests on different
devices and only accepts an answer for the device that sent out the last
request. Only the timeout at the end of a loop is increased iteratively
which favours only the last device. This makes it impossible to work
with a dhcp server that takes little more than 1s connected to a device
that is not the last one.

Instead of also increasing the inter-device timeout, teach the code to
handle delayed replies.

To accomplish that, make *ic_dev track the current ic_device instead of
the current net_device and adapt all users accordingly. The relevant
change then is to reset d to ic_dev on a reply to assert that the
followup request goes through the right device.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ipconfig: Add device name to debug messages
Uwe Kleine-König [Fri, 29 Jul 2016 09:30:37 +0000 (11:30 +0200)]
net: ipconfig: Add device name to debug messages

This simplifies understanding what happens when there is more than one
device.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'be2net-next'
David S. Miller [Mon, 8 Aug 2016 22:38:27 +0000 (15:38 -0700)]
Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Patch 1 fixes the driver to workaournd a bug in the Lancer FW in the
vlan-config cmd processing. The FW in some cases clears the vlan-promisc
setting even if it cannot apply the vlan filter. The driver has no means
of knowing if the vlan-promisc setting has been cleared or not. This
patch now explicitly clears the vlan-promisc setting via the RX-Filter cmd
and then tries to program the vlan-list.

Patch 2 fixes the failure path in the vlan vid add code.
The driver currently removes a new vid from the adapter->vids[] array if
be_vid_config() returns an error, which occurs when there is an error in
HW/FW. This is wrong. After the HW/FW error is recovered from, we need the
complete vids[] array to re-program the vlan list.

Patch 3 fixes the ndo_set_rx_mode() path to avoid unnecessary multicast
list updates to the FW. Each time the ndo_set_rx_mode() routine is called,
the driver programs the multicast list in the adapter without checking
if there are any changes to the list. This leads to a flood of RX_FILTER
cmds when a number of vlan interfaces are configured over the device,
as the ndo_ gets called for each vlan interface. To avoid this, we now
use __dev_mc_sync() and __dev_uc_sync() API, but only to detect if there
is a change in the mc/uc lists. Now that we use this API, the code has to
be-designed to issue these API calls for each invocation of the ndo_ call.

Patch 4 replaces polling with sleeping in the FW completion path.
The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with
BHs disabled. The driver currently issues the required cmds to the FW in
these contexts and polls on completions from the FW, while BHs remain
disabled.  This can cause either packet loss or packet reception to be
delayed on that CPU.  This patch defers processing of the above cmds to a
separate workqueue. With this change, FW cmds are now issued only in process
context. Now that the FW cmds are issued only in process context, they can
sleep waiting for a completion instead of polling.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: replace polling with sleeping in the FW completion path
Sathya Perla [Wed, 27 Jul 2016 09:26:18 +0000 (05:26 -0400)]
be2net: replace polling with sleeping in the FW completion path

The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with
BHs disabled. The driver currently issues the required cmds to the FW in
these contexts and polls on completions from the FW, while BHs remain
disabled.  This can cause either packet loss or packet reception to be
delayed on that CPU.

This patch defers processing of the above cmds to a separate workqueue.
With this change, FW cmds are now issued only in process context.
Now that the FW cmds are issued only in process context, they can sleep
waiting for a completion instead of polling. All the spin_lock_bh(mcc_lock)
calls are now replaced with mutex calls.

Also a new rx_filter_lock is now needed to protect the RX filtering fields
like vids[] between be_vlan_add/rem_vid() and __be_set_rx_mode() contexts.

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: Avoid unnecessary firmware updates of multicast list
Sriharsha Basavapatna [Wed, 27 Jul 2016 09:26:17 +0000 (05:26 -0400)]
be2net: Avoid unnecessary firmware updates of multicast list

Eachtime the ndo_set_rx_mode() routine is called, the driver programs the
multicast list in the adapter without checking if there are any changes to
the list. This leads to a flood of RX_FILTER cmds when a number of vlan
interfaces are configured over the device, as the ndo_ gets
called for each vlan interface. To avoid this, we now use __dev_mc_sync()
and __dev_uc_sync() API, but only to detect if there is a change in the
mc/uc lists. Now that we use this API, the code has to be-designed to
issue these API calls for each invocation of the be_set_rx_mode() call.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: do not remove vids from driver table if be_vid_config() fails.
Sathya Perla [Wed, 27 Jul 2016 09:26:16 +0000 (05:26 -0400)]
be2net: do not remove vids from driver table if be_vid_config() fails.

The driver currently removes a new vid from the adapter->vids[] array if
be_vid_config() returns an error, which occurs when there is an error in
HW/FW. This is wrong. After the HW/FW error is recovered from, we need the
complete vids[] array to re-program the vlan list.

Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: clear vlan-promisc setting before programming the vlan list
Somnath Kotur [Wed, 27 Jul 2016 09:26:15 +0000 (05:26 -0400)]
be2net: clear vlan-promisc setting before programming the vlan list

The Lancer FW has a bug due to which in some cases vlan-promisc setting
is cleared eventhough the vlan-list programming did not succeed (via
VLAN_CONFIG) cmd. The driver has no way of knowing if the vlan-promisc
mode was cleared or not when this cmd fails. To work around this issue,
this patch first explicitly clears the vlan-promisc mode via RX_FILTER
cmd and then tries to program the vlan list.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoneigh: allow admin to set NUD_STALE
Julian Anastasov [Wed, 27 Jul 2016 06:56:50 +0000 (09:56 +0300)]
neigh: allow admin to set NUD_STALE

Admin should be able to set any state. Currently, this fails
when lladdr is not changed and state is changed from
NUD_CONNECTED to NUD_STALE:

ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
ip neigh show to 192.168.8.1
192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
ip neigh show to 192.168.8.1
192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT

Problem may be from 2.1.X days.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Chunhui He <hchunhui@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: use event->chunk when it's valid
Xin Long [Sun, 7 Aug 2016 06:15:13 +0000 (14:15 +0800)]
sctp: use event->chunk when it's valid

Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
it's available") used event->chunk->head_skb to get the head_skb in
sctp_ulpevent_set_owner().

But at that moment, the event->chunk was NULL, as it cloned the skb
in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
work.

This patch is to move the event->chunk initialization before calling
sctp_ulpevent_receive_data() so that it uses event->chunk when it's
valid.

Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: vxlan: lwt: Fix vxlan local traffic.
pravin shelar [Sat, 6 Aug 2016 00:45:37 +0000 (17:45 -0700)]
net: vxlan: lwt: Fix vxlan local traffic.

vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.

Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
Reported-by: Jakub Libosvar <jlibosva@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: vxlan: lwt: Use source ip address during route lookup.
pravin shelar [Sat, 6 Aug 2016 00:45:36 +0000 (17:45 -0700)]
net: vxlan: lwt: Use source ip address during route lookup.

LWT user can specify destination as well as source ip address
for given tunnel endpoint. But vxlan is ignoring given source
ip address. Following patch uses both ip address to route the
tunnel packet. This consistent with other LWT implementations,
like GENEVE and GRE.

Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bpf-csum-complete'
David S. Miller [Mon, 8 Aug 2016 20:11:44 +0000 (13:11 -0700)]
Merge branch 'bpf-csum-complete'

Daniel Borkmann says:

====================
Few BPF helper related checksum fixes

The set contains three fixes with regards to CHECKSUM_COMPLETE
and BPF helper functions. For details please see individual
patches.

Thanks!

v1 -> v2:
  - Fixed make htmldocs issue reported by kbuild bot.
  - Rest as is.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: fix checksum for vlan push/pop helper
Daniel Borkmann [Thu, 4 Aug 2016 22:11:13 +0000 (00:11 +0200)]
bpf: fix checksum for vlan push/pop helper

When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).

For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.

However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.

Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: fix checksum fixups on bpf_skb_store_bytes
Daniel Borkmann [Thu, 4 Aug 2016 22:11:12 +0000 (00:11 +0200)]
bpf: fix checksum fixups on bpf_skb_store_bytes

bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: also call skb_postpush_rcsum on xmit occasions
Daniel Borkmann [Thu, 4 Aug 2016 22:11:11 +0000 (00:11 +0200)]
bpf: also call skb_postpush_rcsum on xmit occasions

Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.

For the same reasons as described in f8ffad69c9f8 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.

Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

Also here we have to address bpf_redirect() and bpf_clone_redirect().

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/ethernet: tundra: fix dump_eth_one warning in tsi108_eth
Paul Gortmaker [Thu, 4 Aug 2016 20:07:58 +0000 (16:07 -0400)]
net/ethernet: tundra: fix dump_eth_one warning in tsi108_eth

The call site for this function appears as:

  #ifdef DEBUG
        data->msg_enable = DEBUG;
        dump_eth_one(dev);
  #endif

...leading to the following warning for !DEBUG builds:

drivers/net/ethernet/tundra/tsi108_eth.c:169:13: warning: 'dump_eth_one' defined but not used [-Wunused-function]
 static void dump_eth_one(struct net_device *dev)
             ^

...when using the arch/powerpc/configs/mpc7448_hpc2_defconfig

Put the function definition under the same #ifdef as the call site
to avoid the warning.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlxsw-dcb-fixes'
David S. Miller [Mon, 8 Aug 2016 19:57:28 +0000 (12:57 -0700)]
Merge branch 'mlxsw-dcb-fixes'

Ido Schimmel says:

====================
mlxsw: DCB fixes

Patches 1 and 2 fix a problem in which PAUSE frames settings are wrongly
overridden when ieee_setpfc() gets called.

Patch 3 adds a missing rollback in port's creation error path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Add missing DCB rollback in error path
Ido Schimmel [Thu, 4 Aug 2016 14:36:22 +0000 (17:36 +0300)]
mlxsw: spectrum: Add missing DCB rollback in error path

We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.

Fixes: f00817df2b42 ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Do not override PAUSE settings
Ido Schimmel [Thu, 4 Aug 2016 14:36:21 +0000 (17:36 +0300)]
mlxsw: spectrum: Do not override PAUSE settings

The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).

Fix this by packing the PFCC register with the current PAUSE settings.

Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.

Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: spectrum: Do not assume PAUSE frames are disabled
Ido Schimmel [Thu, 4 Aug 2016 14:36:20 +0000 (17:36 +0300)]
mlxsw: spectrum: Do not assume PAUSE frames are disabled

When ieee_setpfc() gets called, PAUSE frames are not necessarily
disabled on the port.

Check if PAUSE frames are disabled or enabled and configure the port's
headroom buffer accordingly.

Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable-test: Fix max_size parameter description
Phil Sutter [Thu, 4 Aug 2016 10:37:17 +0000 (12:37 +0200)]
rhashtable-test: Fix max_size parameter description

Looks like a simple copy'n'paste error.

Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sctp_diag-fixes'
David S. Miller [Mon, 8 Aug 2016 19:51:59 +0000 (12:51 -0700)]
Merge branch 'sctp_diag-fixes'

Phil Sutter says:

====================
sctp_diag: A bunch of fixes for upcoming 'ss' support

The following series contains a number of fixes necessary to make my yet
unpublished 'ss' support patch functional.

Changes since v1:
- Fixed patch 2/3
- Rebased whole series onto current net-next/master

Changes since v2:
- Improved description of patch 2/3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp_diag: Respect ss adding TCPF_CLOSE to idiag_states
Phil Sutter [Thu, 4 Aug 2016 10:11:57 +0000 (12:11 +0200)]
sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states

Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp_diag: Fix T3_rtx timer export
Phil Sutter [Thu, 4 Aug 2016 10:11:56 +0000 (12:11 +0200)]
sctp_diag: Fix T3_rtx timer export

The asoc's timer value is not kept in asoc->timeouts array but in it's
primary transport instead.

Furthermore, we must export the timer only if it is pending, otherwise
the value will underrun when stored in an unsigned variable and
user space will only see a very large timeout value.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: Export struct sctp_info to userspace
Phil Sutter [Thu, 4 Aug 2016 10:11:55 +0000 (12:11 +0200)]
sctp: Export struct sctp_info to userspace

This is required to correctly interpret INET_DIAG_INFO messages exported
by sctp_diag module.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: macb: Correct CAPS mask
Harini Katakam [Fri, 5 Aug 2016 05:01:58 +0000 (10:31 +0530)]
net: macb: Correct CAPS mask

USRIO and JUMBO CAPS have the same mask.
Fix the same.

Fixes: ce721a702197 ("net: ethernet: cadence-macb: Add disabled usrio caps")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Harini Katakam <harinik@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Sun, 7 Aug 2016 00:52:00 +0000 (20:52 -0400)]
Merge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
First set of fixes for the current cycle:
 * fix 80+80 bandwidth warning
 * fix powersave with mac80211 TXQ implementation
 * use correct way to free SKBs from multicast buffering
 * mesh: fix operation ordering to work with all drivers
 * mesh: end service period even when peer goes away
 * mesh: correct HT opmode validity checks
 * pass hw pointer from mac80211 to driver in TPT method,
   fixing a bug (in a bit the wrong way, but that's what
   we have right now)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosamples/bpf: add bpf_map_update_elem() tests
Alexei Starovoitov [Fri, 5 Aug 2016 21:01:28 +0000 (14:01 -0700)]
samples/bpf: add bpf_map_update_elem() tests

increase test coverage to check previously missing 'update when full'

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: restore behavior of bpf_map_update_elem
Alexei Starovoitov [Fri, 5 Aug 2016 21:01:27 +0000 (14:01 -0700)]
bpf: restore behavior of bpf_map_update_elem

The introduction of pre-allocated hash elements inadvertently broke
the behavior of bpf hash maps where users expected to call
bpf_map_update_elem() without considering that the map can be full.
Some programs do:
old_value = bpf_map_lookup_elem(map, key);
if (old_value) {
  ... prepare new_value on stack ...
  bpf_map_update_elem(map, key, new_value);
}
Before pre-alloc the update() for existing element would work even
in 'map full' condition. Restore this behavior.

The above program could have updated old_value in place instead of
update() which would be faster and most programs use that approach,
but sometimes the values are large and the programs use update()
helper to do atomic replacement of the element.
Note we cannot simply update element's value in-place like percpu
hash map does and have to allocate extra num_possible_cpu elements
and use this extra reserve when the map is full.

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dsa: b53: Add missing ULL suffix for 64-bit constant
Geert Uytterhoeven [Wed, 3 Aug 2016 19:42:18 +0000 (21:42 +0200)]
net: dsa: b53: Add missing ULL suffix for 64-bit constant

On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):

    drivers/net/dsa/b53/b53_common.c: In function ‘b53_arl_read’:
    drivers/net/dsa/b53/b53_common.c:1072: warning: integer constant is too large for ‘long’ type

Fixes: 1da6df85c6fbed8f ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: panic in leaf_walk_rcu due to stale node pointer
David Forster [Wed, 3 Aug 2016 14:13:01 +0000 (15:13 +0100)]
ipv4: panic in leaf_walk_rcu due to stale node pointer

Panic occurs when issuing "cat /proc/net/route" whilst
populating FIB with > 1M routes.

Use of cached node pointer in fib_route_get_idx is unsafe.

 BUG: unable to handle kernel paging request at ffffc90001630024
 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
 Oops: 0000 [#1] SMP
 Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
 acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
 CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
 RIP: 0010:[<ffffffff814cf6a0>]  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 RSP: 0018:ffff88011a05fda0  EFLAGS: 00010202
 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
 FS:  00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
 Stack:
  ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
  ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
  ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
 Call Trace:
  [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0
  [<ffffffff811dd8b9>] ? seq_read+0x149/0x380
  [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500
  [<ffffffff8138dce0>] ? process_echoes+0x70/0x70
  [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70
  [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0
  [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0
  [<ffffffff811bbe61>] ? vfs_read+0x81/0x120
  [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0
  [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75
 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
 RIP  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
  RSP <ffff88011a05fda0>
 CR2: ffffc90001630024

Signed-off-by: Dave Forster <dforster@brocade.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorxrpc: Fix races between skb free, ACK generation and replying
David Howells [Wed, 3 Aug 2016 13:11:40 +0000 (14:11 +0100)]
rxrpc: Fix races between skb free, ACK generation and replying

Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released.  The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb.  ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

 (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
     an skb is emptied so as to crank the ACK and call states.  This does
     not release the skb, however.  kernel_rxrpc_free_skb() must now be
     called to achieve that.  These together replace
     rxrpc_kernel_data_delivered().

 (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

     This makes afs_deliver_to_call() easier to work as the skb can simply
     be discarded unconditionally here without trying to work out what the
     return value of the ->deliver() function means.

     The ->deliver() functions can, via afs_data_complete(),
     afs_transfer_reply() and afs_extract_data() mark that an skb has been
     consumed (thereby cranking the state) without the need to
     conditionally free the skb to make sure the state is correct on an
     incoming call for when the call processor tries to send the reply.

 (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
     has finished with a packet and MSG_PEEK isn't set.

 (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

     Because of this, we no longer need to clear the destructor and put the
     call before we free the skb in cases where we don't want the ACK/call
     state to be cranked.

 (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
     than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
     the delivery function already), and the caller is now responsible for
     producing an abort if that was the last packet.

 (6) There are many bits of unmarshalling code where:

  ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}

     is to be found.  As -EAGAIN can now be passed back to the caller, we
     now just return if ret < 0:

  ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;

 (7) Checks for trailing data and empty final data packets has been
     consolidated as afs_data_complete().  So:

if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;

     becomes:

ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;

 (8) afs_transfer_reply() now checks the amount of data it has against the
     amount of data desired and the amount of data in the skb and returns
     an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
 0000000000000006 000000000be04930 0000000000000000 ffff880400000000
 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
 [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff814c928f>] skb_dequeue+0x18/0x61
 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
 [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
 [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
 [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
 [<ffffffff81064ac2>] worker_thread+0x24a/0x385
 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
 [<ffffffff810696f5>] kthread+0xf3/0xfb
 [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: arc_emac: add missing of_node_put() in arc_emac_probe()
Wei Yongjun [Wed, 3 Aug 2016 10:58:35 +0000 (10:58 +0000)]
net: arc_emac: add missing of_node_put() in arc_emac_probe()

commit a94efbd7cc45 ("ethernet: arc: emac_main: add missing of_node_put
after calling of_parse_phandle") added missing of_node_put after calling
of_parse_phandle, but missing the devm_ioremap_resource() error handling
case.

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoOVS: Ignore negative headroom value
Ian Wienand [Wed, 3 Aug 2016 05:44:57 +0000 (15:44 +1000)]
OVS: Ignore negative headroom value

net_device->ndo_set_rx_headroom (introduced in
871b642adebe300be2e50aa5f65a418510f636ec) says

  "Setting a negtaive value reset the rx headroom
   to the default value".

It seems that the OVS implementation in
3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
dev->needed_headroom unconditionally.

This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack.  For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.

Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.

Thanks to Ben for some pointers from the crash dumps!

Cc: Benjamin Poirier <bpoirier@suse.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomac80211: Add ieee80211_hw pointer to get_expected_throughput
Maxim Altshul [Thu, 4 Aug 2016 12:43:04 +0000 (15:43 +0300)]
mac80211: Add ieee80211_hw pointer to get_expected_throughput

The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.

This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.

Signed-off-by: Maxim Altshul <maxim.altshul@ti.com>
[rewrite commit message, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agonl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value
Masashi Honma [Wed, 3 Aug 2016 01:07:44 +0000 (10:07 +0900)]
nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value

Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct
flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED |
IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT.

Doing just a range-check allows setting flags that don't exist (0x8)
and invalid flag combinations.

Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT
Operation element".

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[reword commit message, simplify a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agomac80211: End the MPSP even if EOSP frame was not acked
Masashi Honma [Tue, 2 Aug 2016 08:16:57 +0000 (17:16 +0900)]
mac80211: End the MPSP even if EOSP frame was not acked

If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agomac80211: fix purging multicast PS buffer queue
Felix Fietkau [Tue, 2 Aug 2016 09:13:41 +0000 (11:13 +0200)]
mac80211: fix purging multicast PS buffer queue

The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agoMerge branch 'qlcnic-fixes'
David S. Miller [Wed, 3 Aug 2016 19:03:35 +0000 (12:03 -0700)]
Merge branch 'qlcnic-fixes'

Manish Chopra says:

====================
qlcnic: bug fixes

This series fixes a data structure corruption bug in
VF's async mailbox commands handling and an issue realted
to napi poll budget in the driver.

Please consider applying this series to "net"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoqlcnic: Update version to 5.3.65
Manish Chopra [Wed, 3 Aug 2016 08:02:04 +0000 (04:02 -0400)]
qlcnic: Update version to 5.3.65

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoqlcnic: fix napi budget alteration
Manish Chopra [Wed, 3 Aug 2016 08:02:03 +0000 (04:02 -0400)]
qlcnic: fix napi budget alteration

Driver modifies the supplied NAPI budget in qlcnic_83xx_msix_tx_poll()
function. Instead, it should use the budget as it is.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoqlcnic: fix data structure corruption in async mbx command handling
Manish Chopra [Wed, 3 Aug 2016 08:02:02 +0000 (04:02 -0400)]
qlcnic: fix data structure corruption in async mbx command handling

This patch fixes a data structure corruption bug in the SRIOV VF mailbox
handler code. While handling mailbox commands from the atomic context,
driver is accessing and updating qlcnic_async_work_list_struct entry fields
in the async work list. These fields could be concurrently accessed by the
work function resulting in data corruption.

This patch restructures async mbx command handling by using a separate
async command list instead of using a list of work_struct structures.
A single work_struct is used to schedule and handle the async commands
with proper locking mechanism.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tg3-fixes'
David S. Miller [Wed, 3 Aug 2016 18:56:19 +0000 (11:56 -0700)]
Merge branch 'tg3-fixes'

Siva Reddy Kallam says:

====================
tg3: Disallow 0 rx coalesce time and correctly report RSS queues in tg3_get_rxnfc

First patch:
        Diasllow rx coalescing time to be 0

Second patch:
        Report the correct number of RSS queues through tg3_get_rxnfc
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotg3: Report the correct number of RSS queues through tg3_get_rxnfc
Siva Reddy Kallam [Wed, 3 Aug 2016 04:14:00 +0000 (09:44 +0530)]
tg3: Report the correct number of RSS queues through tg3_get_rxnfc

This patch remove the wrong substraction from info->data in
tg3_get_rxnfc function. Without this patch, the number of RSS
queues reported is less by one.

Reported-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotg3: Fix for diasllow rx coalescing time to be 0
Satish Baddipadige [Wed, 3 Aug 2016 04:13:59 +0000 (09:43 +0530)]
tg3: Fix for diasllow rx coalescing time to be 0

When the rx coalescing time is 0, interrupts
are not generated from the controller and rx path hangs.
To avoid this rx hang, updating the driver to not allow
rx coalescing time to be 0.

Signed-off-by: Satish Baddipadige <satish.baddipadige@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: fix method of PTR_TO_PACKET reg id generation
Jakub Kicinski [Tue, 2 Aug 2016 15:12:14 +0000 (16:12 +0100)]
bpf: fix method of PTR_TO_PACKET reg id generation

Using per-register incrementing ID can lead to
find_good_pkt_pointers() confusing registers which
have completely different values.  Consider example:

0: (bf) r6 = r1
1: (61) r8 = *(u32 *)(r6 +76)
2: (61) r0 = *(u32 *)(r6 +80)
3: (bf) r7 = r8
4: (07) r8 += 32
5: (2d) if r8 > r0 goto pc+9
 R0=pkt_end R1=ctx R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=0,off=32,r=32) R10=fp
6: (bf) r8 = r7
7: (bf) r9 = r7
8: (71) r1 = *(u8 *)(r7 +0)
9: (0f) r8 += r1
10: (71) r1 = *(u8 *)(r7 +1)
11: (0f) r9 += r1
12: (07) r8 += 32
13: (2d) if r8 > r0 goto pc+1
 R0=pkt_end R1=inv56 R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=1,off=32,r=32) R9=pkt(id=1,off=0,r=32) R10=fp
14: (71) r1 = *(u8 *)(r9 +16)
15: (b7) r7 = 0
16: (bf) r0 = r7
17: (95) exit

We need to get a UNKNOWN_VALUE with imm to force id
generation so lines 0-5 make r7 a valid packet pointer.
We then read two different bytes from the packet and
add them to copies of the constructed packet pointer.
r8 (line 9) and r9 (line 11) will get the same id of 1,
independently.  When either of them is validated (line
13) - find_good_pkt_pointers() will also mark the other
as safe.  This leads to access on line 14 being mistakenly
considered safe.

Fixes: 969bf05eb3ce ("bpf: direct packet access")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: xgene: fix maybe-uninitialized variable
Arnd Bergmann [Tue, 2 Aug 2016 10:23:29 +0000 (12:23 +0200)]
net: xgene: fix maybe-uninitialized variable

Building with -Wmaybe-uninitialized shows a potential use of
an uninitialized variable:

drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 'xgene_enet_phy_connect':
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:802:23: warning: 'phy_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

Although the compiler correctly identified this based on the function,
the current code is still safe as long dev->of_node is non-NULL
for the case of CONFIG_ACPI=n, which is currently the case.

The warning is now disabled by default, but still appears when
building with W=1, and other build test tools should be able to
detect it as well. Adding an #else clause here makes the code
more robust and makes it clear to the compiler that this cannot
happen.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility")
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: Remove incorrect WARN_ONCE().
Jarno Rajahalme [Tue, 2 Aug 2016 02:36:07 +0000 (19:36 -0700)]
openvswitch: Remove incorrect WARN_ONCE().

ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen.  However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen.  This
has resulted into somewhat confusing kernel log messages.  This patch
removes this check and warning.

Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few updates and fixes:

   - move the suppressing of the __builtin_return_address >0 warning to
     the tracing directory only.

   - metag recordmcount fix for newer glibc's

   - two tracing histogram fixes that were reported by KASAN"

* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix use-after-free in hist_register_trigger()
  tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  ftrace/recordmcount: Work around for addition of metag magic but not relocations

9 years agofs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2
Geert Uytterhoeven [Wed, 3 Aug 2016 15:33:32 +0000 (17:33 +0200)]
fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2

With gcc < 4.2 (e.g. 4.1.2):

      CC      fs/proc/task_mmu.o
    cc1: error: unrecognized command line option "-Wno-override-init"

To fix this, only enable the compiler option when it is actually
supported by the compiler.

Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 3 Aug 2016 11:26:11 +0000 (07:26 -0400)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix several cases of missing of_node_put() calls in various
    networking drivers.  From Peter Chen.

 2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

 3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

 4) Fix lockups in CPDMA driver, from Grygorii Strashko.

 5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

 6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

 7) Various printf format specifier fixes, from Heinrich Schuchardt.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  qed: Fail driver load in 100g MSI mode.
  ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
  ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
  ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
  ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
  ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
  ethernet: altera: add missing of_node_put
  8139too: fix system hang when there is a tx timeout event.
  qed: Fix error return code in qed_resc_alloc()
  net: qlcnic: avoid superfluous assignement
  dsa: b53: remove redundant if
  ...

9 years agomac80211: mesh: flush stations before beacons are stopped
Maital Hahn [Wed, 13 Jul 2016 11:44:41 +0000 (14:44 +0300)]
mac80211: mesh: flush stations before beacons are stopped

Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.

Tested also on Ralink RT3572 chipset.

Signed-off-by: Maital Hahn <maitalm@ti.com>
Signed-off-by: Yaniv Machani <yanivma@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 3 Aug 2016 01:08:07 +0000 (21:08 -0400)]
Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

 - the rest of ocfs2

 - various hotfixes, mainly MM

 - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

 - printk updates

 - firmware

 - checkpatch

 - nilfs2

 - more kexec stuff than usual

 - rapidio updates

 - w1 things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
  ipc: delete "nr_ipc_ns"
  kcov: allow more fine-grained coverage instrumentation
  init/Kconfig: add clarification for out-of-tree modules
  config: add android config fragments
  init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
  relay: add global mode support for buffer-only channels
  init: allow blacklisting of module_init functions
  w1:omap_hdq: fix regression
  w1: add helper macro module_w1_family
  w1: remove need for ida and use PLATFORM_DEVID_AUTO
  rapidio/switches: add driver for IDT gen3 switches
  powerpc/fsl_rio: apply changes for RIO spec rev 3
  rapidio: modify for rev.3 specification changes
  rapidio: change inbound window size type to u64
  rapidio/idt_gen2: fix locking warning
  rapidio: fix error handling in mbox request/release functions
  rapidio/tsi721_dma: advance queue processing from transfer submit call
  rapidio/tsi721: add messaging mbox selector parameter
  rapidio/tsi721: add PCIe MRRS override parameter
  rapidio/tsi721_dma: add channel mask and queue size parameters
  ...

9 years agoMerge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux
Linus Torvalds [Tue, 2 Aug 2016 23:47:06 +0000 (19:47 -0400)]
Merge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux

Pull orangefs update from Martin Brandenburg:
 "Kernel side caching and executable bugfix

  This allows OrangeFS to utilize the dcache and adds an in kernel
  attribute cache.  We previously used the user side client for this
  purpose.

  We see a modest performance increase on small file operations.  For
  example, without the cache, compiling coreutils takes about 17
  minutes.  With the patch and a 50 millisecond timeout for
  dcache_timeout_msecs and getattr_timeout_msecs (the default),
  compiling coreutils takes about 6 minutes 20 seconds.  On the same
  hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
  We see similar improvements with mdtest and a test involving writing,
  reading, and deleting a large number of small files.

  Interested parties can review more data at the following URL.

    https://docs.google.com/spreadsheets/d/1v4aUeppKexIbRMz_Yn9k4eaM3uy2KCaPoe_93YKWOtA/pubhtml

  The eventual goal of this is to allow getdents to turn into a
  readdirplus to the OrangeFS server.  The cache will be filled then,
  which should provide a performance benefit to the common case of
  readdir followed by getattr on each entry (i.e.  ls -l).

  This also fixes a bug.  When orangefs_inode_permission was added, it
  did not collect i_size from the OrangeFS server, since this presses an
  unnecessary load on the OrangeFS server.  However, it left a case
  where i_size is never initialized.  Then running an executable could
  fail.

  With this patch, size is always collected to be inserted into the
  cache.  Thus the bug disappears.  If this patch is not accepted during
  this merge window, we will send a one-line band-aid for this bug
  instead"

* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
  Orangefs: update orangefs.txt
  orangefs: Account for jiffies wraparound.
  orangefs: Change default dcache and getattr timeout to 50 msec.
  orangefs: Allow dcache and getattr cache time to be configured.
  orangefs: Cache getattr results.
  orangefs: Use d_time to avoid excessive lookups

9 years agoMerge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client
Linus Torvalds [Tue, 2 Aug 2016 23:39:09 +0000 (19:39 -0400)]
Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client

Pull Ceph updates from Ilya Dryomov:
 "The highlights are:

   - RADOS namespace support in libceph and CephFS (Zheng Yan and
     myself).  The stopgaps added in 4.5 to deny access to inodes in
     namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
     bit is now fully supported

   - A large rework of the MDS cap flushing code (Zheng Yan)

   - Handle some of ->d_revalidate() in RCU mode (Jeff Layton).  We were
     overly pessimistic before, bailing at the first sight of LOOKUP_RCU

  On top of that we've got a few CephFS bug fixes, a couple of cleanups
  and Arnd's workaround for a weird genksyms issue"

* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
  ceph: fix symbol versioning for ceph_monc_do_statfs
  ceph: Correctly return NXIO errors from ceph_llseek
  ceph: Mark the file cache as unreclaimable
  ceph: optimize cap flush waiting
  ceph: cleanup ceph_flush_snaps()
  ceph: kick cap flushes before sending other cap message
  ceph: introduce an inode flag to indicates if snapflush is needed
  ceph: avoid sending duplicated cap flush message
  ceph: unify cap flush and snapcap flush
  ceph: use list instead of rbtree to track cap flushes
  ceph: update types of some local varibles
  ceph: include 'follows' of pending snapflush in cap reconnect message
  ceph: update cap reconnect message to version 3
  ceph: mount non-default filesystem by name
  libceph: fsmap.user subscription support
  ceph: handle LOOKUP_RCU in ceph_d_revalidate
  ceph: allow dentry_lease_is_valid to work under RCU walk
  ceph: clear d_fsinfo pointer under d_lock
  ceph: remove ceph_mdsc_lease_release
  ceph: don't use ->d_time
  ...

9 years agoipc: delete "nr_ipc_ns"
Alexey Dobriyan [Tue, 2 Aug 2016 21:07:32 +0000 (14:07 -0700)]
ipc: delete "nr_ipc_ns"

Write-only variable.

Link: http://lkml.kernel.org/r/20160708214356.GA6785@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agokcov: allow more fine-grained coverage instrumentation
Vegard Nossum [Tue, 2 Aug 2016 21:07:30 +0000 (14:07 -0700)]
kcov: allow more fine-grained coverage instrumentation

For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis.  This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.

To instrument a part of the kernel, you can use either

    # for a single file in the current directory
    KCOV_INSTRUMENT_filename.o := y

or

    # for all the files in the current directory (excluding subdirectories)
    KCOV_INSTRUMENT := y

or

    # (same as above)
    ccflags-y += $(CFLAGS_KCOV)

or

    # for all the files in the current directory (including subdirectories)
    subdir-ccflags-y += $(CFLAGS_KCOV)

Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoinit/Kconfig: add clarification for out-of-tree modules
Valdis Kletnieks [Tue, 2 Aug 2016 21:07:27 +0000 (14:07 -0700)]
init/Kconfig: add clarification for out-of-tree modules

It doesn't trim just symbols that are totally unused in-tree - it trims
the symbols unused by any in-tree modules actually built.  If you've
done a 'make localmodconfig' and only build a hundred or so modules,
it's pretty likely that your out-of-tree module will come up lacking
something...

Hopefully this will save the next guy from a Homer Simpson "D'oh!"
moment.

Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoconfig: add android config fragments
Rob Herring [Tue, 2 Aug 2016 21:07:24 +0000 (14:07 -0700)]
config: add android config fragments

Copy the config fragments from the AOSP common kernel android-4.4
branch.  It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain.  Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:

  make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config

The following non-upstream config options were removed:

  CONFIG_NETFILTER_XT_MATCH_QTAGUID
  CONFIG_NETFILTER_XT_MATCH_QUOTA2
  CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
  CONFIG_PPPOLAC
  CONFIG_PPPOPNS
  CONFIG_SECURITY_PERF_EVENTS_RESTRICT
  CONFIG_USB_CONFIGFS_F_MTP
  CONFIG_USB_CONFIGFS_F_PTP
  CONFIG_USB_CONFIGFS_F_ACC
  CONFIG_USB_CONFIGFS_F_AUDIO_SRC
  CONFIG_USB_CONFIGFS_UEVENT
  CONFIG_INPUT_KEYCHORD
  CONFIG_INPUT_KEYRESET

Link: http://lkml.kernel.org/r/1466708235-28593-1-git-send-email-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoinit/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
Alexey Dobriyan [Tue, 2 Aug 2016 21:07:21 +0000 (14:07 -0700)]
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig

Doing patches with allmodconfig kernel compiled and committing stuff
into local tree have unfortunate consequence: kernel version changes (as
it should) leading to recompiling and relinking of several files even if
they weren't touched (or interesting at all).  This and "git-whatever"
figuring out current version slow down compilation for no good reason.

But lets face it, "allmodconfig" kernels don't care about kernel
version, they are simply compile check guinea pigs.

Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
allmodconfig .config.

Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorelay: add global mode support for buffer-only channels
Akash Goel [Tue, 2 Aug 2016 21:07:18 +0000 (14:07 -0700)]
relay: add global mode support for buffer-only channels

Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.

This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.

But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.

For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.

Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files().  Hence relay_late_setup_files() symbol also
needs to be exported.

Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com
Signed-off-by: Akash Goel <akash.goel@intel.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoinit: allow blacklisting of module_init functions
Prarit Bhargava [Tue, 2 Aug 2016 21:07:15 +0000 (14:07 -0700)]
init: allow blacklisting of module_init functions

sprint_symbol_no_offset() returns the string "function_name
[module_name]" where [module_name] is not printed for built in kernel
functions.  This means that the blacklisting code will fail when
comparing module function names with the extended string.

This patch adds the functionality to block a module's module_init()
function by finding the space in the string and truncating the
comparison to that length.

Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agow1:omap_hdq: fix regression
H. Nikolaus Schaller [Tue, 2 Aug 2016 21:07:12 +0000 (14:07 -0700)]
w1:omap_hdq: fix regression

Commit e93762bbf681 ("w1: masters: omap_hdq: add support for 1-wire
mode") added a statement to clear the hdq_irqstatus flags in
hdq_read_byte().

If the hdq reading process is scheduled slowly or interrupts are
disabled for a while the hardware read activity might already be
finished on entry of hdq_read_byte().  And hdq_isr() already has set the
hdq_irqstatus to 0x6 (can be seen in debug mode) denoting that both, the
TXCOMPLETE and RXCOMPLETE interrupts occurred in parallel.

This means there is no need to wait and the hdq_read_byte() can just
read the byte from the hdq controller.

By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the
hardware will not issue another RXCOMPLETE interrupt.  This results in a
false timeout.

After such a situation the hdq bus hangs.

Link: http://lkml.kernel.org/r/b724765f87ad276a69625bc19806c8c8844c4590.1469513669.git.hns@goldelico.com
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agow1: add helper macro module_w1_family
Andrew F. Davis [Tue, 2 Aug 2016 21:07:09 +0000 (14:07 -0700)]
w1: add helper macro module_w1_family

The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions.  Add this
macro and use it in all applicable drivers.

Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agow1: remove need for ida and use PLATFORM_DEVID_AUTO
Andrew F. Davis [Tue, 2 Aug 2016 21:07:06 +0000 (14:07 -0700)]
w1: remove need for ida and use PLATFORM_DEVID_AUTO

PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA.  Do this in all
applicable drivers.

Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/switches: add driver for IDT gen3 switches
Alexandre Bounine [Tue, 2 Aug 2016 21:07:03 +0000 (14:07 -0700)]
rapidio/switches: add driver for IDT gen3 switches

Add RapidIO switch driver for IDT Gen3 switch devices: RXS1632 and
RXS2448.

[alexandre.bounine@idt.com: fixup for original driver patch]
Link: http://lkml.kernel.org/r/1469137596-18241-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-14-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agopowerpc/fsl_rio: apply changes for RIO spec rev 3
Alexandre Bounine [Tue, 2 Aug 2016 21:07:00 +0000 (14:07 -0700)]
powerpc/fsl_rio: apply changes for RIO spec rev 3

 - Remove check for parallel PHY

 - Set LP-Serial Register Map type

[akpm@linux-foundation.org: fix build]
[alexandre.bounine@idt.com: fix build fix]
Link: http://lkml.kernel.org/r/20160802184932.2755-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-13-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: modify for rev.3 specification changes
Alexandre Bounine [Tue, 2 Aug 2016 21:06:57 +0000 (14:06 -0700)]
rapidio: modify for rev.3 specification changes

Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:

 - use per-port register offset calculations based on LP-Serial Extended
   Features Block (EFB) Register Map type (I or II) with different
   per-port offset step (0x20 vs 0x40 respectfully).

 - remove deprecated Parallel Physical layer definitions and related
   code.

[alexandre.bounine@idt.com: fix DocBook warning for gen3 update]
Link: http://lkml.kernel.org/r/1469191173-19338-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-12-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: change inbound window size type to u64
Alexandre Bounine [Tue, 2 Aug 2016 21:06:54 +0000 (14:06 -0700)]
rapidio: change inbound window size type to u64

Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size.  This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.

Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.

[alexandre.bounine@idt.com: remove compiler warning about size of constant]
Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/idt_gen2: fix locking warning
Alexandre Bounine [Tue, 2 Aug 2016 21:06:52 +0000 (14:06 -0700)]
rapidio/idt_gen2: fix locking warning

Fix lockdep warning during device probing: move sysfs initialization out
of code protected by a spin lock.

Link: http://lkml.kernel.org/r/1469125134-16523-10-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: fix error handling in mbox request/release functions
Alexandre Bounine [Tue, 2 Aug 2016 21:06:49 +0000 (14:06 -0700)]
rapidio: fix error handling in mbox request/release functions

Add checking for error code returned by HW-specific mbox open routines.
Ensure that resources are properly release if failed.

This patch is applicable to kernel versions starting from v2.6.15.

Link: http://lkml.kernel.org/r/1469125134-16523-9-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/tsi721_dma: advance queue processing from transfer submit call
Alexandre Bounine [Tue, 2 Aug 2016 21:06:46 +0000 (14:06 -0700)]
rapidio/tsi721_dma: advance queue processing from transfer submit call

Add advancing transfer queue immediately from transfer submit call.  DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.

Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/tsi721: add messaging mbox selector parameter
Alexandre Bounine [Tue, 2 Aug 2016 21:06:43 +0000 (14:06 -0700)]
rapidio/tsi721: add messaging mbox selector parameter

Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).

Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.

Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/tsi721: add PCIe MRRS override parameter
Alexandre Bounine [Tue, 2 Aug 2016 21:06:40 +0000 (14:06 -0700)]
rapidio/tsi721: add PCIe MRRS override parameter

Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.

Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes).  Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.

Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/tsi721_dma: add channel mask and queue size parameters
Alexandre Bounine [Tue, 2 Aug 2016 21:06:37 +0000 (14:06 -0700)]
rapidio/tsi721_dma: add channel mask and queue size parameters

Add module parameters to allow load time configuration of DMA channels.

Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.

Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.

Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: fix return value description for dma_prep functions
Alexandre Bounine [Tue, 2 Aug 2016 21:06:34 +0000 (14:06 -0700)]
rapidio: fix return value description for dma_prep functions

Update return value description for rio_dma_prep_...  functions to
include error-valued pointer that can be returned by HW mport device
drivers.  Return values from these functions must be checked using
IS_ERR_OR_NULL macro.

This patch is applicable to kernel versions starting from v4.6-rc1.

Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio/documentation: fix mangled paragraph in mport_cdev
Alexandre Bounine [Tue, 2 Aug 2016 21:06:31 +0000 (14:06 -0700)]
rapidio/documentation: fix mangled paragraph in mport_cdev

Minor edits to correct parameter description.

This patch is applicable to kernel versions starting from v4.6.

Link: http://lkml.kernel.org/r/1469125134-16523-3-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: remove unnecessary 0x prefixes before %pa extension uses
Joe Perches [Tue, 2 Aug 2016 21:06:28 +0000 (14:06 -0700)]
rapidio: remove unnecessary 0x prefixes before %pa extension uses

Patch series "RapidIO subsystem updates".

This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6.  The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).

This patch (of 13):

This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)

Since commit 3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.

These 0x prefixes in the formats are unnecessary.

Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agorapidio: add RapidIO channelized messaging driver
Alexandre Bounine [Tue, 2 Aug 2016 21:06:25 +0000 (14:06 -0700)]
rapidio: add RapidIO channelized messaging driver

Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.

This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT.  Additional input was
received from other members of RapidIO.org.

The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.

This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.

By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).

[weiyj.lk@gmail.com: rapidio/rio_cm: fix return value check in riocm_init()]
Link: http://lkml.kernel.org/r/1469198221-21970-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1468952862-18056-1-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agokexec: add restriction on kexec_load() segment sizes
zhong jiang [Tue, 2 Aug 2016 21:06:22 +0000 (14:06 -0700)]
kexec: add restriction on kexec_load() segment sizes

I hit the following issue when run trinity in my system.  The kernel is
3.4 version, but mainline has the same issue.

The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page.  Other cases will block until
the test case quits.  Also, OOM conditions will occur.

Call Trace:
  __alloc_pages_nodemask+0x14c/0x8f0
  alloc_pages_current+0xaf/0x120
  kimage_alloc_pages+0x10/0x60
  kimage_alloc_control_pages+0x5d/0x270
  machine_kexec_prepare+0xe5/0x6c0
  ? kimage_free_page_list+0x52/0x70
  sys_kexec_load+0x141/0x600
  ? vfs_write+0x100/0x180
  system_call_fastpath+0x16/0x1b

The patch changes sanity_check_segment_list() to verify that the usage by
all segments does not exceed half of memory.

[akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agokexec: allow kdump with crash_kexec_post_notifiers
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers

If a crash kernel is loaded, do not crash the running domain.  This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.

[akpm@linux-foundation.org: build fix: unconditionally include kexec.h]
Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agokexec: add a kexec_crash_loaded() function
Petr Tesarik [Tue, 2 Aug 2016 21:06:16 +0000 (14:06 -0700)]
kexec: add a kexec_crash_loaded() function

Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded.  It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.

I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.

Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agokexec: use core_param for crash_kexec_post_notifiers boot option
Hidehiro Kawai [Tue, 2 Aug 2016 21:06:13 +0000 (14:06 -0700)]
kexec: use core_param for crash_kexec_post_notifiers boot option

crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time.  So, use core_param instead of early_param.

Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoARM: kexec: fix kexec for Keystone 2
Russell King [Tue, 2 Aug 2016 21:06:10 +0000 (14:06 -0700)]
ARM: kexec: fix kexec for Keystone 2

Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch.  We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.

Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>