www.infradead.org Git - users/jedix/linux-maple.git/log

ixgbe: Fix FCRTH value in VM-to-VM loopback mode

Orabug: 21918732

The 82599 and X540 datasheets require that FCRTH be "set" for Tx
switching (VM-to-VM loopback) but it did not previously specify what
the value should be set to. It has now been determined that
the correct value is RXPBSIZE - (24*1024).

This setting is also required for later devices.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bc1fc64fd2d9093496e5b04c6d94d26bfa629c9c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Only clear adapter_stopped if ixgbe_setup_fc succeeded

Orabug: 21918732

A logic error here results in the adapter_stopped flag only being
cleared when ixgbe_setup_fc returns an error. Correct the logic.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3507a9b8c9d1684b5095c97f587ee46184e590da)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Correct several flaws with with DCA setup

Orabug: 21918732

This change does two things. First, it makes it so that we always
set the relaxed ordering bits related to the DCA registers even if
DCA is not enabled. Second, it moves the configuration out of the
ixgbe_down function and into the ixgbe_configure function before
enabling the Rx and Tx rings. This ensures that DCA is configured
correctly before starting to process packets.

Thanks to Alex Duyck for this fix.

CC: Alex Duyck <aduyck@mirantis.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9de7605ea2389d5ab86d6fbb3f1a11b87665a35c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add new X550EM SFP+ device ID

Orabug: 21918732

Add new device ID for X550EM device with SFPs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 018d7146eee1942f27675bdabf9b43586bfaef72)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Update ixgbe_disable_pcie_master flow for X550*

Orabug: 21918732

This patch skips the PCI transactions pending check in
ixgbe_disable_pcie_master. This is done to addresses a known HW
issue where the PCI transactions pending bit sticks high when there
are pending transactions. HW engineering instructed to workaround
this issue by wait and then continue with our reset flow.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7fc151035487916b266257c2e7b8b6cb2a5cd04f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add small packet padding support for X550

Orabug: 21918732

This patch sets RDRXCTL.PSP when the driver is in SRIOV mode which
enables padding of small packets.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f961ddae164a5288a62146aae191da7bc1ecedb4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Correct setting of RDRXCTL register for X550* devices

Orabug: 21918732

Setting the X550* RDRXCTL register should fall through into X540
and 82599, not 82598.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 052a1a724338bbf4721f8b4d7de8486701fc37cb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Correct error path in semaphore handling

Orabug: 21918732

The timeout path is supposed to release the semaphore, so do that.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5967fe225686bcae17352de172573964a15b17d5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add I2C bus mux support

Orabug: 21918732

Take control of an I2C mux that selects which SFP is attached to
the I2C bus. The control of the mux is captured in the taking and
releasing of the related semaphore. Because only port 1 can control
the mux, port 1 always leaves the mux set to select port 0.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 449e21a92411ba35bfa68b4464aa7dbd1f705d28)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Limit SFP polling rate

Orabug: 21918732

Reduce the frequency of polling for SFP modules. Because the
service task sometimes runs at high rates, we can poll for
SFPs too often. When an SFP is not present, the I2C timeouts
that result are very costly. So, prevent SFP polling from
being done more than once every two seconds. To reduce latency,
the poll time is cleared in a couple of cases to permit the
next service task execution to poll the SFP module.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 58e7cd24d474c87763387f606e403012f562760b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Allow SFP+ on more than 82598 and 82599

Orabug: 21918732

Since SFP+ can be used with some X550 devices, permit them to be
detected.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 69eec0c2fa8781a6abae96af1f11069e1965cbfe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add logic to reset CS4227 when needed

Orabug: 21918732

On some hardware platforms, the CS4227 does not initialize properly.
Detect those cases and reset it appropriately.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 542b6eecf4c3640f15a84ff89525131d421e7c8c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Fix 1G and 10G link stability for X550EM_x SFP+

Orabug: 21918732

Configures the CS4227 correctly for both 1G and 10G operation,
by moving the code to ixgbe_setup_mac_link_sfp_x550em(). It
needs to be in this function because we need both the module
type and the speed, and this is the only function in the init
flow that knows the speed. In contrast,
ixgbe_setup_sfp_modules_X550em() does not know the speed, so we
can't do anything useful here. This is a fundamental difference
from the previous flow, and is due to the way the CS4227 is
implemented.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e23f33367882450c66f7de8805b98ce7665a7ba9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add X550EM_x dual-speed SFP+ support

Orabug: 21918732

This patch adds X550EM_x SFP+ dual-speed support. 82599 fiber link
code was moved from ixgbe_82599.c to ixgbe_common.c for use by
X550EM. SFP MAC link code is added to x550EM.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6d373a1bbb99bdfb9ce820aec9ae5f2e02c8891f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Allow reduced delays during SFP detection

Orabug: 21918732

Reduce the number of retries during PHY detection. This reduces
pauses when no SFP is present. Once an SFP is detected, the normal
retry count will be used.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 56f6ed1ce13b0cb85ae9537f839df7c4ba1f5369)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Clear I2C destination location

Orabug: 21918732

Clear the destination location for I2C data initially so that
the received data will not be affected by previous attempts.
This could have returned wrong data in certain retry sequences.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6ee8c9a70d65ee37251465348501a067138050d7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Enable bit-banging mode on X550

Orabug: 21918732

Set the bit banging mode in the hardware when performing bit banging
I2C operations on X550. Also control the output enable on both the
clock and data lines.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 25b1029789f98f945a03a2d04662a94b357aacb9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Set lan_id before first I2C eeprom access

Orabug: 21918732

The lan_id is being set after a previous I2C eeprom access which
makes no sense because it needs to be set before any access. Move
the setting to before the access.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit da4ea4baf77c9e45c53671e465043ffaf26fd45d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Provide unlocked I2C methods

Orabug: 21918732

Most I2C accesses take and release semaphores for each access. Now
there is a reason to perform multiple I2C operations under the same
holding of the semaphore, so provide unlocked I2C methods for that
purpose.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bb5ce9a5cb6e915a2b284a8785686716823679d1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Provide I2C combined on X550EM

Orabug: 21918732

Provide I2C combined operations on X550EM, not X550 devices.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4f9e3a3de0e2fbc49c036322cb2ee656ea8b93fc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add X550EM support for SFP insertion interrupt

Orabug: 21918732

Add support for the SFP insertion interrupt on X550EM devices with
SFPs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cbd45ec7aae9a20835d1a64c7a1910eb5dcec57b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Accept SFP not present errors on all devices

Orabug: 21918732

When an SFP not present error is returned by the reset_hw method,
accept it and go on, since an SFP can still be inserted. Previously
it was only accepted for 82598 devices.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 29a8dca1997f880563e53e9ba0fcb50b03bd23af)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbevf: Enables TSO for stacked VLAN

Orabug: 21918732

Setting ndo_features_check to passthru_features_check allows the driver
to skip the check for multiple tagged TSO packets and enables stacked
VLAN TSO.
Tested with 82599ES.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0f90300f4fd30968a4d40fe47a9043be9912cb31)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add fdir support for SCTP on X550

Orabug: 21918732

X550 has HW support for SCTP flow director filters SCTP mask. This
patch adds it like we do for UDP and TCP.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5532408b48834bd762ed53c22aabed5dae0748d6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add SFP+ detection for X550 hardware

Orabug: 21918732

This patch is part of the future enablement of X550 SFP+ support. This
HW uses different SDP so the interrupts need to be set up accordingly.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a023bbd0b1a3716397d8d54ba5b95e09b8e27699)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Limit lowest interrupt rate for adaptive interrupt moderation to 12K

Orabug: 21918732

This patch updates the lowest limit for adaptive interrupt interrupt
moderation to roughly 12K interrupts per second.

The way I came about reaching 12K as the desired interrupt rate is by
testing with UDP flows.  Specifically I had a simple test that ran a
netperf UDP_STREAM test at varying sizes.  What I found was as the packet
sizes increased the performance fell steadily behind until we were only
able to receive at ~4Gb/s with a message size of 65507.  A bit of digging
found that we were dropping packets for the socket in the network stack,
and looking at things further what I found was I could solve it by increasing
the interrupt rate, or increasing the rmem_default/rmem_max.  What I found was
that when the interrupt coalescing resulted in more data being processed
per interrupt than could be stored in the socket buffer we started losing
packets and the performance dropped.  So I reached 12K based on the
following math.

rmem_default = 212992
skb->truesize = 2994
212992 / 2994 = 71.14 packets to fill the buffer

packet rate at 1514 packet size is 812744pps
71.14 / 812744 = 87.9us to fill socket buffer

From there it was just a matter of choosing the interrupt rate and
providing a bit of wiggle room which is why I decided to go with 12K
interrupts per second as that uses a value of 84us.

The data below is based on VM to VM over a direct assigned ixgbe interface.
The test run was:
netperf -H <ip> -t UDP_STREAM"

Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
Before:
212992   65507   60.00     1100662      0     9613.4     10.89    0.557
212992           60.00      473474            4135.4     11.27    0.576

After:
212992   65507   60.00     1100413      0     9611.2     10.73    0.549
212992           60.00      974132            8508.3     11.69    0.598

Using bare metal the data is similar but not as dramatic as the throughput
increases from about 8.5Gb/s to 9.5Gb/s.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8ac34f10a5ea4c7b6f57dfd52b0693a2b67d9ac4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Teardown SR-IOV before unregister_netdev()

Orabug: 21918732

When the .remove() callback for a PF is called, SR-IOV support for the
device is disabled, which requires unbinding and removing the VFs.
The VFs may be in-use either by the host kernel or userspace, such as
assigned to a VM through vfio-pci.  In this latter case, the VFs may
be removed either by shutting down the VM or hot-unplugging the
devices from the VM.  Unfortunately in the case of a Windows 2012 R2
guest, hot-unplug is broken due to the ordering of the PF driver
teardown.  Disabling SR-IOV prior to unregister_netdev() avoids this
issue.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6b010e9b1f0a406d1d35202a694fa724a559bf77)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: fix issue with SFP events with new X550 devices

Orabug: 21918732

Add checks for systems that don't have SFP's to avoid incorrectly
acting on interrupts that are falsely interpreted as SFP events.
This also includes a modified check generating the EICR mask to be
more forward-looking.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4ccc650cc845476885f73660b2e6335852f0f75c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Resolve "initialized field overwritten" warnings

Orabug: 21918732

Resolve warnings resulting from redundant initialization of the
get_bus_info field in the mac_ops_X550* structures.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 990a2d6ed543bd18b864b8a11f7be3368c67ccea)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Remove bimodal SR-IOV disabling

Orabug: 21918732

When unbinding an SR-IOV device with VFs configured from ixgbe, the
driver behaves in one of two ways.  If max_vfs was specified, the
SR-IOV state is disabled, removing the VFs.  The occurs regardless of
whether the VF count was later modified through sysfs.  If however
max_vfs is zero, such as by not specifying the module parameter, the
VFs persist after the PF is unbound from ixgbe.  If the PF is then
bound to vfio-pci to be assigned to a VM, the PF is non-functional.

>From the comment, commit da36b64736cf ("ixgbe: Implement PCI SR-IOV
sysfs callback operation") clearly intended this alternate behavior,
but probably didn't realize the PF doesn't work in this mode.

This bimodal behavior is confusing to users and results in a state
where the PF is broken for other uses unless the user sets
sriov_numvfs to zero prior to unbinding the device.  Remove this
behavior so that VFs are removed and the PF is functional for other
uses after unbind, regardless of the way VFs are enabled.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7837e2867f56ec4435e75af54236732885303694)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add support for reporting 2.5G link speed

Orabug: 21918732

Now that we can do 2.5G link speed, we need to be able to report it.
Also change the nested triadic involved in creating the log message
to instead use a simpler switch statement to set a string pointer.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 454adb008d78e4ecdfec3f2e5e9eb08ee5a60f1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: fix bounds checking in ixgbe_setup_tc for 82598

Orabug: 21918732

This patch resolves an issue where users were not able to dynamically
set number of queues for 82598 via ethtool -L

Reported-by: Tal Abudi <talabudi@gmail.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7e3f5c8881ba45eba1c74344b00558920008e6e6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: support for ethtool set_rxfh

Orabug: 21918732

Allows to change the rxfh indirection table and/or key using
ethtool interface.

Signed-off-by: Tom Barbette <tom.barbette@ulg.ac.be>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 1c7cf0784e4d448ed8a07c5fc1e3aac1528272f1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Avoid needless PHY access on copper phys

Orabug: 21918732

Avoid a needless PHY access on copper phys to save the 10ms wait
time for each PHY access. A helper function is introduced to
actually do the register access and process the contents.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ae8140aa6bf5c7aafc0d9c2f612c5b59bea1ce9f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: cleanup to use cached mask value

Orabug: 21918732

We already cache this FW/SW semaphore mask so might as well use it
for consistency.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 897b9349f056d1c1cf5141ded4ec26766d845f8b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Remove second instance of lan_id variable

Orabug: 21918732

This patch removes the redundant lan_id in the phy struct and uses
the bus version. Both variables exist and intend to represent the
STATUS register LAN_ID field. However, phy.lan_id is not bit shifted
so the phy.lan_id = 0x0 for LAN Id 0 and phy.lan_id = 0x4 for LAN Id 1.
Where bus.lan_id is bit shifted so bus.lan_id = 0x0 for LAN Id 0 and
bus.lan_id = 0x1 for LAN Id 1. There seems no need for the additional
lan_id variable and this should make the code less confusing.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d5702dea43fc517c389f2d9825213dabbfdaed5e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: use kzalloc for allocating one thing

Orabug: 21918732

Use kzalloc rather than kcalloc(1..

The semantic patch that makes this change is as follows:

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
...)
// </smpl>

and removing checkpatch below CHECK:
CHECK: Prefer kzalloc(sizeof(*fwd_adapter)...) over
kzalloc(sizeof(struct ixgbe_fwd_adapter)...)

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bc52f951e344b2ec64388c71890d88c5fc154a41)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Remove unused PCI bus types

Orabug: 21918732

The ixgbe never has as very doubtfully ever will support either
PCI or PCI-X devices. So remove the unused types from the
ixgbe_bus_type. Thanks to Alex Duyck for suggesting this.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit fa888b891384ccbf18e70af2e02f5173e55e5e7f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: add new bus type for intergrated I/O interface (IOSF)

Orabug: 21918732

With this patch we add support for a new bus type ixgbe_bus_type_internal.
X550em devices use IOSF and not PCIe bus so this new type is to accommodate
them.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f9328bc6a7edc0fbaea836007b4261ca6233d96f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: add get_bus_info method for X550

Orabug: 21918732

Added ixgbe_get_bus_info_X550em to X550 code. ixgbe_get_bus_info_X550em
sets bus.width to ixgbe_bus_width_unknown and bus.speed to
ixgbe_bus_speed_unknown, because IOSF does not report a PCIe bus
width or speed.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 454c65dd1a1e7fdaa5bbd3a34e14ab5560fbfad7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add support for entering low power link up state

Orabug: 21918732

When the device is closing or suspending, call ixgbe_enter_lplu to
enter low power link up state on devices that support it. When this
is done, prevent the phy from being reset in the ixgbe_down path
so that link is present when calling ixgbe_enter_lplu.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6ac7439459606a57265800e60b14d58365ab19eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add support for VXLAN RX offloads

Orabug: 21918732

Add support for VXLAN RX offloads for the X55x devices that support
them.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 67359c3c9fc8e9fbed991bbe0cfeda55c7e0a64c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Add support for UDP-encapsulated tx checksum offload

Orabug: 21918732

By using GSO for UDP-encapsulated packets, all ixgbe devices can
be directed to generate checksums for the inner headers because
the outer UDP checksum can be zero. So point the machinery at the
inner headers and have the hardware generate the checksum.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f467bc06022d4d37de459f9498ff4fbc7e9b0fca)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Check whether FDIRCMD writes actually complete

Orabug: 21918732

Wait up to about 100 us for FDIRCMD writes to complete and return
failure indications.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d490d15877b2e6fc2d800ea232a0eca54cf4592c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: Assign set_phy_power dynamically where needed

Orabug: 21918732

There are various reasons why this method may or may not need to be
defined and some of these we don't know until runtime. So we will
set the value in get_invariants.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b5529ef5be1f0a0089988ec51541aa9573e94476)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: add new function to check for management presence

Orabug: 21918732

This patch adds a support function that will indicate for the
existence of management FW.

Signed-off-by: Donald C Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bd8069ace513dd2741bc7177eeebc9a392451db1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ixgbe: do not set low power mode

Orabug: 21823210

The following commit added the capability of entering low power mode:

ixgbe: Add a PHY power state method

This works fine with newer drivers that support this capability,
however older drivers that dont support this encounter a regression
as they are not able to restore power mode at boot when the driver
loads. This regression is encountered when booting a newer kernel/driver
that supports low power mode, then doing a warm reboot to an older
kernel or other OS like FreeBSD that do not know how to restore the
power mode. In this case a cold reboot is required to restore power
mode.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

Merge tag 'v4.1.6-11.e1000e.3.2.6#bug21792108' of git://ca-git.us.oracle.com/linux-bmaly-public into topic/uek-4.1/drivers

Merge tag 'v4.1.6-12.igb.5.3.0#bug21792102' of git://ca-git.us.oracle.com/linux-bmaly-public into topic/uek-4.1/drivers

sxge/sxgevf: port to uek4

Orabug: 20509061

Signed-off-by: Joyce Yu <joyce.yu@sun.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

igb: bump version to igb-5.3.0

Orabug: 21792102

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6fb469023cd995d7be5ab3bf12b79387710382ff)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: use ARRAY_SIZE to replace calculating sizeof(a)/sizeof(a[0])

Orabug: 21792102

Use the ARRAY_SIZE macro rather than calculating sizeof(a)/sizeof(a[0]).
Also directly replace the code rather than using an unnecessary define.

Reported-by: Maninder Singh <maninder1.s@samsung.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9fa0452b645efdff439948a5cf448b8e497340e9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: report unsupported ethtool settings in set_coalesce

Orabug: 21792102

There are many settings possible using ethtool -C/--coalesce, but not
all of them are supported in igb. Report failure when an unsupported
option is set.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0c5bbeb8839172990e3b8aa82ae3c166e85a09bc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Fix i354 88E1112 PHY on RCC boards using AutoMediaDetect

Orabug: 21792102

e1000_check_for_link_media_swap() checks PHY page 0 for copper and PHY
page 1 for "other" (fiber) link. The switch back from page 1 to page 0
happened too soon, before e1000_check_for_link_82575() is executed, and
link on fiber (other) was never detected. Check for link while still on
the proper PHY page.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2ba6c0797c8b5a9f945345ef2b9193bd47e5f18e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Pull timestamp from fragment before adding it to skb

Orabug: 21792102

This change makes it so that we pull the timestamp from the fragment before
we add it to the skb. By doing this we can avoid a possible issue in which
the fragment can possibly be less than IGB_RX_HDR_LEN due to the timestamp
being pulled after the copybreak check.

While making this change I realized we could also pull the rest of the
igb_pull_tail function into igb_add_rx_frag since in the case of igb,
unlike ixgbe, we are able to unmap the entire buffer before calling
add_rx_frag so merging the two allows for sharing of code between the two
merged functions.

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f56e7bba22fad16c0d4fac996623ce1c13244f8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: only report generic filters in get_ts_info

Orabug: 21792102

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 97aebc1b3cdfd445a0a051090f0dcc6018b6df2c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: bump version of igb to 5.2.18

Orabug: 21792102

Bump version of igb to igb-5.2.18

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 73cd63598dcbc95f51d5becf548e0643aa7a49fa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: disable IPv6 extension header processing

Orabug: 21792102

Disable IPv6 extension header processing as per hardware errata.

Also fix copyright date.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8d0a88a959f0768d6b46436ea2517926fb682e53)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation

Orabug: 21792102

This change updates igb so that it will correctly perform the descriptor
count calculation. Previously it was taking NETDEV_FRAG_PAGE_MAX_SIZE
into account with isn't really correct since a different value is used to
determine the size of the pages used for TCP. That is actually determined
by SKB_FRAG_PAGE_ORDER.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2ee52ad4962b32797bac33fa29ec8159e64c4ee3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: simplify and clean up igb_enable_mas()

Orabug: 21792102

igb_enable_mas() should only be called for the 82575 and has no clear
return so changing it to void. Also simplify the odd conditional
expression.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8cfb879d1b118e190bf9aea1b50da62c0d8a4a77)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
  IB/rds_rdma: unloading of ofed stack causes page fault panic
  RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
  RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: Modify sk_alloc to not reference count the netns of kernel sockets.
  net: Pass kern from net_proto_family.create to sk_alloc
  net: Add a struct net parameter to sock_create_kern

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
DCA: fix over-warning in ioat3_dca_init

e1000e: Increase driver version number

Orabug: 21792108

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d2d7d4e4a60f1aeefb38d7a0bede3742ddb76a68)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix tight loop implementation of systime read algorithm

Orabug: 21792108

Change the algorithm. Read systimel twice and check for overflow.
If there was no overflow, use the first value.
If there was an overflow, read systimeh again and use the second
systimel value.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 37b12910dd11d9ab969f2c310dc9160b7f3e3405)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix incorrect ASPM locking

Orabug: 21792108

This patch fixes wrong locking usage.
In the context of slot reset, we should use lock.
And during resume, there is no need of lock.

Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2758f9edb7bd5a06a2ecee83cc2ebaf8822a0cb5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Cosmetic changes

Orabug: 21792108

1) Replace spaces with tab.
2) Move ich8lan related define to the proper context.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d582891594104adeea89307ddd31b31bcf2d95fa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix EEE in Sx implementation

Orabug: 21792108

This patch implements the EEE in Sx code so that it only applies to parts
that support EEE in Sx (as opposed to all parts that support EEE).
It also uses the existing eee_advert and eee_lp_abiliity to set just the
bits (100/1000) that should be set.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f5ac7445ebdbfa8cd2d90ef2a58b8f4455bcb664)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Cleanup qos request in error handling of e1000_open

Orabug: 21792108

The driver lacks pm_qos_remove_request in error handling (err_req_irq) of
e1000_open, and qos request inserted by pm_qos_add_request is not removed.
This patch add pm_qos_remove_request in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7faae96421870ed990b0a84797c6b2377e81d079)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - k1 workaround for LPT is not required for SPT

Orabug: 21792108

In SPT hardware does not require this driver workaround.
Removed the conditional that caused K1 workaround execution on SPT.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 352f8ead753402d6c0496cb83b902128925459eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - Increase minimum FIFO read/write min gap

Orabug: 21792108

Due to clocking changes in the Skylake platform, there was i219
data corruption. To work around this, HW team reported the need
to increase the minimum gap between the PHY FIFO read and write pointers.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 93cbfc709047a5bc3f8d86e0b55079b5077c8e00)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - increase IPG for speed 10/100 full duplex

Orabug: 21792108

In SPT/i219, there were CRC errors in speed 10/100 full duplex.
The solution given by the HW team is to increase the IPG from 8 to 0xC

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 69cfbc95bdbfa2bd9a82f27dc131b08c48542f19)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - fix to enable both ULP and EEE in Sx state

Orabug: 21792108

In i219, there is a hardware bug that prevented ULP entry.
A side effect of the original software fix for this was that EEE in
Sx couldn't be enabled.
This patch implements a modified flow that allows both ULP and EEE in Sx.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6607c99e7034e7565a1559a24dd35083d6719788)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: synchronization of MAC-PHY interface only on non- ME systems

Orabug: 21792108

On power up, the MAC - PHY interface needs to be set to PCIe, even if
cable is disconnected. In ME systems, the ME handles this on exit from
Sx state. In non-ME, the driver handles it. Added a check for non-ME
system to the driver code that handles that.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit beee8072c39cee74d4ff6e3b4ca8942f9966ed2e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix locking issue with e1000e_disable_aspm

Orabug: 21792108

e1000e_disable_aspm called pci_disable_link_state_locked which requires
pci_bus_sem to be held, but is also called from places where this semaphore
was not previously acquired. This patch implements two flavors of
disable_aspm, one that acquires the lock, and the other (_locked) which
should be called when the semaphore is already acquired.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit beb0a1520bec17cfaf0c3c77bbdd56cbf942883a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Bump the version to 3.2.5

Orabug: 21792108

Bump the version to reflect the driver changes and bug fixes for i219.
Also update the copyright, while we are at it.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 529498cde04537211cc3aa8f920c371b91c0f7d8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix unit hang during loopback test

Orabug: 21792108

System would hang during execution of "ethtool -t <NIC>" for the same
reason that required flushing the descriptor rings. This fix disables
MULR for the loopback test to avoid the hang state.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2ec7d2974c5665780e5f2b532833b76531f0a8d3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix systim issues

Orabug: 21792108

Two issues involving systim were reported.
1. Clock is not running in the correct frequency
2. In some situations, systim values were not incremented linearly
This patch fixes the hardware clock configuration and the spurious
non-linear increment.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 83129b37ef35bb6a7f01c060129736a8db5d31c4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix legacy interrupt handling in i219

Orabug: 21792108

This fix handles a hardware issue that prevented i219 from
working in legacy interrupts mode (IntMode=0)

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ec945cfbbf918dd862d7574f9b75588ba1f4a729)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix flush_desc_ring implementation

Orabug: 21792108

The indication that a descriptor ring flush is required was read from
FEXTNVM7 by mistake. It should be read from the PCI config space.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ff9174291eddb3f42a1e9429f8b919bebc33533b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix logical error in flush_desc_rings

Orabug: 21792108

The condition under which the flush should occur was reversed. The fix
should be applied before any HW reset (unless followed by bus reset)
and before any power state transition from D0.

If E1000_FEXTNVM7_NEED_DESCRING_FLUSH bit is set in FEXTNVM7 and TDLEN > 0
the Tx ring should be flushed. (fixes ~95% of the hang states).
If the E1000_FEXTNVM7_NEED_DESCRING_FLUSH did not clear, we should also
flush the RX ring. Bug was caught by Alexander Duyck during a code review
when examining this fix.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 95f0d950467f1228d4e326c11150e1750a6dd1ef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: remove call to do_div and sign mismatch warning

Orabug: 21792108

Fixes a warning that was reported by Yanjiang Jin
<yanjiang.jin@windriver.com> by implementing the solution suggested by
Alexander Duyck <alexander.h.duyck@redhat.com>.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bfc9473bf90457bf31d3f675d82234897c6480cd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 execute unit hang fix on every reset or power state transition

Orabug: 21792108

After testing various cases, the conclusion is that the fix MUST be
executed BEFORE any event that the HW is reset or transition to D3.
To fix that I moved the execution to the relevant places but per
Alexander Duyck's review, ensure now that the DMA is valid and was not
freed before manipulating the ring.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0ffc56464bbbb8e2a78e319a36e1eafcbaaab9d8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 fix unit hang on reset and runtime D3

Orabug: 21792108

Unit hang may occur if multiple descriptors are available in the rings
during reset or runtime suspend. This state can be detected by testing
bit 8 in the FEXTNVM7 register. If this bit is set and there are pending
descriptors in one of the rings, we must flush them prior to reset. Same
applies entering runtime suspend.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ad851fbb73a3d6564707281aa253418ef6aab878)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix call to do_div() to use u64 arg

Orabug: 21792108

We were using s64 for lat_ns (latency nano-second value) since in
our calculations a negative value could be a resultant. For negative
values, we then assign lat_ns to be zero, so the value passed to
do_div() was never negative, but do_div() expects the argument type
to be u64, so do a cast to resolve a compile warning seen on
PowerPC.

CC: Yanjiang Jin <yanjiang.jin@windriver.com>
CC: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Reported-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
(cherry picked from commit 30544af5483755b11bb5924736e9e0b45ef0644a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Do not allow CRC stripping to be disabled on 82579 w/ jumbo frames

Orabug: 21792108

The driver wasn't allowing jumbo frames to be
enabled when CRC stripping was disabled, however it was allowing CRC
stripping to be disabled while jumbo frames were enabled. This fixes that by
making it so that the NETIF_F_RXFCS flag cannot be set when jumbo frames are
enabled on 82579 and newer parts.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 55e7fe5b9cd94e6accb128e6a1e5902e9018deef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

DCA: fix over-warning in ioat3_dca_init

We keep seeing such dmesg messages on boxes

WARNING: CPU: 0 PID: 457 at drivers/dma/ioat/dca.c:697
ioat3_dca_init+0x19c/0x1b0 [ioatdma]()
[ 16.609614] ioatdma 0000:00:04.0: APICID_TAG_MAP set incorrectly by
BIOS, disabling DCA
...
[<ffffffff8172807e>] dump_stack+0x4d/0x66
[<ffffffff81067f7d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff81068034>] warn_slowpath_fmt_taint+0x44/0x50
[<ffffffffa00228bc>] ioat3_dca_init+0x19c/0x1b0
[ioatdma]
[<ffffffffa0021cd6>] ioat3_dma_probe+0x386/0x3e0
[ioatdma]
[<ffffffffa001a192>] ioat_pci_probe+0x122/0x1b0
[ioatdma]
[<ffffffff81329385>] local_pci_probe+0x45/0xa0
[<ffffffff81080d34>] work_for_cpu_fn+0x14/0x20
[<ffffffff81083c33>] process_one_work+0x183/0x490
[<ffffffff81084bd3>] worker_thread+0x2a3/0x410
[<ffffffff81084930>] ? rescuer_thread+0x410/0x410
[<ffffffff8108b852>] kthread+0xd2/0xf0
[<ffffffff8108b780>] ?

No need to use WARN_TAINT_ONCE to generate a such big noise if this is
not a critical error for kernel. DCA driver could print out a debug
messages then quit quietly.

If this is a real BIOS bug, please ignore this patch. Let's transfer
this issue to BIOS guys.

Thread: https://lkml.org/lkml/2014/5/8/446

Orabug: 21666295

Signed-off-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  IB/rds_rdma: unloading of ofed stack causes page fault panic
  RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
  RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net

IB/rds_rdma: unloading of ofed stack causes page fault panic

This issue surfaced at the tail end of OFED functional automatic test suite
while unloading ofed modules resulting in following stack trace:
BUG: unable to handle kernel paging request at ffffffffa0abd1a0
IP: [<ffffffffa0abd1a0>] 0xffffffffa0abd1a0

Modules linked in: rds(-) ib_ipoib ... dm_mod [last unloaded: rds_rdma]

Workqueue: krdsd 0xffffffffa0abd1a0
task: ffff880670ac8df0 ti: ffff880666654000 task.ti: ffff880666654000
RIP: 0010:[<ffffffffa0abd1a0>]  [<ffffffffa0abd1a0>] 0xffffffffa0abd1a0
RSP: 0018:ffff880666657de0  EFLAGS: 00010286
RAX: 0000000000000600 RBX: ffff880664a03380 RCX: dead000000200200
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880664a03380
RBP: ffff880666657e38 R08: ffff880664a03388 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880674279c80
R13: ffff880675169800 R14: ffff880671a5dd00 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88067fc00000(0000) GS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa0abd1a0 CR3: 0000000001a56000 CR4: 00000000000007e0
Stack:
  ffffffff810962d6 000000000000000b ffff880664a03388 ffff880675169800
  ffff880671a5dd15 ffff880674279cb0 ffff880674279c80 ffff880675169800
  ffff880675169bc0 ffff880674279cb0 ffff880675169818 ffff880666657eb8
Call Trace:
  [<ffffffff810962d6>] ? process_one_work+0x146/0x450

The root cause for panic is failure to purge an active delayed work
request for active bonding initial failover work.

The fix is to cancel active bonding initial failover delayed work if
still active at module unload.

Orabug: 20861212

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Acked-by: Mukesh Kacker <mukesh.kacker@oracle.com>

RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.

Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.

Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.

Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.

Backport of upstream commit: 467fa15356acfb7b2efa38839c3e76caa4e6e0ea

Orabug: 21437445

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net

Open the sockets calling sock_create_kern() with the correct struct net
pointer, and use that struct net pointer when verifying the
address passed to rds_bind().

Backport of upstream commit: d5a8ac28a7ff2f250d1bedbb6008dd2f6f6f1638

Orabug: 21437445

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket

The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
netns of kernel sockets.")

Backport of upstream commit: 8a68173691f036613e3d4e6bf8dc129d4a7bf383

Orabug: 21437445

Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Modify sk_alloc to not reference count the netns of kernel sockets.

Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.

Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.

Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.

Backport of upstream commits: 26abe14379f8e2fa3fd1bcf97c9a7ad9364886fe

Orabug 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

net: Pass kern from net_proto_family.create to sk_alloc

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Backport of upstream commit: 11aa9c28b4209242a9de0a661a7b3405adb568a0

Orabug 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

net: Add a struct net parameter to sock_create_kern

This is long overdue, and is part of cleaning up how we allocate
kernel sockets that don't reference count struct net.

Backport of upstream commit: eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2

Orabug: 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: configs: Enable Chelsio T4 and T5 NIC on OL6

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek: (61 commits)
  i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf
  i40e: Refine an error message to avoid confusion
  i40e/i40evf: Add support for pre-allocated pages for PD
  i40evf: add MAC address filter in open, not init
  i40evf: don't delete all the filters
  i40e: un-disable VF after reset
  i40e: do a proper reset when disabling a VF
  i40e: correctly program filters for VFs
  i40e/i40evf: Update the admin queue command header
  i40e: Remove incorrect #ifdef's
  i40e: ignore duplicate port VLAN requests
  i40evf: Allow for an abundance of vectors
  i40e/i40evf: improve Tx performance with a small tweak
  i40e/i40evf: Update Flex-10 related device/function capabilities
  i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state
  i40e: Implement ndo_features_check()
  i40evf: don't configure unused RSS queues
  i40evf: fix panic during MTU change
  i40e: Bump version to 1.3.4
  i40e/i40evf: remove time_stamp member
  ...

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  nfs: take extra reference to fl->fl_file when running a LOCKU operation
  mm: madvise allow remove operation for hugetlbfs
  mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA
  hugetlbfs: add hugetlbfs_fallocate()
  hugetlbfs: New huge_add_to_page_cache helper routine
  mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate
  mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch
  mm/hugetlb.c: make vma_has_reserves() return bool
  hugetlbfs: truncate_hugepages() takes a range of pages
  hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete
  mm/hugetlb: expose hugetlb fault mutex for use by fallocate
  mm/hugetlb: add region_del() to delete a specific range of entries
  mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix
  mm/hugetlb: add cache of descriptors to resv_map for region_add
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  mm/hugetlb: compute/return the number of regions added by region_add()
  mm/hugetlb: document the reserve map/region tracking routines

nfs: take extra reference to fl->fl_file when running a LOCKU operation

Jean reported another crash, similar to the one fixed by feaff8e5b2cf:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 21687670
(cherry picked from mainline commit db2efec0caba4f81a22d95a34da640b86c313c8e)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Orabug: 20933419

NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
the hang is the global bit map inode lock.  Node 1 is master, has
the lock granted in PR mode; Node 2 is in the converting list (PR ->
EX). There are no holders of the lock on the master node so it should
downconvert to NL and grant EX to node 2 but that does not happen.
BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
Threads are waiting in __ocfs2_cluster_lock on BLOCKED.  One thread wants
EX, rest want PR. So it is as though the downconvert thread needs to be
kicked to complete the conv.

The hang is caused by an EX req coming into  __ocfs2_cluster_lock on
the heels of a PR req after it sets BUSY (drops l_lock, releasing EX
thread), forcing the incoming EX to wait on BUSY without doing anything.
PR has called ocfs2_dlm_lock, which  sets the node 1 lock from NL ->
PR, queues ast.

At this time, upconvert (PR ->EX) arrives from node 2, finds conflict with
node 1 lock in PR, so the lock res is put on dlm thread's dirty listt.

After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
awoken by ast.

Now it is dlm_thread that serially runs dlm_shuffle_lists, ast,  bast,
in that order.  dlm_shuffle_lists ques a bast on behalf of node 2
(which will be run by dlm_thread right after the ast).  ast does its
part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
dlm_thread runs  bast. It sets BLOCKED and kicks dc thread.  dc thread
runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
anything and reques.

Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
of PR, it wakes up first, finds BLOCKED set and skips doing anything
but clearing UPCONVERT_FINISHING (which was actually "meant" for the
PR thread), and this time waits on BLOCKED.  Next, the PR thread comes
out of wait but since UPCONVERT_FINISHING is not set, it skips updating
the l_ro_holders and goes straight to wait on BLOCKED. So there, we
have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock
res in osb blocked list. Only when dc thread is awoken, it will run
ocfs2_unblock_lock and things will unhang.

One way to fix this is to wake the dc thread on the flag after clearing
UPCONVERT_FINISHING

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>