www.infradead.org Git - users/jedix/linux-maple.git/log

igb: Allow asymmetric configuration of MTU versus Rx frame size

Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device. The driver can receive 9K
jumbo frames at all times. The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.

Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 45693bcb00cbd379c373ab22ccd9a9d4755cc7ed)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Refactor VFTA configuration

This patch starts the clean-up process on the VFTA configuration.
Specifically in this patch I attempt to address and simplify several items
while also updating the code to bring it more inline with what is already
in ixgbe.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 832e821c51e381966464c8a0f30f12eb1514eba0)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: clean up code for setting MAC address

Drop a bunch of hand written byte swapping code in favor of just doing the
byte swapping ourselves. The registers are little endian registers storing
a big endian value so if we read the MAC address array as little endian
then we will get the CPU registers into the proper layout.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c3278587e7d34cfbc1d38d3ae25923343af7752a)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb/igbvf: don't give up

The driver shouldn't just give up if it fails to get the hardware
mailbox lock. This can happen in a situation where the PF-VF
communication channel is heavily loaded and causes complete
communications failure between the PF and VF drivers.

Add a counter and a delay. The driver will now retry ten times, waiting
one millisecond between retries.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9ce0e8d72678b5b60c99ce4c7af15ec127c761cb)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Unpair the queues when changing the number of queues

By the commit 72ddef0506da ("igb: Fix oops caused by missing queue
pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
number of queues by "ethtool -L", but it is never cleared unless the igb
driver is reloaded.
This patch clears it if queue pairing becomes unnecessary as a result of
"ethtool -L".

Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 37a5d163fb447b39f7960d0534de30e88ad395bb)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Remove unnecessary flag setting in igb_set_flag_queue_pairs()

If VFs are enabled (max_vfs >= 1), both max_rss_queues and
adapter->rss_queues are set to 2 in the case of e1000_82576.
In this case, IGB_FLAG_QUEUE_PAIRS is always set in the default block as a
result of fall-through, thus setting it in the e1000_82576 block is not
necessary.

Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ceb27759987ec10ba22332bd7fdf1cfb35b86991)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Explicitly label self-test result indices

Previously, the ethtool self-test gstrings/data arrays were accessed via
hardcoded indices, which made the code difficult to follow. This patch
replaces the hardcoded values with enum-based labels.

Signed-off-by: Joe Schultz <jschultz@xes-inc.com>
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d602de05934c1d3022b153ff879e81f65df2a7b6)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Improve cable length function for I210, etc.

Previously, the PHY-specific code to get the cable length for the
I210 internal and related PHYs was reporting the cable length of a
single pair and reporting it as the min, max, and total cable length.
Update it so that all four pairs are checked so the true min, max,
and average cable lengths are reported.

Signed-off-by: Joe Schultz <jschultz@xes-inc.com>
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3627f8f1d137f6487e51ff1199a54a087a2a6446)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Don't add PHY address to PCDL address

There is no reason to add the PHY address into the PCDL register address.

Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 06b0dd64923b5598a52de4c889a116c49493bf97)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Remove GS40G specific defines/functions

The I210 internal PHY can be accessed just as well with the access
functions shared by 82580, I350, and I354 devices. A side effect of
relying on the common functions, is that I210 cable length support
is folded back into the common case which effectively reverts the
following commit:

    commit 59f301046b276f87483b3afa3201a4273def06a9
    Author: Carolyn Wyborny <carolyn.wyborny@intel.com>
    Date:   Wed Oct 10 04:42:59 2012 +0000

    igb: Update get cable length function for i210/i211

Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2a3cdead8b408351fa1e3079b220fa331480ffbc)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: improve handling of disconnected adapters

Clean up array_rd32 so that it uses igb_rd32 the same as rd32, per the
suggestion of Alexander Duyck, and use io_addr in more places, so that
we don't have the need to call E1000_REMOVED (which simply looks for a
null hw_addr) nearly as much.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7b06a6909555ffb0140733cc4420222604140b27)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: fix NULL derefs due to skipped SR-IOV enabling

The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
right number of queues") causes VFs no longer getting set up, leading
to NULL pointer dereferences due to the adapter's ->vf_data being NULL
while ->vfs_allocated_count is non-zero. The first commit not only
neglected the side effect of igb_sriov_reinit() that the second commit
tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
without which igb_enable_sriov() is effectively a no-op. Calling
igb_{,re}set_interrupt_capability() as done here seems to address this,
but I'm not sure whether this is better than sinply reverting the other
two commits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit be06998f96ecb93938ad2cce46c4289bf7cf45bc)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: use the correct i210 register for EEMNGCTL

The i210 has two EEPROM access registers that are located in
non-standard offsets: EEARBC and EEMNGCTL. EEARBC was fixed previously
and EEMNGCTL should also be corrected.

Reported-by: Roman Hodek <roman.aud@siemens.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 08c991297582114a6e1220f913eec91789c4eac6)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: don't unmap NULL hw_addr

I've got a startech thunderbolt dock someone loaned me, which among other
things, has the following device in it:

08:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

This hotplugs just fine (kernel 4.2.0 plus a patch or two here):

[  863.020315] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
[  863.020316] igb: Copyright (c) 2007-2014 Intel Corporation.
[  863.028657] igb 0000:08:00.0: enabling device (0000 -> 0002)
[  863.062089] igb 0000:08:00.0: added PHC on eth0
[  863.062090] igb 0000:08:00.0: Intel(R) Gigabit Ethernet Network Connection
[  863.062091] igb 0000:08:00.0: eth0: (PCIe:2.5Gb/s:Width x1) e8:ea:6a:00:1b:2a
[  863.062194] igb 0000:08:00.0: eth0: PBA No: 000200-000
[  863.062196] igb 0000:08:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[  863.064889] igb 0000:08:00.0 enp8s0: renamed from eth0

But disconnecting it is another story:

[ 1002.807932] igb 0000:08:00.0: removed PHC on enp8s0
[ 1002.807944] igb 0000:08:00.0 enp8s0: PCIe link lost, device now detached
[ 1003.341141] ------------[ cut here ]------------
[ 1003.341148] WARNING: CPU: 0 PID: 199 at lib/iomap.c:43 bad_io_access+0x38/0x40()
[ 1003.341149] Bad IO access at port 0x0 ()
[ 1003.342767] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi igb dca firewire_ohci firewire_core crc_itu_t rfcomm ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter bnep dm_mirror dm_region_hash dm_log dm_mod coretemp x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg
[ 1003.342793]  ansi_cprng aesni_intel hp_wmi aes_x86_64 iTCO_wdt lrw iTCO_vendor_support ppdev gf128mul sparse_keymap glue_helper ablk_helper cryptd snd_hda_codec_realtek snd_hda_codec_generic
microcode snd_hda_intel uvcvideo iwlwifi snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_core snd_hwdep btusb v4l2_common btrtl snd_seq btbcm btintel videodev cfg80211
snd_seq_device rtsx_pci_ms bluetooth pcspkr input_leds i2c_i801 media parport_pc memstick rfkill sg lpc_ich snd_pcm 8250_fintek parport joydev snd_timer snd soundcore hp_accel ie31200_edac
mei_me lis3lv02d edac_core input_polldev mei hp_wireless shpchp tpm_infineon sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables autofs4 xfs libcrc32c sd_mod sr_mod cdrom
rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci
[ 1003.342822]  nouveau ahci libahci mxm_wmi e1000e xhci_pci hwmon ptp drm_kms_helper pps_core xhci_hcd ttm wmi video ipv6
[ 1003.342839] CPU: 0 PID: 199 Comm: kworker/0:2 Not tainted 4.2.0-2.el7_UNSUPPORTED.x86_64 #1
[ 1003.342840] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 Ver. 01.07 02/26/2015
[ 1003.342843] Workqueue: pciehp-3 pciehp_power_thread
[ 1003.342844]  ffffffff81a90655 ffff8804866d3b48 ffffffff8164763a 0000000000000000
[ 1003.342846]  ffff8804866d3b98 ffff8804866d3b88 ffffffff8107134a ffff8804866d3b88
[ 1003.342847]  ffff880486f46000 ffff88046c8a8000 ffff880486f46840 ffff88046c8a8098
[ 1003.342848] Call Trace:
[ 1003.342852]  [<ffffffff8164763a>] dump_stack+0x45/0x57
[ 1003.342855]  [<ffffffff8107134a>] warn_slowpath_common+0x8a/0xc0
[ 1003.342857]  [<ffffffff810713c6>] warn_slowpath_fmt+0x46/0x50
[ 1003.342859]  [<ffffffff8133719e>] ? pci_disable_msix+0x3e/0x50
[ 1003.342860]  [<ffffffff812f6328>] bad_io_access+0x38/0x40
[ 1003.342861]  [<ffffffff812f6567>] pci_iounmap+0x27/0x40
[ 1003.342865]  [<ffffffffa0b728d7>] igb_remove+0xc7/0x160 [igb]
[ 1003.342867]  [<ffffffff8132189f>] pci_device_remove+0x3f/0xc0
[ 1003.342869]  [<ffffffff81433426>] __device_release_driver+0x96/0x130
[ 1003.342870]  [<ffffffff814334e3>] device_release_driver+0x23/0x30
[ 1003.342871]  [<ffffffff8131b404>] pci_stop_bus_device+0x94/0xa0
[ 1003.342872]  [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0
[ 1003.342873]  [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0
[ 1003.342874]  [<ffffffff8131b516>] pci_stop_and_remove_bus_device+0x16/0x30
[ 1003.342876]  [<ffffffff81333f5b>] pciehp_unconfigure_device+0x9b/0x180
[ 1003.342877]  [<ffffffff81333a73>] pciehp_disable_slot+0x43/0xb0
[ 1003.342878]  [<ffffffff81333b6d>] pciehp_power_thread+0x8d/0xb0
[ 1003.342885]  [<ffffffff810881b2>] process_one_work+0x152/0x3d0
[ 1003.342886]  [<ffffffff8108854a>] worker_thread+0x11a/0x460
[ 1003.342887]  [<ffffffff81088430>] ? process_one_work+0x3d0/0x3d0
[ 1003.342890]  [<ffffffff8108ddd9>] kthread+0xc9/0xe0
[ 1003.342891]  [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180
[ 1003.342893]  [<ffffffff8164e29f>] ret_from_fork+0x3f/0x70
[ 1003.342894]  [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180
[ 1003.342895] ---[ end trace 65a77e06d5aa9358 ]---

Upon looking at the igb driver, I see that igb_rd32() attempted to read from
hw_addr and failed, so it set hw->hw_addr to NULL and spit out the message
in the log output above, "PCIe link lost, device now detached".

Well, now that hw_addr is NULL, the attempt to call pci_iounmap is obviously
not going to go well. As suggested by Mark Rustad, do something similar to
what ixgbe does, and save a copy of hw_addr as adapter->io_addr, so we can
still call pci_iounmap on it on teardown. Additionally, for consistency,
make the pci_iomap call assignment directly to io_addr, so map and unmap
match.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 73bf8048d7c86a20a59d427e55deb1a778e94df7)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: add 88E1543 initialization code

Initialize the 88E1543 PHY.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 18f7ce5412027232890143ccfae23668d0872d27)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

net: igb: avoid using timespec

We want to deprecate the use of 'struct timespec' on 32-bit
architectures, as it is will overflow in 2038. The igb
driver uses it to read the current time, and can simply
be changed to use ktime_get_real_ts64() instead.

Because of hardware limitations, there is still an overflow
in year 2106, which we cannot really avoid, but this documents
the overflow.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 40c9b0796d46523fffb93e46ed8c691456146743)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: assume MSI-X interrupts during initialization

In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()

This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).

[    7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[    7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
[    7.419370] PGD 0
[    7.419373] Oops: 0002 [#1] SMP
[    7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[    7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[    7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
[...]
[    7.419431] Call Trace:
[    7.419442]  [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
[    7.419447]  [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0

Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cbfe360a1541a32e9e28f8f8ac925d2b7979d767)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: Enable TSO for stacked VLAN

Setting ndo_features_check to passthru_features_check allows the driver
to skip the check for multiple tagged TSO packets and enables stacked
VLAN TSO.
Tested with I350.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 213246d3fade6772e07138597bca0bdf9fbe754d)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: make sure SR-IOV init uses the right number of queues

Recent changes to igb_probe_vfs() could lead to the PF holding onto all
of the queues. Reorder igb_probe_vfs() to be before
gb_init_queue_configuration() and add some more error checking.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ceee3450b3a85db05a107d54fbea031c77d30401)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igbvf: clear buffer_info->dma after dma_unmap_single()

The driver doesn't clear buffer_info->dma after calling
dma_unmap_single() in all cases. This has been discovered by changing
the mtu twice, which caused the following backtrace.

[   68.569280] WARNING: CPU: 2 PID: 1860 at drivers/iommu/intel-iommu.c:3517 intel_unmap+0x20c/0x220()
[   68.579392] Driver unmaps unmatched page at PFN fffc2a40
[   68.585322] Modules linked in: igbvf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat kvm_intel kvm igb megs
[   68.599163] CPU: 2 PID: 1860 Comm: ifconfig Not tainted 4.2.0-rc4+ #147
[   68.606543] Hardware name: IBM  -[546025Z]-/00Y7630, BIOS -[VVE134TUS-1.51]- 10/17/2013
[   68.615473]  0000000000000dbd ffff88046441bb08 ffffffff81a5ad0b ffffffff81e2f9ea
[   68.623775]  ffff88046441bb58 ffff88046441bb48 ffffffff81056b55 ffff88047fc583c0
[   68.632075]  0000000000000000 ffff880469a8e600 00000000fffc2a40 ffff880465b32098
[   68.640375] Call Trace:
[   68.643109]  [<ffffffff81a5ad0b>] dump_stack+0x48/0x5d
[   68.648844]  [<ffffffff81056b55>] warn_slowpath_common+0x95/0xe0
[   68.655549]  [<ffffffff81056c56>] warn_slowpath_fmt+0x46/0x70
[   68.661960]  [<ffffffff8158a614>] ? find_iova+0x54/0x90
[   68.667791]  [<ffffffff815988dc>] intel_unmap+0x20c/0x220
[   68.673815]  [<ffffffff8159891e>] intel_unmap_page+0xe/0x10
[   68.680038]  [<ffffffffa0067536>] igbvf_clean_rx_ring+0x96/0x370 [igbvf]
[   68.687516]  [<ffffffffa0067915>] igbvf_down+0x105/0x110 [igbvf]
[   68.694219]  [<ffffffffa0067beb>] igbvf_change_mtu+0x16b/0x180 [igbvf]
[...]

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit fae5ecaee3e6265c2aec29ac238ccc29c6f11fc3)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Fix a memory leak in igb_probe

In error handling code of igb_probe, the memory adapter->shadow_vfta
allocated by kcalloc in igb_sw_init is not freed. So when register_netdev
or igb_init_i2c is failed, a memory leak will occur.
This patch adds kfree to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 42ad1a03b4caf4d95b980bce17c46242e6728ddc)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Fix a deadlock in igb_sriov_reinit

When igb_init_interrupt_scheme in igb_sriov_reinit is failed, the lock
acquired by rtnl_lock() is not released, which causes a deadlock.
This patch adds rtnl_unlock() in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3eb14ea8d9584e96680729c2c7a9129bafa13a39)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: Teardown SR-IOV before unregister_netdev()

When the .remove() callback for a PF is called, SR-IOV support for the
device is disabled, which requires unbinding and removing the VFs.
The VFs may be in-use either by the host kernel or userspace, such as
assigned to a VM through vfio-pci.  In this latter case, the VFs may
be removed either by shutting down the VM or hot-unplugging the
devices from the VM.  Unfortunately in the case of a Windows 2012 R2
guest, hot-unplug is broken due to the ordering of the PF driver
teardown.  Disabling SR-IOV prior to unregister_netdev() avoids this
issue.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c23d92b80e0b44d4c17085f0413e7574a7583615)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: add support for 1512 PHY

This patch adds support for Marvell PHY 1512 (required for I354).

Submitted by: Maciej Szwed <maciej.szwed@intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 51045ecff09e33dcf4027f4aa6e6a05a840899d3)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

igb: implement high frequency periodic output signals

In addition to interrupt driven target time output events, the i210
also has two programmable clock outputs. These clocks support periods
between 16 nanoseconds and 140 milliseconds. This patch implements
the periodic output function using the clock outputs when possible,
falling back to the target time for longer periods.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 30c72916d71b7970b16dca2bb1234aef2d37b695)

Orabug: 26325580

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Merge remote-tracking branch 'linux-uek/topic/uek-4.1/dtrace' into uek/uek-4.1-next

e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()

Replace disable_irq() which waits for threaded irq handlers with
disable_hardirq() which waits only for hardirq part.

Fixes: 311191297125 ("e1000: use disable_hardirq() for e1000_netpoll()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit fd8e597ba4afb769a8fb642555a6e0c5349a6ae8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Don't return uninitialized stats

Some statistics passed to ethtool are garbage because e1000e_get_stats64()
doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
memory to userspace and confuses users.

Do like ixgbe and use dev_get_stats() which first zeroes out
rtnl_link_stats64.

Fixes: 5944701df90d ("net: remove useless memset's in drivers get_stats64")
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 24ad2a9209a0bf1ec37fac25a011c98551865abb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix race condition around skb_tstamp_tx()

The e1000e driver and related hardware has a limitation on Tx PTP
packets which requires we limit to timestamping a single packet at once.
We do this by verifying that we never request a new Tx timestamp while
we still have a tx_hwtstamp_skb pointer.

Unfortunately the driver suffers from a race condition around this. The
tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx()
is called. This function notifies the stack and applications of a new
timestamp. Even a well behaved application that only sends a new request
when the first one is finished might be woken up and possibly send
a packet before we can free the timestamp in the driver again. The
result is that we needlessly ignore some Tx timestamp requests in this
corner case.

Fix this by assigning the tx_hwtstamp_skb pointer prior to calling
skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb
until that function finishes. This ensures that the application is not
woken up until the driver is ready to begin timestamping a new packet.

This ensures that well behaved applications do not accidentally race
with condition to skip Tx timestamps. Obviously an application which
sends multiple Tx timestamp requests at once will still only timestamp
one packet at a time. Unfortunately there is nothing we can do about
this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 5012863b7347866764c4a4e58b62fb05346b0d06)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Add Support for 38.4MHZ frequency

Add support for 38.4MHz frequency is required for PTP
on CannonLake. SYSTIM frequency adjustment attributes for TIMINCA are
get/set dependent on the hardware clock frequency for a different
types of adapters. 38.4MHz frequency supported by CannonLake
and active once time synchronisation mechanism was enabled
Changed abbreviation from Hz to HZ to be compliant checkpatch code style

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 68fe1d5da548aab2b6b1c28a9137248d6ccfcc43)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Add Support for CannonLake

The propagation of CannonLake mac type to driver functionality

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit c8744f44aeaee1caf5d6595e9351702253260088)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Initial Support for CannonLake

i219 (6) and i219 (7) are the next LOM generations that will be
available on the nextIntel Client platform (CannonLake)
This patch provides the initial support for these devices

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 3a3173b9c37aa1f07f8a71021114ee29a5712acb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix PTP on e1000_pch_lpt variants

I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed, so set the
appropriate flag2 in e1000_pch_lpt_info too.

Reported-by: Rupesh Patel <rupatel@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 10ed1e0bd144b5c3a92ca0c8fd06eb80b80649e5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix timing for 82579 Gigabit Ethernet controller

After an upgrade to Linux kernel v4.x the hardware timestamps of the
82579 Gigabit Ethernet Controller are different than expected.
The values that are being read are almost four times as big as before
the kernel upgrade.

The difference is that after the upgrade the driver sets the clock
frequency to 25MHz, where before the upgrade it was set to 96MHz. Intel
confirmed that the correct frequency for this network adapter is 96MHz.

Signed-off-by: Bernd Faust <berndfaust@gmail.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 5313eeccd2d7f486be4e5c7560e3e2be239ec8f7)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: Omit private ndo_get_stats function

e1000_get_stats() just returns dev->stats so we can leave it
out altogether and let dev_get_stats() do the job.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 14cf2ddf089c897ed660f5b4fbfd00872f0fc619)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

Revert "e1000e: driver trying to free already-free irq"

This reverts commit 7e54d9d063fa239c95c21548c5267f0ef419ff56.

After additional regression testing, several users are experiencing
kernel panics during shutdown on e1000e devices. Reverting this
change resolves the issue.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26243014
(cherry picked from commit 9f47a48e6eb97d793db85373e3ef4c55d876d334)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: driver trying to free already-free irq

During systemd reboot sequence network driver interface is shutdown
by e1000_close. The PCI driver interface is shut by e1000_shutdown.
The e1000_shutdown checks for netif_running status, if still up it
brings down driver. But it disables msi outside of this if statement,
regardless of netif status. All this is OK when e1000_close happens
after shutdown. However, by default, everything in systemd is done
in parallel. This creates a conditions where e1000_shutdown is called
after e1000_close, therefore hitting BUG_ON assert in free_msi_irqs.

CC: xe-kernel@external.cisco.com
Signed-off-by: khalidm <khalidm@cisco.com>
Signed-off-by: David Singleton <davsingl@cisco.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 7e54d9d063fa239c95c21548c5267f0ef419ff56)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: use disable_hardirq() for e1000_netpoll()

In commit 02cea3958664 ("genirq: Provide disable_hardirq()")
Peter introduced disable_hardirq() for netpoll, but it is forgotten
to use it for e1000.

This patch changes disable_irq() to disable_hardirq() for e1000.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26243014
(cherry picked from commit 311191297125156319be8f86d546ea1c569f7e95)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix PTP on e1000_pch_lpt variants

I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed, so set the
appropriate flag2 in e1000_pch_lpt_info too.

Reported-by: Rupesh Patel <rupatel@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 8037dd60f45264c3fbbea4cc0cea5f2f0a774b5e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: factor out systim sanitization

This is prepatory work for an expanding list of adapter families that have
occasional ~10 hour clock jumps when being used for PTP. Factor out the
sanitization function and convert to using a feature (bug) flag, per
suggestion from Jesse Brandeburg.

Littering functional code with device-specific checks is much messier than
simply checking a flag, and having device-specific init set flags as needed.
There are probably a number of other cases in the e1000e code that
could/should be converted similarly.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 0be5b96cd8400aeb4bf3f8c5e7f5efaa38ae5055)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: prevent division by zero if TIMINCA is zero

Users report that under VMWare, er32(TIMINCA) returns zero.
This causes division by zero at init time as follows:

==>       incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
           for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
                   /* latch SYSTIMH on read of SYSTIML */
                   systim_next = (cycle_t)er32(SYSTIML);
                   systim_next |= (cycle_t)er32(SYSTIMH) << 32;

                   time_delta = systim_next - systim;
                   temp = time_delta;
====>             rem = do_div(temp, incvalue);

This change makes kernel survive this, and users report that
NIC does work after this change.

Since on real hardware incvalue is never zero, this should not affect
real hardware use case.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 3d05b15b03daa4e8c350a97d0d83d2c2abc8b8ef)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: keep Rx/Tx HW_VLAN_CTAG in sync

The bit in the e1000 driver that mentions explicitly that the hardware
has no support for separate RX/TX VLAN accel toggling rings true for
e1000e as well, and thus both NETIF_F_HW_VLAN_CTAG_RX and
NETIF_F_HW_VLAN_CTAG_TX need to be kept in sync.

Revert a portion of commit 889ad456660461 ("e1000e: keep VLAN interfaces
functional after rxvlan off") since keeping the bits in sync resolves
the original issue.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 838086414b3cda5c592591f2b82256996306dab6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: keep VLAN interfaces functional after rxvlan off

I've got a bug report about an e1000e interface, where a VLAN interface is
set up on top of it:

$ ip link add link ens1f0 name ens1f0.99 type vlan id 99
$ ip link set ens1f0 up
$ ip link set ens1f0.99 up
$ ip addr add 192.168.99.92 dev ens1f0.99

At this point, I can ping another host on vlan 99, ip 192.168.99.91.
However, if I do the following:

$ ethtool -K ens1f0 rxvlan off

Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
again. I'm not sure if this is actually intended behavior, or if there's a
lack of software VLAN stripping fallback, or what, but things continue to
work if I simply don't call e1000e_vlan_strip_disable() if there are
active VLANs (plagiarizing a function from the e1000 driver here) on the
interface.

Also slipped a related-ish fix to the kerneldoc text for
e1000e_vlan_strip_disable here...

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26243014
(cherry picked from commit 889ad4566604610804df984e1a3dd5e2c66256e5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl

The e1000e_config_hwtstamp function was incorrectly resetting the SYSTIM
registers every time the ioctl was being run. If you happened to be
running ptp4l and lost the PTP connect (removing cable, or blocking the
UDP traffic for example), then ptp4l will eventually perform a restart
which involves re-requesting timestamp settings. In e1000e this has the
unfortunate and incorrect result of resetting SYSTIME to the kernel
time. Since kernel time is usually in UTC, and PTP time is in TAI, this
results in the leap second being re-applied.

Fix this by extracting the SYSTIME reset out into its own function,
e1000e_ptp_reset, which we call during reset to restore the hardware
registers. This function will (a) restart the timecounter based on the
new system time, (b) restore the previous PPB setting, and (c) restore
the previous hwtstamp settings.

In order to perform (b), I had to modify the adjfreq ptp function
pointer to store the old delta each time it is called. This also has the
side effect of restoring the correct base timinca register correctly.
The driver does not need to explicitly zero the ptp_delta variable since
the entire adapter structure comes zero-initialized.

Reported-by: Brian Walsh <brian@walsh.ws>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit aa524b66c5efd1d3220b74168d803e8b2ee1d212)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: mark shifted values as unsigned

The E1000_ICH_NVM_SIG_MASK value is shifted, out to the 31st bit, which
is the signed bit for signed constants. Mark these values as unsigned to
prevent compiler warnings and issues on platforms which a different
signed bit implementation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 942c711206d1e0cd3dffc591829cbcbb9bcc0b1b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: use BIT() macro for bit defines

This prevents signed bitshift issues when the shift would overwrite the
signed bit, and prevents making this mistake in the future when copying
and modifying code.

Use GENMASK or the unsigned postfix for cases which aren't suitable for
BIT() macro.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 18dd23920703891c39c7965873f8ae369bd3a237)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: e1000e_cyclecounter_read(): do overflow check only if needed

SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0]

er32(SYSTIML) are probably moderately expensive (they are pci bus reads).
Can we avoid one of them? Yes, we can.

If the SYSTIML value we see is smaller than 0xff000000, the overflow
into SYSTIMH would require at least two increments.

We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order.

Even if one increment happens between them, the overflow into SYSTIMH
is impossible, and we can avoid doing another er32(SYSTIML) read
and overflow check.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit ab507c9a54ce3580e6a3829c9f4c24a13c32cbac)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check

If two consecutive reads of the counter are the same, it is also
not an overflow. "systimel_1 < systimel_2" should be
"systimel_1 <= systimel_2".

Before the patch, we could perform an *erroneous* correction:

Let's say that systimel_1 == systimel_2 == 0xffffffff.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: intel-wired-lan@lists.osuosl.org
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit a07fd74d5ea9c45a5c6e41f7cb4b997cf40d50f3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

"incvalue" variable holds a result of "er32(TIMINCA) &
E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)"
as a divisor.

Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit fb5277f2c2e4db4a29740ff071072a688892d2df)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Cleanup consistency in ret_val variable usage

Fixed the file to use a consistent ret_val for return value checking.

Signed-off-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 847042a6a51e6dbb789c259750609b78aa3f27a3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix ethtool autoneg off for non-copper

This patch fixes the issues for disabling auto-negotiation and forcing
speed and duplex settings for the non-copper media.

For non-copper media, e1000_get_settings should return ETH_TP_MDI_INVALID for
eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent e1000_set_settings
call would not fail with -EOPNOTSUPP.

e1000_set_spd_dplx should not automatically turn autoneg back on for forced
1000 Mbps full duplex settings for non-copper media.

Cc: xe-kernel@external.cisco.com
Cc: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Steve Shih <sshih@cisco.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit e11f303e3d0731a7379252192e7d02a1ae319238)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: call ndo_stop() instead of dev_close() when running offline selftest

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 1f2f83f838489d386ecad9d0c77c3d6ec983102c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: call ndo_stop() instead of dev_close() when running offline selftest

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit d5ea45da1f04a3443710306e16db3b3aeae92918)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: Double Tx descriptors needed check for 82544

The 82544 has code that adds one additional descriptor per data buffer.
However we weren't taking that into account when determining the descriptors
needed for the next transmit at the end of the xmit_frame path.

This change takes that into account by doubling the number of descriptors
needed for the 82544 so that we can avoid a potential issue where we could
hang the Tx ring by loading frames with xmit_more enabled and then stopping
the ring without writing the tail.

In addition it adds a few more descriptors to account for some additional
workarounds that have been added over time.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit a4605fef7132f19afded76ee025c957558271a7d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: Do not overestimate descriptor counts in Tx pre-check

The current code path is capable of grossly overestimating the number of
descriptors needed to transmit a new frame.  This specifically occurs if
the skb contains a number of 4K pages.  The issue is that the logic for
determining the descriptors needed is ((S) >> (X)) + 1.  When X is 12 it
means that we were indicating that we required 2 descriptors for each 4K
page when we only needed one.

This change corrects this by instead adding (1 << (X)) - 1 to the S value
instead of adding 1 after the fact.  This way we get an accurate descriptor
needed count as we are essentially doing a DIV_ROUNDUP().

Reported-by: Ivan Suzdal <isuzdal@mirantis.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 847a1d6796c767f8b697ead60997b847a84b897b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Initial support for KabeLake

i219 (4) and i219 (5) are the next LOM generations that will be
available on the next Intel platform (KabeLake).
This patch provides the initial support for the devices.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 9cd34b3a1cfd47692cbef8cb0761475021883e18)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Clear ULP configuration register on ULP exit

There have been bugs caused by HW ULP configuration settings not being
properly cleared after cable connect in V-Pro capable systems.
This caused HW to get out of sync occasionally.
The fix ensures that ULP settings are cleared in HW after
LAN cable re-connect.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit c5c6d07761a9ff64f0ffff2ca410a578fb7c4579)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Set HW FIFO minimum pointer gap for non-gig speeds

Based on feedback from HW team, the configured value of the internal PHY
HW FIFO pointer gap was incorrect for non-gig speeds.
This patch provides the correct configuration.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit c26f40daf4e32f970b8337a88b65a8d00332ae6f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Increase PHY PLL clock gate timing

Several packet loss issues were reported for which the root cause for
them was an incorrect configuration of internal HW PHY clock gating
mechanism by SW.
This patch provides the correct mechanism.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 74f31299a41e729226d60426087592b6790f22b7)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Increase ULP timer

Due to system level changes introduced in Skylake, ULP exit takes
significantly longer to occur. Therefore, driver must wait longer for.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 6721e9d568741ced04b1fe6eed42f2ddf585eac4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix msi-x interrupt automask

Since the introduction of 82574 support in e1000e, the driver has worked
on the assumption that msi-x interrupt generation is automatically
disabled after each irq. As it turns out, this is not the case.
Currently, rx interrupts can fire multiple times before and during napi
processing. This can be a problem for users because frames that arrive
in a certain window (after adapter->clean_rx() but before
napi_complete_done() has cleared NAPI_STATE_SCHED) generate an interrupt
which does not lead to napi_schedule(). These frames sit in the rx queue
until another frame arrives (a tcp retransmit for example).

While the EIAC and CTRL_EXT registers are properly configured for irq
automask, the modification of IAM in e1000_configure_msix() is what
prevents automask from working as intended.

This patch removes that erroneous write and fixes interrupt rearming for
tx interrupts. It also clears IAME from CTRL_EXT. This is not strictly
necessary for operation of the driver but it is to avoid disruption from
potential programs that access the registers directly, like `ethregs -c`.

Reported-by: Frank Steiner <steiner-reg@bio.ifi.lmu.de>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 0a8047ac68e50e4ccbadcfc6b6b070805b976885)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Do not write lsc to ics in msi-x mode

In msi-x mode, there is no handler for the lsc interrupt so there is no
point in writing that to ics now that we always assume Other interrupts
are caused by lsc.

Reviewed-by: Jasna Hodzic <jhodzic@ucdavis.edu>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit a61cfe4ffad7864a07e0c74969ca7ceb77ab2f1f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Do not read ICR in Other interrupt

Removes the ICR read in the other interrupt handler, uses EIAC to
autoclear the Other bit from ICR and IMS. This allows us to avoid
interference with Rx and Tx interrupts in the Other interrupt handler.

The information read from ICR is not needed. IMS is configured such that
the only interrupt cause that can trigger the Other interrupt is Link
Status Change.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 16ecba59bc333d6282ee057fb02339f77a880beb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Remove unreachable code

msi-x interrupts are not shared so there's no need to check if the
interrupt was really from this adapter.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 4d432f67ff004dc387ba307d418d0eae4fa9dc13)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Switch e1000e_up to void, drop code checking for error result

The function e1000e_up always returns 0. As such we can convert it to a
void and just ignore the results. This allows us to drop some code in a
couple spots as we no longer need to worry about non-zero return values.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 386164d9b36b1f6f1396978110de85c7e186491d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: initial support for i219-LM (3)

i219-LM (3) is a LOM that will be available on systems with the
Lewisburg Platform Controller Hub (PCH) chipset from Intel.
This patch provides the initial support for the device.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit f3ed935de059b83394c3ecf2c64c93b57c8915fe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Increase timeout of polling bit RSPCIPHY

Due to timing changes to the ME firmware in Skylake, this timer
needs to be increased to 300ms.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit d17c7868b2f8e329dcee4ecd2f5d16cfc9b26ac8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix division by zero on jumbo MTUs

This patch fixes possible division by zero in receive
interrupt handler when working without adaptive interrupt
moderation.

The adaptive interrupt moderation mechanism is typically
disabled on jumbo MTUs.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Leonid Bloch <leonid@daynix.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit b77ac46bbae862dcb3f51296825c940404c69b0f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: Elementary checkpatch warnings and checks removed

Signed-off-by: Janusz Wolak <januszvdm@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 13a87c124ec8a717b557cd8bc08693af021c8812)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: get rid of duplicate exit path

By using goto statement, we can achieve sharing the same exit path so
that code duplication could be minimized.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit c619581a7903fadccb900d1d497d2366fdf7da9d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: fix kernel-doc argument being missing

Due to historical reason, 'phy_data' has never been included in the
kernel doc. Fix it so that the requirement could be fulfilled.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit f03fed668a9193036db0258229af87e9ba46ed5e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: clean up the local variable

The local variable 'ret' doesn't serve much purpose so we might as well
clean it up.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 5a5e889c80cef7513a40143ee1e474acccdee13b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: fix a typo in the comment

Use 'That' to replace 'The' so that the comment would make sense.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit b6fad9f9fc5d4deb1cb50173bd729de26570fbbe)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: clean up the checking logic

The checking logic needed some clean-up work, so we rewrite it by
checking for break first. With that change in place, we can even move
the second check for goto statement outside of the loop.

As this is merely a cleanup, no functional change is involved. The
questionable 'tmp != 0xFF' is intentionally left alone.

Mark Rustad and Alexander Duyck contributed to this patch.

CC: Mark Rustad <mark.d.rustad@intel.com>
CC: Alex Duyck <aduyck@mirantis.com>
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 4e01f3a802b5910b25814e1d0fd05907edffed6f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: Remove checkpatch coding style errors

Signed-off-by: Janusz Wolak <januszvdm@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit a48954c88b5a9ec0a8323fff1d47a6affba31d61)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: fix data race between tx_ring->next_to_clean

e1000_clean_tx_irq cleans buffers and sets tx_ring->next_to_clean,
then e1000_xmit_frame reuses the cleaned buffers. But there are no
memory barriers when buffers gets recycled, so the recycled buffers
can be corrupted.

Use smp_store_release to update tx_ring->next_to_clean and
smp_load_acquire to read tx_ring->next_to_clean to properly
hand off buffers from e1000_clean_tx_irq to e1000_xmit_frame.

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 9eab46b7cb8d0b0dcf014bf7b25e0e72b9e4d929)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: make eeprom read/write scheduler friendly

Code was responsible for ~150ms scheduler latencies.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit e09b89069ff9e3877bdbca8e64da05cde88bc3a2)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Enable TSO for stacked VLAN

Setting ndo_features_check to passthru_features_check allows the driver
to skip the check for multiple tagged TSO packets and enables stacked
VLAN TSO.
Tested with I217-LM.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit f2701b185e05d0897a47f6a14da40a068b0644ff)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: remove dead e1000_init_eeprom_params calls

The device probe method e1000_probe calls e1000_init_eeprom_params
itself so there's no reason to call it again from e1000_do_write_eeprom
or e1000_do_read_eeprom.

The sentence above assumes that e1000_init_eeprom_params is effective.
e1000_init_eeprom_params depends mostly on hw->mac_type and e1000_probe
bails out early if it can't set mac_type (see e1000_init_hw_struct, then
e1000_set_mac_type), qed.

Btw, if effective, the removed paths would had been deadlock prone when
e1000_eeprom_spi was set:
-> e1000_write_eeprom (takes e1000_eeprom_lock)
   -> e1000_do_write_eeprom
      -> e1000_init_eeprom_params
         -> e1000_read_eeprom (takes e1000_eeprom_lock)

(same narrative with e1000_read_eeprom -> e1000_do_read_eeprom etc.)

As a final note, the candidate deadlock above can't happen in e1000_probe
due to the way eeprom->word_size is set / tested.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 307723255a05242ab252dd7047d4970ab60c7dfd)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Modify Tx/Rx configurations to avoid null pointer dereferences in e1000_open

When e1000e_setup_rx_resources is failed in e1000_open,
e1000e_free_tx_resources in "err_setup_rx" segment is executed.
"writel(0, tx_ring->head)" statement in e1000_clean_tx_ring
in e1000e_free_tx_resources will cause a null poonter dereference(crash),
because "tx_ring->head" is only assigned in e1000_configure_tx
in e1000_configure, but it is after e1000e_setup_rx_resources.

This patch moves head/tail register writing to e1000_configure_tx/rx,
which can fix this problem. It is inspired by igb_configure_tx_ring
in the igb driver.

Specially, thank Alexander Duyck for his valuable suggestion.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26243014
(cherry picked from commit 0845d45e900cad5f7f855a7a6a21c33477800b1f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

blkback/blktap: don't leak stack data via response ring

Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other backends do.
Build on the fact that all response structure flavors are actually
identical (the old code did make this assumption too).

This is XSA-216.

Reported-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26315576
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
drivers/block/xen-blkback/blkback.c (code base)

DTrace: IP provider use-after-free for drop-out probe points

KASan warnings showed a possible use-after-free for skbs in error handling
codepath after netfilter hooks have run. Hooks may free the skb, so we
should not derefence it at drop-out probe points after NF_HOOK().

Orabug: 26267376

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>

ctf: fix a variety of memory leaks and use-after-free bugs

These fall into two classes, but are sufficiently intertwined that it's
easier to commit them in one go.

The first is outright leaks, which exceed 1GiB on a normal run, varying
from the tiny (failure to free getline()'s line), through the disastrous
(failure to free items filtered from a list by list_filter(), leading to
the leaking of nearly the whole of the named_structs state, which is
huge).  We also leak the structs_seen hash due to recreating it on
alias_fixup file switch without bothering to destroy it first,

The second is lifetime problems, centred around the stuff allocated and
freed in the detect_duplicates_tu_{init,/done}() functions.  These were
comparing the module name against a saved copy to see if a new vars_seen
needed to be allocated, or whether this was just a flip of TU without a
change of object file and we could get away with just flushing its
contents out -- but unfortunately the state->module_name is assigned
directly from its parameter, and *that* has a lifetime lasting only
within process_file() -- and a deduplication run, of course, involves
iterating over a great many object files.  So everything works as long as
we're flipping from TU to TU within a single object file, and then we
switch object files and are suddenly strcmp()ing with freed memory.
Discard this faulty optimization entirely, and just flush the vars_seen
hash in tu_done() and both create and destroy it in scan_duplicates(),
right where we create and destroy related stuff too.

Something similar happens with the state->dwfl_file_name due to its
derivation from id_file->file_name: if no duplicates are found, we
list_filter() that id_file straight out of the structs_seen list and
free it, and then on the next call state->dwfl_file_name points to freed
memory.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Orabug: 26283357

dtrace: fix compilation with O=

Separate objdir compilation was broken because we were looking for
autoconf.h in the wrong place.

Fix it by looking in the same place as everything else in scripts/ does.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 26167475

IB/core: Remove stray semicolon in cma_init

Stray semicolon - remove it!

Fixes: f913294 ("IB/core: Fix a potential array overrun in CMA and SA
agent");

Orabug: 26270931

Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-By: Wei Lin Guay<wei.lin.guay@oracle.com>
Reviewed-By: Gerd Rausch <gerd.rausch@oracle.com>

percpu_ref: allow operation mode switching operations to be called concurrently

percpu_ref initially didn't have explicit mode switching operations.
It started out in percpu mode and switched to atomic mode on kill and
then released.  Ensuring that kill operation is initiated only after
init completes was naturally the caller's responsibility.

percpu_ref_reinit() was introduced later but it didn't shift the
synchronization responsibility.  Reinit can't be performed until kill
is confirmed, so there was nothing to worry about
synchronization-wise.  Also, as both reinit and kill manipulate the
base reference, invocations of the same function couldn't be allowed
to race each other.

The latest additions of percpu_ref_switch_to_atomic/percpu() changed
the situation.  These two functions can be called any time as long as
the percpu_ref is between init and exit and thus there are valid valid
usage scenarios where these new functions race with each other or
against reinit/kill.  Mostly from inertia, f47ad4578461 ("percpu_ref:
decouple switching to percpu mode and reinit") still left
synchronization among percpu mode switching operations to its users.

That the new switch functions can be freely mixed with kill/reinit but
the operations themselves should be synchronized is too subtle a
requirement and led to a very subtle race condition in blk-mq freezing
path.

This patch fixes the situation by introducing percpu_ref_switch_lock
to protect mode switching operations.  This ensures that percpu-ref
users don't have to worry about mode changing operations racing
against each other, e.g. switch_to_percpu against kill, as long as the
sequence of operations is valid.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: http://lkml.kernel.org/g/1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
(cherry picked from commit 33e465ce7cb30b71c113a26f36d293b545a28e12)

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

percpu_ref: restructure operation mode switching

Restructure atomic/percpu mode switching.

* The users of __percpu_ref_switch_to_atomic/percpu() now call a new
  function __percpu_ref_switch_mode() which calls either of the
  original switching functions depending on the current state of
  ref->force_atomic and the __PERCPU_REF_DEAD flag.  The callers no
  longer check whether switching is necessary but always invoke
  __percpu_ref_switch_mode().

* !ref->confirm_switch waiting is collected into
  __percpu_ref_switch_mode().

This patch doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 3f49bdd95855a33eea749304d2e10530a869218b)

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

percpu_ref: unify staggered atomic switching wait behavior

When an atomic or percpu switching starts before the previous atomic
switching finishes, the taken behaviors are

* If the new atomic switching has confirmation callback, it waits
  for the previous atomic switching to complete.

* If the new percpu switching is the first percpu switching following
  the previous atomic switching, it waits the previous atomic
  switching to complete.

No percpu_ref user depends on these subtleties.  The only meaningful
part is that, if the caller ensures that atomic switching isn't in
progress, mode switching operations can be issued from any context.

This patch pulls the wait logic to the top of both switching functions
so that they always wait for the previous atomic switching to
complete.  This makes the behavior simpler and consistent for both
directions and will help allowing concurrent invocations of mode
switching functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 18808354b79622ed11857e41f9044ba17aec5b01)

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()

Reorganize __percpu_ref_switch_to_atomic() so that it looks
structurally similar to __percpu_ref_switch_to_percpu() and relocate
percpu_ref_switch_to_atomic so that the two internal functions are
co-located.

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit b2302c7fdc654d249c546aac6228b8e10969bc1e)

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation

At the beginning, percpu_ref guaranteed a RCU grace period between a
call to percpu_ref_kill_and_confirm() and the invocation of the
confirmation callback. This guarantee exposed internal implementation
details and got rescinded while switching over to sched RCU; however,
__percpu_ref_switch_to_atomic() still inserts a full sched RCU grace
period even when it can simply wait for the previous attempt.

Remove the unnecessary grace period and perform the confirmation
synchronously for staggered atomic switching attempts. Update
comments accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a2f5630cb737787c1bfd9aa894b1bf9f3f4554ea)

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

block: Fix mismerge in queue freeze logic

Commit 7466bf8e2078 ("blk-mq: fix freeze queue race") introduced a mutex
to protect the queue freeze/unfreeze logic.

The locking requirement was obsoleted by commit c03fa711de6a ("block:
use an atomic_t for mq_freeze_depth") but the mutex was left in place in
our backport.

During the c311ca8a3d93 merge of the pmem tree conflicts arose in
blk-mq. The mutex lock calls were removed but the mutex unlocks left in
place. This lead to NVMe controller reset failures in ED testing. Remove
the last remnants of commit 7466bf8e2078.

Orabug: 26254388

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

nvme: Remove timeout when deleting queue

Avoid waiting for ADMIN_TIMEOUT when deleting a queue since chances are
that the controller is wedged.

Orabug: 26277582

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: tcp: Set linger when rejecting an incoming conn in rds_tcp_accept_one

Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP
layer accepts the connection and then the rds_tcp_accept_one()
callback is invoked to process the incoming connection.

rds_tcp_accept_one() may reject the incoming syn for a number of
reasons, e.g., commit 1a0e100fb2c9 ("RDS: TCP: Force every connection
to be initiated by numerically smaller IP address"), or because
we are getting spammed by a malicious node that is triggering
a flood of connection attempts to RDS_TCP_PORT. If the incoming
syn is rejected, no data would have been sent on the TCP socket,
and we do not need to be in TIME_WAIT state, so we set linger on
the TCP socket before closing, thereby closing the socket efficiently
with a RST.

Orabug: 26289770

(Cherry-pick of upstream 10beea7d7408d0b1c9208757f445c5c710239e0e)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Imanti Mendez <imanti.mendez@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: various endian-ness fixes

Found when testing between sparc and x86 machines on different
subnets, so the address comparison patterns hit the corner cases and
brought out some bugs fixed by this patch.

Orabug: 26289770

(Cherry-pick of upstream commit 00354de5779db4aa9c019db787ef89bd1a6b149b)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Imanti Mendez <imanti.mendez@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: remove cp_outgoing

After commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be
initiated by numerically smaller IP address") we no longer need
the logic associated with cp_outgoing, so clean up usage of this
field.

Orabug: 26289770

(Cherry-pick of upstream commit 41500c3e2a19ffcf40a7158fce1774de08e26ba2)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Imanti Mendez <imanti.mendez@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races

Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in
rds_tcp_listen_data_ready") added the function
rds_tcp_listen_sock_def_readable()  to handle the case when a
partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
null and we would hit a panic  of the form
  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP:           (null)
       :
  ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
  tcp_data_queue+0x39d/0x5b0
  tcp_rcv_established+0x2e5/0x660
  tcp_v4_do_rcv+0x122/0x220
  tcp_v4_rcv+0x8b7/0x980
        :
In the above case, it is not fatal to encounter a NULL value for
ready- we should just drop the packet and let the flush of the
acceptor thread finish gracefully.

In general, the tear-down sequence for listen() and accept() socket
that is ensured by this commit is:
     rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
     In rds_tcp_listen_stop():
         serialize with, and prevent, further callbacks using lock_sock()
         flush rds_wq
         flush acceptor workq
         sock_release(listen socket)

Orabug: 26289770

(Cherry-pick of upstream commit b21dd4506b71bdb9c5a20e759255cd2513ea7ebe)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races

Order of initialization in rds_tcp_init needs to be done so
that resources are set up and destroyed in the correct synchronization
sequence with both the data path, as well as netns create/destroy
path. Specifically,

- we must call register_pernet_subsys and get the rds_tcp_netid
  before calling register_netdevice_notifier, otherwise we risk
  the sequence
    1. register_netdevice_notifier sets up netdev notifier callback
    2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds
       the wrong rtn, resulting in a panic with string that is of the form:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
  IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp]
             :

- the rds_tcp_incoming_slab kmem_cache must be initialized before the
  datapath starts up. The latter can happen any time after the
  pernet_subsys registration of rds_tcp_net_ops, whose -> init
  function sets up the listen socket. If the rds_tcp_incoming_slab has
  not been set up at that time, a panic of the form below may be
  encountered

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
  IP: kmem_cache_alloc+0x90/0x1c0
         :
  rds_tcp_data_recv+0x1e7/0x370 [rds_tcp]
  tcp_read_sock+0x96/0x1c0
  rds_tcp_recv_path+0x65/0x80 [rds_tcp]
         :

Orabug: 26289770

(Cherry-pick of upstream 16c09b1c7657522a321b04aa7f4300865b7cb292)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: Take explicit refcounts on struct net

It is incorrect for the rds_connection to piggyback on the
sock_net() refcount for the netns because this gives rise to
a chicken-and-egg problem during rds_conn_destroy. Instead explicitly
take a ref on the net, and hold the netns down till the connection
tear-down is complete.

Orabug: 26289770

(Cherry-pick of upstream 8edc3affc0770886c7bfb3436b0fdd09bce13167)

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

nvme: Quirks for PM1725 controllers

Samsung PM1725 controllers have a few quirks that need to be handled in
the driver:

  - The host interface registers go offline briefly while resetting
    the controller. Thus a delay is needed before checking whether the
    controller is ready.

Orabug: 26275976

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: Andrew Byrne <byrneadw@ie.ibm.com>
Reported-by: Jaime A. H. Gomez <jahgomez@mx1.ibm.com>
Reported-by: Zachary D. Myers <zdmyers@us.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Jeffrey Lien <Jeff.Lien@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit b5a10c5f7532b7473776da87e67f8301bbc32693)

Orabug: 26275976

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>