www.infradead.org Git - users/jedix/linux-maple.git/log

be2iscsi : Fix memory check before unmapping.

Orabug: 21862307

Check DMA memory before it is unmapped.

Signed-off-by: John Soni Jose <sony.john-n@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>

Merge tag 'v4.1.6-11.e1000e.3.2.6#bug21792108' of git://ca-git.us.oracle.com/linux-bmaly-public into topic/uek-4.1/drivers

Merge tag 'v4.1.6-12.igb.5.3.0#bug21792102' of git://ca-git.us.oracle.com/linux-bmaly-public into topic/uek-4.1/drivers

sxge/sxgevf: port to uek4

Orabug: 20509061

Signed-off-by: Joyce Yu <joyce.yu@sun.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

igb: bump version to igb-5.3.0

Orabug: 21792102

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6fb469023cd995d7be5ab3bf12b79387710382ff)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: use ARRAY_SIZE to replace calculating sizeof(a)/sizeof(a[0])

Orabug: 21792102

Use the ARRAY_SIZE macro rather than calculating sizeof(a)/sizeof(a[0]).
Also directly replace the code rather than using an unnecessary define.

Reported-by: Maninder Singh <maninder1.s@samsung.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9fa0452b645efdff439948a5cf448b8e497340e9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: report unsupported ethtool settings in set_coalesce

Orabug: 21792102

There are many settings possible using ethtool -C/--coalesce, but not
all of them are supported in igb. Report failure when an unsupported
option is set.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0c5bbeb8839172990e3b8aa82ae3c166e85a09bc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Fix i354 88E1112 PHY on RCC boards using AutoMediaDetect

Orabug: 21792102

e1000_check_for_link_media_swap() checks PHY page 0 for copper and PHY
page 1 for "other" (fiber) link. The switch back from page 1 to page 0
happened too soon, before e1000_check_for_link_82575() is executed, and
link on fiber (other) was never detected. Check for link while still on
the proper PHY page.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2ba6c0797c8b5a9f945345ef2b9193bd47e5f18e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Pull timestamp from fragment before adding it to skb

Orabug: 21792102

This change makes it so that we pull the timestamp from the fragment before
we add it to the skb. By doing this we can avoid a possible issue in which
the fragment can possibly be less than IGB_RX_HDR_LEN due to the timestamp
being pulled after the copybreak check.

While making this change I realized we could also pull the rest of the
igb_pull_tail function into igb_add_rx_frag since in the case of igb,
unlike ixgbe, we are able to unmap the entire buffer before calling
add_rx_frag so merging the two allows for sharing of code between the two
merged functions.

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f56e7bba22fad16c0d4fac996623ce1c13244f8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: only report generic filters in get_ts_info

Orabug: 21792102

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 97aebc1b3cdfd445a0a051090f0dcc6018b6df2c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: bump version of igb to 5.2.18

Orabug: 21792102

Bump version of igb to igb-5.2.18

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 73cd63598dcbc95f51d5becf548e0643aa7a49fa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: disable IPv6 extension header processing

Orabug: 21792102

Disable IPv6 extension header processing as per hardware errata.

Also fix copyright date.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8d0a88a959f0768d6b46436ea2517926fb682e53)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation

Orabug: 21792102

This change updates igb so that it will correctly perform the descriptor
count calculation. Previously it was taking NETDEV_FRAG_PAGE_MAX_SIZE
into account with isn't really correct since a different value is used to
determine the size of the pages used for TCP. That is actually determined
by SKB_FRAG_PAGE_ORDER.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2ee52ad4962b32797bac33fa29ec8159e64c4ee3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

igb: simplify and clean up igb_enable_mas()

Orabug: 21792102

igb_enable_mas() should only be called for the 82575 and has no clear
return so changing it to void. Also simplify the odd conditional
expression.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8cfb879d1b118e190bf9aea1b50da62c0d8a4a77)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
  IB/rds_rdma: unloading of ofed stack causes page fault panic
  RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
  RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: Modify sk_alloc to not reference count the netns of kernel sockets.
  net: Pass kern from net_proto_family.create to sk_alloc
  net: Add a struct net parameter to sock_create_kern

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
DCA: fix over-warning in ioat3_dca_init

e1000e: Increase driver version number

Orabug: 21792108

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d2d7d4e4a60f1aeefb38d7a0bede3742ddb76a68)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix tight loop implementation of systime read algorithm

Orabug: 21792108

Change the algorithm. Read systimel twice and check for overflow.
If there was no overflow, use the first value.
If there was an overflow, read systimeh again and use the second
systimel value.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 37b12910dd11d9ab969f2c310dc9160b7f3e3405)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix incorrect ASPM locking

Orabug: 21792108

This patch fixes wrong locking usage.
In the context of slot reset, we should use lock.
And during resume, there is no need of lock.

Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2758f9edb7bd5a06a2ecee83cc2ebaf8822a0cb5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Cosmetic changes

Orabug: 21792108

1) Replace spaces with tab.
2) Move ich8lan related define to the proper context.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d582891594104adeea89307ddd31b31bcf2d95fa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Fix EEE in Sx implementation

Orabug: 21792108

This patch implements the EEE in Sx code so that it only applies to parts
that support EEE in Sx (as opposed to all parts that support EEE).
It also uses the existing eee_advert and eee_lp_abiliity to set just the
bits (100/1000) that should be set.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f5ac7445ebdbfa8cd2d90ef2a58b8f4455bcb664)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Cleanup qos request in error handling of e1000_open

Orabug: 21792108

The driver lacks pm_qos_remove_request in error handling (err_req_irq) of
e1000_open, and qos request inserted by pm_qos_add_request is not removed.
This patch add pm_qos_remove_request in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7faae96421870ed990b0a84797c6b2377e81d079)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - k1 workaround for LPT is not required for SPT

Orabug: 21792108

In SPT hardware does not require this driver workaround.
Removed the conditional that caused K1 workaround execution on SPT.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 352f8ead753402d6c0496cb83b902128925459eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - Increase minimum FIFO read/write min gap

Orabug: 21792108

Due to clocking changes in the Skylake platform, there was i219
data corruption. To work around this, HW team reported the need
to increase the minimum gap between the PHY FIFO read and write pointers.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 93cbfc709047a5bc3f8d86e0b55079b5077c8e00)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - increase IPG for speed 10/100 full duplex

Orabug: 21792108

In SPT/i219, there were CRC errors in speed 10/100 full duplex.
The solution given by the HW team is to increase the IPG from 8 to 0xC

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 69cfbc95bdbfa2bd9a82f27dc131b08c48542f19)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 - fix to enable both ULP and EEE in Sx state

Orabug: 21792108

In i219, there is a hardware bug that prevented ULP entry.
A side effect of the original software fix for this was that EEE in
Sx couldn't be enabled.
This patch implements a modified flow that allows both ULP and EEE in Sx.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6607c99e7034e7565a1559a24dd35083d6719788)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: synchronization of MAC-PHY interface only on non- ME systems

Orabug: 21792108

On power up, the MAC - PHY interface needs to be set to PCIe, even if
cable is disconnected. In ME systems, the ME handles this on exit from
Sx state. In non-ME, the driver handles it. Added a check for non-ME
system to the driver code that handles that.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit beee8072c39cee74d4ff6e3b4ca8942f9966ed2e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix locking issue with e1000e_disable_aspm

Orabug: 21792108

e1000e_disable_aspm called pci_disable_link_state_locked which requires
pci_bus_sem to be held, but is also called from places where this semaphore
was not previously acquired. This patch implements two flavors of
disable_aspm, one that acquires the lock, and the other (_locked) which
should be called when the semaphore is already acquired.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit beb0a1520bec17cfaf0c3c77bbdd56cbf942883a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Bump the version to 3.2.5

Orabug: 21792108

Bump the version to reflect the driver changes and bug fixes for i219.
Also update the copyright, while we are at it.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 529498cde04537211cc3aa8f920c371b91c0f7d8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix unit hang during loopback test

Orabug: 21792108

System would hang during execution of "ethtool -t <NIC>" for the same
reason that required flushing the descriptor rings. This fix disables
MULR for the loopback test to avoid the hang state.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2ec7d2974c5665780e5f2b532833b76531f0a8d3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix systim issues

Orabug: 21792108

Two issues involving systim were reported.
1. Clock is not running in the correct frequency
2. In some situations, systim values were not incremented linearly
This patch fixes the hardware clock configuration and the spurious
non-linear increment.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 83129b37ef35bb6a7f01c060129736a8db5d31c4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix legacy interrupt handling in i219

Orabug: 21792108

This fix handles a hardware issue that prevented i219 from
working in legacy interrupts mode (IntMode=0)

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ec945cfbbf918dd862d7574f9b75588ba1f4a729)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix flush_desc_ring implementation

Orabug: 21792108

The indication that a descriptor ring flush is required was read from
FEXTNVM7 by mistake. It should be read from the PCI config space.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ff9174291eddb3f42a1e9429f8b919bebc33533b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix logical error in flush_desc_rings

Orabug: 21792108

The condition under which the flush should occur was reversed. The fix
should be applied before any HW reset (unless followed by bus reset)
and before any power state transition from D0.

If E1000_FEXTNVM7_NEED_DESCRING_FLUSH bit is set in FEXTNVM7 and TDLEN > 0
the Tx ring should be flushed. (fixes ~95% of the hang states).
If the E1000_FEXTNVM7_NEED_DESCRING_FLUSH did not clear, we should also
flush the RX ring. Bug was caught by Alexander Duyck during a code review
when examining this fix.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 95f0d950467f1228d4e326c11150e1750a6dd1ef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: remove call to do_div and sign mismatch warning

Orabug: 21792108

Fixes a warning that was reported by Yanjiang Jin
<yanjiang.jin@windriver.com> by implementing the solution suggested by
Alexander Duyck <alexander.h.duyck@redhat.com>.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bfc9473bf90457bf31d3f675d82234897c6480cd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 execute unit hang fix on every reset or power state transition

Orabug: 21792108

After testing various cases, the conclusion is that the fix MUST be
executed BEFORE any event that the HW is reset or transition to D3.
To fix that I moved the execution to the relevant places but per
Alexander Duyck's review, ensure now that the DMA is valid and was not
freed before manipulating the ring.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0ffc56464bbbb8e2a78e319a36e1eafcbaaab9d8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: i219 fix unit hang on reset and runtime D3

Orabug: 21792108

Unit hang may occur if multiple descriptors are available in the rings
during reset or runtime suspend. This state can be detected by testing
bit 8 in the FEXTNVM7 register. If this bit is set and there are pending
descriptors in one of the rings, we must flush them prior to reset. Same
applies entering runtime suspend.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ad851fbb73a3d6564707281aa253418ef6aab878)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: fix call to do_div() to use u64 arg

Orabug: 21792108

We were using s64 for lat_ns (latency nano-second value) since in
our calculations a negative value could be a resultant. For negative
values, we then assign lat_ns to be zero, so the value passed to
do_div() was never negative, but do_div() expects the argument type
to be u64, so do a cast to resolve a compile warning seen on
PowerPC.

CC: Yanjiang Jin <yanjiang.jin@windriver.com>
CC: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Reported-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
(cherry picked from commit 30544af5483755b11bb5924736e9e0b45ef0644a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

e1000e: Do not allow CRC stripping to be disabled on 82579 w/ jumbo frames

Orabug: 21792108

The driver wasn't allowing jumbo frames to be
enabled when CRC stripping was disabled, however it was allowing CRC
stripping to be disabled while jumbo frames were enabled. This fixes that by
making it so that the NETIF_F_RXFCS flag cannot be set when jumbo frames are
enabled on 82579 and newer parts.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 55e7fe5b9cd94e6accb128e6a1e5902e9018deef)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

DCA: fix over-warning in ioat3_dca_init

We keep seeing such dmesg messages on boxes

WARNING: CPU: 0 PID: 457 at drivers/dma/ioat/dca.c:697
ioat3_dca_init+0x19c/0x1b0 [ioatdma]()
[ 16.609614] ioatdma 0000:00:04.0: APICID_TAG_MAP set incorrectly by
BIOS, disabling DCA
...
[<ffffffff8172807e>] dump_stack+0x4d/0x66
[<ffffffff81067f7d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff81068034>] warn_slowpath_fmt_taint+0x44/0x50
[<ffffffffa00228bc>] ioat3_dca_init+0x19c/0x1b0
[ioatdma]
[<ffffffffa0021cd6>] ioat3_dma_probe+0x386/0x3e0
[ioatdma]
[<ffffffffa001a192>] ioat_pci_probe+0x122/0x1b0
[ioatdma]
[<ffffffff81329385>] local_pci_probe+0x45/0xa0
[<ffffffff81080d34>] work_for_cpu_fn+0x14/0x20
[<ffffffff81083c33>] process_one_work+0x183/0x490
[<ffffffff81084bd3>] worker_thread+0x2a3/0x410
[<ffffffff81084930>] ? rescuer_thread+0x410/0x410
[<ffffffff8108b852>] kthread+0xd2/0xf0
[<ffffffff8108b780>] ?

No need to use WARN_TAINT_ONCE to generate a such big noise if this is
not a critical error for kernel. DCA driver could print out a debug
messages then quit quietly.

If this is a real BIOS bug, please ignore this patch. Let's transfer
this issue to BIOS guys.

Thread: https://lkml.org/lkml/2014/5/8/446

Orabug: 21666295

Signed-off-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  IB/rds_rdma: unloading of ofed stack causes page fault panic
  RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
  RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net

IB/rds_rdma: unloading of ofed stack causes page fault panic

This issue surfaced at the tail end of OFED functional automatic test suite
while unloading ofed modules resulting in following stack trace:
BUG: unable to handle kernel paging request at ffffffffa0abd1a0
IP: [<ffffffffa0abd1a0>] 0xffffffffa0abd1a0

Modules linked in: rds(-) ib_ipoib ... dm_mod [last unloaded: rds_rdma]

Workqueue: krdsd 0xffffffffa0abd1a0
task: ffff880670ac8df0 ti: ffff880666654000 task.ti: ffff880666654000
RIP: 0010:[<ffffffffa0abd1a0>]  [<ffffffffa0abd1a0>] 0xffffffffa0abd1a0
RSP: 0018:ffff880666657de0  EFLAGS: 00010286
RAX: 0000000000000600 RBX: ffff880664a03380 RCX: dead000000200200
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880664a03380
RBP: ffff880666657e38 R08: ffff880664a03388 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880674279c80
R13: ffff880675169800 R14: ffff880671a5dd00 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88067fc00000(0000) GS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa0abd1a0 CR3: 0000000001a56000 CR4: 00000000000007e0
Stack:
  ffffffff810962d6 000000000000000b ffff880664a03388 ffff880675169800
  ffff880671a5dd15 ffff880674279cb0 ffff880674279c80 ffff880675169800
  ffff880675169bc0 ffff880674279cb0 ffff880675169818 ffff880666657eb8
Call Trace:
  [<ffffffff810962d6>] ? process_one_work+0x146/0x450

The root cause for panic is failure to purge an active delayed work
request for active bonding initial failover work.

The fix is to cancel active bonding initial failover delayed work if
still active at module unload.

Orabug: 20861212

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Acked-by: Mukesh Kacker <mukesh.kacker@oracle.com>

RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.

Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.

Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.

Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.

Backport of upstream commit: 467fa15356acfb7b2efa38839c3e76caa4e6e0ea

Orabug: 21437445

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net

Open the sockets calling sock_create_kern() with the correct struct net
pointer, and use that struct net pointer when verifying the
address passed to rds_bind().

Backport of upstream commit: d5a8ac28a7ff2f250d1bedbb6008dd2f6f6f1638

Orabug: 21437445

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket

The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
netns of kernel sockets.")

Backport of upstream commit: 8a68173691f036613e3d4e6bf8dc129d4a7bf383

Orabug: 21437445

Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Modify sk_alloc to not reference count the netns of kernel sockets.

Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.

Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.

Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.

Backport of upstream commits: 26abe14379f8e2fa3fd1bcf97c9a7ad9364886fe

Orabug 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

net: Pass kern from net_proto_family.create to sk_alloc

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Backport of upstream commit: 11aa9c28b4209242a9de0a661a7b3405adb568a0

Orabug 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

net: Add a struct net parameter to sock_create_kern

This is long overdue, and is part of cleaning up how we allocate
kernel sockets that don't reference count struct net.

Backport of upstream commit: eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2

Orabug: 21437445

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: configs: Enable Chelsio T4 and T5 NIC on OL6

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek: (61 commits)
  i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf
  i40e: Refine an error message to avoid confusion
  i40e/i40evf: Add support for pre-allocated pages for PD
  i40evf: add MAC address filter in open, not init
  i40evf: don't delete all the filters
  i40e: un-disable VF after reset
  i40e: do a proper reset when disabling a VF
  i40e: correctly program filters for VFs
  i40e/i40evf: Update the admin queue command header
  i40e: Remove incorrect #ifdef's
  i40e: ignore duplicate port VLAN requests
  i40evf: Allow for an abundance of vectors
  i40e/i40evf: improve Tx performance with a small tweak
  i40e/i40evf: Update Flex-10 related device/function capabilities
  i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state
  i40e: Implement ndo_features_check()
  i40evf: don't configure unused RSS queues
  i40evf: fix panic during MTU change
  i40e: Bump version to 1.3.4
  i40e/i40evf: remove time_stamp member
  ...

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  nfs: take extra reference to fl->fl_file when running a LOCKU operation
  mm: madvise allow remove operation for hugetlbfs
  mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA
  hugetlbfs: add hugetlbfs_fallocate()
  hugetlbfs: New huge_add_to_page_cache helper routine
  mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate
  mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch
  mm/hugetlb.c: make vma_has_reserves() return bool
  hugetlbfs: truncate_hugepages() takes a range of pages
  hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete
  mm/hugetlb: expose hugetlb fault mutex for use by fallocate
  mm/hugetlb: add region_del() to delete a specific range of entries
  mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix
  mm/hugetlb: add cache of descriptors to resv_map for region_add
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  mm/hugetlb: compute/return the number of regions added by region_add()
  mm/hugetlb: document the reserve map/region tracking routines

nfs: take extra reference to fl->fl_file when running a LOCKU operation

Jean reported another crash, similar to the one fixed by feaff8e5b2cf:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 21687670
(cherry picked from mainline commit db2efec0caba4f81a22d95a34da640b86c313c8e)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Orabug: 20933419

NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
the hang is the global bit map inode lock.  Node 1 is master, has
the lock granted in PR mode; Node 2 is in the converting list (PR ->
EX). There are no holders of the lock on the master node so it should
downconvert to NL and grant EX to node 2 but that does not happen.
BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
Threads are waiting in __ocfs2_cluster_lock on BLOCKED.  One thread wants
EX, rest want PR. So it is as though the downconvert thread needs to be
kicked to complete the conv.

The hang is caused by an EX req coming into  __ocfs2_cluster_lock on
the heels of a PR req after it sets BUSY (drops l_lock, releasing EX
thread), forcing the incoming EX to wait on BUSY without doing anything.
PR has called ocfs2_dlm_lock, which  sets the node 1 lock from NL ->
PR, queues ast.

At this time, upconvert (PR ->EX) arrives from node 2, finds conflict with
node 1 lock in PR, so the lock res is put on dlm thread's dirty listt.

After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
awoken by ast.

Now it is dlm_thread that serially runs dlm_shuffle_lists, ast,  bast,
in that order.  dlm_shuffle_lists ques a bast on behalf of node 2
(which will be run by dlm_thread right after the ast).  ast does its
part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
dlm_thread runs  bast. It sets BLOCKED and kicks dc thread.  dc thread
runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
anything and reques.

Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
of PR, it wakes up first, finds BLOCKED set and skips doing anything
but clearing UPCONVERT_FINISHING (which was actually "meant" for the
PR thread), and this time waits on BLOCKED.  Next, the PR thread comes
out of wait but since UPCONVERT_FINISHING is not set, it skips updating
the l_ro_holders and goes straight to wait on BLOCKED. So there, we
have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock
res in osb blocked list. Only when dc thread is awoken, it will run
ocfs2_unblock_lock and things will unhang.

One way to fix this is to wake the dc thread on the flag after clearing
UPCONVERT_FINISHING

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf

Orabug: 21570582

Bump.

Change-ID: I84573d9fa51effc5b29bf5b8c74e3cc8b2673f48
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 76945bf9ff8a2433f1efb777ec64475c1eec08ab)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Refine an error message to avoid confusion

Orabug: 21570582

Change a warning message to indicate what may have really happened when
the init_shared_code call fails.

Change-ID: I616ace40fed120d0dec86dfc91ab2d7cde466904
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b2a75c5819ec910f430a2ff12fec6cce202899a0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Add support for pre-allocated pages for PD

Orabug: 21570582

The i40e_add_pd_table_entry() routine is being modified to handle both
cases where a backing page is passed and where backing page is allocated
in i40e_add_pd_table_entry().

For PBLE resource management, it is more efficient for it to manage its
backing pages. For VF, PBLE backing page addresses will be send to PF
driver for PBLE resource.

The i40e_remove_pd_bp() is also modified to not free pre-allocated pages and
free only ones which were allocated in i40e_add_pd_table_entry().

Change-ID: Ie673f0403f22979e9406f5a94048dceb91bcf9a8
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3bbf0faa90cb8d541d8b2ce01610dcec6828bd00)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: add MAC address filter in open, not init

Orabug: 21570582

During close, all of the MAC filters are cleared, so the driver would be
unable to receive unicast packets after being closed and reopened.

Add the adapter's "hardware" MAC address filter in open, not init. This
ensures that the correct filter is present each time.

Change-ID: I51a11e9c1200139dab6f66a5353bd38c7d26f875
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 44151cd32deb1074530f3beba51d535fa0887d9a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: don't delete all the filters

Orabug: 21570582

Due to an inverted conditional, the driver was marking all of its MAC
filters for deletion every time set_rx_mode was called. Depending upon
the timing of the calls to set_rx_mode and the processing of the admin
queue, the driver would (accidentally) end up with a varying number of
functional filters.

Correct this logic so that MAC filters are added and removed correctly.
Add a check for the driver's "hardware" MAC address so that this filter
doesn't get removed incorrectly.

Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 68ef169204e3a88ea4823645038d5496f66200f6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: un-disable VF after reset

Orabug: 21570582

When a VF is disabled, there is no way for it to recover until either
the PF driver is reloaded or SR-IOV is disabled and enabled. To correct
this, enable the VF after a successful reset.

Change-ID: I9e0788476c4d53d5407961b503febdfff2b8a7c6
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5b8f8505d37c63d492391e5fafcd43332671b36b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: do a proper reset when disabling a VF

Orabug: 21570582

The VF disable code was just whanging on the reset bit without properly
cleaning up the VF, which would leave the VF in an indeterminate state
from which it could not recover. Fix this by notifying the VF and then
by calling the normal VF reset routine.

Change-ID: I862b9dfa919368773cbdc212b805b520db2f7430
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 54f455eeb56c0ab92db87bed6bd767d206d9e743)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: correctly program filters for VFs

Orabug: 21570582

MAC filters for VFs were being programmed with 0 for the VLAN value when
there was no VLAN assigned. This is incorrect and actually assigns the
VF to VLAN 0. Instead, we must use -1 to indicate that no VLAN is in
use. This change programs the filters correctly and gets rid of a bogus
error message when setting a port VLAN on an active VF.

Change-ID: Ica9a9906d768405377ff3308e27f7d0b5b2ea96e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e995163cdcf9b70c7840a8d6a7ea7c0ce81c761b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Update the admin queue command header

Orabug: 21570582

Make the necessary updates to i40e_adminq_cmd.h.

Change-ID: Ib031c86cc6cab78e5aa44c64d8ce5474be8d7e42
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cb2f65bc0c64015e8fa45fe1065ad241bf31a994)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Remove incorrect #ifdef's

Orabug: 21570582

This patch removes some #ifdef's that should not be there. They
were stopping code that is needed from being compiled in.

With these #ifdef's removed, changes are needed in the driver
to fix some compile errors: adding missing parameters to
the definition of ndo_bridge_setlink and a ndo_dflt_brige_getlink call.

Change-ID: I5516614e1bc50b6bca0647cef971bc96161ba2de
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 9df70b66418e284dc1e7f272ac445c1d1e990b97)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: ignore duplicate port VLAN requests

Orabug: 21570582

If user attempts to set a port VLAN on a VF that already has the same
port VLAN configured, the driver will go through a completely
unnecessary flurry of filter removals and filter adds. Just check for
this condition and return success instead of doing a bunch of busywork.

Change-ID: Ia1a9e83e6ed48b3f4658bc20dfc6af0cf525d54a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 85927ec1b369c880407aa82eba70d49c04c35062)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: Allow for an abundance of vectors

Orabug: 21570582

The driver currently only maps TX and RX queues to a single MSI-X vector
per queue pair if there are exactly enough vectors for this.
Unfortunately, if we have too many vectors it will fail and allocate
queues to vectors in a suboptimal manner. Change the condition check to
allow for excess vectors. In this case, the extras just won't be used.

Change-ID: I23e1e2955c64739c86612db88a25583e6a7e0b17
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 973371da4d66b96736143bd3f2b2ff2331faae8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: improve Tx performance with a small tweak

Orabug: 21570582

Add a prefetch for the next Tx descriptor to be used when we know
there are more coming.

Change-ID: Ibb9acab11d508eec2db7da795df74debc16eeacb
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 489ce7a46306052ab4ef26c6305051c5f1b24bb4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Update Flex-10 related device/function capabilities

Orabug: 21570582

The Flex10 device/function capability has been upgraded to include
information needed to support Flex-10 configurations. This patch adds new
fields to the i40e_hw_capabilities structure and updates
i40e_parse_discover_capabilities functions to extract them from the AQ
response. Naming convention has changed to use flex10 mode instead of
existing mfp_mode_1.

Change-ID: I305dd888866985a30293acb3fb14fa43ca6b79ea
Signed-off-by: Pawel Orlowski <pawel.orlowski@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c78b953e0f189824f5eaa2d60123cfd12ea6db0d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state

Orabug: 21570582

Since the driver can dynamically enable/disable FD ATR and SB features,
these stats help keep track of the current state and along with
fd_flush count provide a means to debug what could be going on
with the flow director filters. This will take away the need for
being verbose in our debug logs with respect to FD.

Change-ID: I29224f750fe6602391043655d18996570720377d
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d0389e51fc9b3c74e7935ded5d22eab4ea004589)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Implement ndo_features_check()

Orabug: 21570582

i40e supports UDP tunnel headers up to 80 bytes in length, so
this adds a check to ensure that it doesn't try to offload
packets that exceed that.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f44a75e27d5eb4b1788f59c2bc185baaaf732f75)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: don't configure unused RSS queues

Orabug: 21570582

The driver will only configure as many queues as there are available
CPUs, up the maximum number of queues. However, it always configures
RSS as though it is using the maximum number of queues. This can cause
the device to drop a lot of RX traffic, as the packets get assigned to
nonfunctional queues.

Fix this by only configuring RSS with the number of active queues.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 40746eb14c6b44f4d635c2f4cf8c67550db9b3ab)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: fix panic during MTU change

Orabug: 21570582

Down was requesting queue disables, but then exited immediately
without waiting for the queues to actually disable. This could
allow any function called after i40evf_down to run immediately,
including i40evf_up, and causes a memory leak.

Removing the whole reinit_locked function is the best way
to go about this, and allows for the driver to handle the
state changes by requesting reset from the periodic timer.

Also, add a couple WARN_ONs in slow path to help us recognize
if we re-introduce this issue or missed any cases.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 67c818a1d58c7897b8a6f531684516f9c236fe1b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Bump version to 1.3.4

Orabug: 21570582

Bump.

Change-ID: I54ec2787a9fead5e18447078f26e5dd27f01da44
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f029094e49814b56fdb3261a694c8890983b7a2d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: remove time_stamp member

Orabug: 21570582

The driver doesn't use the time_stamp member to determine if there is a
tx_hang any more. There really isn't any point to the variable at all
so just remove it. It was left over from a previous tx_hang design.

Change-ID: I4c814827e1bcb46e45118fe37acdcfa814fb62a0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 335075989fbb3c3fffc3ba238b893fa92508a6f1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: force inline transmit functions

Orabug: 21570582

Inlining these functions gives us about 15% more 64 byte packets per
second when using pktgen. 13.3 million to 15 million with a single
queue.

Also fix the function names in i40evf to i40evf not i40e while we are
touching the function header.

Change-ID: I3294ae9b085cf438672b6db5f9af122490ead9d0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3e587cf3c1cc2996c39f8a19e453cb8233112416)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40evf: skb->xmit_more support

Orabug: 21570582

Eric added support for skb->xmit_more in i40e, this ports that into
i40evf as well.

Support skb->xmit_more in i40evf is straightforward; we need to move
around i40e_maybe_stop_tx() call to correctly test netif_xmit_stopped()
before taking the decision to not kick the NIC.

Change-ID: Idddda6a2e4a7ab335631c91ced51f55b25eb8468
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8f6a2b05c67d915cef66b2c9636404e0d531def2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Move the FD ATR/SB messages to a higher debug level

Orabug: 21570582

These are not useful unless SV is happening as there is a FD flush counter
that tracks this.

Change-ID: If2655b5a29687247d03a51d35f69854bbeb711ce
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2e4875e38c288702c2002c7bcf527d8aa0083979)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: fix unrecognized FCOE EOF case

Orabug: 21570582

Because i40e_fcoe_ctxt_eof should never be called without
i40e_fcoe_eof_is_supported being called first, the EOF in fcoe_ctxt_eof
should always be valid and therefore we do not need to print an error
if it is not valid.

However, a WARN ON to easily catch any calls to i40e_fcoe_ctxt_eof that
aren't preceded with a call to i40e_fcoe_eof_is_supported is helpful.

Change-ID: I3b536b1981ec0bce80576a74440b7dea3908bdb9
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 41837cad54fe22d29f021f6cb0e9d151acb104a0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Remove unneeded TODO

Orabug: 21570582

There's no need for a counter so remove the TODO comment.

Change-ID: I3321dda04934c4f5fda9b279ab666192bda44214
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6b02a174c1542486eeaa1de94e6c38e9271b89d8)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Remove unnecessary pf members

Orabug: 21570582

We can use the stat index macro directly, a variable is not required.

Change-ID: I19f08ac16353dc0cd87a1a8248d714e15a54aa8a
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 0bf4b1b0c3fda4dd72910cba3c40b3273a2de756)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Add stats to count Tunnel ATR hits

Orabug: 21570582

Add a 3rd dynamic filter counter to track Tunneled ATR hits separately.
Ethtool port stat "fdir_atr_tunnel_match"

Change-ID: Idd978b6db2a462b5722397cd2ffd04ef055f8655
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 60ccd45cbabdc058061b860c43c48877558cc176)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e/i40evf: Add ATR support for tunneled TCP/IPv4/IPv6 packets.

Orabug: 21570582

Without this, RSS would have done inner header load balancing. Now we can
get the benefits of ATR for tunneled packets to better align TX and RX
queues with the right core/interrupt.

Change-ID: I07d0e0a192faf28fdd33b2f04c32b2a82ff97ddd
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 89232c3bf78b3799699e48201f60892283564b78)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Disable offline diagnostics if VFs are enabled

Orabug: 21570582

Require the user to disable virtual functions before running the device
offline diagnostics. The offline diagnostics are intended to ensure
basic operation of the device - it is beyond the scope of the diagnostic
test to handle the additional complexity of bringing all the virtual
functions offline and then back online for each test run.

Change-ID: Ic0b854851a09fc85df0c9e82c220e45885457c30
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e17bc411aea8fbebc51857037f104ab09f765120)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

i40e: Collect PFC XOFF RX stats even in single TC case

Orabug: 21570582

When PFC is enabled for any UP in single TC configuration the driver didn't
collect the PFC XOFF RX stats. Though a single TC with PFC enabled is not a
common scenario do not prevent the driver from collecting stats if firmware
indicates that PFC is enabled.

Change-ID: Ie20bd58b07608b528f3c6d95894c9ae56b00077a
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit e120814d74bc805769d18ed7177f43a17a88fd40)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: configs: Enable Chelsio T4 and T5 NIC on OL6

Orabug: 21754829

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm: madvise allow remove operation for hugetlbfs

Orabug: 21652814

Now that we have hole punching support for hugetlbfs, we can
also support the MADV_REMOVE interface to it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8f787c8989ce599cbf0feb10ecea912d07111439)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA

Orabug: 21652814

Commit 56bb4d795 introduced a build error if CONFIG_NUMA is not
defined. When fallocate preallocation allocates pages, it will
use the defined numa policy. However, if numa is not defined
there is no such policy and no code should reference numa policy.
Create wrappers to isolate policy manipulation code that are a
NOOP in the non-NUMA case.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0ce9057732d1dd94ef2bd32c8acb68ae68b08a2f)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: add hugetlbfs_fallocate()

Orabug: 21652814

This is based on the shmem version, but it has diverged quite a bit.  We
have no swap to worry about, nor the new file sealing.  Add
synchronication via the fault mutex table to coordinate page faults,
fallocate allocation and fallocate hole punch.

What this allows us to do is move physical memory in and out of a
hugetlbfs file without having it mapped.  This also gives us the ability
to support MADV_REMOVE since it is currently implemented using
fallocate().  MADV_REMOVE lets madvise() remove pages from the middle of a
hugetlbfs file, which wasn't possible before.

hugetlbfs fallocate only operates on whole huge pages.

Based on code by Dave Hansen.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 79925b07fd22d7c4e4e77cdc26edb26dc4ff2701)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: New huge_add_to_page_cache helper routine

Orabug: 21652814

Currently, there is only a single place where hugetlbfs pages are added to
the page cache. The new fallocate code be adding a second one, so break
the functionality out into its own helper.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4b153581930e8c61250078efcdcce3e19bc2a45b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

Orabug: 21652814

Areas hole punched by fallocate will not have entries in the
region/reserve map. However, shared mappings with min_size subpool
reservations may still have reserved pages. alloc_huge_page needs to
handle this special case and do the proper accounting.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b21aa74b8077ce5b9c5fea566fe37af866934746)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch

Orabug: 21652814

In vma_has_reserves(), the current assumption is that reserves are always
present for shared mappings.  However, this will not be the case with
fallocate hole punch.  When punching a hole, the present page will be
deleted as well as the region/reserve map entry (and hence any
reservation).  vma_has_reserves is passed "chg" which indicates whether or
not a region/reserve map is present.  Use this to determine if reserves
are actually present or were removed via hole punch.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 320fb799e7cedc1edb96fd69c686547c731d5fc8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb.c: make vma_has_reserves() return bool

Orabug: 21652814

This makes vma_has_reserves() return bool due to this particular function
only returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f7bba6ea9b80a19c4849b4d870fb2f2bd492713b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: truncate_hugepages() takes a range of pages

Orabug: 21652814

Modify truncate_hugepages() to take a range of pages (start, end) instead
of simply start.  If an end value of LLONG_MAX is passed, the current
"truncate" functionality is maintained.  Existing callers are modified to
pass LLONG_MAX as end of range.  By keying off end == LLONG_MAX, the
routine behaves differently for truncate and hole punch.  Page removal is
now synchronized with page allocation via faults by using the fault mutex
table.  The hole punch case can experience the rare region_del error and
must handle accordingly.

Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the
case where region_del returns an error.

Since the routine handles more than just the truncate case, it is renamed
to remove_inode_hugepages().  To be consistent, the routine
truncate_huge_page() is renamed remove_huge_page().

Downstream of remove_inode_hugepages(), the routine
hugetlb_unreserve_pages() is also modified to take a range of pages.
hugetlb_unreserve_pages is modified to detect an error from region_del and
pass it back to the caller.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6a57804ccdfb77b8f333b736a3ee7cb1bf8732e1)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete

Orabug: 21652814

fallocate hole punch will want to unmap a specific range of pages. Modify
the existing hugetlb_vmtruncate_list() routine to take a start/end range.
If end is 0, this indicates all pages after start should be unmapped.
This is the same as the existing truncate functionality. Modify existing
callers to add 0 as end of range.

Since the routine will be used in hole punch as well as truncate
operations, it is more appropriately renamed to hugetlb_vmdelete_list().

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit dea23e2a9e811e5fba895a134f701455908aa0d3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: expose hugetlb fault mutex for use by fallocate

Orabug: 21652814

hugetlb page faults are currently synchronized by the table of mutexes
(htlb_fault_mutex_table).  fallocate code will need to synchronize with
the page fault code when it allocates or deletes pages.  Expose interfaces
so that fallocate operations can be synchronized with page faults.  Minor
name changes to be more consistent with other global hugetlb symbols.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fec73245c33b067c60f520908a93c971003664c8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add region_del() to delete a specific range of entries

Orabug: 21652814

fallocate hole punch will want to remove a specific range of pages.  The
existing region_truncate() routine deletes all region/reserve map entries
after a specified offset.  region_del() will provide this same
functionality if the end of region is specified as LONG_MAX.  Hence,
region_del() can replace region_truncate().

Unlike region_truncate(), region_del() can return an error in the rare
case where it can not allocate memory for a region descriptor.  This ONLY
happens in the case where an existing region must be split.  Current
callers passing LONG_MAX as end of range will never experience this error
and do not need to deal with error handling.  Future callers of
region_del() (such as fallocate hole punch) will need to handle this
error.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c2cfad5701106f8ddb0607b9e09d524ef55ef0ec)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix

Orabug: 21652814

fix typo in comment, use more cols

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 203a5abc2eb0bcd7f8e5a3742467e845de368df8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add cache of descriptors to resv_map for region_add

Orabug: 21652814

hugetlbfs is used today by applications that want a high degree of control
over huge page usage.  Often, large hugetlbfs files are used to map a
large number huge pages into the application processes.  The applications
know when page ranges within these large files will no longer be used, and
ideally would like to release them back to the subpool or global pools for
other uses.  The fallocate() system call provides an interface for
preallocation and hole punching within files.  This patch set adds
fallocate functionality to hugetlbfs.

fallocate hole punch will want to remove a specific range of pages.  When
pages are removed, their associated entries in the region/reserve map will
also be removed.  This will break an assumption in the
region_chg/region_add calling sequence.  If a new region descriptor must
be allocated, it is done as part of the region_chg processing.  In this
way, region_add can not fail because it does not need to attempt an
allocation.

To prepare for fallocate hole punch, create a "cache" of descriptors that
can be used by region_add if necessary.  region_chg will ensure there are
sufficient entries in the cache.  It will be necessary to track the number
of in progress add operations to know a sufficient number of descriptors
reside in the cache.  A new routine region_abort is added to adjust this
in progress count when add operations are aborted.  vma_abort_reservation
is also added for callers creating reservations with
vma_needs_reservation/vma_commit_reservation.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 27af163113310a86b6d19bb5693c1a08eb89b0f7)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

Orabug: 21652814

alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the
number of pages which will be added to the reserve map.  Subpool and
global reserve counts are adjusted based on the output of region_chg.
Before the pages are actually added to the reserve map, these routines
could race and add fewer pages than expected.  If this happens, the
subpool and global reserve counts are not correct.

Compare the number of pages actually added (region_add) to those expected
to added (region_chg).  If fewer pages are actually added, this indicates
a race and adjust counters accordingly.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33039678c8da8133e30ea3250d10ae14701dae2b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: compute/return the number of regions added by region_add()

Orabug: 21652814

Modify region_add() to keep track of regions(pages) added to the reserve
map and return this value. The return value can be compared to the return
value of region_chg() to determine if the map was modified between calls.

Make vma_commit_reservation() also pass along the return value of
region_add(). In the normal case, we want vma_commit_reservation to
return the same value as the preceding call to vma_needs_reservation.
Create a common __vma_reservation_common routine to help keep the special
case return values in sync

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cf3ad20bfeadda693e408d85684790714fc29b08)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>