www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek:
  xen/events/fifo: Consume unprocessed events when a CPU dies
  Revert "xen/fb: allow xenfb initialization for hvm guests"
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
ocfs2: return non-zero st_blocks for inline data

ocfs2: return non-zero st_blocks for inline data

Some versions of tar assume that files with st_blocks == 0 do not contain
any data and will skip reading them entirely. See also commit
9206c561554c ("ext4: return non-zero st_blocks for inline data").

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Gang He <ghe@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ca426103429543e7a9be9017537fc3ffc37b5724)
Orabug: 22218243
Signed-off-by: John Haxby <john.haxby@oracle.com>

xen/events/fifo: Consume unprocessed events when a CPU dies

When a CPU is offlined, there may be unprocessed events on a port for
that CPU. If the port is subsequently reused on a different CPU, it
could be in an unexpected state with the link bit set, resulting in
interrupts being missed. Fix this by consuming any unprocessed events
for a particular CPU when that CPU dies.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3de88d622fd68bd4dbee0f80168218b23f798fd0)

Orabug: 22498877
Tested-by: Carson Hovey <carson.hovey@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge branch 'uek4/bug20386370' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/bug20386370' of git://ca-git.us.oracle.com/linux-konrad-public:
Revert "xen/fb: allow xenfb initialization for hvm guests"

Merge branch 'uek4/xsa157' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/xsa157' of git://ca-git.us.oracle.com/linux-konrad-public:
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

Merge branch 'uek4/xsa155' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek4/xsa155' of git://ca-git.us.oracle.com/linux-konrad-public:
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()

Revert "xen/fb: allow xenfb initialization for hvm guests"

This reverts commit d9581c7dcac15c02ad4d47c60c60f4d8f197db55.

which is:
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Fri Jan 3 19:02:09 2014 +0000

    xen/fb: allow xenfb initialization for hvm guests

    There is no reasons why an HVM guest shouldn't be allowed to use xenfb.

as "Xend" exposes an PV and VGA (via QEMU) driver to the guests (either
PV or HVM).

The backed in this case is QEMU - and it only provides the emulation
for the VGA backend (for HVM guests). Upstream QEMU can do both - but
since we are using Xend - we only expose QEMU-traditional.

Upstreamwise we hadn't reached a good decisions. Poking for XenStore
keys to see if the backend is Xend vs xl seems odd. And the issue
goes away in 'xl' as you can't define both PV and VGA drivers. Only
one of them is allowed.

The 'proper' fix would be to teach Xend to not expose PV FB for HVM
guests but that code is pretty genric.

So continuing on with this revert.

OraBug: 20386370 - XENBUS_PROBE_FRONTEND: TIMEOUT CONNECTING TO DEVICE EROR WITH UEK4
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.

commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.

Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).

Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.

Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.

Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).

This does not change the behavior of XEN_PCI_OP_disable_msi[|x].

The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.

However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.

This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.

Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7cfb905b9638982862f0331b36ccaaca5d383b49)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: Do not install an IRQ handler for MSI interrupts.

Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:

for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));

Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.

To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.

Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.

Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.

Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).

Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.

The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.

P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled

The guest sequence of:

a) XEN_PCI_OP_enable_msix
b) XEN_PCI_OP_enable_msix

results in hitting an NULL pointer due to using freed pointers.

The device passed in the guest MUST have MSI-X capability.

The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).

The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.

The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

The guest sequence of:

a) XEN_PCI_OP_enable_msi
b) XEN_PCI_OP_enable_msi
c) XEN_PCI_OP_disable_msi

results in hitting an BUG_ON condition in the msi.c code.

The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.

The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set). c) pci_disable_msi passes the msi_enabled checks and hits:

BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));

and blows up.

The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.

This is part of XSA-157.

CC: stable@vger.kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: Save xen_pci_op commands before processing it

Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.

The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.

This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-scsiback: safely copy requests

The copy of the ring request was lacking a following barrier(),
potentially allowing the compiler to optimize the copy away.

Use RING_COPY_REQUEST() to ensure the request is copied to local
memory.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit be69746ec12f35b484707da505c6c76ff06f97dc)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-blkback: read from indirect descriptors only once

Since indirect descriptors are in memory shared with the frontend, the
frontend could alter the first_sect and last_sect values after they have
been validated but before they are recorded in the request. This may
result in I/O requests that overflow the foreign page, possibly
overwriting local pages when the I/O request is executed.

When parsing indirect descriptors, only read first_sect and last_sect
once.

This is part of XSA155.

(cherry-pick from 18779149101c0dd43ded43669ae2a92d21b6f9cb)
CC: stable@vger.kernel.org
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
----
v2: This is against v4.3

xen-blkback: only read request operation from shared ring once

A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.

When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 1f13d75ccb806260079e0679d55d9253e370ec8a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-netback: use RING_COPY_REQUEST() throughout

Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.

This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 68a33bfd8403e4e22847165d149823a2e0e67c9c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-netback: don't use last request to determine minimum Tx credit

The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.

Instead allow for the worst case 128 KiB packet.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: Add RING_COPY_REQUEST()

Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.

Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.

Use a volatile source to prevent the compiler from reordering or
omitting the copy.

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 454d5d882c7e412b840e3c99010fe81a9862f6fb)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring
KEYS: Fix race between key destruction and finding a keyring by name

KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring

The following sequence of commands:

    i=`keyctl add user a a @s`
    keyctl request2 keyring foo bar @t
    keyctl unlink $i @s

tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set.  However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code.  When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:

BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
...
CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
...
Call Trace:
[<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
[<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
[<ffffffff8105ec9b>] process_one_work+0x28e/0x547
[<ffffffff8105fd17>] worker_thread+0x26e/0x361
[<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
[<ffffffff810648ad>] kthread+0xf3/0xfb
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
[<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2

Note the value in RAX.  This is a 32-bit representation of -ENOKEY.

The solution is to only call ->destroy() if the key was successfully
instantiated.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Orabug: 22373388
CVE: CVE-2015-7872
(cherry picked from mainline commit f05819df10d7b09f6d1eb6f8534a8f68e5a4fe61)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KEYS: Fix race between key destruction and finding a keyring by name

There appears to be a race between:

(1) key_gc_unused_keys() which frees key->security and then calls
     keyring_destroy() to unlink the name from the name list

(2) find_keyring_by_name() which calls key_permission(), thus accessing
     key->security, on a key before checking to see whether the key usage is 0
     (ie. the key is dead and might be cleaned up).

Fix this by calling ->destroy() before cleaning up the core key data -
including key->security.

Reported-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Orabug: 22373388
(cherry picked from mainline commit 94c4554ba07adbdde396748ee7ae01e86cf2d8d7)
pre-requisite patch for mainline f05819df10d7b09f6d1eb6f8534a8f68e5a4fe61
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
x86/efi: back-out bug fix 22353360 which causes efi regression

x86/efi: back-out bug fix 22353360 which causes efi regression

Revert "conditionalize Secure Boot initialization on x86 platform"
Revert "x86/efi: Set securelevel when loaded without efi stub"

Orabug: 22363222

This reverts commit 289ef170a213f9b078886327a162cbcb7a325838.
This reverts commit a954f7350658a8fde4b893c7b74de8137864ad12.

We're reverting the two commits for now, but will continue to
investigate a correct fix to the original problem.

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
conditionalize Secure Boot initialization on x86 platform
x86/efi: Set securelevel when loaded without efi stub

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
KVM: svm: unconditionally intercept #DB
KVM: x86: work around infinite loop in microcode when #AC is delivered

conditionalize Secure Boot initialization on x86 platform

Orabug: 22353360

A previous commit, bbb87ba5690c72821614acdd465c24eeb99fa923,
introduced a call to efi_secure_boot_init() into the start_kernel()
sequence. The new call was not conditionalized based on platform.

Secure Boot is only implmented on the x86 platform; the definition
of efi_secure_boot_init() is in arch/x86/platform/efi/efi.c. On
non-x86 architectures, this caused a implicit-declaration error and
broke the build.

This commit wraps the call in #ifdef CONFIG_X86/#endif to eliminate
the error.

Signed-off-by: Dan Duval <dan.duval@oracle.com>

x86/efi: Set securelevel when loaded without efi stub

Orabug: 22353360

With UEFI Secure Boot enabled and securelevel set, after a kernel is loaded
using kexec and booted, securelevel is disabled. With the securelevel patch
set, the state of UEFI Secure Boot is queried when booted via the efi stub, but
kexec does not use the efi stub.

To allow kernels which are not loaded through the efi stub to properly set
securelevel as well, add a new init routine to start_kernel() to query the
state of UEFI Secure Boot and enable securelevel if needed.

Taken from https://bugzilla.redhat.com/attachment.cgi?id=1052836 .

Signed-off-by: Linn Crosetto <linn@hp.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>

KVM: svm: unconditionally intercept #DB

This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 22333633
CVE: 2015-8104
mainline commit cbdb967af3d54993f5814f1cee0ed311a055377d
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KVM: x86: work around infinite loop in microcode when #AC is delivered

It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions. This causes the
microcode to enter an infinite loop where the core never receives
another interrupt. The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).

Signed-off-by: Eric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 22333632
CVE: CVE-2015-5307
mainline commit 54a20552e1eae07aa240fa370a0293e006b5faed
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: module signing key verification on sparc
uek-rpm: rebuild module kabi list

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
ib_core: Add udata argument to alloc_shpd()

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek:
scsi: Fix a bdi reregistration race
i40e: Fix for recursive RTNL lock during PROMISC change

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
KABI: padding optimization

Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
Revert "netlink: Fix autobind race condition that leads to zero port ID"
Revert "netlink: Replace rhash_portid with bound"

uek-rpm: module signing key verification on sparc

Orabug: 21900415

Config fixes for bug:
ORACLE LINUX KERNEL MODULE SIGNING KEY VERIFICATION FAILS ON SPARC

Signed-off-by: Allen Pais <allen.pais@oracle.com>

scsi: Fix a bdi reregistration race

Unregister and reregister BDI devices in the proper order. This patch
avoids that the following kernel warning can get triggered:

WARNING: CPU: 7 PID: 203 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[<ffffffff814ff5a4>] dump_stack+0x4c/0x65
[<ffffffff810746ba>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81074736>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81237ca8>] sysfs_warn_dup+0x68/0x80
[<ffffffff81237d8e>] sysfs_create_dir_ns+0x7e/0x90
[<ffffffff81291f58>] kobject_add_internal+0xa8/0x320
[<ffffffff812923a0>] kobject_add+0x60/0xb0
[<ffffffff8138c937>] device_add+0x107/0x5e0
[<ffffffff8138d018>] device_create_groups_vargs+0xd8/0x100
[<ffffffff8138d05c>] device_create_vargs+0x1c/0x20
[<ffffffff8117f233>] bdi_register+0x63/0x2a0
[<ffffffff8117f497>] bdi_register_dev+0x27/0x30
[<ffffffff81281549>] add_disk+0x1a9/0x4e0
[<ffffffffa00c5739>] sd_probe_async+0x119/0x1d0 [sd_mod]
[<ffffffff8109a81a>] async_run_entry_fn+0x4a/0x140
[<ffffffff81091078>] process_one_work+0x1d8/0x7c0
[<ffffffff81091774>] worker_thread+0x114/0x460
[<ffffffff81097878>] kthread+0xf8/0x110
[<ffffffff8150801f>] ret_from_fork+0x3f/0x70

See also patch "block: destroy bdi before blockdev is unregistered"
(commit ID 6cd18e711dd8).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit bf2cf3baa20b0a6cd2d08707ef05dc0e992a8aa0)

Orabug: 22250360
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
Conflicts:
drivers/scsi/scsi_sysfs.c

i40e: Fix for recursive RTNL lock during PROMISC change

The sync_vsi_filters function can be called directly under RTNL
or through the timer subtask without one. This was causing a deadlock.

If sync_vsi_filters is called from a thread which held the lock,
and in another thread the PROMISC setting got changed we would
be executing the PROMISC change in the thread which already held
the lock alongside the other filter update. The PROMISC change
requires a reset if we are on a VEB, which requires it to be called
under RTNL.

Earlier the driver would call reset for PROMISC change without
checking if we were already under RTNL and would try to grab it
causing a deadlock. This patch changes the flow to see if we are
already under RTNL before trying to grab it.

Orabug: 22328907

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

KABI: padding optimization

Orabug : 22239883

Removed pads affecting the network datapath in sock , sk_buff
net , pci_dev and iptunnel

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

uek-rpm: rebuild module kabi list

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

ib_core: Add udata argument to alloc_shpd()

ib_core: Add udata argument to alloc_shpd()

Add udata argument to shared pd interface alloc_shpd()
consistent with evolution of other similar ib_core
interfaces so providers that wish to support it can use it.

For providers (like current Mellanox driver code) that
do not expect user user data, we assert a warning.

Orabug: 21884873

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Revert "netlink: Fix autobind race condition that leads to zero port ID"

Orabug: 22284865

This reverts commit 4e27762417669cb459971635be550eb7b5598286.

That commit, along with d48623677191e0f035d7afd344f92cf880b01f8e, have
been shown to produce a hang in the Oracle Real Application Clusters
(RAC) silent-installation procedure.

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Revert "netlink: Replace rhash_portid with bound"

Orabug: 22284865

This reverts commit d48623677191e0f035d7afd344f92cf880b01f8e.

That commit, along with 4e27762, have been shown to provoke a hang
in the Oracle Real Application Clusters (RAC) silent-installation
procedure.

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  mm-hugetlb-resv-map-memory-leak-for-placeholder-entries-v2
  mm/hugetlb.c: fix resv map memory leak for placeholder entries
  Prevent syncing frozen file system

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
ocfs2: fix SGID not inherited issue

Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
kbuild: Set objects.builtin dependency to bzImage for CONFIG_CTF

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek:
ixgbe: make a workaround to tx hang issue under dom

ixgbe: make a workaround to tx hang issue under dom

report 1 tx queue to net core to workaround the tx hang
issue reported in Xen environment.
The change is only limited to dom0 and baremetal is left unchanged.

Orabug: 22171500
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

mm-hugetlb-resv-map-memory-leak-for-placeholder-entries-v2

Orabug: 22302415

V2: The original version of the patch did not correctly handle placeholder
entries before the range to be deleted. The new check is more specific
and only matches placeholders at the start of range.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fd2e0def3e0954be0453b625ce12c48e4a83bc70)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

mm/hugetlb.c: fix resv map memory leak for placeholder entries

Orabug: 22302415

Dmitry Vyukov reported the following memory leak

unreferenced object 0xffff88002eaafd88 (size 32):
  comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s)
  hex dump (first 32 bytes):
    28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff  (.Nc....(.Nc....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<     inline     >] kmalloc include/linux/slab.h:458
    [<ffffffff815efa64>] region_chg+0x2d4/0x6b0 mm/hugetlb.c:398
    [<ffffffff815f0c63>] __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791
    [<     inline     >] vma_needs_reservation mm/hugetlb.c:1813
    [<ffffffff815f658e>] alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845
    [<     inline     >] hugetlb_no_page mm/hugetlb.c:3543
    [<ffffffff815fc561>] hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717
    [<ffffffff815fd349>] follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880
    [<ffffffff815a2bb2>] __get_user_pages+0x542/0xf30 mm/gup.c:497
    [<ffffffff815a400e>] populate_vma_page_range+0xde/0x110 mm/gup.c:919
    [<ffffffff815a4207>] __mm_populate+0x1c7/0x310 mm/gup.c:969
    [<ffffffff815b74f1>] do_mlock+0x291/0x360 mm/mlock.c:637
    [<     inline     >] SYSC_mlock2 mm/mlock.c:658
    [<ffffffff815b7a4b>] SyS_mlock2+0x4b/0x70 mm/mlock.c:648

Dmitry identified a potential memory leak in the routine region_chg, where
a region descriptor is not free'ed on an error path.

However, the root cause for the above memory leak resides in region_del.
In this specific case, a "placeholder" entry is created in region_chg.
The associated page allocation fails, and the placeholder entry is left in
the reserve map.  This is "by design" as the entry should be deleted when
the map is released.  The bug is in the region_del routine which is used
to delete entries within a specific range (and when the map is released).
region_del did not handle the case where a placeholder entry exactly
matched the start of the range range to be deleted.  In this case, the
entry would not be deleted and leaked.  The fix is to take these special
placeholder entries into account in region_del.

The region_chg error path leak is also fixed.

Fixes: feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8ea87f6c208bd070a2701b53d2479a415f4159f4)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Prevent syncing frozen file system

Orabug: 22332381

port from RH. https://lkml.org/lkml/2015/7/9/724
fs] vfs: Prevent syncing frozen file system (Lukas Czerner)
      [1243404 1241791]
port from RH. https://lkml.org/lkml/2015/7/9/487

From Lukas Czerner <>
Subject [RFC][PATCH] fs: Prevent syncing frozen file system
Date Thu, 9 Jul 2015 19:45:45 +0200

Currently we can end up in a deadlock because of broken
sb_start_write -> s_umount ordering.

The race goes like this:

- write the file
- unlink the file - final_iput will not be calles as file is opened
- freeze the file system
- Now simultaneously close the file and call sync (or syncfs on that
   particular file system). Sync will get to wait_sb_inodes() where it will
   grab the referece to the inode (__iget()) and later to call iput().
   If we manage to close the file and drop the reference in between those
   calls sync will attempt to do a iput_final() because the inode is now
   unlinked and we're holding the last reference to it. This will
   however block on a frozen file system (ext4_delete_inode for
   example).

Note that I've not been able to reproduce the issue, I've only seen this
happen once. However with some instrumentation (like msleep() in the
wait_sb_inodes() it can be achieved.

Fix this by properly doing sb_start_write/sb_end_write to prevent us
from fsfreeze.

Note that with this patch syncfs will block on the frozen file system
which is probably ok, but sync will block if any file system happens to
be frozen - not sure if that's a problem, but it's certainly different
from what we've been used to.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>

ocfs2: fix SGID not inherited issue

Oracle-bug: 22311520

commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue,
SGID of sub dir was not inherited from its parents dir. It is because SGID
is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten
by "mode" which don't have SGID set later.

Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>

kbuild: Set objects.builtin dependency to bzImage for CONFIG_CTF

For x86 architecture use dependency on bzImage target instead of
vmlinux, otherwise the linux_banner in the debug vmlinux and the
vmlinuz that is shipped are different because vmlinux is now getting
rebuilt for ctf.

Orabug: 17510915
Orabug: 22329011

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

Merge branch 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek:
Do not execute i40e_macaddr_init if the macaddr is default

Do not execute i40e_macaddr_init if the macaddr is default

Orabug 22302847

If the macaddr is not from Open Firmwre or IDPROM (i.e., defaults
macaddr was used) then do not call i40e_macaddr_init again, else
you will get a driver init failure like this:

[ 8127.050926] WARNING: CPU: 18 PID: 878 at kernel/irq/manage.c:1346
+__free_irq+0x9f/0x230()
[ 8127.050927] Trying to free already-free IRQ 177
:
[ 8127.051005]  [<ffffffff810dd7ef>] __free_irq+0x9f/0x230
[ 8127.051006]  [<ffffffff810dda1d>] free_irq+0x4d/0xb0
[ 8127.051013]  [<ffffffffa043ddd0>] i40e_clear_interrupt_scheme+0xb0/0xc0
+[i40e]
[ 8127.051018]  [<ffffffffa044b538>] i40e_probe.part.64+0x1018/0x1320 [i40e]
[ 8127.051023]  [<ffffffff813dfc2f>] ? acpi_ut_remove_reference+0x2f/0x33
[ 8127.051026]  [<ffffffff813dc79a>] ? acpi_rs_get_prt_method_data+0x50/0x6d
[ 8127.051029]  [<ffffffff8170b2b6>] ? mutex_lock+0x16/0x37
[ 8127.051034]  [<ffffffff81057e3e>] ? mp_map_pin_to_irq+0xee/0x250
[ 8127.051035]  [<ffffffff81058044>] ? mp_map_gsi_to_irq+0xa4/0xd0
[ 8127.051038]  [<ffffffff8104e404>] ? acpi_register_gsi_ioapic+0x54/0x1d0
[ 8127.051043]  [<ffffffff815c577e>] ? pci_conf1_read+0xbe/0x120
[ 8127.051045]  [<ffffffff815c99e3>] ? raw_pci_read+0x23/0x40
[ 8127.051048]  [<ffffffff813676f0>] ? pci_bus_read_config_word+0xa0/0xb0
[ 8127.051053]  [<ffffffff813709f0>] ? do_pci_enable_device+0xf0/0x120
[ 8127.051057]  [<ffffffffa044b862>] i40e_probe+0x22/0x30 [i40e]
                                     :

Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
dtrace: ensure return value of access_process_vm() is > 0

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
ocfs2: fix umask ignored issue

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
ksplice: correctly clear garbage on signal handling.

dtrace: ensure return value of access_process_vm() is > 0

If access_process_vm() returns a value <= 0, the loop intended to replace
NULs with whitespace in the ps string could turn into a memory sprayer
by incrementing its loop counter all the way to (uintptr_t)-1. This
often shows up as crashes with PTEs showing 0x20 stuffed inside, e.g.:

[  630.136149] PGD 66f7f067 PUD 802a5067 PMD 18fff0067 PTE 2020200208ad2025
[  630.136194] Bad pagetable: 000d [#1] SMP

[ 1698.220469] PGD 5cebd067 PUD 5ced6067 PMD 17fe6d067 PTE 8020202032652867
[ 1698.220511] Bad pagetable: 000d [#1] SMP

[  438.486319] PGD 136cee067 PUD 136d9b067 PMD 13ba2067 PTE 802020012c8bf867
[  438.486364] Bad pagetable: 000d [#1] SMP

though it can cause other crashes. To fix, bracket the loop logic in
a test of access_process_vm()'s return value.

Orabug: 22295336

Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>

ocfs2: fix umask ignored issue

Oracle-bug: 22155833

New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.

Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8f1eb48758aacf6c1ffce18179295adbf3bd7640)

ksplice: correctly clear garbage on signal handling.

The test for _TIF_SIGPENDING was inverted, and so the stack was being
cleared when there were no signals pending rather than signals pending.
Correctly test _TIF_SIGPENDING so that the freezer can be used to clear
the stack of garbage when applying Ksplice updates.

Orabug: 22194459

Reviewed-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>

Merge branch 'topic/uek-4.1/nfs-rdma' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/nfs-rdma' of git://ca-git.us.oracle.com/linux-uek:
NFSoRDMA: for local permissions, pull lkey value from the correct ib_mr

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"

NFSoRDMA: for local permissions, pull lkey value from the correct ib_mr

When not using local_dma_lkey, fmr_op_open() fetches a ib_mr associated
with local write permission. This value is stuffed into the returned
rpcrdma_ia, but ia->ri_dma_mr->lkey was accessed _before_ the ia->ri_dma_mr
was updated.

Fix this by pulling lkey directly from the newly fetched ib_mr rather than
the value previously stored in ia->ri_dma_mr.

Orabug: 22202841

Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"

Orabug: 22186705

This reverts commit 0ea14034782db03efc55a7d8dbfa7cd140e7e959.

This commit has been seen to cause hard hangs or crashes when
flock(..., LOCK_EX) is attempted on an NFSv4-mounted filesystem.

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  uek-rpm: builds: Enable kabi check
  uek-rpm: builds: generate module kabi files
  uek-rpm: builds: add kabi whitelist debug version

uek-rpm: builds: Enable kabi check

Orabug: 21882206

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: builds: generate module kabi files

Orabug: 17437969

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: builds: add kabi whitelist debug version

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: builds: add kabi name tags

uek-rpm: builds: add kabi name tags

Orabug: 17437969

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: builds: Add kabi whitelist

uek-rpm: builds: Add kabi whitelist

Orabug: 17437969

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  uek-rpm: configs: change the x86_64 default governor to ondemand
  uek-rpm: configs: sync up the EFIVAR_FS between ol6 and ol7
  uek-rpm: use the latest 0.5 version of linux-firmware

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  mm-hugetlbfs-fix-bugs-in-fallocate-hole-punch-of-areas-with-holes-v3
  mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes
  btrfs: Print Warning only if ENOSPC_DEBUG is enabled
  rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
pci: Limit VPD length for megaraid_sas adapter
KABI Padding to allow future extensions

Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
  dtrace: fire proc:::signal-send for queued signals too
  dtrace: correct signal-handle probe semantics
  dtrace: remove trailing space in psargs

mm-hugetlbfs-fix-bugs-in-fallocate-hole-punch-of-areas-with-holes-v3

Orabug: 22220400

V3:
  Add more descriptive comments and minor improvements as suggested by
  Naoya Horiguchi
v2:
  Make remove_inode_hugepages simpler after verifying truncate can not
  race with page faults here.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d4d7420e86e29267ff2e20226cbcadcf7e6eab8c)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes

Orabug: 22220400

Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole
punch code.  These problems are in the routine remove_inode_hugepages and
mostly occur in the case where there are holes in the range of pages to be
removed.  These holes could be the result of a previous hole punch or
simply sparse allocation.  The current code could access pages outside the
specified range.

remove_inode_hugepages handles both hole punch and truncate operations.
Page index handling was fixed/cleaned up so that the loop index always
matches the page being processed.  The code now only makes a single pass
through the range of pages as it was determined page faults could not race
with truncate.  A cond_resched() was added after removing up to
PAGEVEC_SIZE pages.

Some totally unnecessary code in hugetlbfs_fallocate() that remained from
early development was also removed.

Tested with fallocate tests submitted here:
http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/
And, some ftruncate tests under development

Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ef9a2b7a46755b6b2d4ab522c2ffa53c6e1a0729)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

btrfs: Print Warning only if ENOSPC_DEBUG is enabled

Orabug: 21626666

Signed-off-by : Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Bo Liu <bo.li.liu@oracle.com>

rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats

Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().

Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.

This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.

Orabug: 21857538

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

pci: Limit VPD length for megaraid_sas adapter

Reading or Writing of PCI VPD data causes system panic.
We saw this problem by running "lspci -vvv" in the beginning.
However this can be easily reproduced by running
cat /sys/bus/devices/XX../vpd

VPD length has been set as 32768 by default. Accessing vpd
will trigger read/write of 32k. This causes problem as we
could read data beyond the VPD end tag. Behaviour is un-
predictable when this happens. I see some other adapter doing
similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk
for Broadcom 5708S"))

I see there is an attempt to fix this right way.
https://patchwork.ozlabs.org/patch/534843/ or
https://lkml.org/lkml/2015/10/23/97

Tried to fix it this way, but problem is I dont see the proper
start/end TAGs(at least for this adapter) at all. The data is
mostly junk or zeros. This patch fixes the issue by setting the
vpd length to 0x80.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Orabug: 22104511

Changes since v2 -> v3
Changed the vpd length from 0 to 0x80 which leaves the
option open for someone to read first few bytes.

Changes since v1 -> v2
Removed the changes in pci_id.h. Kept all the vendor
ids in quirks.c

uek-rpm: configs: change the x86_64 default governor to ondemand

Orabug: 21910845

Signed-off-by: Todd Vierling <todd.vierling@oracle.com>

uek-rpm: configs: sync up the EFIVAR_FS between ol6 and ol7

Orabug: 21806900

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

KABI Padding to allow future extensions

Padding for struct Scsi_Host , pci_dev , pci_driver for future extension while preserving ABI

Orabug: 22227652

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

uek-rpm: use the latest 0.5 version of linux-firmware

Orabug: 22227047

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  uek-rpm: configs: Sparc64: enable RDS modules
  uek-rpm: ol7: update linux-firmware dependency to 20140911-0.1.git365e80c.0.4
  uek-rpm: configs: disable PS2_VMMOUSE to avoid vmware platform breakage
  uek-rpm: configs: ol7: don't set EFI_VARS_PSTORE_DEFAULT_DISABLE

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  mm: do not ignore mapping_gfp_mask in page cache allocation paths
  PCI: Set SR-IOV NumVFs to zero after enumeration
  virtio-net: drop NETIF_F_FRAGLIST
  blk-mq: avoid excessive boot delays with large lun counts

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
KABI PADDING FOR ORACLE SPECIFIC FUTURE EXTENSIONS
Disable VLAN 0 tagging for none VLAN traffic

Merge branch 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/sparc' of git://ca-git.us.oracle.com/linux-uek:
  SPARC64: UEK4 LDOMS DOMAIN SERVICES UPDATE 1
  i40e: Look up MAC address in Open Firmware or IDPROM
  sparc64, vdso: update the CLOCK_MONOTONIC_COARSE clock

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
  RDS: establish connection for legitimate remote RDMA message
  rds: remove the _reuse_ rds ib pool statistics
  RDS: Add support for per socket SO_TIMESTAMP for incoming messages
  RDS: Fix out-of-order RDS_CMSG_RDMA_SEND_STATUS
  Integrate Uvnic functionality into uek-4.1 Revision 8008
  1) S_IRWXU causing kernel soft crash changing to 0644 WARNING: CPU: 0 PID: 20907 at fs/sysfs/group.c:61 create_files+0x171/0x180() Oct 12 21:43:14 ovn87-180 kernel: [252606.588541] Attribute vhba_default_scsi_timeout: Invalid permissions 0700 [Rev 8008]
  1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XCM_OVN - Xsigo VP780/OSDN standalone Chassis, (add pvi) Type 1 - XSMP_XCM_NOUPLINK - EDR Without uplink (add public-network) Type 2 - XSMP_XCM_UPLINK -EDR with uplink (add public-network <with -if> 3) Intelligence in driver to support all the modes 4) Added Code for printing Multicast LID [Revision 8008] 5) removed style errors
  net/rds: start rdma listening after ib/iw initialization is done

mm: do not ignore mapping_gfp_mask in page cache allocation paths

page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path. Anyway it
is better to obey mapping gfp mask and prevent from later breakage.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6afdb859b71019143b8eecda02b8b29b03185055)

Orabug: 22066703

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

PCI: Set SR-IOV NumVFs to zero after enumeration

Orabug: 21547430

The enumeration path should leave NumVFs set to zero. But after
4449f079722c ("PCI: Calculate maximum number of buses required for VFs"),
we call virtfn_max_buses() in the enumeration path, which changes NumVFs.
This NumVFs change is visible via lspci and sysfs until a driver enables
SR-IOV.

Set NumVFs to zero after virtfn_max_buses() computes the maximum number of
buses.

Fixes: 4449f079722c ("PCI: Calculate maximum number of buses required for VFs")
Based-on-patch-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

virtio-net: drop NETIF_F_FRAGLIST

virtio-net: drop NETIF_F_FRAGLIST

Orabug: 22154074
CVE-2015-5156

virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.

A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.

Drop NETIF_F_FRAGLIST so we only get what we can handle.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

blk-mq: avoid excessive boot delays with large lun counts

Zhangqing Luo reported long boot times on a system with thousands of
LUNs when scsi-mq was enabled.  He narrowed the problem down to
blk_mq_add_queue_tag_set, where every queue is frozen in order to set
the BLK_MQ_F_TAG_SHARED flag.  Each added device will freeze all queues
added before it in sequence, which involves waiting for an RCU grace
period for each one.  We don't need to do this.  After the second queue
is added, only new queues need to be initialized with the shared tag.
We can do that by percolating the flag up to the blk_mq_tag_set, and
updating the newly added queue's hctxs if the flag is set.

This problem was introduced by commit 0d2602ca30e41 (blk-mq: improve
support for shared tags maps).

Reported-and-tested-by: Jason Luo <zhangqing.luo@oracle.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2404e607a9ee36db361bebe32787dafa1f7d6c00)

Orabug: 21879600
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>

RDS: establish connection for legitimate remote RDMA message

The first message to a remote node should prompt a new
connection even if it is RDMA operation via CMSG. So that
means before CMSG parsing, the connection needs to be
established. Commit 3d6e0fed8edc ("rds_rdma: rds_sendmsg
should return EAGAIN if connection not setup")' tried to
address that issue as part of bug 20232581.

But it inadvertently broke the QoS policy evaluation. Basically
QoS has opposite requirement where it needs information from
CMSG to evaluate if the message is legitimate to be sent over
the wire. It basically needs to know how the total payload
which should include the actual payload and additional rdma
bytes. It then evaluates total payload with the systems QoS
thresholds to determine if the message is legitimate to be
sent.

Patch addresses these two opposite requirement by fetching
only the rdma bytes information for QoS evaluation and let
the full CMSG parsing happen after the connection is
initiated. Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.

Orabug: 22139696

Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

rds: remove the _reuse_ rds ib pool statistics

Orabug: 22124214
fix a regress introduced by ceb99ba579a769f4e02375a3d52e36f44ae5f27f

The above commit introduced the two new statistics to rds_ib_statistics.

uint64_t s_ib_rdma_mr_1m_pool_reuse;
uint64_t s_ib_rdma_mr_8k_pool_reuse;

But didn't have rds-info changed accordingly thus rds-info gets shifted stats.

this removes these two stats.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>

RDS: Add support for per socket SO_TIMESTAMP for incoming messages

The SO_TIMESTAMP generates time stamp for each incoming RDS message
using the wall time. Result is returned via recv_msg() in a control
message as timeval (usec resolution).

User app can enable it by using SO_TIMESTAMP setsocketopt() at
SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the
time stamp in struct timeval format.

Orabug: 22190837

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Tested-by: Namrata Jampani <namrata.jampani@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

KABI PADDING FOR ORACLE SPECIFIC FUTURE EXTENSIONS

Orabug 21903851

Padding added before ABI freeze to allow future extensions while
preserving ABI on interface structures scsi_disk ,queue_limits,block_device_operations,
sk_buff,user_namespace,ip_tunnel,net,sock,scsi_cmnd ,scsi_device

Reviwed-by: Martin Petersen <martin.petersen@oracle.com>
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>