www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

commit | commitdiff | tree

Chuck Anderson [Fri, 28 Oct 2016 21:36:16 +0000 (14:36 -0700)]

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Fri, 28 Oct 2016 21:34:57 +0000 (14:34 -0700)]

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Mukesh Kacker [Thu, 27 Oct 2016 13:07:16 +0000 (06:07 -0700)]

mlx4_ib: remove WARN_ON() based on incorrect assumptions

A WARN_ON() was inserted when user data was introduced to the
ibv_cmd_alloc_shpd() by another infiniband provider to make
sure that no user data is sent to older providers such as this
one which do not expect it.

It assumed when no user data is sent the udata->inlen is zero.
The user-kernel API however always sends at least 8 octets
(which may not be initialized in case of provider libraries that
do not user user data).

We remove the WARN_ON and rely on providers not to touch that
field if their companion library is not expected to initialize it.

Orabug: 24972331

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Ashok Vairavan [Thu, 27 Oct 2016 19:31:52 +0000 (12:31 -0700)]

nvme: fix max_segments integer truncation

The block layer uses an unsigned short for max_segments. The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 24928835
Cherry picked commit: 45686b6198bd824f083ff5293f191d78db9d708a
Conflicts:
UEK4 QU2 nvme module doesn't have core.c file. All the functions
resides in pci.c. Hence, this patch is manually ported to the respective
function in pci.c
drivers/nvme/host/core.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Thu, 27 Oct 2016 17:53:17 +0000 (10:53 -0700)]

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Conflicts:
arch/x86/xen/enlighten.c

commit | commitdiff | tree

Guru Anbalagane [Tue, 25 Oct 2016 00:26:50 +0000 (17:26 -0700)]

Revert "ib/mlx4: Initialize multiple Mellanox HCAs in parallel"

This reverts commit a661980d2a809dbe208914b8eec46c18b78c18fb.

Orabug: 24951493
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Tue, 25 Oct 2016 08:15:49 +0000 (01:15 -0700)]

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Tue, 25 Oct 2016 08:12:56 +0000 (01:12 -0700)]

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Ashok Vairavan [Sat, 22 Oct 2016 01:02:59 +0000 (18:02 -0700)]

No ILOM web console keyboard support in ueknano kernel

Enable HID driver in the OL6 ueknano kernel config.

Orabug: 24946756
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Fri, 14 Oct 2016 23:47:49 +0000 (16:47 -0700)]

mlx4_core/ib: set the IB port MTU to 2K

'commit 096335b3f983 ("mlx4_core: Allow dynamic MTU configuration for IB
ports")' overwrite the default port MTU and sets it as 4K. Since this
directly impacts the HW VLs supported and Oracle workloads heavily uses
all supported 8 VLs for traffic classification, 2K default needs to
be kept as is.

We initilise it to default 2k so that the feature(dynamic MTU configuration)
is still available for non DB users to set the desired MTU value using sysctl.

Also for CX2 cards, commit 596c5ff4b7b3 ("net/mlx4: adjust initial
value of vl_cap in mlx4_SET_PORT") broke the vl_cap which made the
supported VLs to 4 irrespective of MTU size.

Orabug: 24946479

Tested-by: Pierre Orzechowski <pierre.e.orzechowski@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Knut Omang [Fri, 21 Oct 2016 06:23:39 +0000 (08:23 +0200)]

sif: cq: transfer headroom attribute to user mode

This commit makes sure old libsif versions works
with the driver while providing a forward compatible
way of making additional changes to the extra
headroom in the CQs.

We anticipate to be able to trim
down the extra entries once we have PQP errors
handled transparently. This commit then ensures that
the headroom is only set in one place, at the
driver side, and that user mode just can
pick up the configured headroom from the kernel.
This is done by providing the used headroom
in a formerly reserved 32 bit field, thus no changes
to the packet size is necessary.

Nevertheless we increment the abi version from
3.6 to 3.7 to allow libsif to detect whether
the headroom field can be trusted.

Orabug: 24926265

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: franklin <osl04sys_no_grp@oracle.com>

commit | commitdiff | tree

Knut Omang [Thu, 20 Oct 2016 12:06:14 +0000 (14:06 +0200)]

sif: Minor cleanup commit

- Remove some unused variables
- Minor changes due to bug fixes in hardware header
file generator

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

commit | commitdiff | tree

Knut Omang [Thu, 13 Oct 2016 10:07:33 +0000 (12:07 +0200)]

sif: Add vendor flag to support testing without oversized CQs

After introduction of extra CQ entries to reduce risk of
having duplicate completions overflow a CQ, we no longer can
trigger various CQ overflow scenarios without running a lot of
requests. We need to be able to test with a minimal set of operations
to allow co-sim based tests for further analysis.

Introduce a new vendor_flag no_x_cqe = 0x80 to turn off
the allocation of extra CQEs.

Orabug: 24919301

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

commit | commitdiff | tree

Wei Lin Guay [Thu, 13 Oct 2016 07:06:15 +0000 (09:06 +0200)]

sif: cq: Fix the max_cqe capability supported by SIF

Orabug: 24673784

This patch fixes an incomplete patch in commit "cq: Add
additional SIF visible cqes to CQ". The max_cqe
capability reported by query_device is incorrect because
it includes the SIF visible cqes.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

commit | commitdiff | tree

Vinay Shaw [Wed, 14 Sep 2016 20:31:13 +0000 (22:31 +0200)]

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

The initial commit with the same title was somehow probably
subject to a merge issue and it's effect got lost entirely
by a subsequent patch.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

commit | commitdiff | tree

Wei Lin Guay [Fri, 7 Oct 2016 08:07:11 +0000 (10:07 +0200)]

sif: qp: Fix modify_qp_hw from SQE to RTS

Orabug: 24810237

Fix an issue in modify_qp_hw from SQE to RTS returns
EPSC_MODIFY_CANNOT_CHANGE_QP_ATTR. In sif, qp transition
from SQE to RTS must explicitly set the req_access_error
to 0.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

commit | commitdiff | tree

Wei Lin Guay [Tue, 4 Oct 2016 10:41:32 +0000 (12:41 +0200)]

sif: pd: Implement Oracle ib_core compliance shared pd

Orabug: 24713410

shared pd is not an IBTA defined feature, but an Oracle
Linux extension. Even though PSIF can share a pd easily,
it must comply with the Oracle ib_core implementation which
requires a new pd "object" when reusing a pd (via share_pd
verbs).

Without a new pd "object", it causes a NULL pointer deference
during pd clean-up phase. Thus, this patch creates a new pd
"object" when reusing a pd, and this pd "object" is pointing
to the original pd index.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

commit | commitdiff | tree

Francisco Triviño [Fri, 30 Sep 2016 08:35:14 +0000 (10:35 +0200)]

sif: eq: Add timeout to the threaded interrupt handler

This commit implements a timeout that prevents soft lockup issues when
the threaded interrupt function (sif_intr_worker) keeps processing
events for a long period. If the timeout is reached, the threaded
handler returns IRQ_HANDLED even if there are more events to be
processed. In such a case, the coalescing mechanism will generate
an IRQ for the last event.

Orabug: 24839976

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

commit | commitdiff | tree

Majd Dibbiny [Mon, 19 Sep 2016 15:32:22 +0000 (18:32 +0300)]

IB/mlx4: Scatter CQs to different EQs

If the user does not request a specific comp vector, use a weight based
algorithm to set the EQ.

Orabug: 24705943

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Thu, 29 Sep 2016 18:07:11 +0000 (11:07 -0700)]

RDS: IB: fix panic with handlers running post teardown

Shutdown cqe reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet BUG, tasklet handler could
still run after tasklet_kill which lead can lead to kernel
panic. Fix for core tasklet code was proposed and accepted
upstream, but it comes with bagage of fixing quite a
few bad users of it. Also for receive, we have additional
kthread to take care.

The BUG fix done as part of Orabug 2446085, had an additional
assumption that reaping code won't reap all the CQEs after
QP moved to error state which was not correct. QP is
moved to error state as part of rdma_disconnect() and
all the CQEs are reaped by the loop properly.

Any handler running after above and trying to access the
qp/cq resources gets exposed to race conditions. Patch
fixes this race by makes sure that handlers returns
without any action post teardown.

Orabug: 24460805

Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Mon, 24 Oct 2016 00:05:13 +0000 (17:05 -0700)]

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Mon, 24 Oct 2016 00:02:46 +0000 (17:02 -0700)]

uek-rpm nano: replace linux-firmware dependency with linux-nano-firmware

The Linux nano kernel now uses linux-nano-firmware. Replace the
linux-firmware dependency with it.

Orabug: 24938352

Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Fri, 21 Oct 2016 02:38:24 +0000 (19:38 -0700)]

Merge branch topic/uek-4.1/dtrace of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Fri, 21 Oct 2016 02:29:52 +0000 (19:29 -0700)]

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Thu, 20 Oct 2016 23:56:47 +0000 (16:56 -0700)]

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Thu, 20 Oct 2016 23:53:37 +0000 (16:53 -0700)]

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Conflicts:
include/linux/mm.h

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Oct 2016 20:07:36 +0000 (13:07 -0700)]

mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619)
Orabug: 24926639
Conflicts:
        include/linux/mm.h
        mm/gup.c
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:36 +0000 (17:56 +0200)]

x86/acpi: store ACPI ids from MADT for future usage

Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3e9e57fad3d8530aa30787f861c710f598ddc4e7)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Filipe Manco [Thu, 15 Sep 2016 15:10:46 +0000 (17:10 +0200)]

xen-netback: fix error handling on netback_probe()

In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.

Signed-off-by: Filipe Manco <filipe.manco@neclab.eu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cce94483e47e8e3d74cf4475dea33f9fd4b6ad9f)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Fri, 29 Jul 2016 09:06:48 +0000 (11:06 +0200)]

xen: change the type of xen_vcpu_id to uint32_t

We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 55467dea2967259f21f4f854fc99d39cc5fea60e)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Mon, 15 Aug 2016 15:02:38 +0000 (09:02 -0600)]

xenbus: don't look up transaction IDs for ordinary writes

This should really only be done for XS_TRANSACTION_END messages, or
else at least some of the xenstore-* tools don't work anymore.

Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition")
Reported-by: Richard Schütz <rschuetz@uni-koblenz.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Richard Schütz <rschuetz@uni-koblenz.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9a035a40f7f3f6708b79224b86c5777a3334f7ea)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Bob Liu [Wed, 27 Jul 2016 09:42:04 +0000 (17:42 +0800)]

xen-blkfront: free resources if xlvbd_alloc_gendisk fails

Current code forgets to free resources in the failure path of
xlvbd_alloc_gendisk(), this patch fix it.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4e876c2bd37fbb5c37a4554a79cf979d486f0e82)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
drivers/block/xen-blkfront.c

commit | commitdiff | tree

Juergen Gross [Tue, 26 Jul 2016 12:15:11 +0000 (14:15 +0200)]

xen: add static initialization of steal_clock op to xen_time_ops

pv_time_ops might be overwritten with xen_time_ops after the
steal_clock operation has been initialized already. To prevent calling
a now uninitialized function pointer add the steal_clock static
initialization to xen_time_ops.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit d34c30cc1fa80f509500ff192ea6bc7d30671061)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:43 +0000 (17:56 +0200)]

xen/pvhvm: run xen_vcpu_setup() for the boot CPU

Historically we didn't call VCPUOP_register_vcpu_info for CPU0 for
PVHVM guests (while we had it for PV and ARM guests). This is usually
fine as we can use vcpu info in the shared_info page but when we try
booting on a vCPU with Xen's vCPU id > 31 (e.g. when we try to kdump
after crashing on this CPU) we're not able to boot.

Switch to always doing VCPUOP_register_vcpu_info for the boot CPU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee42d665d3f5db975caf87baf101a57235ddb566)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:42 +0000 (17:56 +0200)]

xen/evtchn: use xen_vcpu_id mapping

Use the newly introduced xen_vcpu_id mapping to get Xen's idea of vCPU
id for CPU0.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit cbbb4682394c45986a34d8c77a02e7a066e30235)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:41 +0000 (17:56 +0200)]

xen/events: fifo: use xen_vcpu_id mapping

EVTCHNOP_init_control has vCPU id as a parameter and Xen's idea of
vCPU id should be used. Use the newly introduced xen_vcpu_id mapping
to convert it from Linux's id.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit be78da1cf43db4c1a9e13af8b6754199a89d5d75)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:40 +0000 (17:56 +0200)]

xen/events: use xen_vcpu_id mapping in events_base

EVTCHNOP_bind_ipi and EVTCHNOP_bind_virq pass vCPU id as a parameter
and Xen's idea of vCPU id should be used. Use the newly introduced
xen_vcpu_id mapping to convert it from Linux's id.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 8058c0b897e7d1ba5c900cb17eb82aa0d88fca53)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:39 +0000 (17:56 +0200)]

x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info

shared_info page has space for 32 vcpu info slots for first 32 vCPUs
but these are the first 32 vCPUs from Xen's perspective and we should
map them accordingly with the newly introduced xen_vcpu_id mapping.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e15a8621935cac527b4e0ed4078d24c3e5ef73a6)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
arch/x86/xen/enlighten.c

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:38 +0000 (17:56 +0200)]

x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op

HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter
while Xen's idea is expected. In some cases these ideas diverge so we
need to do remapping.

Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr().

Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as
they're only being called by PV guests before perpu areas are
initialized. While the issue could be solved by switching to
early_percpu for xen_vcpu_id I think it's not worth it: PV guests will
probably never get to the point where their idea of vCPU id diverges
from Xen's.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ad5475f9faf5186b7f59de2c6481ee3e211f1ed7)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
arch/x86/xen/enlighten.c
arch/x86/xen/time.c

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:37 +0000 (17:56 +0200)]

xen: introduce xen_vcpu_id mapping

It may happen that Xen's and Linux's ideas of vCPU id diverge. In
particular, when we crash on a secondary vCPU we may want to do kdump
and unlike plain kexec where we do migrate_to_reboot_cpu() we try
booting on the vCPU which crashed. This doesn't work very well for
PVHVM guests as we have a number of hypercalls where we pass vCPU id
as a parameter. These hypercalls either fail or do something
unexpected.

To solve the issue introduce percpu xen_vcpu_id mapping. ARM and PV
guests get direct mapping for now. Boot CPU for PVHVM guest gets its
id from CPUID. With secondary CPUs it is a bit more
trickier. Currently, we initialize IPI vectors before these CPUs boot
so we can't use CPUID. Use ACPI ids from MADT instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 88e957d6e47f1232ad15b21e54a44f1147ea8c1b)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
arch/arm/xen/enlighten.c

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 30 Jun 2016 15:56:35 +0000 (17:56 +0200)]

x86/xen: update cpuid.h from Xen-4.7

Update cpuid.h header from xen hypervisor tree to get
XEN_HVM_CPUID_VCPU_ID_PRESENT definition.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit de2f5537b397249e91cafcbed4de64a24818542e)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

David Vrabel [Mon, 11 Jul 2016 14:45:51 +0000 (15:45 +0100)]

xen/evtchn: add IOCTL_EVTCHN_RESTRICT

IOCTL_EVTCHN_RESTRICT limits the file descriptor to being able to bind
to interdomain event channels from a specific domain. Event channels
that are already bound continue to work for sending and receiving
notifications.

This is useful as part of deprivileging a user space PV backend or
device model (QEMU). e.g., Once the device model as bound to the
ioreq server event channels it can restrict the file handle so an
exploited DM cannot use it to create or bind to arbitrary event
channels.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
(cherry picked from commit fbc872c38c8fed31948c85683b5326ee5ab9fccc)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Thu, 7 Jul 2016 07:38:13 +0000 (01:38 -0600)]

xen-blkback: really don't leak mode property

Commit 9d092603cc ("xen-blkback: do not leak mode property") left one
path unfixed; correct this.

Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit aea305e11f7a7af12aa2beb7c7e053a338659c49)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Thu, 7 Jul 2016 07:38:58 +0000 (01:38 -0600)]

xen-blkback: constify instance of "struct attribute_group"

The functions these get passed to have been taking pointers to const
since at least 2.6.16.

Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 530439484d2d9f2a7f1038b1afd3d3543ecc63f6)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Thu, 7 Jul 2016 08:05:46 +0000 (02:05 -0600)]

xen-blkfront: prefer xenbus_scanf() over xenbus_gather()

... for single items being collected: It is more typesafe (as the
compiler can check format string and to-be-written-to variable match)
and requires one less parameter to be passed.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ff595325ed556fb4b83af5b9ffd5c427c18405d7)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Thu, 7 Jul 2016 08:05:21 +0000 (02:05 -0600)]

xen-blkback: prefer xenbus_scanf() over xenbus_gather()

... for single items being collected: It is more typesafe (as the
compiler can check format string and to-be-written-to variable match)
and requires one less parameter to be passed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 6694389af9be4d1eb8d3313788a902f0590fb8c2)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Paul Gortmaker [Thu, 14 Jul 2016 00:18:59 +0000 (20:18 -0400)]

x86/xen: Audit and remove any unnecessary uses of module.h

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7a2463dcacee3f2f36c78418c201756372eeea6b)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Sat, 9 Jul 2016 00:35:30 +0000 (17:35 -0700)]

Input: xen-kbdfront - prefer xenbus_write() over xenbus_printf() where possible

... as being the simpler variant.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
(cherry picked from commit cd6763be8f553c7db421d38ddcb36466fb8512cd)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
drivers/input/misc/xen-kbdfront.c

commit | commitdiff | tree

Juergen Gross [Wed, 6 Jul 2016 05:00:30 +0000 (07:00 +0200)]

xen: support runqueue steal time on xen

Up to now reading the stolen time of a remote cpu was not possible in a
performant way under Xen. This made support of runqueue steal time via
paravirt_steal_rq_enabled impossible.

With the addition of an appropriate hypervisor interface this is now
possible, so add the support.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6ba286ad845799b135e5af73d1fbc838fa79f709)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Juergen Gross [Wed, 6 Jul 2016 05:00:28 +0000 (07:00 +0200)]

xen: update xen headers

Update some Xen headers to be able to use new functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 7ba8dba95cb227eb6c270b1aa77f942e45f5e47c)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 07:00:14 +0000 (01:00 -0600)]

xen-pciback: drop superfluous variables

req_start is simply an alias of the "offset" function parameter, and
req_end is being used just once in each function. (And both variables
were loop invariant anyway, so should at least have got initialized
outside the loop.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 1ad6344acfbf19288573b4a5fa0b07cbb5af27d7)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:59:35 +0000 (00:59 -0600)]

xen-pciback: short-circuit read path used for merging write values

There's no point calling xen_pcibk_config_read() here - all it'll do is
return whatever conf_space_read() returns for the field which was found
here (and which would be found there again). Also there's no point
clearing tmp_val before the call.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee87d6d0d36d98c550f99274a81841033226e3bf)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:58:58 +0000 (00:58 -0600)]

xen-pciback: use const and unsigned in bar_init()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 585203609c894db11dea724b743c04d0c9927f39)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:58:19 +0000 (00:58 -0600)]

xen-pciback: simplify determination of 64-bit memory resource

Other than for raw BAR values, flags are properly separated in the
internal representation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c8670c22e04e4e42e752cc5b53922106b3eedbda)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:57:43 +0000 (00:57 -0600)]

xen-pciback: fold read_dev_bar() into its now single caller

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6ad2655d87d2d35c1de4500402fae10fe7b30b4a)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:57:07 +0000 (00:57 -0600)]

xen-pciback: drop rom_init()

It is now identical to bar_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 664093bb6b797c8ba0a525ee0a36ad8cbf89413e)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Jan Beulich [Wed, 6 Jul 2016 06:56:27 +0000 (00:56 -0600)]

xen-pciback: drop unused function parameter of read_dev_bar()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c6e4caa2006ab82587a3648967314ec92569a98)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Amitoj Kaur Chawla [Wed, 29 Jun 2016 15:00:38 +0000 (20:30 +0530)]

x86/xen: Use DIV_ROUND_UP

The kernel.h macro DIV_ROUND_UP performs the computation
(((n) + (d) - 1) /(d)) but is perhaps more readable.

The Coccinelle script used to make this change is as follows:
@haskernel@
@@

@depends on haskernel@
expression n,d;
@@

(
- (n + d - 1) / d
+ DIV_ROUND_UP(n,d)
|
- (n + (d - 1)) / d
+ DIV_ROUND_UP(n,d)
)

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 585423c8c4d2f39a2c299bc6dd16433e6141fba5)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Bhaktipriya Shridhar [Tue, 31 May 2016 16:56:30 +0000 (22:26 +0530)]

xen: xenbus: Remove create_workqueue

System workqueues have been able to handle high level of concurrency
for a long time now and there's no reason to use dedicated workqueues
just to gain concurrency. Replace dedicated xenbus_frontend_wq with the
use of system_wq.

Unlike a dedicated per-cpu workqueue created with create_workqueue(),
system_wq allows multiple work items to overlap executions even on
the same CPU; however, a per-cpu workqueue doesn't have any CPU
locality or global ordering guarantees unless the target CPU is
explicitly specified and the increase of local concurrency shouldn't
make any difference.

In this case, there is only a single work item, increase of concurrency
level by switching to system_wq should not make any difference.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 5ee405d9d234ee5641741c07a654e4c6ba3e2a9d)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Bhaktipriya Shridhar [Wed, 1 Jun 2016 14:15:08 +0000 (19:45 +0530)]

xen: xen-pciback: Remove create_workqueue

System workqueues have been able to handle high level of concurrency
for a long time now and there's no reason to use dedicated workqueues
just to gain concurrency. Replace dedicated xen_pcibk_wq with the
use of system_wq.

Unlike a dedicated per-cpu workqueue created with create_workqueue(),
system_wq allows multiple work items to overlap executions even on
the same CPU; however, a per-cpu workqueue doesn't have any CPU
locality or global ordering guarantees unless the target CPU is
explicitly specified and thus the increase of local concurrency shouldn't
make any difference.

Since the work items could be pending, flush_work() has been used in
xen_pcibk_disconnect(). xen_pcibk_xenbus_remove() calls free_pdev()
which in turn calls xen_pcibk_disconnect() for every pdev to ensure that
there is no pending task while disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 429eafe60943bdfa33b15540ab2db5642a1f8c3c)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Boris Ostrovsky [Tue, 21 Jun 2016 14:17:33 +0000 (10:17 -0400)]

xen/PMU: Log VPMU initialization error at lower level

This will match how PMU errors are reported at check_hw_exists()'s
msr_fail label, which is reached when VPMU initialzation fails.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6ab9507ed96a6c0b24174d3430064a90b3dddd0a)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Juergen Gross [Wed, 20 Apr 2016 12:03:32 +0000 (20:03 +0800)]

xen: add steal_clock support on x86

The pv_time_ops structure contains a function pointer for the
"steal_clock" functionality used only by KVM and Xen on ARM. Xen on x86
uses its own mechanism to account for the "stolen" time a thread wasn't
able to run due to hypervisor scheduling.

Add support in Xen arch independent time handling for this feature by
moving it out of the arm arch into drivers/xen and remove the x86 Xen
hack.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ecb23dc6f2eff0ce64dd60351a81f376f13b12cc)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
arch/arm/xen/enlighten.c

commit | commitdiff | tree

Muhammad Falak R Wani [Sun, 24 Apr 2016 12:03:32 +0000 (20:03 +0800)]

xen: use vma_pages().

Replace explicit computation of vma page count by a call to
vma_pages().

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c7ebf9d9c6b4e9402b978da0b0785db4129c1f79)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:32 +0000 (20:03 +0800)]

ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services

When running on Xen hypervisor, runtime services are supported through
hypercall. Add a Xen specific function to initialize runtime services.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit be1aaf4e4026118e4191117a48f8a8078d1c0ed4)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:31 +0000 (20:03 +0800)]

XEN: EFI: Move x86 specific codes to architecture directory

Move x86 specific codes to architecture directory and export those EFI
runtime service functions. This will be useful for initializing runtime
service on ARM later.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a62ed500307bfaf4c1a818b69f7c1e7df1039a16)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
drivers/xen/efi.c

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:26 +0000 (20:03 +0800)]

xen/hvm/params: Add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ

This new delivery type which is for ARM shares the same value with
HVM_PARAM_CALLBACK_TYPE_VECTOR which is for x86.

val[15:8] is flag: val[7:0] is a PPI.
To the flag, bit 8 stands the interrupt mode is edge(1) or level(0) and
bit 9 stands the interrupt polarity is active low(1) or high(0).

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 383ff518a79fe3dcece579b9d30be77b219d10f8)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:25 +0000 (20:03 +0800)]

Xen: public/hvm: sync changes of HVM_PARAM_CALLBACK_VIA ABI from Xen

Sync the changes of HVM_PARAM_CALLBACK_VIA ABI introduced by
Xen commit <ca5c54b6ff05> (public/hvm: export the HVM_PARAM_CALLBACK_VIA
ABI in the API).

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit b6f0bcc23fa9cb32752cbf263d4014a21f132f92)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:24 +0000 (20:03 +0800)]

Xen: ARM: Add support for mapping AMBA device mmio

Add a bus_notifier for AMBA bus device in order to map the device
mmio regions when DOM0 booting with ACPI.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 5789afeb0efb4b4eb914ee10c12b597044cf1d22)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:23 +0000 (20:03 +0800)]

Xen: ARM: Add support for mapping platform device mmio

Add a bus_notifier for platform bus device in order to map the device
mmio regions when DOM0 booting with ACPI.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 4ba04bec3755b765bb10b21943afbee60c33288d)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:22 +0000 (20:03 +0800)]

xen: memory : Add new XENMAPSPACE type XENMAPSPACE_dev_mmio

Add a new type of Xen map space for Dom0 to map device's MMIO region.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 712a5b77cb5a1d3028a8c202f4f839802a03f19f)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:20 +0000 (20:03 +0800)]

Xen: xlate: Use page_to_xen_pfn instead of page_to_pfn

Make xen_xlate_map_ballooned_pages work with 64K pages. In that case
Kernel pages are 64K in size but Xen pages remain 4K in size. Xen pfns
refer to 4K pages.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 975fac3c4f38e0b47514abdb689548a8e9971081)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shannon Zhao [Thu, 7 Apr 2016 12:03:19 +0000 (20:03 +0800)]

xen/grant-table: Move xlated_setup_gnttab_pages to common place

Move xlated_setup_gnttab_pages to common place, so it can be reused by
ARM to setup grant table.

Rename it to xen_xlate_map_ballooned_pages.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 243848fc018cb98c2a70c39fe1f93eb266c79835)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Andy Lutomirski [Tue, 24 May 2016 22:48:38 +0000 (15:48 -0700)]

x86/xen: Simplify set_aliased_prot()

A year ago, via the following commit:

aa1acff356bb ("x86/xen: Probe target addresses in set_aliased_prot() before the hypercall")

I added an explicit probe to work around a hypercall issue. The code can
be simplified by using probe_kernel_read().

No change in functionality.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0706f1a2538e481194514197298cca6b5e3f2638.1464129798.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 99158f10e91768d34c5004c40c42f802b719bcae)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Andy Lutomirski [Thu, 30 Jul 2015 21:31:31 +0000 (14:31 -0700)]

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit aa1acff356bbedfd03b544051f5b371746735d89)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

commit | commitdiff | tree

Shan Hai [Wed, 12 Oct 2016 14:24:57 +0000 (22:24 +0800)]

drivers/nvme: provide a module parameter for setting number of I/O queues

Orabug: 24914952

The current NVME driver allocates I/O queue per-cpu and alloctes IRQ
per-queue for the devices, a large number of IRQs will be allotted to
the I/O queues on a large NUMA/SMP system with multiple NVME devices
installed because of this design.

It would cause failure of CPU hotplug operations on the above mentioned
system, the problem is that the CPU cores could not be hotplugged after
certain number of them are offlined because the remaining online CPU cores
have not enough IRQ vectors to accept the large number of migrated IRQs.

This patch fixes it by providing a way to reduce the NVME queue IRQs to
an acceptable number.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

commit | commitdiff | tree

Jens Axboe [Wed, 24 Aug 2016 21:38:01 +0000 (15:38 -0600)]

blk-mq: improve warning for running a queue on the wrong CPU

__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online. If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0e87e58bf60edb6bb28e493c7a143f41b091a5e5)

Orabug: 24914952
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

commit | commitdiff | tree

Shan Hai [Sun, 9 Oct 2016 15:13:10 +0000 (08:13 -0700)]

blk-mq: fix freeze queue race

Orabug: 24914952

There are several race conditions while freezing queue.

When unfreezing queue, there is a small window between decrementing
q->mq_freeze_depth to zero and percpu_ref_reinit() call with
q->mq_usage_counter.  If the other calls blk_mq_freeze_queue_start()
in the window, q->mq_freeze_depth is increased from zero to one and
percpu_ref_kill() is called with q->mq_usage_counter which is already
killed.  percpu refcount should be re-initialized before killed again.

Also, there is a race condition while switching to percpu mode.
percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be
executed at the same time as the following scenario is possible:

1. q->mq_usage_counter is initialized in atomic mode.
   (atomic counter: 1)

2. After the disk registration, a process like systemd-udev starts
   accessing the disk, and successfully increases refcount successfully
   by percpu_ref_tryget_live() in blk_mq_queue_enter().
   (atomic counter: 2)

3. In the final stage of initialization, q->mq_usage_counter is being
   switched to percpu mode by percpu_ref_switch_to_percpu() in
   blk_mq_finish_init().  But if CONFIG_PREEMPT_VOLUNTARY is enabled,
   the process is rescheduled in the middle of switching when calling
   wait_event() in __percpu_ref_switch_to_percpu().
   (atomic counter: 2)

4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze
   request queue.  q->mq_usage_counter is decreased and marked as
   DEAD.  Wait until all requests have finished.
   (atomic counter: 1)

5. The process rescheduled in the step 3. is resumed and finishes
   all remaining work in __percpu_ref_switch_to_percpu().
   A bias value is added to atomic counter of q->mq_usage_counter.
   (atomic counter: PERCPU_COUNT_BIAS + 1)

6. A request issed in the step 2. is finished and q->mq_usage_counter
   is decreased by blk_mq_queue_exit().  q->mq_usage_counter is DEAD,
   so atomic counter is decreased and no release handler is called.
   (atomic counter: PERCPU_COUNT_BIAS)

7. CPU hotplug handling in the step 4. will wait forever as
   q->mq_usage_counter will never be zero.

Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed
at the same time.  Because both functions could call
__percpu_ref_switch_to_percpu() which adds the bias value and
initialize percpu counter.

Fix those races by serializing with per-queue mutex.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
(cherry picked from https://patchwork.kernel.org/patch/7269471/)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Sun, 16 Oct 2016 03:41:32 +0000 (20:41 -0700)]

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Ashok Vairavan [Mon, 10 Oct 2016 20:31:35 +0000 (13:31 -0700)]

nvme: Remove RCU namespace protection

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9 nvme: switch to RCU freeing the namespace]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 32f0c4afb4363e31dad49202f1554ba591d649f2)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Ashok Vairavan [Mon, 10 Oct 2016 19:15:14 +0000 (12:15 -0700)]

nvme: synchronize access to ctrl->namespaces

Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Ashok Vairavan [Mon, 10 Oct 2016 19:13:01 +0000 (12:13 -0700)]

NVMe: Implement namespace list scanning

The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation. To be relaxed
later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Chuck Anderson [Fri, 14 Oct 2016 18:13:30 +0000 (11:13 -0700)]

Merge branch topic/uek-4.1/uek-carry of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Fri, 14 Oct 2016 18:08:19 +0000 (11:08 -0700)]

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Fri, 14 Oct 2016 18:03:11 +0000 (11:03 -0700)]

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Chuck Anderson [Fri, 14 Oct 2016 17:52:52 +0000 (10:52 -0700)]

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

commit | commitdiff | tree

Kris Van Hees [Wed, 14 Sep 2016 10:40:19 +0000 (06:40 -0400)]

dtrace: ensure new SDT info generation works on sparc64

Due to addresses on sparc64 being in the lower range of the memory map, the
calculated addresses for SDT probe points were represented with less hex
characters then their function base address counterparts (obtained from
objdump output). This messed up the sorting based on address values, and
resulted in all probe points being associated with the last function in the
list.

This commit ensures that all addresses are 16 hex characters long.

Orabug: 24655168

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

commit | commitdiff | tree

Sabrina Dubroca [Tue, 11 Oct 2016 23:35:06 +0000 (19:35 -0400)]

net: add recursion limit to GRO

Orabug: 24829124
CVE: CVE-2016-7039

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.

Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd () queasysnail net>
Reviewed-by: Jiri Benc <jbenc () redhat com>
Acked-by: Hannes Frederic Sowa <hannes () stressinduktion org>
(cherry picked from commit e71f3b1fca2ae5d0ae9d9c1a02a93d52beaae322)

Signed-off-by: Brian Maly <brian.maly@oracle.com>

commit | commitdiff | tree

Junxiao Bi [Fri, 9 Sep 2016 01:43:20 +0000 (09:43 +0800)]

ocfs2: fix trans extend while free cached blocks

Orabug: 24759174

Root cause of this issue is the same with the one fixed by last patch,
but this time credits for allocator inode and group descriptor may not
be consumed before trans extend.

The following error was caught.

[  685.240276] WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
[  685.240294] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[  685.240296] CPU: 0 PID: 2037 Comm: rm Tainted: G        W       4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
[  685.240296] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[  685.240298]  000000000000010d ffff88007ac3f808 ffffffff816bc5bc 000000000000010d
[  685.240300]  0000000000000000 ffff88007ac3f848 ffffffff81081475 ffff88007ac3f828
[  685.240301]  ffff880037bbf000 ffff880037688210 0000000000000095 0000000000000050
[  685.240301] Call Trace:
[  685.240305]  [<ffffffff816bc5bc>] dump_stack+0x48/0x5c
[  685.240308]  [<ffffffff81081475>] warn_slowpath_common+0x95/0xe0
[  685.240310]  [<ffffffff810814da>] warn_slowpath_null+0x1a/0x20
[  685.240313]  [<ffffffffa0080993>] start_this_handle+0x4c3/0x510 [jbd2]
[  685.240317]  [<ffffffffa0088f95>] ? __jbd2_log_start_commit+0xe5/0xf0 [jbd2]
[  685.240319]  [<ffffffff810c4eb3>] ? __wake_up+0x53/0x70
[  685.240322]  [<ffffffffa0080b41>] jbd2__journal_restart+0x161/0x1b0 [jbd2]
[  685.240325]  [<ffffffffa0080ba3>] jbd2_journal_restart+0x13/0x20 [jbd2]
[  685.240340]  [<ffffffffa06d1cf4>] ocfs2_extend_trans+0x74/0x220 [ocfs2]
[  685.240347]  [<ffffffffa069762b>] ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
[  685.240349]  [<ffffffff810e9131>] ? internal_add_timer+0x91/0xc0
[  685.240356]  [<ffffffffa06981b0>] ocfs2_run_deallocs+0x70/0x270 [ocfs2]
[  685.240363]  [<ffffffffa06a2894>] ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
[  685.240374]  [<ffffffffa0744a40>] ? ocfs2_xattr_tree_et_ops+0x60/0xfffffffffffe8c00 [ocfs2]
[  685.240384]  [<ffffffffa06d1960>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[  685.240385]  [<ffffffff81202303>] ? __sb_end_write+0x33/0x70
[  685.240394]  [<ffffffffa06ca57d>] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
[  685.240402]  [<ffffffffa06ca1f4>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[  685.240409]  [<ffffffffa06caed6>] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
[  685.240415]  [<ffffffffa06ca1f4>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[  685.240422]  [<ffffffffa06cb6e2>] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
[  685.240424]  [<ffffffff812298c9>] ? __inode_wait_for_writeback+0x69/0xc0
[  685.240437]  [<ffffffffa0732100>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb500 [ocfs2]
[  685.240444]  [<ffffffffa06cc1f8>] ocfs2_evict_inode+0x28/0x60 [ocfs2]
[  685.240445]  [<ffffffff8121b81b>] evict+0xab/0x1a0
[  685.240456]  [<ffffffffa0732100>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb500 [ocfs2]
[  685.240457]  [<ffffffff8121ba06>] iput_final+0xf6/0x190
[  685.240458]  [<ffffffff8121bb68>] iput+0xc8/0xe0
[  685.240460]  [<ffffffff8120f9b7>] do_unlinkat+0x1b7/0x310
[  685.240462]  [<ffffffff81126dbc>] ? __audit_syscall_entry+0xac/0x110
[  685.240464]  [<ffffffff810236cc>] ? do_audit_syscall_entry+0x6c/0x70
[  685.240465]  [<ffffffff81023823>] ? syscall_trace_enter_phase1+0x153/0x180
[  685.240467]  [<ffffffff8120fd52>] SyS_unlinkat+0x22/0x40
[  685.240468]  [<ffffffff816c122e>] system_call_fastpath+0x12/0x71
[  685.240469] ---[ end trace a62437cb060baa71 ]---
[  685.240470] JBD2: rm wants too many credits (149 > 128)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

commit | commitdiff | tree

Junxiao Bi [Thu, 8 Sep 2016 06:57:15 +0000 (14:57 +0800)]

ocfs2: fix trans extend while flush truncate log

Orabug: 24759174

Every time,  ocfs2_extend_trans() included a credit for truncate log inode,
but as that inode had been managed by jbd2 running transaction first time,
it will not consume that credit until jbd2_journal_restart(). Since total
credits to extend always included the un-consumed ones, there will be more
and more un-consumed credit, at last jbd2_journal_restart() will fail due
to credit number over the half of max transction credit.

The following error was caught when unlink a large file with many extents.

[233096.013936] ------------[ cut here ]------------
[233096.018586] WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
[233096.028335] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[233096.081751] CPU: 0 PID: 13626 Comm: unlink Tainted: G        W       4.1.12-37.6.3.el6uek.x86_64 #2
[233096.088556] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[233096.093125]  000000000000010d ffff88000018b768 ffffffff816bc5bc 000000000000010d
[233096.099082]  0000000000000000 ffff88000018b7a8 ffffffff81081475 ffff88000018b788
[233096.105038]  ffff88007a99a000 ffff88007b573390 00000000000000fb 0000000000000050
[233096.110540] Call Trace:
[233096.111893]  [<ffffffff816bc5bc>] dump_stack+0x48/0x5c
[233096.114637]  [<ffffffff81081475>] warn_slowpath_common+0x95/0xe0
[233096.117797]  [<ffffffff810814da>] warn_slowpath_null+0x1a/0x20
[233096.120984]  [<ffffffffa0080993>] start_this_handle+0x4c3/0x510 [jbd2]
[233096.124505]  [<ffffffffa0088f95>] ? __jbd2_log_start_commit+0xe5/0xf0 [jbd2]
[233096.128115]  [<ffffffff810c4eb3>] ? __wake_up+0x53/0x70
[233096.130924]  [<ffffffffa0080b41>] jbd2__journal_restart+0x161/0x1b0 [jbd2]
[233096.134523]  [<ffffffffa0080ba3>] jbd2_journal_restart+0x13/0x20 [jbd2]
[233096.137986]  [<ffffffffa06d1d94>] ocfs2_extend_trans+0x74/0x220 [ocfs2]
[233096.141407]  [<ffffffffa06d156a>] ? ocfs2_journal_dirty+0x3a/0x90 [ocfs2]
[233096.144921]  [<ffffffffa0692943>] ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
[233096.148819]  [<ffffffffa0697ace>] __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
[233096.152644]  [<ffffffffa0697304>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x1f0 [ocfs2]
[233096.157310]  [<ffffffffa069f768>] ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
[233096.161099]  [<ffffffffa0696777>] ? __ocfs2_find_path+0x187/0x2d0 [ocfs2]
[233096.164612]  [<ffffffffa06a2673>] ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
[233096.168204]  [<ffffffffa0744ac0>] ? ocfs2_xattr_tree_et_ops+0x60/0xfffffffffffe8c20 [ocfs2]
[233096.172539]  [<ffffffffa06d1a00>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[233096.176285]  [<ffffffff81202303>] ? __sb_end_write+0x33/0x70
[233096.179226]  [<ffffffffa06ca61d>] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
[233096.183009]  [<ffffffffa06ca294>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[233096.186738]  [<ffffffffa06caf76>] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
[233096.190165]  [<ffffffffa06ca294>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[233096.193846]  [<ffffffffa06cb782>] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
[233096.197274]  [<ffffffff812298c9>] ? __inode_wait_for_writeback+0x69/0xc0
[233096.200736]  [<ffffffffa0732180>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb520 [ocfs2]
[233096.205146]  [<ffffffffa06cc298>] ocfs2_evict_inode+0x28/0x60 [ocfs2]
[233096.208462]  [<ffffffff8121b81b>] evict+0xab/0x1a0
[233096.211020]  [<ffffffffa0732180>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb520 [ocfs2]
[233096.215396]  [<ffffffff8121ba06>] iput_final+0xf6/0x190
[233096.218169]  [<ffffffff8121bb68>] iput+0xc8/0xe0
[233096.220586]  [<ffffffff8120f9b7>] do_unlinkat+0x1b7/0x310
[233096.223487]  [<ffffffff8106ae5b>] ? __do_page_fault+0x18b/0x480
[233096.226655]  [<ffffffff81126dbc>] ? __audit_syscall_entry+0xac/0x110
[233096.230009]  [<ffffffff810236cc>] ? do_audit_syscall_entry+0x6c/0x70
[233096.233346]  [<ffffffff81023823>] ? syscall_trace_enter_phase1+0x153/0x180
[233096.237103]  [<ffffffff8120fb26>] SyS_unlink+0x16/0x20
[233096.239800]  [<ffffffff816c122e>] system_call_fastpath+0x12/0x71
[233096.244346] ---[ end trace 28aa7410e69369cf ]---
[233096.247798] JBD2: unlink wants too many credits (251 > 128)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

commit | commitdiff | tree

Xue jiufei [Fri, 25 Mar 2016 21:21:44 +0000 (14:21 -0700)]

ocfs2: extend enough credits for freeing one truncate record while replaying truncate records

Orabug: 24759174

Now function ocfs2_replay_truncate_records() first modifies tl_used,
then calls ocfs2_extend_trans() to extend transactions for gd and alloc
inode used for freeing clusters. jbd2_journal_restart() may be called
and it may happen that tl_used in truncate log is decreased but the
clusters are not freed, which means these clusters are lost. So we
should avoid extending transactions in these two operations.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 102c2595aa193f598c0f4b1bf2037d168c80e551)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

commit | commitdiff | tree

Johannes Thumshirn [Tue, 5 Apr 2016 09:50:45 +0000 (11:50 +0200)]

Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"

Now that we've done a more comprehensive fix with the intermediate
target state we can remove the previous hack introduced with commit
90a88d6ef88e ("scsi: fix soft lockup in scsi_remove_target() on module
removal").

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 305c2e71b3d733ec065cb716c76af7d554bd5571)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

ashok.vairavan [Sat, 8 Oct 2016 01:47:01 +0000 (21:47 -0400)]

scsi: Add intermediate STARGET_REMOVE state to scsi_target_state

Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
state is only valid in the path from scsi_remove_target() to
scsi_target_destroy() indicating this target is going to be removed.

This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI]
scsi_remove_target: fix softlockup regression on hot remove") and
40998193560d ("scsi: restart list search after unlock in
scsi_remove_target") in a more comprehensive way.

[mkp: Included James' fix for scsi_target_destroy()]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Cc: stable@vger.kernel.org
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 24844559
Mainline commit: f05795d3d771f30a7bdc3a138bf714b06d42aa95
Conflicts:
    The following changes are done to fix kabi breakage.

        #ifndef __GENKSYMS__
        STARGET_REMOVE,
        #endif

commit | commitdiff | tree

James Bottomley [Wed, 10 Feb 2016 16:03:26 +0000 (08:03 -0800)]

scsi: fix soft lockup in scsi_remove_target() on module removal

This softlockup is currently happening:

[  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29]
[  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed
d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda
_core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_
fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th
ermal
[  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G           O    4.4.0-rc5-2.g1e923a3-default #1
[  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E           /D2164-A1, BIOS 5.00 R1.10.2164.A1               05/08/2006
[  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
[  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
[  444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1
[  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
[  444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286
[  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
[  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0
[  444.088002] Stack:
[  444.088002]  f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90
[  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040
[  444.088002]  f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004
[  444.088002] Call Trace:
[  444.088002]  [<c066b0f7>] scsi_remove_target+0x167/0x1c0
[  444.088002]  [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc]
[  444.088002]  [<c026cb25>] process_one_work+0x155/0x3e0
[  444.088002]  [<c026cde7>] worker_thread+0x37/0x490
[  444.088002]  [<c027214b>] kthread+0x9b/0xb0
[  444.088002]  [<c07e72c1>] ret_from_kernel_thread+0x21/0x40

What appears to be happening is that something has pinned the target
so it can't go into STARGET_DEL via final release and the loop in
scsi_remove_target spins endlessly until that happens.

The fix for this soft lockup is to not keep looping over a device that
we've called remove on but which hasn't gone into DEL state.  This
patch will retain a simplistic memory of the last target and not keep
looping over it.

Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
(cherry picked from commit 90a88d6ef88edcfc4f644dddc7eef4ea41bccf8b)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Christoph Hellwig [Mon, 19 Oct 2015 14:35:46 +0000 (16:35 +0200)]

scsi: restart list search after unlock in scsi_remove_target

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop.  This bug was introduced by

commit bc3f02a795d3b4faa99d37390174be2a75d091bd
Author: Dan Williams <djbw@fb.com>
Date:   Tue Aug 28 22:12:10 2012 -0700

    [SCSI] scsi_remove_target: fix softlockup regression on hot remove

Which was itself trying to fix a reported soft lockup issue

http://thread.gmane.org/gmane.linux.kernel/1348679

However, we believe even with this revert of the original patch, the soft
lockup problem has been fixed by

commit f2495e228fce9f9cec84367547813cbb0d6db15a
Author: James Bottomley <JBottomley@Parallels.com>
Date:   Tue Jan 21 07:01:41 2014 -0800

    [SCSI] dual scan thread bug fix

Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
prior history down.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: bc3f02a795d3b4faa99d37390174be2a75d091bd
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit 40998193560dab6c3ce8d25f4fa58a23e252ef38)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Thu, 13 Oct 2016 17:44:03 +0000 (10:44 -0700)]

RDS: ib: build fix rds_conn_drop() takes extra parameter now

rds_conn_drop() now takes reason as a parametr. Fixes
commit 052f62099 build issue

Orabug: 22506032

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Avinash Repaka [Wed, 7 Sep 2016 00:54:02 +0000 (17:54 -0700)]

RDS: Drop the connection as part of cancel to avoid hangs

To avoid waiting indefinitely in rds_send_drop_to(), drop the connection
proactively if one of the cancelled messages is mapped.

Orabug: 22506032

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Tue, 12 Jul 2016 06:39:45 +0000 (23:39 -0700)]

RDS: add reconnect retry scheme for stalled connections

RDS IB connections gets stalled at times and letting the connections
take its sweet time to reconnect. On passive side, we wait for 15 seconds
for such stalled connections which is too slow based on application
IO timeouts. IB connections are established in milliseconds so we better
drop these stuck connections early and retry.

The retry timeout is kept tunable via reconnect_retry_ms sysctl. The
upper bound for retries is tunbale via rds_sysctl_reconnect_max_retries.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Thu, 15 Sep 2016 21:50:17 +0000 (14:50 -0700)]

RDS: restore the exponential back-off scheme

Lower IP and exponential back-off scheme was added to save the
SM queries because of races but it doesn't do what its intended.
The exponential back-off scheme does a good job of backing off
for races. The code just falls back to the original scheme.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Sat, 17 Sep 2016 02:20:53 +0000 (19:20 -0700)]

RDS: avoid duplicate connection drop for self loopback

For self-IB loopback is special mode and the c_passive conn is just a
place holder to stick the the second QP.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

commit | commitdiff | tree

Santosh Shilimkar [Thu, 15 Sep 2016 02:03:30 +0000 (19:03 -0700)]

RDS: don't modify conn state directly in rds_connect_complete

Avoid modifying the conn state directly and let
the APIs handle it to be consistent across.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom