]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
9 years agoMerge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux...
Chuck Anderson [Thu, 10 Mar 2016 21:00:28 +0000 (13:00 -0800)]
Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agobpf: fix branch offset adjustment on backjumps after patching ctx expansion
Daniel Borkmann [Wed, 10 Feb 2016 15:47:11 +0000 (16:47 +0100)]
bpf: fix branch offset adjustment on backjumps after patching ctx expansion

Orabug: 22740787
CVE: CVE-2016-2383

When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.

Analysis on what the check in adjust_branches() is currently doing:

  /* adjust offset of jmps if necessary */
  if (i < pos && i + insn->off + 1 > pos)
    insn->off += delta;
  else if (i > pos && i + insn->off + 1 < pos)
    insn->off -= delta;

First condition (forward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- i/insn            insns[1] <--- i/insn
  insns[2] <--- pos               insns[P] <--- pos
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- target_X          insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- target_X
                                  insns[5]

First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.

Second condition (backward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- target_X          insns[1] <--- target_X
  insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- i/insn            insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- i/insn
                                  insns[5]

Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a1b14d27ed0965838350f1377ff97c93ee383492)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
9 years agoALSA: usb-audio: avoid freeing umidi object twice
Andrey Konovalov [Sat, 13 Feb 2016 08:08:06 +0000 (11:08 +0300)]
ALSA: usb-audio: avoid freeing umidi object twice

Orabug: 22740866
CVE: CVE-2016-2384

The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.

Found by KASAN.

Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 07d86ca93db7e5cdf4743564d98292042ec21af7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
9 years agobio: Fix kabi error
Jason Luo [Thu, 25 Feb 2016 08:52:10 +0000 (16:52 +0800)]
bio: Fix kabi error

The two commits:
    bio: skip atomic inc/dec of ->bi_remaining for non-chains
    bio: skip atomic inc/dec of ->bi_cnt for most use cases
rename some members of struct bio which causes KABI changes

Orabug: 22820562
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoblock: remove management of bi_remaining when restoring original bi_end_io
Mike Snitzer [Fri, 22 May 2015 13:14:03 +0000 (09:14 -0400)]
block: remove management of bi_remaining when restoring original bi_end_io

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 326e1dbb57368087a36607aaebe9795b8d5453e5)

Orabug: 22820562
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agobio: skip atomic inc/dec of ->bi_cnt for most use cases
Jens Axboe [Fri, 17 Apr 2015 22:23:59 +0000 (16:23 -0600)]
bio: skip atomic inc/dec of ->bi_cnt for most use cases

Struct bio has a reference count that controls when it can be freed.
Most uses cases is allocating the bio, which then returns with a
single reference to it, doing IO, and then dropping that single
reference. We can remove this atomic_dec_and_test() in the completion
path, if nobody else is holding a reference to the bio.

If someone does call bio_get() on the bio, then we flag the bio as
now having valid count and that we must properly honor the reference
count when it's being put.

Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit dac56212e8127dbc0bff7be35c508bc280213309)

Orabug: 22820562
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agobio: skip atomic inc/dec of ->bi_remaining for non-chains
Jens Axboe [Fri, 17 Apr 2015 22:15:18 +0000 (16:15 -0600)]
bio: skip atomic inc/dec of ->bi_remaining for non-chains

Struct bio has an atomic ref count for chained bio's, and we use this
to know when to end IO on the bio. However, most bio's are not chained,
so we don't need to always introduce this atomic operation as part of
ending IO.

Add a helper to elevate the bi_remaining count, and flag the bio as
now actually needing the decrement at end_io time. Rename the field
to __bi_remaining to catch any current users of this doing the
incrementing manually.

For high IOPS workloads, this reduces the overhead of bio_endio()
substantially.

Tested-by: Robert Elliott <elliott@hp.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c4cf5261f8bffd9de132b50660a69148e7575bd6)

Orabug: 22820562
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoVSOCK: Fix lockdep issue.
Jorgen Hansen [Thu, 22 Oct 2015 15:25:25 +0000 (08:25 -0700)]
VSOCK: Fix lockdep issue.

The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.

Testing: Verified fix on kernel with lockdep enabled.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8566b86ab9f0f45bc6f7dd422b21de9d0cf5415a)

Orabug: 22820522
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoVSOCK: sock_put wasn't safe to call in interrupt context
Jorgen Hansen [Wed, 21 Oct 2015 11:53:56 +0000 (04:53 -0700)]
VSOCK: sock_put wasn't safe to call in interrupt context

In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions  to a worker thread. All these functions were
deallocation of resources related to the transport itself.

Furthermore, an unused callback was removed to simplify the
cleanup.

Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.

Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).

Reviewed-by: Aditya Asarwade <asarwade@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4ef7ea9195ea73262cd9730fb54e1eb726da157b)

Orabug: 22820522
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoxfs: checksum log record ext headers based on record size
Brian Foster [Tue, 18 Aug 2015 23:59:50 +0000 (09:59 +1000)]
xfs: checksum log record ext headers based on record size

The first 4 bytes of every basic block in the physical log is stamped
with the current lsn. To support this mechanism, the log record header
(first block of each new log record) contains space for the original
first byte of each log record block before it is replaced with the lsn.
The log record header has space for 32k worth of blocks. The version 2
log adds new extended record headers for each additional 32k worth of
blocks beyond what is supported by the record header.

The log record checksum incorporates the log record header, the extended
headers and the record payload. xlog_cksum() checksums the extended
headers based on log->l_iclog_heads, which specifies the number of
extended headers in a log record based on the log buffer size mount
option. The log buffer size is variable, however, and thus means the
checksum can be calculated differently based on how a filesystem is
mounted. This is problematic if a filesystem crashes and recovery occurs
on a subsequent mount using a different log buffer size. For example,
crash an active filesystem that is mounted with the default (32k)
logbsize, attempt remount/recovery using '-o logbsize=64k' and the mount
fails on or warns about log checksum failures.

To avoid this problem, update xlog_cksum() to calculate the checksum
based on the size of the log buffer according to the log record. The
size is already included in the h_size field of the log record header
and thus is available at log recovery time. Extended log record headers
are also only written when the log record is large enough to require
them. This makes checksum calculation of log records consistent with the
extended record header mechanism as well as how on-disk records are
checksummed with various log buffer size mount options.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Orabug: 22682565
Mainline v4.3 commit a3f20014659a1566a4e516e2bf95287960fe2c44
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agoxfs: always drain dio before extending aio write submission
Brian Foster [Mon, 12 Oct 2015 05:02:05 +0000 (16:02 +1100)]
xfs: always drain dio before extending aio write submission

XFS supports and typically allows concurrent asynchronous direct I/O
submission to a single file. One exception to the rule is that file
extending dio writes that start beyond the current EOF (e.g.,
potentially create a hole at EOF) require exclusive I/O access to the
file. This is because such writes must zero any pre-existing blocks
beyond EOF that are exposed by virtue of now residing within EOF as a
result of the write about to be submitted.

Before EOF zeroing can occur, the current file i_size must be stabilized
to avoid data corruption. In this scenario, XFS upgrades the iolock to
exclude any further I/O submission, waits on in-flight I/O to complete
to ensure i_size is up to date (i_size is updated on dio write
completion) and restarts the various checks against the state of the
file. The problem is that this protection sequence is triggered only
when the iolock is currently held shared. While this is true for async
dio in most cases, the caller may upgrade the lock in advance based on
arbitrary circumstances with respect to EOF zeroing. For example, the
iolock is always acquired exclusively if the start offset is not block
aligned. This means that even though the iolock is already held
exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
can occur based on an unstable i_size.

This problem has been reproduced as guest data corruption in virtual
machines with file-backed qcow2 virtual disks hosted on an XFS
filesystem. The virtual disks must be configured with aio=native mode
and the must not be truncated out to the maximum file size (as some virt
managers will do).

Update xfs_file_aio_write_checks() to unconditionally drain in-flight
dio before EOF zeroing can occur. Rather than trigger the wait based on
iolock state, use a new flag and upgrade the iolock when necessary. Note
that this results in a full restart of the inode checks even when the
iolock was already held exclusive when technically it is only required
to recheck i_size. This should be a rare enough occurrence that it is
preferable to keep the code simple rather than create an alternate
restart jump target.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Orabug : 22682207
Mainline v4.4 commit 3136e8bb3054d3bb68942f8f1ee6c26c05f798b0
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agoiw_cxgb3: Fix incorrectly returning error on success
Hariprasad S [Fri, 11 Dec 2015 08:29:17 +0000 (13:59 +0530)]
iw_cxgb3: Fix incorrectly returning error on success

The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 67f1aee6f45059fd6b0f5b0ecb2c97ad0451f6b3)

Orabug: 22713209

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
9 years agoBtrfs: use received_uuid of parent during send
Josef Bacik [Thu, 4 Jun 2015 21:17:25 +0000 (17:17 -0400)]
Btrfs: use received_uuid of parent during send

Neil Horman pointed out a problem where if he did something like this

receive A
snap A B
change B
send -p A B

and then on another box do

recieve A
receive B

the receive B would fail because we use the UUID of A for the clone sources for
B.  This makes sense most of the time because normally you are sending from the
original sources, not a received source.  However when you use a recieved subvol
its UUID is going to be something completely different, so if you then try to
receive the diff on a different volume it won't find the UUID because the new A
will be something else.  The only constant is the received uuid.  So instead
check to see if we have received_uuid set on the root, and if so use that as the
clone source, as btrfs receive looks for matches either in received_uuid or
uuid.  Thanks,

Reported-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: Chris Mason <clm@fb.com>
Orabug: 22580612
Mainline v4.2 commit 37b8d27de5d0079e1ecef2711061048e13054ebe
Signed-off-by: bo.li.liu@oracle.com
9 years agoMerge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into...
Chuck Anderson [Wed, 2 Mar 2016 13:44:53 +0000 (05:44 -0800)]
Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into...
Chuck Anderson [Wed, 2 Mar 2016 13:43:12 +0000 (05:43 -0800)]
Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch 'topic/uek-4.1/nfs-rdma' of git://ca-git.us.oracle.com/linux-uek into...
Chuck Anderson [Wed, 2 Mar 2016 13:40:23 +0000 (05:40 -0800)]
Merge branch 'topic/uek-4.1/nfs-rdma' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek...
Chuck Anderson [Wed, 2 Mar 2016 13:39:19 +0000 (05:39 -0800)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek...
Chuck Anderson [Wed, 2 Mar 2016 13:38:17 +0000 (05:38 -0800)]
Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into...
Chuck Anderson [Wed, 2 Mar 2016 13:37:07 +0000 (05:37 -0800)]
Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoMerge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux...
Chuck Anderson [Wed, 2 Mar 2016 13:33:03 +0000 (05:33 -0800)]
Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

9 years agoconfig: Enable CONFIG_XEN_PCIDEV_BACKEND by to be built-in.
Konrad Rzeszutek Wilk [Wed, 9 Dec 2015 17:50:56 +0000 (12:50 -0500)]
config: Enable CONFIG_XEN_PCIDEV_BACKEND by to be built-in.

When doing PCI passthrough using the xen-pciback.hide=
parameters works great - except when the code is a module
- at which point you have to do a lot of 'unbind'.

OraBug: 22338679 - UEK4: CONFIG_XEN_PCIDEV_BACKEND should be set to y
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agokernel: VirtBox workaround for dynamically allocated text
Mike Kravetz [Wed, 16 Dec 2015 18:26:40 +0000 (10:26 -0800)]
kernel: VirtBox workaround for dynamically allocated text

Orabug: 22377612

VirtualBox dynamically allocates space for executable text within the
kernel.  When these text addresses are encountered by stack trace back
code, they are dropped or the stack trace is terminated.

Ideally, an interface would be created so that routines could registered
to validate kernel text addresses.  Drivers like VirtualBox, would register
routines which know about dynamically allocated text.  However, this will
need more use cases from code in the mainline linux tree.

As a workaround, assume executable pages in vmalloc or module areas are
valid text addresses.  The #ifdef CONFIG_X86 is acceptable as VirtualBox
only supports x86 architecture.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
9 years agosvcrdma: Fix NFS server crash triggered by 1MB NFS WRITE
Chuck Lever [Mon, 12 Oct 2015 14:53:39 +0000 (10:53 -0400)]
svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE

Now that the NFS server advertises a maximum payload size of 1MB
for RPC/RDMA again, it crashes in svc_process_common() when NFS
client sends a 1MB NFS WRITE on an NFS/RDMA mount.

The server has set up a 259 element array of struct page pointers
in rq_pages[] for each incoming request. The last element of the
array is NULL.

When an incoming request has been completely received,
rdma_read_complete() attempts to set the starting page of the
incoming page vector:

  rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count];

and the page to use for the reply:

  rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];

But the value of page_no has already accounted for head->hdr_count.
Thus rq_respages now points past the end of the incoming pages.

For NFS WRITE operations smaller than the maximum, this is harmless.
But when the NFS WRITE operation is as large as the server's max
payload size, rq_respages now points at the last entry in rq_pages,
which is NULL.

Fixes: cc9a903d915c ('svcrdma: Change maximum server payload . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Orabug: 22204799

9 years agoRDS: Add interface for receive MSG latency trace
Santosh Shilimkar [Fri, 11 Dec 2015 20:01:56 +0000 (12:01 -0800)]
RDS: Add interface for receive MSG latency trace

Socket option to tap receive path latency.
SO_RDS: SO_RDS_MSG_RXPATH_LATENCY
with parameter,
struct rds_rx_trace_so {
u8 rx_traces;
        u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
}

CMSG:
RDS_CMSG_RXPATH_LATENCY(recvmsg)
Returns rds message latencies in various stages of receive
path in nS. Its set per socket using SO_RDS_MSG_RXPATH_LATENCY
socket option. Legitimate points are defined in
enum rds_message_rxpath_latency. More points can be added in
future.

CSMG format:
struct rds_cmsg_rx_trace {
        u8 rx_traces;
        u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
        u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
}

Receive MSG trace points: RDS message Receive Path Latency points
enum rds_message_rxpath_latency {
RDS_MSG_RX_HDR_TO_DGRAM_START = 0,
RDS_MSG_RX_DGRAM_REASSEMBLE,
RDS_MSG_RX_DGRAM_DELIVERED,
RDS_MSG_RX_DGRAM_TRACE_MAX
}

Tested-by: Namrata Jampani <namrata.jampani@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Orabug: 22630180
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoocfs2: call ocfs2_abort when journal abort
Ryan Ding [Mon, 21 Dec 2015 03:01:12 +0000 (11:01 +0800)]
ocfs2: call ocfs2_abort when journal abort

orabug: 22293201

journal can not recover from abort state, so we should take following action to
prevent file system from corruption:

1. change to readonly filesystem when local mount. We can not afford further
   write, so change to RO state is reasonable.

2. panic when cluster mount. Because we can not release lock resource in this
   state, other node will hung when it require a lock owned by this node. So
   panic and remaster is a reasonable choise.

ocfs2_abort() will do all the above work.

Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
9 years agoocfs2: o2hb: increase unsteady iterations
Junxiao Bi [Thu, 14 Jan 2016 23:17:15 +0000 (15:17 -0800)]
ocfs2: o2hb: increase unsteady iterations

Oracle-bug: 21886612

When run multiple xattr test of ocfs2-test on a three-nodes cluster,
mount failed sometimes with the following message.

  o2hb: Unable to stabilize heartbeart on region D18B775E758D4D80837E8CF3D086AD4A (xvdb)

Stabilize heartbeat depends on the timing order to mount ocfs2 from
cluster nodes and how fast the tcp connections are established.  So
increase unsteady interations to leave more time for it.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a84ac334dcb44c76f0b051513a6c27a2d747f883)
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
9 years agovmxnet3: Bump up driver version number
Shreyas Bhatewara [Mon, 29 Jun 2015 11:14:43 +0000 (04:14 -0700)]
vmxnet3: Bump up driver version number

Bump up the driver version number to reflect the changes done to
work with vmxnet3 adapter version 2

Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a694717437c14efd489566540e821bc83ec234f3)

Orabug: 22380674
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agovmxnet3: Changes for vmxnet3 adapter version 2 (fwd)
Shreyas Bhatewara [Fri, 19 Jun 2015 20:38:29 +0000 (13:38 -0700)]
vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)

Make the driver understand adapter version 2.

Cc: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 45dac1d6ea045ae56e4df8d9c70c92c7412bd4fc)

Orabug: 22380674
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agovmxnet3: Fix memory leaks in rx path (fwd)
Shreyas Bhatewara [Fri, 19 Jun 2015 20:37:03 +0000 (13:37 -0700)]
vmxnet3: Fix memory leaks in rx path (fwd)

If rcd length was zero, the page used for frag was not being released. It
was being replaced with a newly allocated page. This change takes care
of that memory leak.

Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c41fcce997d2caa039a46495d40423348c51ad61)

Orabug: 22380674
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agovmxnet3: Register shutdown handler for device (fwd)
Shreyas Bhatewara [Fri, 19 Jun 2015 20:36:02 +0000 (13:36 -0700)]
vmxnet3: Register shutdown handler for device (fwd)

Implement a handler for pci shutdown so that the driver has an
opportunity to make sure that device is quiesced before the PCI
switches to legacy IRQs. This way the possibility of
"screaming interrupt" is avoided.

Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e9ba47bfe381888d8dc79123a20b2ec8b6751a47)

Orabug: 22380674
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agolpfc driver updates for UEK4 R1 11.0.0.13
rkennedy [Tue, 2 Feb 2016 19:51:56 +0000 (11:51 -0800)]
lpfc driver updates for UEK4 R1 11.0.0.13

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Use kzalloc instead of kmalloc
Punit Vara [Wed, 16 Dec 2015 23:12:07 +0000 (18:12 -0500)]
lpfc: Use kzalloc instead of kmalloc

This patch is to the lpfc_els.c which resolves following warning
reported by coccicheck:

WARNING: kzalloc should be used for rdp_context, instead of
kmalloc/memset

Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 699acd6220ea5b20b25d5eec0ab448827d745357)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Add logging for misconfigured optics.
James Smart [Wed, 16 Dec 2015 23:12:05 +0000 (18:12 -0500)]
lpfc: Add logging for misconfigured optics.

Add logging for misconfigured optics acqe reported by fw.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 448193b5b5e2471fc90ea11e78c39bcfd167efb6)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix external loopback failure.
James Smart [Wed, 16 Dec 2015 23:12:04 +0000 (18:12 -0500)]
lpfc: Fix external loopback failure.

Fix external loopback failure.

Rx sequence reassembly was incorrect.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4360ca9c24388e44cb0e14861a62fff43cf225c0)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix mbox reuse in PLOGI completion
James Smart [Wed, 16 Dec 2015 23:12:03 +0000 (18:12 -0500)]
lpfc: Fix mbox reuse in PLOGI completion

Fix mbox reuse in PLOGI completion. Moved allocations so that buffer
properly init'd.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 01c73bbcd7cc4f31f45a1b0caeacdba46acd9c9c)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Use new FDMI speed definitions for 10G, 25G and 40G FCoE.
James Smart [Wed, 16 Dec 2015 23:12:02 +0000 (18:12 -0500)]
lpfc: Use new FDMI speed definitions for 10G, 25G and 40G FCoE.

Use new FDMI speed definitions for 10G, 25G and 40G FCoE.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a085e87c814567c94e5d375e7362f9f25030aac1)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Make write check error processing more resilient
James Smart [Wed, 16 Dec 2015 23:12:01 +0000 (18:12 -0500)]
lpfc: Make write check error processing more resilient

Make write check error processing more resilient.

Checks to catch writes that fw reports weren't fully complete yet SCSI
status indicated fine needed correction.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5afab6bbf3f026b7d50451acbfdc12300c5f4353)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix RDP ACC being too long.
James Smart [Wed, 16 Dec 2015 23:12:00 +0000 (18:12 -0500)]
lpfc: Fix RDP ACC being too long.

Fix RDP ACC being too long.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit eb8d68c9930f7f9c8f3f4a6059b051b32077a735)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix RDP Speed reporting.
James Smart [Wed, 16 Dec 2015 23:11:59 +0000 (18:11 -0500)]
lpfc: Fix RDP Speed reporting.

Fix RDP Speed reporting.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 81e7517723fc17396ba91f59312b3177266ddbda)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Modularize and cleanup FDMI code in driver
James Smart [Wed, 16 Dec 2015 23:11:58 +0000 (18:11 -0500)]
lpfc: Modularize and cleanup FDMI code in driver

Modularize, cleanup, add comments - for FDMI code in driver

Note: I don't like the comments with leading # - but as we have a lot if
present, I'm deferring to handle it in one big fix later.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4258e98ee3862ca7036654b43c839ab7668043e0)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix crash in fcp command completion path.
James Smart [Wed, 16 Dec 2015 23:11:57 +0000 (18:11 -0500)]
lpfc: Fix crash in fcp command completion path.

Fix crash in fcp command completion path.

Missed null check.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c90261dcd86e4eb5c9c1627fde037e902db8aefa)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
James Smart [Wed, 16 Dec 2015 23:11:56 +0000 (18:11 -0500)]
lpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16

Fix driver crash when module parameter lpfc_fcp_io_channel set to 16

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6690e0d4fc5cccf74534abe0c9f9a69032bc02f0)

Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix RegLogin failed error seen on Lancer FC during port bounce
James Smart [Wed, 16 Dec 2015 23:11:55 +0000 (18:11 -0500)]
lpfc: Fix RegLogin failed error seen on Lancer FC during port bounce

Fix RegLogin failed error seen on Lancer FC during port bounce

Fix the statemachine and ref counting.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4b7789b71c916f79a3366da080101014473234c3)

    Orabug: 22493326
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix the FLOGI discovery logic to comply with T11 standards
James Smart [Wed, 16 Dec 2015 23:11:53 +0000 (18:11 -0500)]
lpfc: Fix the FLOGI discovery logic to comply with T11 standards

Fix the FLOGI discovery logic to comply with T11 standards

We weren't properly setting fabric parameters, such as R_A_TOV and E_D_TOV,
when we registered the vfi object in default configs and pt2pt configs.
Revise to now pass service params with the values to the firmware and
ensure they are reset on link bounce. Required reworking the call sequence
in the discovery threads.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d6de08cc46269899988b4f40acc7337279693d4b)

Orabug: 22493326

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
James Smart [Wed, 16 Dec 2015 23:11:52 +0000 (18:11 -0500)]
lpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.

Orabug: 22493326

Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f5cb5304eb26d307c9b30269fb0e007e0b262b7d)

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: fix memory leak and NULL dereference
Sudip Mukherjee [Wed, 23 Sep 2015 13:32:32 +0000 (19:02 +0530)]
lpfc: fix memory leak and NULL dereference

Orabug: 22493326

kmalloc() can return NULL and without checking we were dereferencing it.
Moreover if kmalloc succeeds but the function fails in other parts then
we were returning the error code but we missed freeing lcb_context.
While at it fixed one related checkpatch warning.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Reviewed-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit e79504236548e4c909959ba444f87a12224555ac)

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agolpfc: Fix rport leak.
James Smart [Thu, 21 May 2015 17:55:28 +0000 (13:55 -0400)]
lpfc: Fix rport leak.

Orabug: 22493326

Correct locking and refcounting in tracking our rports

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit 466e840b7809e00ab3a1af9b4a5b5751e681730d)

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
9 years agoDrivers: hv: vmbus: Fix a Host signaling bug
K. Y. Srinivasan [Tue, 15 Dec 2015 00:01:54 +0000 (16:01 -0800)]
Drivers: hv: vmbus: Fix a Host signaling bug

Currently we have two policies for deciding when to signal the host:
One based on the ring buffer state and the other based on what the
VMBUS client driver wants to do. Consider the case when the client
wants to explicitly control when to signal the host. In this case,
if the client were to defer signaling, we will not be able to signal
the host subsequently when the client does want to signal since the
ring buffer state will prevent the signaling. Implement logic to
have only one signaling policy in force for a given channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Tested-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8599846d73997cdbccf63f23394d871cfad1e5e6)

Orabug: 22725962
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agocpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disable
Vitaly Kuznetsov [Wed, 5 Aug 2015 07:52:47 +0000 (00:52 -0700)]
cpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disable

Hyper-V module needs to disable cpu hotplug (offlining) as there is no
support from hypervisor side to reassign already opened event channels
to a different CPU. Currently it is been done by altering
smp_ops.cpu_disable but it is hackish.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 32145c4677d2c46b9d877a33ae82c6fcacd002f9)

As hv driver depends on it
Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoclockevents: Add helpers to check the state of a clockevent device
Viresh Kumar [Thu, 21 May 2015 08:03:45 +0000 (13:33 +0530)]
clockevents: Add helpers to check the state of a clockevent device

Some clockevent drivers, once migrated to use per-state callbacks,
need to check the state of the clockevent device in their callbacks or
interrupt handler.

Add accessor functions clockevent_state_*() to get this information.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/04a717d490335c688dd7af899fbcede97e1bb8ee.1432192527.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 3434d23b694e5cb6e44e966914563406c31c4053)

As hv drivers depends on it
Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoclockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state
Viresh Kumar [Fri, 3 Apr 2015 03:34:04 +0000 (09:04 +0530)]
clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state

When no timers/hrtimers are pending, the expiry time is set to a
special value: 'KTIME_MAX'. This normally happens with
NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.

When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
(NOHZ_MODE_LOWRES).  But, the clockevent device is already
reprogrammed from the tick-handler for next tick.

As the clock event device is programmed in ONESHOT mode it will at
least fire one more time (unnecessarily). Timers on few
implementations (like arm_arch_timer, etc.) only support PERIODIC mode
and their drivers emulate ONESHOT over that. Which means that on these
platforms we will get spurious interrupts periodically (at last
programmed interval rate, normally tick rate).

In order to avoid spurious interrupts, the clockevent device should be
stopped or its interrupts should be masked.

A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and
update clockevent drivers to STOP generating events (or delay it to
max time) when 'expires' is set to KTIME_MAX. But the drawback here is
that every clockevent driver has to be hacked for this particular case
and its very easy for new ones to miss this.

However, Thomas suggested to add an optional state ONESHOT_STOPPED to
solve this problem: lkml.org/lkml/2014/5/9/508.

This patch adds support for ONESHOT_STOPPED state in clockevents
core. It will only be available to drivers that implement the
state-specific callbacks instead of the legacy ->set_mode() callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 8fff52fd50934580c5108afed12043a774edf728)

As hv driver depends on it
Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: util: introduce hv_utils_transport abstraction
Vitaly Kuznetsov [Sun, 12 Apr 2015 01:07:51 +0000 (18:07 -0700)]
Drivers: hv: util: introduce hv_utils_transport abstraction

The intention is to make KVP/VSS drivers work through misc char devices.
Introduce an abstraction for kernel/userspace communication to make the
migration smoother. Transport operational mode (netlink or char device)
is determined by the first received message. To support driver upgrades
the switch from netlink to chardev operational mode is supported.

Every hv_util daemon is supposed to register 2 callbacks:
1) on_msg() to get notified when the userspace daemon sent a message;
2) on_reset() to get notified when the userspace daemon drops the connection.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 14b50f80c32dd4e84b6baeaa8bf4049cc5ecf56d)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: util: introduce state machine for util drivers
Vitaly Kuznetsov [Sun, 12 Apr 2015 01:07:46 +0000 (18:07 -0700)]
Drivers: hv: util: introduce state machine for util drivers

KVP/VSS/FCOPY drivers work in fully serialized mode: we wait till userspace
daemon registers, wait for a message from the host, send this message to the
daemon, get the reply, send it back to host, wait for another message.
Introduce enum hvutil_device_state to represend this state in all 3 drivers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 636c88da6df3bb2f978b48d3a7ed55423da84d19)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: use cpu_hotplug_enable/disable
Vitaly Kuznetsov [Wed, 5 Aug 2015 07:52:48 +0000 (00:52 -0700)]
Drivers: hv: vmbus: use cpu_hotplug_enable/disable

Commit e513229b4c38 ("Drivers: hv: vmbus: prevent cpu offlining on newer
hypervisors") was altering smp_ops.cpu_disable to prevent CPU offlining.
We can bo better by using cpu_hotplug_enable/disable functions instead of
such hard-coding.

Reported-by: Radim Kr.má <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f39c4280a3872b0e6c7b01076132c12ad7a90392)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: add a sysfs attr to show the binding of channel/VP
Dexuan Cui [Wed, 5 Aug 2015 07:52:43 +0000 (00:52 -0700)]
Drivers: hv: vmbus: add a sysfs attr to show the binding of channel/VP

This is useful to analyze performance issue.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 042ab0313bbb7e776e9510da3f07fb300d08a8ba)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Implement a clocksource based on the TSC page
K. Y. Srinivasan [Wed, 5 Aug 2015 07:52:42 +0000 (00:52 -0700)]
Drivers: hv: vmbus: Implement a clocksource based on the TSC page

The current Hyper-V clock source is based on the per-partition reference counter
and this counter is being accessed via s synthetic MSR - HV_X64_MSR_TIME_REF_COUNT.
Hyper-V has a more efficient way of computing the per-partition reference
counter value that does not involve reading a synthetic MSR. We implement
a time source based on this mechanism.

Tested-by: Vivek Yadav <vyadav@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ca9357bd26c2f8e7b909321eedd651f52cc30d04)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agodrivers/hv: Migrate to new 'set-state' interface
Viresh Kumar [Wed, 5 Aug 2015 07:52:41 +0000 (00:52 -0700)]
drivers/hv: Migrate to new 'set-state' interface

Migrate hv driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: devel@linuxdriverproject.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bc609cb47fb2e74654e23cef0a1d4db38b6570a3)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv_vmbus: Fix signal to host condition
Christopher Oo [Wed, 5 Aug 2015 07:52:40 +0000 (00:52 -0700)]
Drivers: hv_vmbus: Fix signal to host condition

Fixes a bug where previously hv_ringbuffer_read would pass in the old
number of bytes available to read instead of the expected old read index
when calculating when to signal to the host that the ringbuffer is empty.
Since the previous write size is already saved, also changes the
hv_need_to_signal_on_read to use the previously read value rather than
recalculating it.

Signed-off-by: Christopher Oo <t-chriso@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a5cca686ce0ef4909deaee4ed46dd991e3a9ece4)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Improve the CPU affiliation for channels
K. Y. Srinivasan [Wed, 5 Aug 2015 07:52:38 +0000 (00:52 -0700)]
Drivers: hv: vmbus: Improve the CPU affiliation for channels

The current code tracks the assigned CPUs within a NUMA node in the context of
the primary channel. So, if we have a VM with a single NUMA node with 8 VCPUs, we may
end up unevenly distributing the channel load. Fix the issue by tracking affiliations
globally.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9f01ec53458d9e9b68f1c555e773b5d1a1f66e94)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agodrivers:hv: Move MMIO range picking from hyper_fb to hv_vmbus
Jake Oshins [Wed, 5 Aug 2015 07:52:37 +0000 (00:52 -0700)]
drivers:hv: Move MMIO range picking from hyper_fb to hv_vmbus

This patch deletes the logic from hyperv_fb which picked a range of MMIO space
for the frame buffer and adds new logic to hv_vmbus which picks ranges for
child drivers.  The new logic isn't quite the same as the old, as it considers
more possible ranges.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3546448338e76a52d4f86eb3680cb2934e22d89b)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agodrivers:hv: Modify hv_vmbus to search for all MMIO ranges available.
Jake Oshins [Wed, 5 Aug 2015 07:52:36 +0000 (00:52 -0700)]
drivers:hv: Modify hv_vmbus to search for all MMIO ranges available.

This patch changes the logic in hv_vmbus to record all of the ranges in the
VM's firmware (BIOS or UEFI) that offer regions of memory-mapped I/O space for
use by paravirtual front-end drivers.  The old logic just found one range
above 4GB and called it good.  This logic will find any ranges above 1MB.

It would have been possible with this patch to just use existing resource
allocation functions, rather than keep track of the entire set of Hyper-V
related MMIO regions in VMBus.  This strategy, however, is not sufficient
when the resource allocator needs to be aware of the constraints of a
Hyper-V virtual machine, which is what happens in the next patch in the series.
So this first patch exists to show the first steps in reworking the MMIO
allocation paths for Hyper-V front-end drivers.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7f163a6fd957a85f7f66a129db1ad243a44399ee)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Consider ND NIC in binding channels to CPUs
K. Y. Srinivasan [Sat, 1 Aug 2015 23:08:21 +0000 (16:08 -0700)]
Drivers: hv: vmbus: Consider ND NIC in binding channels to CPUs

We cycle through all the "high performance" channels to distribute
load across the available CPUs. Process the NetworkDirect as a
high performance device.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 379e4f756b915bcc35958365e5d1326b3b54efce)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agomshyperv: fix recognition of Hyper-V guest crash MSR's
Denis V. Lunev [Sat, 1 Aug 2015 23:08:20 +0000 (16:08 -0700)]
mshyperv: fix recognition of Hyper-V guest crash MSR's

Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid
(0x40000003) EDX's 10th bit should be used to check that Hyper-V guest
crash MSR's functionality available.

This patch should fix this recognition. Currently the code checks EAX
register instead of EDX.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cc2dd4027a43bb36c846f195a764edabc0828602)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: prefer 'die' notification chain to 'panic'
Vitaly Kuznetsov [Sat, 1 Aug 2015 23:08:10 +0000 (16:08 -0700)]
Drivers: hv: vmbus: prefer 'die' notification chain to 'panic'

current_pt_regs() sometimes returns regs of the userspace process and in
case of a kernel crash this is not what we need to report. E.g. when we
trigger crash with sysrq we see the following:
...
 RIP: 0010:[<ffffffff815b8696>]  [<ffffffff815b8696>] sysrq_handle_crash+0x16/0x20
 RSP: 0018:ffff8800db0a7d88  EFLAGS: 00010246
 RAX: 000000000000000f RBX: ffffffff820a0660 RCX: 0000000000000000
...
at the same time current_pt_regs() give us:
ip=7f899ea7e9e0, ax=ffffffffffffffda, bx=26c81a0, cx=7f899ea7e9e0, ...
These registers come from the userspace process triggered the crash. As we
don't even know which process it was this information is rather useless.

When kernel crash happens through 'die' proper regs are being passed to
all receivers on the die_chain (and panic_notifier_list is being notified
with the string passed to panic() only). If panic() is called manually
(e.g. on BUG()) we won't get 'die' notification so keep the 'panic'
notification reporter as well but guard against double reporting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 510f7aef65bb7ed22cf9c7f94f955727f963ede4)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: unregister panic notifier on module unload
Vitaly Kuznetsov [Thu, 23 Apr 2015 04:31:29 +0000 (21:31 -0700)]
Drivers: hv: vmbus: unregister panic notifier on module unload

Commit 96c1d0581d00f7abe033350edb021a9d947d8d81 ("Drivers: hv: vmbus: Add
support for VMBus panic notifier handler") introduced
atomic_notifier_chain_register() call on module load. We also need to call
atomic_notifier_chain_unregister() on module unload as otherwise the following
crash is observed when we bring hv_vmbus back:

[   39.788877] BUG: unable to handle kernel paging request at ffffffffa00078a8
[   39.788877] IP: [<ffffffff8109d63f>] notifier_call_chain+0x3f/0x80
...
[   39.788877] Call Trace:
[   39.788877]  [<ffffffff8109de7d>] __atomic_notifier_call_chain+0x5d/0x90
...
[   39.788877]  [<ffffffff8109d788>] ? atomic_notifier_chain_register+0x38/0x70
[   39.788877]  [<ffffffff8109d767>] ? atomic_notifier_chain_register+0x17/0x70
[   39.788877]  [<ffffffffa002814f>] hv_acpi_init+0x14f/0x1000 [hv_vmbus]
[   39.788877]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 096c605feb3d85b309e95db2afc01584b967cc23)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: fix typo in hv_port_info struct
Nik Nyby [Sat, 1 Aug 2015 23:08:18 +0000 (16:08 -0700)]
Drivers: hv: vmbus: fix typo in hv_port_info struct

This fixes a typo: base_flag_bumber to base_flag_number

Signed-off-by: Nik Nyby <nikolas@gnu.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e26009aad095feae45a6e79bb022c55a969ecded)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Permit sending of packets without payload
K. Y. Srinivasan [Sat, 1 Aug 2015 23:08:14 +0000 (16:08 -0700)]
Drivers: hv: vmbus: Permit sending of packets without payload

The guest may have to send a completion packet back to the host.
To support this usage, permit sending a packet without a payload -
we would be only sending the descriptor in this case.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b81658cf5d44e07c70c93e3b2aefe848eaaba99f)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: balloon: Enable dynamic memory protocol negotiation with Windows 10...
Alex Ng [Sat, 1 Aug 2015 23:08:13 +0000 (16:08 -0700)]
Drivers: hv: balloon: Enable dynamic memory protocol negotiation with Windows 10 hosts

Support Win10 protocol for Dynamic Memory. Thia patch allows guests on Win10 hosts
to hot-add memory even when dynamic memory is not enabled on the guest.

Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b6ddeae1603dfa55e857ba1520f5acea83f8cf1c)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: don't do hypercalls when hypercall_page is NULL
Vitaly Kuznetsov [Sat, 1 Aug 2015 23:08:08 +0000 (16:08 -0700)]
Drivers: hv: don't do hypercalls when hypercall_page is NULL

At the very late stage of kexec a driver (which are not being unloaded) can
try to post a message or signal an event. This will crash the kernel as we
already did hv_cleanup() and the hypercall page is NULL.

Move all common (between 32 and 64 bit code) declarations to the beginning
of the do_hypercall() function. Unfortunately we have to write the
!hypercall_page check twice to not mix declarations and code.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d7646eaa7678fe5adc42247b4bdfbe9d9db8c253)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: add special kexec handler
Vitaly Kuznetsov [Sat, 1 Aug 2015 23:08:07 +0000 (16:08 -0700)]
Drivers: hv: vmbus: add special kexec handler

When general-purpose kexec (not kdump) is being performed in Hyper-V guest
the newly booted kernel fails with an MCE error coming from the host. It
is the same error which was fixed in the "Drivers: hv: vmbus: Implement
the protocol for tearing down vmbus state" commit - monitor pages remain
special and when they're being written to (as the new kernel doesn't know
these pages are special) bad things happen. We need to perform some
minimalistic cleanup before booting a new kernel on kexec. To do so we
need to register a special machine_ops.shutdown handler to be executed
before the native_machine_shutdown(). Registering a shutdown notification
handler via the register_reboot_notifier() call is not sufficient as it
happens to early for our purposes. machine_ops is not being exported to
modules (and I don't think we want to export it) so let's do this in
mshyperv.c

The minimalistic cleanup consists of cleaning up clockevents, synic MSRs,
guest os id MSR, and hypercall MSR.

Kdump doesn't require all this stuff as it lives in a separate memory
space.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2517281d63a2b09d94aedfb522943617048f337e)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: remove hv_synic_free_cpu() call from hv_synic_cleanup()
Vitaly Kuznetsov [Sat, 1 Aug 2015 23:08:05 +0000 (16:08 -0700)]
Drivers: hv: vmbus: remove hv_synic_free_cpu() call from hv_synic_cleanup()

We already have hv_synic_free() which frees all per-cpu pages for all
CPUs, let's remove the hv_synic_free_cpu() call from hv_synic_cleanup()
so it will be possible to do separate cleanup (writing to MSRs) and final
freeing. This is going to be used to assist kexec.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 06210b42f33ea1c29a90f4db2d88be91c511154b)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: kill tasklets on module unload
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:41 +0000 (17:47 -0700)]
Drivers: hv: vmbus: kill tasklets on module unload

Explicitly kill tasklets we create on module unload.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1959a28e2671004c1e9c30ccd2914b868f100742)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Add structs and handlers for VF messages
Haiyang Zhang [Fri, 24 Jul 2015 17:08:40 +0000 (10:08 -0700)]
hv_netvsc: Add structs and handlers for VF messages

This patch adds data structures and handlers for messages related
to SRIOV Virtual Function.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 71790a2792c8772e29bf5aa726215d9256ef93dc)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Wait for sub-channels to be processed during probe
KY Srinivasan [Wed, 22 Jul 2015 18:42:32 +0000 (11:42 -0700)]
hv_netvsc: Wait for sub-channels to be processed during probe

The current code returns from probe without waiting for the proper handling
of subchannels that may be requested. If the netvsc driver were to be rapidly
loaded/unloaded, we can  trigger a panic as the unload will be tearing
down state that may not have been fully setup yet. We fix this issue by making
sure that we return from the probe call only after ensuring that the
sub-channel offers in flight are properly handled.

Reviewed-and-tested-by: Haiyang Zhang <haiyangz@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b3e6b82a0099dfef038e40c630a554ed1e402504)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Add close of RNDIS filter into change mtu call
Haiyang Zhang [Mon, 13 Jul 2015 20:09:16 +0000 (13:09 -0700)]
hv_netvsc: Add close of RNDIS filter into change mtu call

The current change mtu call only stops tx before removing RNDIS filter.
In case ringbufer is not empty, the rndis_filter_device_remove() may
hang on removing the buffers.

This patch adds close of RNDIS filter before removing it, also a
gradual waiting loop until the ring is empty. The change_mtu hang
issue under heavy traffic is solved by this patch.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2de8530ba0c71a2fba02590681af0f3a2a187a9b)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agox86: hyperv: add CPUID bit for crash handlers
Paolo Bonzini [Tue, 7 Jul 2015 10:17:36 +0000 (12:17 +0200)]
x86: hyperv: add CPUID bit for crash handlers

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5d75a747596be046546eeb9d6ba39a3af851a1af)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Add support to set MTU reservation from guest side
Haiyang Zhang [Mon, 6 Jul 2015 21:11:37 +0000 (14:11 -0700)]
hv_netvsc: Add support to set MTU reservation from guest side

When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9cbce34c34bcc05ea0dd78c8999bfe88b5b6b86)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agokvm: add hyper-v crash msrs values
Andrey Smetanin [Thu, 2 Jul 2015 16:07:46 +0000 (19:07 +0300)]
kvm: add hyper-v crash msrs values

Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a88464a8b0ffb2f8dfb69d3ab982169578b50f22)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agostorvsc: use shost_for_each_device() instead of open coding
Vitaly Kuznetsov [Wed, 1 Jul 2015 09:31:27 +0000 (11:31 +0200)]
storvsc: use shost_for_each_device() instead of open coding

Comment in struct Scsi_Host says that drivers are not supposed to access
__devices directly. storvsc_host_scan() doesn't happen in irq context
so we can just use shost_for_each_device().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Long Li <longli@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit 8d6a9f5676f0e734967ac3739f5c6a28a0b047d9)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agostorvsc: be more picky about scmnd->sc_data_direction
Vitaly Kuznetsov [Thu, 25 Jun 2015 16:12:11 +0000 (18:12 +0200)]
storvsc: be more picky about scmnd->sc_data_direction

Under the 'default' case in scmnd->sc_data_direction we have 3 options:
- DMA_NONE which we handle correctly.
- DMA_BIDIRECTIONAL which is never supposed to be set by SCSI stack.
- Garbage value.

Do WARN() and return -EINVAL in the last two cases. virtio_scsi does
BUG_ON() here but it looks like an overkill.

Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit cb1cf0804fe582f8a626c3cc591cb3127536137c)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
K. Y. Srinivasan [Mon, 1 Jun 2015 04:27:03 +0000 (21:27 -0700)]
Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion

Allocate ring buffer memory from the NUMA node assigned to the channel.
Since this is a performance and not a correctness issue, if the node specific
allocation were to fail, fall back and allocate without specifying the node.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 294409d20572e9bcf857328286433f851168d54a)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Implement NUMA aware CPU affinity for channels
K. Y. Srinivasan [Sun, 31 May 2015 06:37:48 +0000 (23:37 -0700)]
Drivers: hv: vmbus: Implement NUMA aware CPU affinity for channels

Channels/sub-channels can be affinitized to VCPUs in the guest. Implement
this affinity in a way that is NUMA aware. The current protocol distributed
the primary channels uniformly across all available CPUs. The new protocol
is NUMA aware: primary channels are distributed across the available NUMA
nodes while the sub-channels within a primary channel are distributed amongst
CPUs within the NUMA node assigned to the primary channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1f656ff3fdddc2f59649cc84b633b799908f1f7b)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Use the vp_index map even for channels bound to CPU 0
K. Y. Srinivasan [Sun, 31 May 2015 06:37:47 +0000 (23:37 -0700)]
Drivers: hv: vmbus: Use the vp_index map even for channels bound to CPU 0

Map target_cpu to target_vcpu using the mapping table.
We should use the mapping table to transform guest CPU ID to VP Index
as is done for the non-performance critical channels.
While the value CPU 0 is special and will
map to VP index 0, it is good to be consistent.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9c6e64adf200d3bac0dd47d52cdbd3bd428384a5)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: distribute subchannels among all vcpus
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:46 +0000 (17:47 -0700)]
Drivers: hv: vmbus: distribute subchannels among all vcpus

Primary channels are distributed evenly across all vcpus we have. When the host
asks us to create subchannels it usually makes us num_cpus-1 offers and we are
supposed to distribute the work evenly among the channel itself and all its
subchannels. Make sure they are all assigned to different vcpus.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ce59fec836a9b4dc51cbcf9cb245b59e0ef53bea)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: balloon: check if ha_region_mutex was acquired in MEM_CANCEL_ONLINE...
Vitaly Kuznetsov [Fri, 29 May 2015 18:18:02 +0000 (11:18 -0700)]
Drivers: hv: balloon: check if ha_region_mutex was acquired in MEM_CANCEL_ONLINE case

Memory notifiers are being executed in a sequential order and when one of
them fails returning something different from NOTIFY_OK the remainder of
the notification chain is not being executed. When a memory block is being
onlined in online_pages() we do memory_notify(MEM_GOING_ONLINE, ) and if
one of the notifiers in the chain fails we end up doing
memory_notify(MEM_CANCEL_ONLINE, ) so it is possible for a notifier to see
MEM_CANCEL_ONLINE without seeing the corresponding MEM_GOING_ONLINE event.
E.g. when CONFIG_KASAN is enabled the kasan_mem_notifier() is being used
to prevent memory hotplug, it returns NOTIFY_BAD for all MEM_GOING_ONLINE
events. As kasan_mem_notifier() comes before the hv_memory_notifier() in
the notification chain we don't see the MEM_GOING_ONLINE event and we do
not take the ha_region_mutex. We, however, see the MEM_CANCEL_ONLINE event
and unconditionally try to release the lock, the following is observed:

[  110.850927] =====================================
[  110.850927] [ BUG: bad unlock balance detected! ]
[  110.850927] 4.1.0-rc3_bugxxxxxxx_test_xxxx #595 Not tainted
[  110.850927] -------------------------------------
[  110.850927] systemd-udevd/920 is trying to release lock
(&dm_device.ha_region_mutex) at:
[  110.850927] [<ffffffff81acda0e>] mutex_unlock+0xe/0x10
[  110.850927] but there are no more locks to release!

At the same time we can have the ha_region_mutex taken when we get the
MEM_CANCEL_ONLINE event in case one of the memory notifiers after the
hv_memory_notifier() in the notification chain failed so we need to add
the mutex_is_locked() check. In case of MEM_ONLINE we are always supposed
to have the mutex locked.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4e4bd36f97b1492f19b3329ac74ed313da13de34)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Allocate the sendbuf in a NUMA aware way
K. Y. Srinivasan [Fri, 29 May 2015 00:08:07 +0000 (17:08 -0700)]
hv_netvsc: Allocate the sendbuf in a NUMA aware way

Allocate the send buffer in a NUMA aware way.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5defde5946676ee23cd6a9d0e1de899410f4a33f)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Allocate the receive buffer from the correct NUMA node
K. Y. Srinivasan [Fri, 29 May 2015 00:08:06 +0000 (17:08 -0700)]
hv_netvsc: Allocate the receive buffer from the correct NUMA node

Allocate the receive bufer from the NUMA node assigned to the primary
channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0a726c2b499e390b1c1fc3092bd789f2192a2d03)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Properly size the vrss queues
KY Srinivasan [Wed, 27 May 2015 20:16:57 +0000 (13:16 -0700)]
hv_netvsc: Properly size the vrss queues

The current algorithm for deciding on the number of VRSS channels is
not optimal since we open up the min of number of CPUs online and the
number of VRSS channels the host is offering. So on a 32 VCPU guest
we could potentially open 32 VRSS subchannels. Experimentation has
shown that it is best to limit the number of VRSS channels to the number
of CPUs within a NUMA node.

Here is the new algorithm for deciding on the number of sub-channels we
would open up:
        1) Pick the minimum of what the host is offering and what the driver
           in the guest is specifying as the default value.
        2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
           node the primary channel is bound to.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e01ec2199ef22e2cabd7d6e68a192f3eb728029f)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus:Update preferred vmbus protocol version to windows 10.
Keith Mange [Tue, 26 May 2015 21:23:01 +0000 (14:23 -0700)]
Drivers: hv: vmbus:Update preferred vmbus protocol version to windows 10.

Add support for Windows 10.

Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6c4e5f9c9ff41ea997fd0f345b3b2b88c113eb68)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: use per_cpu stats to calculate TX/RX data
sixiao@microsoft.com [Thu, 14 May 2015 08:00:25 +0000 (01:00 -0700)]
hv_netvsc: use per_cpu stats to calculate TX/RX data

Current code does not lock anything when calculating the TX and RX stats.
As a result, the RX and TX data reported by ifconfig are not accuracy in a
system with high network throughput and multiple CPUs (in my test,
RX/TX = 83% between 2 HyperV VM nodes which have 8 vCPUs and 40G Ethernet).

This patch fixed the above issue by using per_cpu stats.
netvsc_get_stats64() summarizes TX and RX data by iterating over all CPUs
to get their respective stats.

This v2 patch addressed David's comments on the cleanup path when
netdev_alloc_pcpu_stats() failed.

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7eafd9b4005643cfc24f1daf78f4dd56ff71f559)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agohv_netvsc: Use the xmit_more skb flag to optimize signaling the host
KY Srinivasan [Mon, 11 May 2015 22:39:46 +0000 (15:39 -0700)]
hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

Based on the information given to this driver (via the xmit_more skb flag),
we can defer signaling the host if more packets are on the way. This will help
make the host more efficient since it can potentially process a larger batch of
packets. Implement this optimization.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:45 +0000 (17:47 -0700)]
Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()

We need to call init_vp_index() after we added the channel to the appropriate
list (global or subchannel) to be able to use this information when assigning
the channel to the particular vcpu. To do so we need to move a couple of
functions around. The only real change is the init_vp_index() call. This is a
small refactoring without a functional change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f38e7dd72337d83cced910cfbf6016475ef85bf7)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: briefly comment num_sc and next_oc
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:43 +0000 (17:47 -0700)]
Drivers: hv: vmbus: briefly comment num_sc and next_oc

next_oc and num_sc fields of struct vmbus_channel deserve a description. Move
them closer to sc_list as these fields are related to it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fea844a2b0edd6540d5cde2cd54a8a3c86e9c53f)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: unify calls to percpu_channel_enq()
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:42 +0000 (17:47 -0700)]
Drivers: hv: vmbus: unify calls to percpu_channel_enq()

Remove some code duplication, no functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8dfd332674758135039d0d2d2a7479934ff0b9c5)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: do cleanup on all vmbus_open() failure paths
Vitaly Kuznetsov [Thu, 7 May 2015 00:47:40 +0000 (17:47 -0700)]
Drivers: hv: vmbus: do cleanup on all vmbus_open() failure paths

In case there was an error reported in the response to the CHANNELMSG_OPENCHANNEL
call we need to do the cleanup as a vmbus_open() user won't be doing it after
receiving an error. The cleanup should be done on all failure paths. We also need
to avoid returning open_info->response.open_result.status as the return value as
all other errors we return from vmbus_open() are -EXXX and vmbus_open() callers
are not supposed to analyze host error codes.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ffc151f3c83c25ec06d5ad13a78d0fc066c7167e)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: vmbus: Implement the protocol for tearing down vmbus state
K. Y. Srinivasan [Thu, 23 Apr 2015 04:31:32 +0000 (21:31 -0700)]
Drivers: hv: vmbus: Implement the protocol for tearing down vmbus state

Implement the protocol for tearing down the monitor state established with
the host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2db84eff127e3f4b3635edc589cd6a56db8755a3)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agodrivers: hv: vmbus: Get rid of some unused definitions
K. Y. Srinivasan [Thu, 23 Apr 2015 04:31:31 +0000 (21:31 -0700)]
drivers: hv: vmbus: Get rid of some unused definitions

Get rid of some unused definitions.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit db9ba2088f6507fee370904f02db1eb9b49bd088)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoDrivers: hv: fcopy: full handshake support
Vitaly Kuznetsov [Sun, 12 Apr 2015 01:07:58 +0000 (18:07 -0700)]
Drivers: hv: fcopy: full handshake support

Introduce FCOPY_VERSION_1 to support kernel replying to the negotiation
message with its own version.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a4d1ee5b0255a135fead1d62a7fc7e6fe718b66e)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoTools: hv: vss: use misc char device to communicate with kernel
Vitaly Kuznetsov [Sun, 12 Apr 2015 01:07:56 +0000 (18:07 -0700)]
Tools: hv: vss: use misc char device to communicate with kernel

Use /dev/vmbus/hv_vss instead of netlink.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f5722b9bd418e29b7429bd9a43bd100599b26d4f)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
9 years agoTools: hv: kvp: use misc char device to communicate with kernel
Vitaly Kuznetsov [Sun, 12 Apr 2015 01:07:55 +0000 (18:07 -0700)]
Tools: hv: kvp: use misc char device to communicate with kernel

Use /dev/vmbus/hv_kvp instead of netlink.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8ddca8088586303cfe3db4209a4682f7a4cf7d2d)

Orabug: 21886720
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>