www.infradead.org Git - users/jedix/linux-maple.git/log

Merge DTrace topic branch into uek-4.1-next

dtrace: modules provide called from rcu atomic section

The per-module provide callback is called from within RCU read
critical section. This results of running proivder code in
atomic context which can cause troubles. Mainly due to sleeping
allocations calls in this code path.

Orabug: 26680982

Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

dtrace: Implement high precision walltimestamp

There are lock-free implementations for other timers (mono & raw) but
lock-free access to realtime clock is missing. This patch allows DTrace
to provide CLOCK_REALTIME_COARSE time via walltimestamp without taking
any lock.

Orabug: 25883559

Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

virtio_net: clear MTU when out of range

virtio attempts to clear the MTU feature bit if the value is out of the
supported range, but this has no real effect since FEATURES_OK has
already been set.

Fix this up by checking the MTU in the new validate callback.

Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fe36cbe0671e868cbd2f534a50ac60273fa5acf2)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.

virtio_net: enable big packets for large MTU values

If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2e123b44a3c19de75f40ee0081d6d4fc04adfdc7)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/virtio_net.c
Due to the lack of commit d0c2c9973ecd ("net: use core MTU range
checking in virt drivers") the MTU size check is still done in the
virtio_net.

virtio: allow drivers to validate features

Some drivers can't support all features in all configurations. At the
moment we blindly set FEATURES_OK and later FAILED. Support this better
by adding a callback drivers can use to do some early checks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 404123c2db798027e852480ed9c4accef9f1d9e6)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

virtio-net: Add initial MTU advice feature

This commit adds the feature bit and associated mtu device entry for the
virtio network device. When a virtio device comes up, it checks the
feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is
enabled, the driver will read the advised MTU and use it as the initial
value.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 14de9d114a82a564b94388c95af79a701dc93134)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

virtio-net: correctly enable multiqueue

Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
multiqueue by default") blindly set the affinity instead of queues
during probe which can cause a mismatch of #queues between guest and
host. This patch fixes it by setting queues.

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a220871be66f99d8957c693cf22ec67ecbd9c23a)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

virtio-net: enable multiqueue by default

We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4490001029012539937ff02778fe6180613fa949)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

virtio_net: add gro capability

Straightforward patch to add GRO processing to virtio_net.

napi_complete_done() usage allows more aggressive aggregation,
opted-in by setting /sys/class/net/xxx/gro_flush_timeout

Tested:

Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec,
Rick Jones reported following results.

One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs
and Intel 82599ES-based NICs. So, two "before" and two "after" VMs.
The OpenStack compute nodes were running OpenStack Kilo, with VxLAN
encapsulation being used through OVS so no GRO coming-up the host
stack.  The compute nodes themselves were running a 3.14-based kernel.

Single-stream netperf, CPU utilizations and thus service demands are
based on intra-guest reported CPU.

Throughput Mbit/s, bigger is better
        Min     Median  Average Max
4.2.0-rc3+      1364    1686    1678    1938
4.2.0-rc3+flush1k       1824    2269    2275    2647

Send Service Demand, smaller is better
        Min     Median  Average Max
4.2.0-rc3+      0.236   0.558   0.524   0.802
4.2.0-rc3+flush1k       0.176   0.503   0.471   0.738

Receive Service Demand, smaller is better.
        Min     Median  Average Max
4.2.0-rc3+      1.906   2.188   2.191   2.531
4.2.0-rc3+flush1k       0.448   0.529   0.533   0.692

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Rick Jones <rick.jones2@hp.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0fbd050a7d262b74527a289ae75a33626d1060a8)

Orabug: 26584452

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

Merge DTrace topic branch into uek-4.1-next

dtrace: fix lquantize for 32-bit overflow on values

Fix dtrace_aggregate_lquantize() so that it no longer truncates
value to or computes bin index in 32 bits. Linux bug is
26268136 dtrace_aggregate_lquantize() suffers from 32-bit overflow
It references a corresponding Solaris bug.

Orabug: 26268136

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>

Bluetooth: Properly check L2CAP config option output buffer length

Orabug: 26790014
CVE: CVE-2017-1000251

Validate the output buffer length for L2CAP config requests and responses
to avoid overflowing the stack buffer used for building the option blocks.

Cc: stable@vger.kernel.org
Signed-off-by: Ben Seri <ben@armis.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e860d2c904d1a9f38a24eb44c9f34b8f915a6ea3)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP

In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 1bff8a0c1f8c236209ee369b7952751c04eaa71a)
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP

In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit be73b3043bf465455d4c9b88f68e03b6447bcfb0)
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks

Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload
callcaks are bad (either don't exist or gro_receive is not specified).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 270136613bf7306e2b83457628e2b2f6c6be3989)
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

vxlan: GRO support at tunnel layer

Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.

Testing:

Ran 200 netperf TCP_STREAM instance

  - With fix (GRO enabled on VXLAN interface)

    Verify GRO is happening.

    9084 MBps tput
    3.44% CPU utilization

  - Without fix (GRO disabled on VXLAN interface)

    Verified no GRO is happening.

    9084 MBps tput
    5.54% CPU utilization

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit 58ce31cca1ffe057f4744c3f671e3e84606d3d4a)
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

gro: Fix remcsum offload to deal with frags in GRO

The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb->head.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 25879842
(cherry picked from commit b7fe10e5ebac2a3f37e95535e616494b65fa020f)
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags

As described in RFC5661, section 18.46, some of the status flags exist
in order to tell the client when it needs to acknowledge the existence of
revoked state on the server and/or to recover state.
Those flags will then remain set until the recovery procedure is done.

In order to avoid looping, the client therefore needs to ignore
those particular flags while recovering.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 0a014a44a50839a8064618e959fae5bbc44c2fd5)

orabug: 25513155
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-By: Jack Vogel <jack.vogel@oracle.com>
Tested-by: xuan.qi@oracle.com
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

NFSv4.1: Defer bumping the slot sequence number until we free the slot

For operations like OPEN or LAYOUTGET, which return recallable state
(i.e. delegations and layouts) we want to enable the mechanism for
resolving recall races in RFC5661 Section 2.10.6.3.
To do so, we will want to defer bumping the slot's sequence number until
we have finished processing the RPC results.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit 07e8dcbda71ef87e9cbdc42b5bb16a44c1ab839b)

orabug: 25513155
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-By: Jack Vogel <jack.vogel@oracle.com>
Tested-by: xuan.qi@oracle.com
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

NFSv4: Leases are renewed in sequence_done when we have sessions

Ensure that the calls to renew_lease() in open_done() etc. only apply
to session-less versions of NFSv4.x (i.e. NFSv4.0).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit be824167e33a8b747423c90f72479deb03255d54)

orabug: 25513155
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-By: Jack Vogel <jack.vogel@oracle.com>
Tested-by: xuan.qi@oracle.com
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

NFSv4.1: nfs41_sequence_done should handle sequence flag errors

Instead of just kicking off lease recovery, we should look into the
sequence flag errors and handle them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit b15c7cdde4991be5058f442c6d08d404d56f662c)

orabug: 25513155
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-By: Jack Vogel <jack.vogel@oracle.com>
Tested-by: xuan.qi@oracle.com
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

Revert "RDMA CM: Add reason code for IB_CM_REJ_CONSUMER_DEFINED"

This revert commit 5e86bae96237 ("RDMA CM: Add reason code for
IB_CM_REJ_CONSUMER_DEFINED". because rolling downgrade has been
de-featured. Thus, the underlying rdma_cm/ib_cm changes can be removed.

Orabug: 26124147

Suggested-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

Revert "RDS: base connection dependency needed for rolling downgrade from version 4.1 to 3.1""

This commit partially revert commit 5e86bae96237 ("RDMA CM: Add reason code
for IB_CM_REJ_CONSUMER_DEFINED") as it contains two changes in a single
commit. The rolling downgrade support from 4.1 to 3.1 is no longer needed
because there is no practical use case for this feature. This commit removes
the base connection dependency for rolling downgrade, or changes the code
back to its original form before 5e86bae96237. The clean-up of the RDS
backward compatibility issue is not addresssed in this patch, and please
refer to bugdb #26772473 for more details.

Orabug: 26124147

Suggested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

Revert "RDS: Ensure non-zero SL uses correct path before lane 0 connection is dropped"

This reverts commit 5fe5f2d6e883 ("RDS: Ensure non-zero SL uses correct
path before lane 0 connection is dropped") because RDS specific path record
caching has been removed in commit 81be7fc4f495 ("net/rds: remove the RDS
specific path record caching"). Thus, there is no dependency that TOS
connections can only be re-established after the base connection (lane 0)
is up.

Orabug: 26124147

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Suggested-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>

Revert "rds: make sure base connection is up on both sides"

This reverts commit cf80f396af3a ("rds: make sure base connection is up on
both sides") because RDS specific path record caching has been removed in
commit 81be7fc4f495 ("net/rds: remove the RDS specific path record
caching"). Thus, there is no dependency that TOS connections can only be
re-established after the base connection (lane 0) is up.

Orabug: 26124147

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>

net/rds: remove the RDS specific path record caching

This patch partially reverts commit b12826152417 ("RDS: SA query
optimization"), which has RDS specific path record caching, and uses the
underlying ibacm path record caching. ibacm considers all <source,dest,N>
entries as a similar path record (N is the TOS). Thus, RDS needs to update
the SL manually during the QP creation. RDS also assumes that it is a 1:1
mapping in the TOS to SL mapping.

Orabug: 26124147

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE

Orabug: 26797298

During testing, I discovered that __generic_file_splice_read() returns
0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
page of a single/multi-page splice read operation. This EOF return code
causes the userspace test to (correctly) report a zero-length read error
when it was expecting otherwise.

The current strategy of returning a partial non-zero read when ->readpage
returns AOP_TRUNCATED_PAGE works only when the failed page is not the
first of the lot being processed.

This patch attempts to retry lookup and call ->readpage again on pages
that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
tests pass and I haven't noticed any unwanted side effects.

This version removes the thrice-retry loop and instead indefinitely
retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This
behavior is now similar to do_generic_file_read().

Signed-off-by: Abhi Das <adas@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AOP_TRUNCATED_PAGE is not used much in the kernel now, but ocfs2 uses it to
avoid deadlocks. Specifically, ocfs2_readpage() fails the read and returns
AOP_TRUNCATED_PAGE in order to avoid deadlock on page lock with the
downconvert thread, if it fails to get the inode cluster lock. It also uses
this return value to avoid livelock on the ip_alloc_sem semaphore. This is
done with the expectation that the VFS will check for this return value and
retry the read on the page and do_generic_file_read() does exactly this.
However, in case of splice read, __generic_file_splice_read() fails the read
and returns a partial/zero-length read back. This causes upper layers that use
splice read (such as nfs) to return EIO or other failures to userspace. Saar
ran into this issue while testing database workloads over knfs with ocfs2 as
the backend fs on the nfs server. This issue is fixed with this patch in place.

(cherrypicked from commit 90330e689c32e5105265c461c54af6ecec3373fa)
Tested-by: Saar Maoz <saar.maoz@oracle.com>
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>

Remove dma_unmap_single_attrs call.

Mistaken addition of dma_unmap call in bnxt_free_rx_skbs
was causing panics in some circumstances, remove the call.

Orabug: 26713916
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ken Brucker <ken.brucker@oracle.com>
Reviewed-by: Shannon Nelson (shannon.nelson@oracle.com>

Merge branch 'qu6-rel-topic' into uek-build

dtrace: cyclics taking lock in atomic context

The spin_lock_irqsave() makes cpu notifier addition run in atomic
context which may block. The fix is to move this to module init time
that should be run only once and before DTrace comes around during
kernel boot.

Orabug: 26782572

Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: should not sleep in idr code paths

The idr_preload() causes thread to continue in atomic context.
Taking mutex in this code path may lead to deadlock or scheduler
problems. This fix alters locking scheme in a way that may cause
some latency issues in DTrace framework but will keep host safe.

Orabug: 26680802

Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: Removal of XCalls from dtrace_sync()

Replaces synchronization mechanism in the framework with lock-free
algorithm.

Orabug: 26671843

Signed-off-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: implement tracemem optional third arg (dyn size)

Solaris DTrace added a third, optional argument, dynamic size.
Here we implement this for Linux, including introducing the
new user-visible arguments
DTRACE_TRACEMEM_STATIC
DTRACE_TRACEMEM_DYNAMIC
DTRACE_TRACEMEM_SIZE
DTRACE_TRACEMEM_SSIZE
as well as refactoring the
dt_print_bytes() raw label, making it its own function
dt_print_rawbytes().

Orabug: 26223475

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>

dtrace: implement llquantize log/linear aggregation

DTrace offers a histogram aggregation, quantize(), whose bins are base-2
logarithmic.  It also has a linear function lquantize().

Linux DTrace should also implement a log-linear function llquantize().
Such functionality is supported by Solaris DTrace and other tracing tools.
Motivations for such a function include:
  - a logarithmic aggregation with base other than 2 (e.g. base 10)
  - finer control than simply logarithmic
  - greater dynamic range than simply linear

Orabug: 26675659

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>

dtrace: IO provider unused variables when DTrace is disabled

Fix unused variables warnings caused by IO provider probes when
CONFIG_DTRACE is not set.

Orabug: 26570995

Signed-off-by: Nicolas Droux <nicolas.droux@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

dtrace: failing to allocate more ECB space can cause a crash

The existing code was not taking into consideration that when the
table of ECBs needs to be expanded, the memory allocation can fail.
This could lead to a NULL pointer access, and a kernel crash. We
now check the result of the allocation, and bail out if it fails.

Orabug: 26503342
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>

uek-rpm: Add CPU Time Jitter Based Non-Physical True RNG support

Orabug: 26330509

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - drop duplicate header module.h

Orabug: 26330509

Drop duplicate header module.h from jitterentropy-kcapi.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit e8b2fa476e7272631f00efae10ae6c17a9978993)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - use ktime_get_ns as fallback

Orabug: 26330509

As part of the Y2038 development, __getnstimeofday is not supposed to be
used any more. It is now replaced with ktime_get_ns. The Jitter RNG uses
the time stamp to measure the execution time of a given code path and
tries to detect variations in the execution time. Therefore, the only
requirement the Jitter RNG has, is a sufficient high resolution to
detect these variations.

The change was tested on x86 to show an identical behavior as RDTSC. The
used test code simply measures the execution time of the heart of the
RNG:

        jent_get_nstime(&time);
        jent_memaccess(ec, min);
        jent_fold_time(NULL, time, &folded, min);
        jent_get_nstime(&time2);
        return ((time2 - time));

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b578456c342ecd4266dac96c87ca803602ea9c48)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - always select CRYPTO_RNG

Orabug: 26330509

When building the jitterentropy driver by itself, we get a link error
when CRYPTO_RNG is not enabled as well:

crypto/built-in.o: In function `jent_mod_init':
jitterentropy-kcapi.c:(.init.text+0x98): undefined reference to `crypto_register_rng'
crypto/built-in.o: In function `jent_mod_exit':
jitterentropy-kcapi.c:(.exit.text+0x60): undefined reference to `crypto_unregister_rng'

This adds a 'select CRYPTO_RNG' to CRYPTO_JITTERENTROPY to ensure the API
is always there when it's used, not just when DRBG is also enabled.
CRYPTO_DRBG would set it implicitly through CRYPTO_JITTERENTROPY now,
but this leaves it in place to make it explicit what the driver does.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2f313e029020f1fa5f58f38f48ff6988d67fc3c1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - remove unnecessary information from a comment

Orabug: 26330509

The clocksource does not provide clocksource_register() function since
f893598 commit (clocksource: Mostly kill clocksource_register()), so
let's remove unnecessary information about this function from a comment.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f5128432b08c3e263e1a7ce709d686b1ded51131)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - use safe format string parameters

Orabug: 26330509

Since the API for jent_panic() does not include format string parameters,
adjust the call to panic() to use a literal string to avoid any future
callers from leaking format strings into the panic message.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0c5f0aa5dd92a36a2c6491695abcb95196b88ef6)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - Delete unnecessary checks before the function call "kzfree"

Orabug: 26330509

The kzfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cea0a3c305fa348cfad3bae4a226c241720daf55)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - avoid compiler warnings

Orabug: 26330509

The core of the Jitter RNG is intended to be compiled with -O0. To
ensure that the Jitter RNG can be compiled on all architectures,
separate out the RNG core into a stand-alone C file that can be compiled
with -O0 which does not depend on any kernel include file.

As no kernel includes can be used in the C file implementing the core
RNG, any dependencies on kernel code must be extracted.

A second file provides the link to the kernel and the kernel crypto API
that can be compiled with the regular compile options of the kernel.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit dfc9fa91938bd0cd5597a3da33d613986149a1e6)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: drbg - use pragmas for disabling optimization

Orabug: 26330509

Replace the global -O0 compiler flag from the Makefile with GCC
pragmas to mark only the functions required to be compiled without
optimizations.

This patch also adds a comment describing the rationale for the
functions chosen to be compiled without optimizations.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fbb145bc0a1c03b90a96cca99dc07c33aaad2318)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - remove timekeeping_valid_for_hres

Orabug: 26330509

The patch removes the use of timekeeping_valid_for_hres which is now
marked as internal for the time keeping subsystem. The jitterentropy
does not really require this verification as a coarse timer (when
random_get_entropy is absent) is discovered by the initialization test
of jent_entropy_init, which would cause the jitter rng to not load in
that case.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cf58fcb1bea9e0fcf3447bdb959ef5bcd22cfbcb)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: jitterentropy - add jitterentropy RNG

Orabug: 26330509

The CPU Jitter RNG provides a source of good entropy by
collecting CPU executing time jitter. The entropy in the CPU
execution time jitter is magnified by the CPU Jitter Random
Number Generator. The CPU Jitter Random Number Generator uses
the CPU execution timing jitter to generate a bit stream
which complies with different statistical measurements that
determine the bit stream is random.

The CPU Jitter Random Number Generator delivers entropy which
follows information theoretical requirements. Based on these
studies and the implementation, the caller can assume that
one bit of data extracted from the CPU Jitter Random Number
Generator holds one bit of entropy.

The CPU Jitter Random Number Generator provides a decentralized
source of entropy, i.e. every caller can operate on a private
state of the entropy pool.

The RNG does not have any dependencies on any other service
in the kernel. The RNG only needs a high-resolution time
stamp.

Further design details, the cryptographic assessment and
large array of test results are documented at
http://www.chronox.de/jent.html.

CC: Andreas Steffen <andreas.steffen@strongswan.org>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Sandy Harris <sandyinchina@gmail.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bb5530e4082446aac3a3d69780cd4dbfa4520013)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Add multiple algorithm registration interface

Orabug: 26330509

This patch adds the helpers that allow the registration and removal
of multiple RNG algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 881cd6c570af412c2fab278b0656f7597dc5ee74)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Add crypto_rng_set_entropy

Orabug: 26330509

This patch adds the function crypto_rng_set_entropy. It is only
meant to be used by testmgr when testing RNG implementations by
providing fixed entropy data in order to verify test vectors.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7ca99d814821e8a8ac6d7c48b2ccfc24bda27b1f)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Convert low-level crypto_rng to new style

Orabug: 26330509

This patch converts the low-level crypto_rng interface to the
"new" style.

This allows existing implementations to be converted over one-
by-one. Once that is complete we can then remove the old rng
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit acec27ff35af9caf34d76d16ee17ff3b292e7d83)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Mark crypto_rng_reset seed as const

Orabug: 26330509

There is no reason why crypto_rng_reset should modify the seed
so this patch marks it as const. Since our algorithms don't
export a const seed function yet we have to go through some
contortions for now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3c5d8fa9f56ad0928e7a1f06003e5034f5eedb52)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Introduce crypto_rng_generate

Orabug: 26330509

This patch adds the new top-level function crypto_rng_generate
which generates random numbers with additional input. It also
extends the mid-level rng_gen_random function to take additional
data as input.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ff030b099a21a4753af575b4304249e88400e506)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

crypto: rng - Convert crypto_rng to new style crypto_type

Orabug: 26330509

This patch converts the top-level crypto_rng to the "new" style.
It was the last algorithm type added before we switched over
to the new way of doing things exemplified by shash.

All users will automatically switch over to the new interface.

Note that this patch does not touch the low-level interface to
rng implementations.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit d0e83059a6c9b04f00264a74b8f6439948de4613)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

mm: Tighten x86 /dev/mem with zeroing reads

Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:

usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)

This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.

Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit a4866aa812518ed1a37d8ea0c881dc946409de94)

Orabug: 25917914
CVE: CVE-2017-7889

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

[media] saa7164: fix double fetch PCIe access condition

Avoid a double fetch by reusing the values from the prior transfer.

Originally reported via https://bugzilla.kernel.org/show_bug.cgi?id=195559

Thanks to Pengfei Wang <wpengfeinudt@gmail.com> for reporting.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
(cherry picked from commit 6fb05e0dd32e566facb96ea61a48c7488daa5ac3)

Orabug: 26093949
CVE: CVE-2017-8831

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Revert "net/rds: Revert "RDS: add reconnect retry scheme for stalled
connections""

This commit restores commit 5acb959ad599 ("RDS: add reconnect retry scheme
for stalled connections"). Even though this retry scheme "workaround"
causes a long brownout time in the OVM configuration, it is needed to avoid
RDS loopback connections stalls after switch reboot in the bare-metal
system. As for now, the plan agreed with Exadata is to put back this commit
first and have a similar code path among QU6, QU5 and QU4.

Orabug: 26497333

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

Revert "net/rds: prioritize the base connection establishment"

This reverts commit 1bc87d23681a ("net/rds: prioritize the base connection
establishment"). This patch is reverted because the RDS path record caching
is replaced by the underlying ibacm path record caching. By doing so, all
the TOS connections can be established without depending on its base
connection to perform SA query. Thus, it is not required to prioritize
the base connection establishment.

Orabug: 26497333

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

Revert "net/rds: determine active/passive connection with IP addresses"

This reverts commit 1f2ea7a020a1 ("net/rds: determine active/passive
connection with IP addresses"). The plan is to use the original, well
tested one-sided reconnection that was reverted in "812c02791: RDS: restore
the exponential back-off scheme".

Orabug: 26497333

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

Revert "net/rds: use different workqueue for base_conn"

This reverts commit ad7312bb8b8e ("net/rds: use different workqueue for
base_conn"). The RDS path record caching will be replaced by ibacm. Thus,
all the TOS connections do not rely on its base connection to perform SA
query. As a result, RDS does not need a separate "priority" workqueue for
the base connection.

Orabug: 26497333

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

blk-mq: add missing blk_mq_put_ctx

In case of failure to queue to the lower level driver, we
enqueue it via blk_mq_insert_request() and return while
holding a cpu reference.

Give up the currently held reference before calling
blk_mq_insert_request().

Orabug: 26339553

Suggested-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

blk-mq: avoid re-initialize request which is failed in direct dispatch

If we directly issue a request and it fails, we use
blk_mq_merge_queue_io(). But we already assigned bio to a request in
blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
blk_mq_bio_to_request again.

Orabug: 26339553
Suggested-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 239ad215f0d8388cbe6c09a0fab8ad8ff5dba420)
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

bnxt_en: Add bnxt_get_num_stats() to centrally get the number of ethtool stats.

Orabug: 26726982

Instead of duplicating the logic multiple times. Also, it is unnecessary
to zero the buffer in .get_ethtool_stats() because it is already zeroed
by the caller.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5c8227d0d3b1eb1ad8f98d0b6dc619d70f2cfa04)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Implement ndo_bridge_{get|set}link methods.

Orabug: 26726982

To allow users to set the hardware bridging mode to VEB or VEPA. Only
single function PF can change the bridging mode.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 39d8ba2e71fbdde686d7e31ad141a01994dc0793)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Retrieve the hardware bridge mode from the firmware.

Orabug: 26726982

Retrieve and store the hardware bridge mode, so that we can implement
ndo_bridge_{get|set)link methods in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 32e8239c9138a050bc1feeea7cf41f27d79e6664)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Update firmware interface spec to 1.8.0.

Orabug: 26726982

VF representors and PTP are added features in the new firmware spec.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit acb2005463612930b07723e852b2483d669ff856)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/mm: Fix flush_tlb_page() on Xen

flush_tlb_page() passes a bogus range to flush_tlb_others() and
expects the latter to fix it up. native_flush_tlb_others() has the
fixup but Xen's version doesn't. Move the fixup to
flush_tlb_others().

AFAICS the only real effect is that, without this fix, Xen would
flush everything instead of just the one page on remote vCPUs in
when flush_tlb_page() was called.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e7b52ffd45a6 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
OraBug: 26662731

(cherry picked from commit dbd68d8e84c606673ebbcf15862f8c155fa92326)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/mm/tlb.c
(uek's native_flush_tlb_others() did not have end's non-zero check)

xen-netback: correctly schedule rate-limited queues

Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling will be done in the
timer callback function.

Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OraBug: 26662731

(cherry picked from commit dfa523ae9f2542bee4cddaea37b3be3e157f6e6b)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/net/xen-netback/interface.c
(Upstream's code used napi_complete_done())

xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.

Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OraBug: 26662731

(cherry picked from commit a24fa22ce22ae302b3bf8f7008896d52d5d57b8d)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/blkback: don't free be structure too early

The be structure must not be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OraBug: 26662731

(cherry picked from commit 71df1d7ccad1c36f7321d6b3b48f2ea42681c363)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen: make xen_flush_tlb_all() static

xen_flush_tlb_all() is used in arch/x86/xen/mmu.c only. Make it static.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit c71e6d804c88168ecf02aaf14e1fd5773d683b5f)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/mmu.c
(Different location of xen_flush_tlb_all())

block: xen-blkback: add null check to avoid null pointer dereference

Add null check before calling xen_blkif_put() to avoid potential
null pointer dereference.

Addresses-Coverity-ID: 1350942
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OraBug: 26662731

(cherry picked from commit 2d4456c73a487abe53863e10641c2f73537edf5c)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen: adjust early dom0 p2m handling to xen hypervisor behavior

When booted as pv-guest the p2m list presented by the Xen is already
mapped to virtual addresses. In dom0 case the hypervisor might make use
of 2M- or 1G-pages for this mapping. Unfortunately while being properly
aligned in virtual and machine address space, those pages might not be
aligned properly in guest physical address space.

So when trying to obtain the guest physical address of such a page
pud_pfn() and pmd_pfn() must be avoided as those will mask away guest
physical address bits not being zero in this special case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 69861e0a52f8733355ce246f0db15e1b240ad667)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/mmu_pv.c
(No mmu_pv.c in our tree)

xen/x86: Do not call xen_init_time_ops() until shared_info is initialized

Routines that are set by xen_init_time_ops() use shared_info's
pvclock_vcpu_time_info area. This area is not properly available until
shared_info is mapped in xen_setup_shared_info().

This became especially problematic due to commit dd759d93f4dd ("x86/timers:
Add simple udelay calibration") where we end up reading tsc_to_system_mul
from xen_dummy_shared_info (i.e. getting zero value) and then trying
to divide by it in pvclock_tsc_khz().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit d162809f85b4f54ef075517ffa2f3d02e55d5a53)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/enlighten_pv.c
(No enlighten_pv.c in our tree. Note also that
we don't have dd759d93f4dd. Still, this fixes a
latent bug).

xen: Implement EFI reset_system callback

When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack
trace [1].

This is happening because when EFI runtimes are enabled, the reset code
(see machine_restart) will first try to use EFI restart method.

However, the EFI restart code is expecting the reset_system callback to
be always set. This is not the case for Xen and will lead to crash.

The EFI restart helper is used in multiple places and some of them don't
not have fallback (see machine_power_off). So implement reset_system
callback as a call to xen_reboot when using EFI Xen.

[   36.999270] reboot: Restarting system
[   37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP
[   37.011460] Modules linked in:
[   37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506
[   37.023903] Hardware name: (null) (DT)
[   37.027734] task: ffff800902068000 task.stack: ffff800902064000
[   37.033739] PC is at 0x0
[   37.036359] LR is at efi_reboot+0x94/0xd0
[   37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5
[   37.047920] sp : ffff800902067cf0
[   37.051314] x29: ffff800902067cf0 x28: ffff800902068000
[   37.056709] x27: ffff000008992000 x26: 000000000000008e
[   37.062104] x25: 0000000000000123 x24: 0000000000000015
[   37.067499] x23: 0000000000000000 x22: ffff000008e6e250
[   37.072894] x21: ffff000008e6e000 x20: 0000000000000000
[   37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010
[   37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef
[   37.089079] x15: 0000000000000006 x14: ffff000088f42bef
[   37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0
[   37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff
[   37.105265] x9 : ffff800902067a50 x8 : 6974726174736552
[   37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0
[   37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000
[   37.121453] x3 : 0000000000000000 x2 : 0000000000000000
[   37.126845] x1 : 0000000000000000 x0 : 0000000000000000
[   37.132239]
[   37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000)
[   37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000)
[   37.146949] 7ce0:                                   ffff800902067d40 ffff000008085334
[   37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0
[   37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8
[   37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567
[   37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4
[   37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000
[   37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024
[   37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004
[   37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005
[   37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001
[   37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294
[   37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388
[   37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000
[   37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000
[   37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0
[   37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000
[   37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66
[   37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101
[   37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588
[   37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000
[   37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000
[   37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000
[   37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80
[   37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e
[   37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f
[   37.344911] Call trace:
[   37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50)
[   37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000
[   37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4
[   37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0
[   37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015
[   37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000
[   37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8
[   37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000
[   37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006
[   37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470
[   37.430190] [<          (null)>]           (null)
[   37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70
[   37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78
[   37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228
[   37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28
[   37.456737] Code: bad PC value
[   37.459891] ---[ end trace 76e2fc17e050aecd ]---

Signed-off-by: Julien Grall <julien.grall@arm.com>
--

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
The x86 code has theoritically a similar issue, altought EFI does not
seem to be the preferred method. I have only built test it on x86.

This should also probably be fixed in stable tree.

    Changes in v2:
        - Implement xen_efi_reset_system using xen_reboot
        - Move xen_efi_reset_system in drivers/xen/efi.c
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit e371fd7607999fabbd955b4d22c8e912594a7997)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen: Export xen_reboot

The helper xen_reboot will be called by the EFI code in a later patch.

Note that the ARM version does not yet exist and will be added in a
later patch too.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 5d9404e1185de8d508cd042761306495f727d7eb)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/xen-ops.h
arch/x86/xen/enlighten.c
(No changes to xen-ops.h, make xen_reboot() non-static)

xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()

e820 map is updated with information from the zeropage (i.e. pvh_bootparams)
by default_machine_specific_memory_setup(). With the way things are done
now, we end up with a duplicated e820 map.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 5f6a1614fab801834e32b420b60acdc27acfcdec)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/enlighten_pvh.c
(No enlighten_pvh.c in our tree)

xen/scsifront: use offset_in_page() macro

Use offset_in_page() macro instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 6483e3135a693548874429db901c0544d3a9b4cd)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen,kdump: handle pv domain in paddr_vmcoreinfo_note()

For kdump to work correctly it needs the physical address of
vmcoreinfo_note. When running as dom0 this means the virtual address
has to be translated to the related machine address.

paddr_vmcoreinfo_note() is meant to do the translation via
__pa_symbol() only, but being attributed "weak" it can be replaced
easily in Xen case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 29985b09613ba106a1ed0496988636d288600515)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/mmu_pv.c
( no mmu_pv.c in our tree)

Input: xen-kbdfront - add module parameter for setting resolution

Add a parameter for setting the resolution of xen-kbdfront in order to
be able to cope with a (virtual) frame buffer of arbitrary resolution.

While at it remove the pointless second reading of parameters from
Xenstore in the device connection phase: all parameters are available
during device probing already and that is where they should be read.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
OraBug: 26662731

(cherry picked from commit 8b3afdfa48c70144479a2a5ca51a66e96ec60934)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/input/misc/xen-kbdfront.c
(xenbus_read_unsigned vs. xenbus_scanf)

blkfront: add uevent for size change

When a blkfront device is resized from dom0, emit a KOBJ_CHANGE uevent to
notify the guest about the change. This allows for custom udev rules, such
as automatically resizing a filesystem, when an event occurs.

With this patch you get these udev

KERNEL[577.206230] change /devices/vbd-51728/block/xvdb (block)
UDEV [577.226218] change /devices/vbd-51728/block/xvdb (block)

Signed-off-by: Marc Olson <marcolso@amazon.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OraBug: 26662731

(cherry picked from commit 89515d0255c918e08aa4085956c79bf17615fda5)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

x86/xen/time: Set ->min_delta_ticks and ->max_delta_ticks

In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's xen clockevent driver initialize these fields properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
OraBug: 26662731

(cherry picked from commit 3d18d661aaad5a22f4d37a0592acc9d784f2a11b)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/x86/xen/time.c
(whitespaces)

xen, fbfront: add support for specifying size via xenstore

Today xen-fbfront supports specifying the display size via module
parameters only. Add support for specifying the size via Xenstore in
order to enable doing this easily via the domain's Xen configuration.

Add an error message in case the configured display size conflicts
with video memory size.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
OraBug: 26662731

(cherry picked from commit 5a93db427ab170c9793d76abf3e4be1ebd09375f)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen: Create KABI-compatible version of struct xenbus_watch

Commit 5584ea250ae4 ("xen: modify xenstore watch event interface")
modified definition of struct xenbus_watch, breaking UEK KABI.

The patch introduces compat version of the struct, with callback_
pointing to KABI-compatible definition.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen, fbfront: fix connecting to backend

Connecting to the backend isn't working reliably in xen-fbfront: in
case XenbusStateInitWait of the backend has been missed the backend
transition to XenbusStateConnected will trigger the connected state
only without doing the actions required when the backend has
connected.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
OraBug: 26662731

(cherry picked from commit 9121b15b5628b38b4695282dc18c553440e0f79b)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xenbus: remove transaction holder from list before freeing

After allocation the item is being placed on the list right away.
Consequently it needs to be taken off the list before freeing in the
case xenbus_dev_request_and_reply() failed, as in that case the
callback (xenbus_dev_queue_reply()) is not being called (and if it
was called, it should do both).

Fixes: 5584ea250ae44f929feb4c7bd3877d1c5edbf813
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit ac4cde398a96c1d28b1c28a0f69b6efd892a1c8a)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/acpi: upload PM state from init-domain to Xen

This was broken in commit cd979883b9ed ("xen/acpi-processor:
fix enabling interrupts on syscore_resume"). do_suspend (from
xen/manage.c) and thus xen_resume_notifier never get called on
the initial-domain at resume (it is if running as guest.)

The rationale for the breaking change was that upload_pm_data()
potentially does blocking work in syscore_resume(). This patch
addresses the original issue by scheduling upload_pm_data() to
execute in workqueue context.

Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: stable@vger.kernel.org
Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit 1914f0cd203c941bba72f9452c8290324f1ef3dc)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/acpi: Replace hard coded "ACPI0007"

Replace hard coded "ACPI0007" with ACPI_PROCESSOR_DEVICE_HID

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit 1c2593cc8fd5960f8861de1be67135851f884836)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/privcmd: add IOCTL_PRIVCMD_RESTRICT

The purpose if this ioctl is to allow a user of privcmd to restrict its
operation such that it will no longer service arbitrary hypercalls via
IOCTL_PRIVCMD_HYPERCALL, and will check for a matching domid when
servicing IOCTL_PRIVCMD_DM_OP or IOCTL_PRIVCMD_MMAP*. The aim of this
is to limit the attack surface for a compromised device model.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit 4610d240d691768203fdd210a5da0a2e02eddb76)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism
for restricting device emulators (such as QEMU) to a limited set of
hypervisor operations, and being able to audit those operations in the
kernel of the domain in which they run.

This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op.

NOTE: There is no requirement for user-space code to bounce data through
      locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
      privcmd has enough information to lock the original buffers
      directly.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit ab520be8cd5d56867fc95cfbc34b90880faf1f9d)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
arch/arm/xen/enlighten.c
arch/arm/xen/hypercall.S
arch/arm64/xen/hypercall.S
        (uek doesn't define HYPERVISOR_vm_assist() for ARM)
        arch/arm/include/asm/xen/hypercall.h
        (Upstream defines HYPERVISOR_dm_op() in
         include/xen/arm/hypercall.h, which uek4 does not have)

tpm xen: drop unneeded chip variable

The call that used chip was dropped in 1f0f30e404b3. Drop the
leftover declaration and initialization.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
OraBug: 26662731

(cherry picked from commit 5cec5bacd37784a1596d2f8fea377212948a0bf4)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

swiotlb-xen: implement xen_swiotlb_get_sgtable callback

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
OraBug: 26662731

(cherry picked from commit 69369f52d28a34c84acb6f2a8a585e743441566a)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

swiotlb-xen: implement xen_swiotlb_dma_mmap callback

This function creates userspace mapping for the DMA-coherent memory.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
OraBug: 26662731

(cherry picked from commit 7e91c7df29b5e196de3dc6f086c8937973bd0b88)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/privcmd: return -ENOTTY for unimplemented IOCTLs

The code sets the default return code to -ENOSYS but then overrides this
to -EINVAL in the switch() statement's default case, which is clearly
silly.

This patch removes the override and sets the default return code to
-ENOTTY, which is the conventional return for an unimplemented ioctl.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit dc9eab6fd94dd26340749321bba2c58634761516)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen: optimize xenbus driver for multiple concurrent xenstore accesses

Handling of multiple concurrent Xenstore accesses through xenbus driver
either from the kernel or user land is rather lame today: xenbus is
capable to have one access active only at one point of time.

Rewrite xenbus to handle multiple requests concurrently by making use
of the request id of the Xenstore protocol. This requires to:

- Instead of blocking inside xb_read() when trying to read data from
  the xenstore ring buffer do so only in the main loop of
  xenbus_thread().

- Instead of doing writes to the xenstore ring buffer in the context of
  the caller just queue the request and do the write in the dedicated
  xenbus thread.

- Instead of just forwarding the request id specified by the caller of
  xenbus to xenstore use a xenbus internal unique request id. This will
  allow multiple outstanding requests.

- Modify the locking scheme in order to allow multiple requests being
  active in parallel.

- Instead of waiting for the reply of a user's xenstore request after
  writing the request to the xenstore ring buffer return directly to
  the caller and do the waiting in the read path.

Additionally signal handling was optimized by avoiding waking up the
xenbus thread or sending an event to Xenstore in case the addressed
entity is known to be running already.

As a result communication with Xenstore is sped up by a factor of up
to 5: depending on the request type (read or write) and the amount of
data transferred the gain was at least 20% (small reads) and went up to
a factor of 5 for large writes.

In the end some more rough edges of xenbus have been smoothed:

- Handling of memory shortage when reading from xenstore ring buffer in
  the xenbus driver was not optimal: it was busy looping and issuing a
  warning in each loop.

- In case of xenstore not running in dom0 but in a stubdom we end up
  with two xenbus threads running as the initialization of xenbus in
  dom0 expecting a local xenstored will be redone later when connecting
  to the xenstore domain. Up to now this was no problem as locking
  would prevent the two xenbus threads interfering with each other, but
  this was just a waste of kernel resources.

- An out of memory situation while writing to or reading from the
  xenstore ring buffer no longer will lead to a possible loss of
  synchronization with xenstore.

- The user read and write part are now interruptible by signals.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit fd8aa9095a95c02dcc35540a263267c29b8fda9d)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/xen/xenbus/xenbus_comms.c
        (Use mb() instead of virt_mb() for consistency in our code)

xen: modify xenstore watch event interface

Today a Xenstore watch event is delivered via a callback function
declared as:

void (*callback)(struct xenbus_watch *,
const char **vec, unsigned int len);

As all watch events only ever come with two parameters (path and token)
changing the prototype to:

void (*callback)(struct xenbus_watch *,
const char *path, const char *token);

is the natural thing to do.

Apply this change and adapt all users.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit 5584ea250ae44f929feb4c7bd3877d1c5edbf813)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/xen/ovmapi.c
(Updated for new interface)

xen: clean up xenbus internal headers

The xenbus driver has an awful mixture of internally and globally
visible headers: some of the internally used only stuff is defined in
the global header include/xen/xenbus.h while some stuff defined in
internal headers is used by other drivers, too.

Clean this up by moving the externally used symbols to
include/xen/xenbus.h and the symbols used internally only to a new
header drivers/xen/xenbus/xenbus.h replacing xenbus_comms.h and
xenbus_probe.h

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit 332f791dc98d98116f4473b726f67c9321b0f31e)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Conflicts:
drivers/xen/xenbus/xenbus_dev_frontend.c
(MODULE_LICENSE() is present in our code)

xenbus: Neaten xenbus_va_dev_error

This function error patch can be simplified, so do so.

Remove fail: label and somewhat obfuscating, used once "error_path"
function.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
OraBug: 26662731

(cherry picked from commit c0d197d55e8e8aeeea55f79bdf67e1c957bfa25d)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/pvh: Use Xen's emergency_restart op for PVH guests

Using native_machine_emergency_restart (called during reboot) will
lead PVH guests to machine_real_restart() where we try to use
real_mode_header which is not initialized.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 7a1c44ebc5ac2e2c28d95b0da6060728c334e7e4)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

xen/pvh: Enable CPU hotplug

PVH guests don't (yet) receive ACPI hotplug interrupts and therefore
need to monitor xenstore for CPU hotplug event.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
OraBug: 26662731

(cherry picked from commit 2a7197f02dddf1f9cee300bd12512375ed56524a)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>