www.infradead.org Git - users/jedix/linux-maple.git/log

rds: IB: fix returned value not set error

When IB port link up/down too frequently, the following will occur.
"
16:57:08  kernel: mlx5_core 0000:af:00.1 enp175s0f1: Link up
16:57:08  kernel: RDS/IB: PORT-EVENT: port active, PORT:
          mlx5_0/port_2/enp175s0f1 : port state transition to UP
          (portlayers 0x7)
16:57:08  kernel: RDS/IB: NET-EVENT: NETDEV-CHANGE, PORT
          mlx5_0/port_2/enp175s0f1 : port state transition NONE -
          port retained in state UP (portlayers 0x7)
16:57:13  kernel: mlx5_core 0000:af:00.1 enp175s0f1: Link down
16:57:13  kernel: RDS/IB: PORT-EVENT: port error, PORT:
          mlx5_0/port_2/enp175s0f1 : port state transition to DOWN
          (portlayers 0x4)
16:57:13  kernel: RDS/IB: NET-EVENT: NETDEV-CHANGE, PORT
          mlx5_0/port_2/enp175s0f1 : port state transition NONE -
          port retained in state DOWN (portlayers 0x4)
16:57:13  kernel: RDS/IB: Address already exist
16:57:17  kernel: mlx5_core 0000:af:00.1 enp175s0f1: Link up
16:57:17  kernel: RDS/IB: PORT-EVENT: port active, PORT:
          mlx5_0/port_2/enp175s0f1 : port state transition to UP
          (portlayers 0x7)
16:57:17  kernel: RDS/IB: NET-EVENT: NETDEV-CHANGE, PORT
          mlx5_0/port_2/enp175s0f1 : port state transition NONE -
          port retained in state UP (portlayers 0x7)
16:57:18  kernel: RDS/IB: IPv4 172.16.2.52 migrated from enp175s0f0
          (port 1) to enp175s0f1 (port 2)
"
From the above, rds_ib_addr_exist is called to check ip address. If
exists, the function goes to the end directly while the ret is not set.
This is not reasonable. This will make some global variables error.

Orabug: 28356474

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

fs: proc: array.c: fix Speculation_Store_Bypass print format

This patch fixes commit id f590427c8b01298b416f043107e654918107c579 While
backporting, we introduce a \n in front of Speculation_Store_Bypass ending up
with a suplimentary newline which is affecting different tools (like iotop):
Seccomp: 0

Speculation_Store_Bypass: thread vulnerable

Orabug: 28128750

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: don't chain ioends during writepage submission

Currently we can build a long ioend chain during ->writepages that
gets attached to the writepage context. IO submission only then
occurs when we finish all the writepage processing. This means we
can have many ioends allocated and pending, and this violates the
mempool guarantees that we need to give about forwards progress.
i.e. we really should only have one ioend being built at a time,
otherwise we may drain the mempool trying to allocate a new ioend
and that blocks submission, completion and freeing of ioends that
are already in progress.

To prevent this situation from happening, we need to submit ioends
for IO as soon as they are ready for dispatch rather than queuing
them for later submission. This means the ioends have bios built
immediately and they get queued on any plug that is current active.
Hence if we schedule away from writeback, the ioends that have been
built will make forwards progress due to the plug flushing on
context switch. This will also prevent context switches from
creating unnecessary IO submission latency.

We can't completely avoid having nested IO allocation - when we have
a block size smaller than a page size, we still need to hold the
ioend submission until after we have marked the current page dirty.
Hence we may need multiple ioends to be held while the current page
is completely mapped and made ready for IO dispatch. We cannot avoid
this problem - the current code already has this ioend chaining
within a page so we can mostly ignore that it occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit e10de3723c53378e7cf441529f563c316fdc0dd3)

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: factor mapping out of xfs_do_writepage

Separate out the bufferhead based mapping from the writepage code so
that we have a clear separation of the page operations and the
bufferhead state.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit bfce7d2e2d5ee05e9d465888905c66a70a9c243c)

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: xfs_cluster_write is redundant

xfs_cluster_write() is not necessary now that xfs_vm_writepages()
aggregates writepage calls across a single mapping. This means we no
longer need to do page lookups in xfs_cluster_write, so writeback
only needs to look up th epage cache once per page being written.
This also removes a large amount of mostly duplicate code between
xfs_do_writepage() and xfs_convert_page().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit ad68972acb82c3e8bba316d542ab204984cb1f1c)

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: Introduce writeback context for writepages

xfs_vm_writepages() calls generic_writepages to writeback a range of
a file, but then xfs_vm_writepage() clusters pages itself as it does
not have any context it can pass between->writepage calls from
__write_cache_pages().

Introduce a writeback context for xfs_vm_writepages() and call
__write_cache_pages directly with our own writepage callback so that
we can pass that context to each writepage invocation. This
encapsulates the current mapping, whether it is valid or not, the
current ioend and it's IO type and the ioend chain being built.

This requires us to move the ioend submission up to the level where
the writepage context is declared. This does mean we do not submit
IO until we packaged the entire writeback range, but with the block
plugging in the writepages call this is the way IO is submitted,
anyway.

It also means that we need to handle discontiguous page ranges. If
the pages sent down by write_cache_pages to the writepage callback
are discontiguous, we need to detect this and put each discontiguous
page range into individual ioends. This is needed to ensure that the
ioend accurately represents the range of the file that it covers so
that file size updates during IO completion set the size correctly.
Failure to take into account the discontiguous ranges results in
files being too small when writeback patterns are non-sequential.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit fbcc025613590d7b1d15521555dcc6393a148a6b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/xfs/xfs_aops.c

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: remove xfs_cancel_ioend

We currently have code to cancel ioends being built because we
change bufferhead state as we build the ioend. On error, this needs
to be unwound and so we have cancelling code that walks the buffers
on the ioend chain and undoes these state changes.

However, the IO submission path already handles state changes for
buffers when a submission error occurs, so we don't really need a
separate cancel function to do this - we can simply submit the
ioend chain with the specific error and it will be cancelled rather
than submitted.

Hence we can remove the explicit cancel code and just rely on
submission to deal with the error correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit 150d5be09ce49a9bed6feb7b7dc4e5ae188778ec)

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfs: remove nonblocking mode from xfs_vm_writepage

Remove the nonblocking optimisation done for mapping lookups during
writeback. It's not clear that leaving a hole in the writeback range
just because we couldn't get a lock is really a win, as it makes us
do another small random IO later on rather than a large sequential
IO now.

As this gets in the way of sane error handling later on, just remove
for the moment and we can re-introduce an equivalent optimisation in
future if we see problems due to extent map lock contention.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit 988ef927926aa3481cbf144f235c0cefd7deb9e4)

Orabug: 28193043

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: tcp: cancel all worker threads before shutting down socket

We could end up executing rds_conn_shutdown before the rds_recv_worker
thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
sock_release and set sock->sk to null, which may interleave in bad
ways with rds_recv_worker, e.g., it could result in:

    "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
       [ffff881769f6fd70] release_sock at ffffffff815f337b
       [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
       [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
       [ffff881769f6fde0] process_one_work at ffffffff810a14c1
       [ffff881769f6fe40] worker_thread at ffffffff810a1940
       [ffff881769f6fec0] kthread at ffffffff810a6b1e

cancel send/recv worker threads before shutting down connection.

Also, do not enqueue any new shutdown workq items when the connection is
shutting down (this may happen for rds-tcp in softirq mode, if a FIN
or CLOSE is received while the modules is in the middle of an unload)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
(cherry picked from commit UEK4 bf1f0c8c007c58eae14622c6f9a3fe36f6b19d5a)

Orabug: 28298156

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: signedness bug

In the original code if the copy_from_user() fails in rds_rdma_pages()
then the error handling fails and we get a stack trace from kmalloc().

<amended commit message by hbugge>

This bug was originally fixed in v2.6.36-rc5. However, commit
edc31ebe9e36 ("[patch1/3] Merge for Mellanox OFED R2, 0080 release")
re-introduces the bug, most probably because the Mellanox OFED didn't
have the upstream fix.

So, this bug has been present since v2.6.39-400.9.0 and is present in
all our releases from uek2 to uek5.

</amended>

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9b9d2e00bfa592aceda7b43da76c670df61faa97)

Orabug: 28319166

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Fixes: edc31ebe9e36d87dcd08931784a49c2e75eed359
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

block: update integrity interval after queue limits change

A disk's integrity interval is calculated based on the logical block
size of it's queue. If the queue or it's block size is not set, default
512 integrity interval is set.
The issue is that the integrity interval of dm device is set before it's
queue limits are set and it is not updated once the appropriate queue
limits are set.
So, dm integrity interval is set to 512 by default. This causes
following integrity mismatch errors for devices with logical block size
!=512.
This further leads to dm paths not being setup correctly.

blk_integrity_compare: dm-X/sdx protection interval 512 != 4096

Solution is to set integrity interval all the time in
blk_integrity_register() according to the logical block size of the
queue. Once logical block size of the queue is updated, the integrity
interval will be set correctly on the next blk_integrity_register().

This issue does not occur in UEK5 or upstream as they contain the
following
commits which resolve the current and couple of other issues as well.
These when backported to UEK4 cause KABI breakage. To work around it,
the current patch resolves this issue.

aff34e192e4eeacfb8b5ffc68e10a240f2c0c6d7
0f8087ecdeac921fc4920f1328f55c15080bc6aa
a48f041d91bf1aee599fa2adb53b780ed20c2ee5
4c241d08dbfcbdc7a949b91d72707a289d464954
25520d55cdb6ee289abc68f553d364d22478ff54
2859323e35ab5fc42f351fbda23ab544eaa85945

Orabug: 27586756

Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

dccp: check sk for closed state in dccp_sendmsg()

dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL,
therefore if DCCP socket is disconnected and dccp_sendmsg() is
called after it, it will cause a NULL pointer dereference in
dccp_write_xmit().

This crash and the reproducer was reported by syzbot. Looks like
it is reproduced if commit 69c64866ce07 ("dccp: CVE-2017-8824:
use-after-free in DCCP code") is applied.

Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 67f93df79aeefc3add4e4b31a752600f834236e2)

Orabug: 28001529
CVE: CVE-2018-1130

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: Implement ARP flushing correctly

If a remote peer has moved its IP address from one port to the other,
the local node may have an incorrect ARP entry in its cache. During
connection management, we will then get back a route-error-event from
the CM.

Current code attempts to flush the ARP entry from the cache. However,
1) it does not check for return values, 2) it does not supply the
device name, 3) it does not iterate over all possible device names,
and 4) its doesn't supply the correct flags.

Due to 2-4 above, the flushing doesn't work.

This commit fixes this.

On a system with a single CX-3 and 16 VFs, fail-over just after a
fail-back is reduced from ~60 seconds down to ~10 seconds with the fix
(1156 RDS connections).

Orabug: 28219857

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---

V1 -> V2:
- Move NOTE section down as suggested by Santosh
- Remove unnecessary initialization of ret
- Fix a nit in the commit message (2-4 instead of 1-4)
- Use pr_err() instead of printk() as suggested by Yuval
- Use sizeof(*r) as suggested by Yuval

NOTE:
* The uek-4-QU7 version of this bug has an approved exception from
Greg Marsden, that it can be merged first in uek-4-QU7, before
uek5/master-next.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: Fix incorrect bigger vs. smaller IP address check

In commit d8bd5dfb5de4 ("net/rds: use one sided reconnection during a
race"), the function rds_addr_cmp() was called without comparing the
return value to less-than or bigger-than-equal to zero.

This led to a situation where both peers thought they were the winner,
and consequently, racing RDS connections didn't form or took a long
time to form.

The issue is fixed by this commit.

Orabug: 28236599

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Suggested-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Fixes: d8bd5dfb5de4 ("net/rds: use one sided reconnection during a race")
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
(cherry picked from commit 653ccd08be01f82c6129f5060bd8110489aca110
repo UEK/linux-hbugge-public.git)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
* Had to "rename" the SHA of the Fixes: uek-5-master commit to
the correct one for uek-4.1-qu7.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

ocfs2: Fix locking for res->tracking and dlm->tracking_list

Orabug: 28256391

In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.

Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.

Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
CC: stable@vger.kernel.org
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xfrm: policy: check policy direction value

Orabug: 28256487
CVE: CVE-2017-11600

The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.

This fixes CVE-2017-11600.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1474928
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: <stable@vger.kernel.org> # v2.6.21-rc1
Reported-by: "bo Zhang" <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 7bab09631c2a303f87a7eb7e3d69e888673b9b7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

add kernel param to pre-allocate NICs

Orabug: 27870400

This is so that eth0, eth1, ... ethX are available for ip link rename set for certain
applications (that will not be named).

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Saar Maoz <Saar.Maoz@oracle.com>

mm/mempolicy.c: fix error handling in set_mempolicy and mbind.

Orabug: 28242475
CVE: CVE-2017-7616

In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.

Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cf01fb9985e8deb25ccf0ea54d916b8871ae0e62)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xhci: Fix USB3 NULL pointer dereference at logical disconnect.

Hub driver will try to disable a USB3 device twice at logical disconnect,
racing with xhci_free_dev() callback from the first port disable.

This can be triggered with "udisksctl power-off --block-device <disk>"
or by writing "1" to the "remove" sysfs file for a USB3 device
in 4.17-rc4.

USB3 devices don't have a similar disabled link state as USB2 devices,
and use a U3 suspended link state instead. In this state the port
is still enabled and connected.

hub_port_connect() first disconnects the device, then later it notices
that device is still enabled (due to U3 states) it will try to disable
the port again (set to U3).

The xhci_free_dev() called during device disable is async, so checking
for existing xhci->devs[i] when setting link state to U3 the second time
was successful, even if device was being freed.

The regression was caused by, and whole thing revealed by,
Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
and causes a NULL pointer dereference the second time we try to set U3.

Fix this by checking xhci->devs[i]->udev exists before setting link state.

The original patch went to stable so this fix needs to be applied there as
well.

Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
Cc: <stable@vger.kernel.org>
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e)

Orabug: 27426023
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Hand merge the change since multiple patches have been added to
xhci-hub.c since this cherry pick became available.

Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mlx4_core: restore optimal ICM memory allocation

Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought two regressions caught in our regression suite.

The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
or 6.25 % which is unacceptable since ICM can be pretty large.

This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
(instead of prior 256KB)

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

Most of these allocations happen right after boot time, when we get
plenty of non fragmented memory, there is really no point being so
pessimistic and break huge pages into order-0 ones just for fun.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

Second regression is an KASAN fault, that will need further investigations.

Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream commit
885892fb378dc096693557ba4f2b875188619b36)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
replaced __GFP_DIRECT_RECLAIM with __GFP_WAIT due to kernel difference
between UEK4 and upstream

Orabug: 27718303

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Tested-by: June Wang <june.wang@oracle.com>
Tested-by: Rose Wang <rose.wang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mlx4_core: allocate ICM memory in page size chunks

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.

However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use kvzalloc to replace kcalloc which will fall back
to vmalloc automatically if kmalloc fails.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Acked-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream commit
1383cb8103bb166e50cbab1543bb3b5118fccf82)

conflicts:
drivers/net/ethernet/mellanox/mlx4/icm.c

Orabug: 27718303

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Tested-by: June Wang <june.wang@oracle.com>
Tested-by: Rose Wang <rose.wang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

kernel/signal.c: avoid undefined behaviour in kill_something_info
When running kill(72057458746458112, 0) in userspace I hit the following
issue.

  UBSAN: Undefined behaviour in kernel/signal.c:1462:11
  negation of -2147483648 cannot be represented in type 'int':
  CPU: 226 PID: 9849 Comm: test Tainted: G    B          ---- -------   3.10.0-327.53.58.70.x86_64_ubsan+ #116
  Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SYSC_kill+0x43e/0x4d0
    SyS_kill+0xe/0x10
    system_call_fastpath+0x16/0x1b

Add code to avoid the UBSAN detection.

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(Cherry-picked from commit 4ea77014af0d)

Orabug: 28078687
CVE: CVE-2018-10124

Signed-off-by: mridula shastry <mridula.c.shastry@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: tcp: compute m_ack_seq as offset from ->write_seq

rds-tcp uses m_ack_seq to track the tcp ack# that indicates
that the peer has received a rds_message. The m_ack_seq is
used in rds_tcp_is_acked() to figure out when it is safe to
drop the rds_message from the RDS retransmit queue.

The m_ack_seq must be calculated as an offset from the right
edge of the in-flight tcp buffer, i.e., it should be based on
the ->write_seq, not the ->snd_nxt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b589513e6354a5fd6934823b7fd66bffad41137a)

Orabug: 28085214

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: fix bitmap position validation

[ Upstream commit 22be37acce25d66ecf6403fc8f44df9c5ded2372 ]

Currently in ext4_valid_block_bitmap() we expect the bitmap to be
positioned anywhere between 0 and s_blocksize clusters, but that's
wrong because the bitmap can be placed anywhere in the block group. This
causes false positives when validating bitmaps on perfectly valid file
system layouts. Fix it by checking whether the bitmap is within the group
boundary.

The problem can be reproduced using the following

mkfs -t ext3 -E stride=256 /dev/vdb1
mount /dev/vdb1 /mnt/test
cd /mnt/test
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
tar xf linux-4.16.3.tar.xz

This will result in the warnings in the logs

EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap

[ Changed slightly for clarity and to not drop a overflow test -- TYT ]

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers")
Cc: stable@vger.kernel.org
(cherry picked from commit 22be37acce25d66ecf6403fc8f44df9c5ded2372)

Orabug: 28167032
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: Fix bug in failover_group parsing

When supplying a module parameter value for the fail-over group as:

ib0,ib1;ib2,ib3;ib4,ib5;ib6,ib7;ib8,ib9

rdmaip / rds_rdma is unable to parse it correctly. Based on debug
prints, it does:

lab38 kernel: RDS/IB: ib0 is designated group  1
lab38 kernel: RDS/IB: ib1 is designated group  1
lab38 kernel: RDS/IB: ib2 is designated group  2
(no more prints).

This implies, that for Xn-8 systems, fail-over will not be distributed
correctly. We see:

lab38 kernel: RDS/IP: IP 192.2.23.100 migrated from ib0 to ib1:P01
lab38 kernel: RDS/IP: IP 192.2.23.102 migrated from ib2 to ib1:P03
lab38 kernel: RDS/IP: IP 192.2.23.104 migrated from ib4 to ib3:P05
lab38 kernel: RDS/IP: IP 192.2.23.106 migrated from ib6 to ib3:P07
lab38 kernel: RDS/IP: IP 192.2.23.108 migrated from ib8 to ib3:P09

That is, all fail-overs except for ib0 are migrated to ib3.

This commit fixes this. The following values for the module parameter
have been tested (in user space):

ib0
ib0,ib1
ib0,ib1,ib2
ib0,ib1,ib2,ib3
ib0;ib1
ib0,ib1;ib2,ib3
ib0,ib1,ib2;ib3,ib4,ib5
ib0,ib1,ib2,ib3;ib4,ib5,ib6,ib7
ib0;ib1;ib2
ib0,ib1;ib2,ib3;ib4,ib5
ib0,ib1,ib2;ib3,ib4,ib5;ib6,ib7,ib8
ib0,ib1,ib2,ib3;ib4,ib5,ib6,ib7;ib8,ib9,ib10,ib11
ib0;ib1;ib2;ib3
ib0,ib1;ib2,ib3;ib4,ib5;ib6,ib7
ib0,ib1,ib2;ib3,ib4,ib5;ib6,ib7,ib8;ib9,ib10,ib11
ib0,ib1,ib2,ib3;ib4,ib5,ib6,ib7;ib8,ib9,ib10,ib11;ib12,ib13,ib14,ib15

Orabug: 28198749

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
(inspired by uek-5-next commit bf8cd0080482fa23ee859cc6118c964693cb3a72)

Signed-off-by: Brian Maly <brian.maly@oracle.com>

sctp: verify size of a new chunk in _sctp_make_chunk()

When SCTP makes INIT or INIT_ACK packet the total chunk length
can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
transmitting these packets, e.g. the crash on sending INIT_ACK:

[  597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168
               put:120156 head:000000007aa47635 data:00000000d991c2de
               tail:0x1d640 end:0xfec0 dev:<NULL>
...
[  597.976970] ------------[ cut here ]------------
[  598.033408] kernel BUG at net/core/skbuff.c:104!
[  600.314841] Call Trace:
[  600.345829]  <IRQ>
[  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.436934]  skb_put+0x16c/0x200
[  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
[  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
[  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
[  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
[  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
[  600.845555]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
[  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
[  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
[  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
[  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
[  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
[  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
[  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
[  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
...

Here the chunk size for INIT_ACK packet becomes too big, mostly
because of the state cookie (INIT packet has large size with
many address parameters), plus additional server parameters.

Later this chunk causes the panic in skb_put_data():

  skb_packet_transmit()
      sctp_packet_pack()
          skb_put_data(nskb, chunk->skb->data, chunk->skb->len);

'nskb' (head skb) was previously allocated with packet->size
from u16 'chunk->chunk_hdr->length'.

As suggested by Marcelo we should check the chunk's length in
_sctp_make_chunk() before trying to allocate skb for it and
discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c)

Orabug: 28240074
CVE: CVE-2018-5803

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/sctp/sm_make_chunk.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: xt_TCPMSS: add more sanity tests on tcph->doff

Orabug: 27896802
CVE: CVE-2017-18017

Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS

I have provided three patches to fix this issue, either in xt_TCPMSS or
in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 2638fd0f92d4397884fd991d8f4925cb3f081901)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

kernel/exit.c: avoid undefined behaviour when calling wait4()
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9

The related calltrace is as follows:

  negation of -2147483648 cannot be represented in type 'int':
  CPU: 9 PID: 16482 Comm: zj Tainted: G    B          ---- -------   3.10.0-327.53.58.71.x86_64+ #66
  Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SyS_wait4+0x1cb/0x1e0
    system_call_fastpath+0x16/0x1b

Exclude the overflow to avoid the UBSAN warning.

Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 28049778
CVE: CVE-2018-10087

(Cherry-picked from commit dd83c161fbcc)
Signed-off-by: mridula shastry <mridula.c.shastry@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/module: Provide retpoline_modules_only parameter to fail non-retpoline modules

This allows developers to have right away a mechanism to see
if their module is incorrect.

Note that we enable this regardless whether the OS is using
retpoline or IBRS. This is on purpose.

Orabug: 28071992
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
---
v2: s/nonretpoline_fail/retpoline_modules_only/

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/fpu: Make eager FPU default

This is equivalent of upstream's commit 58122bf1d856 ("x86/fpu:
Default eagerfpu=on on all CPUs").

Orabug: 28135099
CVE: CVE-2018-3665

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

crypto: drbg - fix shadow copy of the test buffer

drbg->test_data.list is expected to be empty.
tmp->test_data.list should be empty(head->next == head) as well,
otherwise the specified test entropy will not be used in drbg_seed.

Fixes: b07ca65d8 ("crypto: drbg - Convert to new rng interface")
Orabug: 28078838

Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netlink: add a start callback for starting a netlink dump

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06)

Orabug: 27169581
CVE: CVE-2017-16939

Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ipsec: Fix aborted xfrm policy dump crash

An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.

The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.

This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.

Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 1137b5e2529a8f5ca8ee709288ecba3e68044df2)

Orabug: 27169581
CVE: CVE-2017-16939

Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: prevent RDS connections using stale ARP entries

Active bonding fallback is an asynchronous event in the cluster. Thus,
there is a potential race where a host has sent out a CM REQ, using a
wrong remote gid, before IP migration happens at the remote end. The
side effect of this issue causes RDS connections only use one physical
IB port. When that particular IB port goes down, it causes send
completion error on those RDS connections and eventually increases the
brownout time.

To avoid this phenomenon, RDS has to reject all incoming CM REQ after
RDMA_CM_EVENT_DISCONNECTED or RDMA_CM_EVENT_ADDR_CHANGE events until
route resolution has been performed locally. By doing so, we can
ensure that the gratituos ARPs are sent to the remote end and RDS
connections are established based on the correct ARP entries.

Orabug: 28149101

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
(cherry picked from commit d7f00e2a0d30b0f14a763d96e6c8ae2ddb40a281
repo https://linux-git.us.oracle.com/UEK/linux-wguay-public)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/rds/rds.h
net/rds/ib_cm.c

Fixed checkpatch issues.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: Avoid stalled connection due to CM REQ retries

RDS drops a connection and destroys its cm_id once a CM REJ is sent. In a
congested fabric, there is a race where a remote node receives a CM REJ
after CM has retried another CM REQ. In this scenario, the cm_id that sends
the CM REQ is no longer exists even though the remote end might respond
with a CM REP, and wait for an incoming CM RTU. This RDS connection
establishment is stuck until the connection is destroyed after the CM
timeout. As a result, this leads to a very long brownout time. Thus, this
patch adds a mechanism to detect a rejected CM REQ and rejects all the
subsequent CM REQ that are retried by the CM.

Orabug: 28068627

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
(cherry picked from commit c5c4f1472bc788ddc69af713f975ad92bdefe206
repo https://linux-git.us.oracle.com/UEK/linux-wguay-public)

Conflict:
net/rds/ib_cm.c

Made it checkpatch clean.

v1->v2:
Added Shannon's recommendations

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: use one sided reconnection during a race

This commit reverts commit 812c02791add ("RDS: restore the exponential
back-off scheme") to use one sided reconnection when a race is
detected. When a race is detected, the active side reconnects as fast
as possible, whereas the passive side wait for 15s.

Orabug: 28068627

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
(cherry picked from commit 464c84386ab55a2700d963619a470a55e53a1b66
repo https://linux-git.us.oracle.com/UEK/linux-wguay-public)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/rds/ib_cm.c
net/rds/rdma_transport.c
net/rds/threads.c

Made it checkpatch clean.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "Revert "net/rds: Revert "RDS: add reconnect retry scheme for stalled"

This reverts commit fde8284e499b ("Revert "net/rds: Revert "RDS: add
reconnect retry scheme for stalled connections""").

Here is the commit message for the first revert:

<quote>
Commit 54312602fb26 ("RDS: add reconnect retry scheme for stalled
connections") introduces the rds_reconnect_timeout to retry the
connection establishment after sysctl_reconnect_retry_ms (default is
1000ms). Nevertheless, this proactive mechanism is overkilled and it
is causing long brownout time in the virtualized environment. In
short, below are the justifications why said commit is reverted.

a) The retry counter starts ticking after RDS received an ADDR_CHANGE
event. After receiving an ADDR_CHANGE event, RDS needs to perform shutdown
via shutdown_worker. Then, initiate a new connection via connect_worker.
Eventually, a CM REQ is only sent out after RDS received
RDMA_CM_EVENT_ADDR_RESOLVED and RDMA_CM_EVENT_ROUTE_RESOLVED events. If the
retry is made to cater for stalled connection due to missing CM messages,
the retry should only happen after a CM REQ is sent. With the current
retry scheme (and with the default 1000 ms) that happens after ADDR_CHANGE
event, it introduces congestion in the single threaded workqueue.

b) Assuming that we modify the retry counter to start ticking after a
CM REQ message is sent out. By introducing another retry timeout, it
complicates the system tuning. Why? First, the
sysctl_reconnect_retry_ms relies on the underlying
cma_response_timeout. Any modication of cma_response_timeout requires
to tune sysctl_reconnect_retry_ms. Second, it is hard to find a
universal timing that fits all configurations (bare-metal,
virtualized, mixed environment, and homo/heterogenous system).
</quote>

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/rds/ib_cm.c
net/rds/rdma_transport.c
net/rds/threads.c

Fixed checkpatch warnings.

Orabug: 28068627

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

(cherry picked from commit 4855c92dbb7b3b85c23e88ab7ca04f99b9677b41)
Orabug: 22910685

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Tested-by: John Sobecki <john.sobecki@oracle.com>
Reviewed-by: Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: Assign the correct service level

Commit 2bbf6158c769 ("net/rds: remove the RDS specific path record
caching") has changed the service level (SL) to TOS mapping to be
1:1. Nevertheless, it has assigned the SL to be a 8-bit value even though
IB specification section 7.7.3 mentioned that the SL is only a 4-bit
field. Thus, this patch assigns the SL correctly.

Orabug: 27607213

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reported-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
(cherry picked from commit 8305552e624618c7196575d5a6ae64c0c024fc20
repo uek5)

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

target: Re-add missing SCF_ACK_KREF assignment in v4.1.y

This patch fixes a regression in >= v4.1.y code where the original
SCF_ACK_KREF assignment in target_get_sess_cmd() was dropped upstream
in commit 054922bb, but the series for addressing TMR ABORT_TASK +
LUN_RESET with fabric session reinstatement in commit febe562c20 still
depends on this code in transport_cmd_finish_abort().

The regression manifests itself as a se_cmd->cmd_kref +1 leak, where
ABORT_TASK + LUN_RESET can hang indefinately for a specific I_T session
for drivers using SCF_ACK_KREF, resulting in hung kthreads.

This patch has been verified with v4.1.y code.

Reported-by: Vaibhav Tandon <vst@datera.io>
Tested-by: Vaibhav Tandon <vst@datera.io>
Cc: Vaibhav Tandon <vst@datera.io>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 527268df31e57cf2b6d417198717c6d6afdb1e3e)

Orabug: 27781132

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

target: Fix LUN_RESET active I/O handling for ACK_KREF

This patch fixes a NULL pointer se_cmd->cmd_kref < 0
refcount bug during TMR LUN_RESET with active se_cmd
I/O, that can be triggered during se_cmd descriptor
shutdown + release via core_tmr_drain_state_list() code.

To address this bug, add common __target_check_io_state()
helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
both cases ahead of last target_put_sess_cmd() after
TFO->aborted_task() -> transport_cmd_finish_abort()
callback has completed.

It also introduces SCF_ACK_KREF to determine when
transport_cmd_finish_abort() needs to drop the second
extra reference, ahead of calling target_put_sess_cmd()
for the final kref_put(&se_cmd->cmd_kref).

It also updates transport_cmd_check_stop() to avoid
holding se_cmd->t_state_lock while dropping se_cmd
device state via target_remove_from_state_list(), now
that core_tmr_drain_state_list() is holding the
se_device lock while checking se_cmd state from
within TMR logic.

Finally, move transport_put_cmd() release of SGL +
TMR + extended CDB memory into target_free_cmd_mem()
in order to avoid potential resource leaks in TMR
ABORT_TASK + LUN_RESET code-paths. Also update
target_release_cmd_kref() accordingly.

Reviewed-by: Quinn Tran <quinn.tran@qlogic.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 27781132

(backport upstream commit febe562c20dfa8f33bee7d419c6b517986a5aa33)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/target/target_core_tmr.c
drivers/target/target_core_transport.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

target: Invoke release_cmd() callback without holding a spinlock

This patch fixes the following kernel warning because it avoids that
IRQs are disabled while ft_release_cmd() is invoked (fc_seq_set_resp()
invokes spin_unlock_bh()):

WARNING: CPU: 3 PID: 117 at kernel/softirq.c:150 __local_bh_enable_ip+0xaa/0x110()
Call Trace:
[<ffffffff814f71eb>] dump_stack+0x4f/0x7b
[<ffffffff8105e56a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8105e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81062b2a>] __local_bh_enable_ip+0xaa/0x110
[<ffffffff814ff229>] _raw_spin_unlock_bh+0x39/0x40
[<ffffffffa03a7f94>] fc_seq_set_resp+0xe4/0x100 [libfc]
[<ffffffffa02e604a>] ft_free_cmd+0x4a/0x90 [tcm_fc]
[<ffffffffa02e6972>] ft_release_cmd+0x12/0x20 [tcm_fc]
[<ffffffffa042bd66>] target_release_cmd_kref+0x56/0x90 [target_core_mod]
[<ffffffffa042caf0>] target_put_sess_cmd+0xc0/0x110 [target_core_mod]
[<ffffffffa042cb81>] transport_release_cmd+0x41/0x70 [target_core_mod]
[<ffffffffa042d975>] transport_generic_free_cmd+0x35/0x420 [target_core_mod]

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joern Engel <joern@logfs.org>
Reviewed-by: Andy Grover <agrover@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Orabug: 27781132

(backport upstream commit 9ff9d15eddd13ecdd41876c5e1f31ddbb127101c)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/target/target_core_tmr.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Remove the Disabling Spectre v2 mitigation retpoline

as we actually not disabling it - just enabling IBRS (and still
having retpoline enabled).

Orabug: 27897282
Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Report properly retpoline+IBRS

If the retpoline fallback is activated we want to properly
report that both of them are on.

This involves a bit of dance where we do not change
the spectre_v2_enabled ever again. But we still need the logic
to detect whether IBRS + retpoline is enabled. To make it
easier we change the name of the function to be more obvious.

OraBug: 27897282
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
v2: Remove the pr_err('Spectre v2 mitigation IBRS is set.\n')
Make the retpoline_only_enabled optimized.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/spec_ctrl.c

x86/bugs: Don't lie when fallback retpoline is engaged

That is we actually have two mitigations in effect: retpoline
and IBRS.

OraBug: 27897282

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

fs: aio: fix the increment of aio-nr and counting against aio-max-nr

Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4':

    ioctx_alloc()
    ...
        nr_events = max(nr_events, num_possible_cpus() * 4);
        nr_events *= 2;
    ...
        ctx->max_reqs = nr_events;
    ...
        aio_nr += ctx->max_reqs;
    ....

This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.

For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.

Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':

    ioctx_alloc()
    ...
        if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
        ...
            goto err_ctx;
    ...

This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reported-by: Lekshmi C. Pillai <lekshmi.cpillai@in.ibm.com>
Tested-by: Lekshmi C. Pillai <lekshmi.cpillai@in.ibm.com>
Tested-by: Paul Nguyen <nguyenp@us.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Orabug: 28079082

(cherry picked from commit 2a8a98673c13cb2a61a6476153acf8344adfa992)
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Reviewed-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qla2xxx: Enable buffer boundary check when DIF bundling is on.

The Qlogic firmware requires the upper 32-bit of dma buffer address
not flipped, this happens when DIF bundling is enabled and the SGE
buffer address plus length changes the upper 32-bit address, a local
buffer is used for DIF information.

Orabug: 28130589

Co-authored-by: Giri Malavali <giridhar.malavali@cavium.com>
Co-authored-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Reviewed-by: Giri Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

kernel: sys.c: missing break for prctl spec ctrl

In the process of backporting speculation control bits a break was missed which
is causing prctl with PR_SET_SPECULATION_CTRL to also fail.

Orabug: 28144775

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/IBRS: Keep SSBD mitigation in effect if spectre_v2=ibrs is selected

From: Boris Ostrovsky <boris.ostrovsky@oracle.com>

If the system admins picks to disable memory disambiguation at bootup
(spec_store_bypass_disable=on) and enable IBRS (spectre_v2=ibrs) we
end up briefly at bootup disabling memory disambiguation and then
IBRS SPEC_CTRL kicks - and memory disambiguation is enabled back again.

The logic is there for the 'auto' case, but we missed it for
the other ones. Lets fix it up.

OraBug: 28071800

Fixes: 89981b51b9240ec16e506304990ce2311e93285b ("x86/speculation: Add prctl for Speculative Store Bypass mitigation")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

fs/pstore: update the backend parameter in pstore module

This patch update the module parameter backend, so it is visible
through /sys/module/pstore/parameters/backend.

For example:
if pstore backend is ramoops, with this patch:
# cat /sys/module/pstore/parameters/backend
ramoops
and without this patch:
# cat /sys/module/pstore/parameters/backend
(null)

Signed-off-by: Wang Long <long.wanglong@huawei.com>
Acked-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
(cherry picked from commit 42222c2a5d5da7fe4839491d5c44034f40761071)
Orabug: 27994372
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

kvm: vmx: Reinstate support for CPUs without virtual NMI

[ Upstream commit 8a1b43922d0d1279e7936ba85c4c2a870403c95f ]

This is more or less a revert of commit 2c82878b0cb3 ("KVM: VMX: require
virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
only had virtual NMIs in some SKUs.

The revert is not trivial because in the meanwhile there have been several
fixes to nested NMI injection.  Therefore, the entire vNMI state is moved
to struct loaded_vmcs.

Another change compared to before the patch is a simplification here:

       if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
           !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
                                       get_vmcs12(vcpu))))) {

The final condition here is always true (because nested_cpu_has_virtual_nmis
is always false) and is removed.

Fixes: 2c82878b0cb38fd516fd612c67852a6bbf282003
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 8a1b43922d0d1279e7936ba85c4c2a870403c95f)

Orabug: 28041210

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/vmx.c

Conflicts are due to the fact that a large number of patches affecting this
area of the code have not been ported to UEK4-QU7. Portions of the
cherry-picked patch had to be manually inserted in the correct places, but
no logical changes were required.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

dm crypt: add big-endian variant of plain64 IV

The big-endian IV (plain64be) is needed to map images from extracted
disks that are used in some external (on-chip FDE) disk encryption
drives, e.g.: data recovery from external USB/SATA drives that support
"internal" encryption.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 7e3fd855ad66ffc0dd926911da23dd21e59f9462)
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Qiang Wang <qiang.z.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28043932
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/md/dm-crypt.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Rename SSBD_NO to SSB_NO

The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit 240da953)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/msr-index.h
[msr-index.h: different file location]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD

Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM. This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit bc226f07)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/kvm_host.h
arch/x86/kernel/cpu/common.c
arch/x86/kvm/cpuid.c
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c
arch/x86/kvm/x86.c
[
We did not have cpu_has_high_real_mode_segbase entry at all.
Also msr_info is not in this patchset, I will take care of it
in Orabug: 28069548 in a future patchset.
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG

Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to
x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or
X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl
argument to check whether the state must be modified on the host. The
update reuses speculative_store_bypass_update() so the ZEN-specific sibling
coordination can be reused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit 47c61b39)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/spec-ctrl.h
arch/x86/kernel/cpu/bugs.c
[
bugs.c: different file name
spec-ctrl.h: contextual changes. Diff couldn't be applied
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Rework spec_ctrl base and mask logic

x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"

Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.

Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit be6fcb54)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c

[Konrad:
As we have the IBRS support and boy that makes it double hard.
The first part of this patch is to invert the mask, no biggie.

But the mask for the IBRS mode (that is - we want to set SPEC_CTRL
MSR to 1<<0 in kernel space, but in user-space we want it to be
1<<1) didn't set the SSBD bit as we should not set the SSBD
in kernel mode. But with the inversion that is OK.

Next part is the two values - x86_spec_ctrl_base and x86_spec_ctrl_priv.

The x86_spec_ctrl_base is what userspace is going to have (so
tack on SSBD), and x86_spec_ctrl_priv what runs in kernel (so
tack on IBRS, but _NOT_ SSBD).

That means the whole logic of filtering the supported SPEC_CTRL
value depending on what the host supports should be seeded
with x86_spec_ctrl_priv.

With all that the logic works - we end up ANDing our mask
and what we can support (and the initial boot-time value of the
MSR), and then ORing what the guest wants with our mask.

All the while supporting any other bits in the SPEC_CTRL that
may come in the future.

And this logic is fine on AMD too - where the SSBD bit does not
show up in the SPEC_CTRL mask

P.S.
To make it more fun the x86_spec_ctrl_priv |= IBRS is set in a
header (see set_ibrs_inuse).]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Expose x86_spec_ctrl_base directly

x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.

Remove the function and export the variable instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit fa8ac498)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/nospec-branch.h
arch/x86/include/asm/spec-ctrl.h
arch/x86/kernel/cpu/bugs.c
[Contextual changes: things weren't in the expected place]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}

Function bodies are very similar and are going to grow more almost
identical code. Add a bool arg to determine whether SPEC_CTRL is being set
for the guest or restored to the host.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit cc69b349)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[Different filename bugs_64.c]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Rework speculative_store_bypass_update()

The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse
speculative_store_bypass_update() to avoid code duplication. Add an
argument for supplying a thread info (TIF) value and create a wrapper
speculative_store_bypass_update_current() which is used at the existing
call site.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit 0270be3e)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[Different filename (bugs_64.c)]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Add virtualized speculative store bypass disable support

Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit 11fb0683)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/msr-index.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/process.c
[
cpufeatures.h: different file name and different index
msr-index.h: different file location
bugs.c: different file name
process.c: different file structure
common.c: This is because we skipped the first two patches from the patch
series. We do no have enough feature bits to align all the actual feature in
our cpufeature structure. We created a synthetic feature and this is where we
detect and set it.
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL

AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit ccbcd267)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c
[
We skipped cherry-picking two commits from upstream:
x86/cpufeatures: Disentangle SSBD enumeration ...
x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS ...
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP ...

Because we do not enough space in word 7 of synthetic bits for cpufeature. Also
we do not have the word 13 to move the bits around (like upstream did). We
cannot add word 13 because we will break kABI. So we do no have VIRT_SPEC_CTRL
cpufeature but we have VIRT_SSBD cpufeature which is a bit in virt_spec_ctrl
MSR.
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Handle HT correctly on AMD

The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when
hyperthreading is enabled the SSBD bit toggle needs to take both cores into
account. Otherwise the following situation can happen:

CPU0 CPU1

disable SSB
disable SSB
enable SSB <- Enables it for the Core, i.e. for CPU0 as well

So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled
again.

On Intel the SSBD control is per core as well, but the synchronization
logic is implemented behind the per thread SPEC_CTRL MSR. It works like
this:

CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL

i.e. if one of the threads enables a mitigation then this affects both and
the mitigation is only disabled in the core when both threads disabled it.

Add the necessary synchronization logic for AMD family 17H. Unfortunately
that requires a spinlock to serialize the access to the MSR, but the locks
are only shared between siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit 1f50ddb4)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/process.c
arch/x86/kernel/smpboot.c
[Contextual changes: in UEK4 files are pretty different and do not match the diff]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpufeatures: Add FEATURE_ZEN

Add a ZEN feature bit so family-dependent static_cpu_has() optimizations
can be built for ZEN.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit d1035d97)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
[Different filename and differnet bit for FEATURE_ZEN]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpu/AMD: Fix erratum 1076 (CPB bit)

CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Orabug: 28063992
CVE: CVE-2018-3639

(cherry picked from commit f7f3dc00)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

perf/hwbp: Simplify the perf-hwbp code, fix documentation

Orabug: 27947602
CVE: CVE-2018-1000199

Annoyingly, modify_user_hw_breakpoint() unnecessarily complicates the
modification of a breakpoint - simplify it and remove the pointless
local variables.

Also update the stale Docbook while at it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f67b15037a7a50c57f72e69a6d59941ad90a0f0f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "perf/hwbp: Simplify the perf-hwbp code, fix documentation"

Orabug: 27947602

Wrong CVE tag was included in this commit

This reverts commit 2ff296f2eb2475d4c3f206e145c4b2132d86bc2c.

KVM: SVM: Move spec control call after restore of GS

svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but
before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current'
to determine the host SSBD state of the thread. 'current' is GS based, but
host GS is not yet restored and the access causes a triple fault.

Move the call after the host GS restore.

OraBug: 28041771
CVE: CVE-2018-3639

Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/svm.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Fix the parameters alignment and missing void

Fixes: 7bb4d366c ("x86/bugs: Make cpu_show_common() static")
Fixes: 24f7fc83b ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit ffed645e3be0e32f8e9ab068d257aee8d0fe8eec)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[It is called bugs_64 in UEK4]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Make cpu_show_common() static

cpu_show_common() is not used outside of arch/x86/kernel/cpu/bugs.c, so
make it static.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 7bb4d366cba992904bffa4820d24e70a3de93e76)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[It is called bugs_64.c]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Fix __ssb_select_mitigation() return type

__ssb_select_mitigation() returns one of the members of enum ssb_mitigation,
not ssb_mitigation_cmd; fix the prototype to reflect that.

OraBug: 28041771
CVE: CVE-2018-3639

Fixes: 24f7fc83b9204 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit d66d8ff3d21667b41eddbe86b35ab411e40d8c5f)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[It is called bugs_64.c]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

Documentation/spec_ctrl: Do some minor cleanups

Fix some typos, improve formulations, end sentences with a fullstop.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit dd0792699c4058e63c0715d9a7c2d40226fcdddc)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

proc: Use underscores for SSBD in 'status'

The style for the 'status' file is CamelCase or this. _.

Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations")
OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit e96f46ee8587607a828f783daa6eb5b44d25004d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Rename _RDS to _SSBD

Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).

Hence changing it.

It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.

Also fixed the missing space in X86_FEATURE_AMD_SSBD.

[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 9f65fb29374ee37856dbad847b4e121aab72b510)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/msr-index.h
arch/x86/include/asm/spec-ctrl.h
arch/x86/include/asm/thread_info.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/intel.c
arch/x86/kernel/process.c
arch/x86/kvm/cpuid.c
arch/x86/kvm/vmx.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass

Unless explicitly opted out of, anything running under seccomp will have
SSB mitigations enabled. Choosing the "prctl" mode will disable this.

[ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit f21b53b20c754021935ea43364dbf53778eeba32)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
[It is called Documentation/kernel-paramters.txt]

arch/x86/include/asm/nospec-branch.h

[Different name..]
arch/x86/kernel/cpu/bugs.c
[And again, bugs_64.c, and also we did provide the SPEC_STORE_BYPASS_USERSPACE]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

seccomp: Move speculation migitation control to arch code

The migitation control is simpler to implement in architecture code as it
avoids the extra function call to check the mode. Aside of that having an
explicit seccomp enabled mode in the architecture mitigations would require
even more workarounds.

Move it into architecture code and provide a weak function in the seccomp
code. Remove the 'which' argument as this allows the architecture to decide
which mitigations are relevant for seccomp.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 8bf37d8c067bb7eb8e7c381bdadf9bd89182b6bc)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[Which is called bugs_64.c in UEK4]
include/linux/nospec.h

[Which is called nospec-branch.h]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

seccomp: Add filter flag to opt-out of SSB mitigation

If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 00a02d0c502a06d15e07b857f8ff921e3e402675)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
include/linux/seccomp.h
include/uapi/linux/seccomp.h
tools/testing/selftests/seccomp/seccomp_bpf.c
[No eBPF in UEK4]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

seccomp: Use PR_SPEC_FORCE_DISABLE

Use PR_SPEC_FORCE_DISABLE in seccomp() because seccomp does not allow to
widen restrictions.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit b849a812f7eb92e96d1c8239b06581b2cfd8b275)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

prctl: Add force disable speculation

For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 356e4bfff2c5489e016fdb925adbf12a1e3950ee)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c

[File is called bugs_64.c]
include/linux/sched.h

Signed-off-by: Brian Maly <brian.maly@oracle.com>

seccomp: Enable speculation flaw mitigations

When speculation flaw mitigations are opt-in (via prctl), using seccomp
will automatically opt-in to these protections, since using seccomp
indicates at least some level of sandboxing is desired.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 5c3070890d06ff82eecb808d02d2ca39169533ef)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
kernel/seccomp.c
[The include file is called nospec-branch.h instead of nospec.h]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

proc: Provide details on speculation flaw mitigations

As done with seccomp and no_new_privs, also show speculation flaw
mitigation state in /proc/$pid/status.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit fae1fa0fc6cca8beee3ab8ed71d54f9a78fa3f64)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
fs/proc/array.c
[Missing seq_putc]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

nospec: Allow getting/setting on non-current task

Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.

This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 7bbf1373e228840bb0295a2ca26d548ef37f448e)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
include/linux/nospec.h
kernel/sys.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/IBRS: Disable SSB (RDS) if IBRS is sslected for spectre_v2.

If =userspace is selected we want frob the SPEC_CTRL MSR on every
userspace entrace (disable memory disambigation), and also on every
kernel entrace (enable memory disambiguation). However we have
to be careful as having MSR frobbed and retpoline being enabled
slows the machine even further.

Therefore if possible swap over to using SPEC_CTRL MSR (IBRS) on
every kernel entrace instead of using retpoline.

Naturally this heuristic is controlled by various knobs.

To summarize, if "spectre_v2=retpoline spec_store_bypass_disable=userspace"
is set then we will switch the spectre_v2 to IBRS.

This table may explain this better:
effect    | spectre_v2  | spec_store_bypass_disable | remark
==========+=============+===========================+======
IBRS      | ibrs        | userspace                 |
IBRS      | auto        | userspace                 | *1 *2
IBRS      | retpoline   | userspace                 | *1
IBRS      | ibrs        | boot                      |
retpoline | auto        | boot                      |
retpoline | retpoline   | boot                      |
retpoline | auto        | boot                      |
retpoline | auto        | auto                      |

*1: If spectre_v2_heuristic=off or spectre_v2_heuristic=rds=off
is selected then the spec_store_bypass_disable=userspace parameter
is not followed and the effect is both retpoline and IBRS enabled
in the kernel.

*2: If we run under Skylake+ the 'spec_store_bypass_disable=auto'
will disable retpoline and enable IBRS. If not on Skylake+, then
retpoline and IBRS are both enabled.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Add prctl for Speculative Store Bypass mitigation

Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.

Andi Kleen provided the following rationale (slightly redacted):

There are multiple levels of impact of Speculative Store Bypass:

1) JITed sandbox.
    It cannot invoke system calls, but can do PRIME+PROBE and may have call
    interfaces to other code

2) Native code process.
    No protection inside the process at this level.

3) Kernel.

4) Between processes.

The prctl tries to protect against case (1) doing attacks.

If the untrusted code can do random system calls then control is already
lost in a much worse way. So there needs to be system call protection in
some way (using a JIT not allowing them or seccomp). Or rather if the
process can subvert its environment somehow to do the prctl it can already
execute arbitrary code, which is much worse than SSB.

To put it differently, the point of the prctl is to not allow JITed code
to read data it shouldn't read from its JITed sandbox. If it already has
escaped its sandbox then it can already read everything it wants in its
address space, and do much worse.

The ability to control Speculative Store Bypass allows to enable the
protection selectively without affecting overall system performance.

Based on an initial patch from Tim Chen. Completely rewritten.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/x86/kernel/cpu/bugs.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86: thread_info.h: move RDS from index 5 to 23

In UEK4, the thread flags field is split in two parts:
- lower bits of the word which are used usually for "pending work-to-be-done"
- upper bits of the word

There is a comment in arch/x86/include/asm/thread_info.h:88 where it says that
the lower bits are hard-coded in entry_64.S. In entry_64.S a mask of 0x0000ffff
is used to check the state of the thread and determine if it would go to
userspace or not. Because we used bit "5", which was in the lower bits part,
one of the checked condition was always true and the program never returned
from kernel.

We moved RDS to bit 23 which was free to solve the issue.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/process: Allow runtime control of Speculative Store Bypass

The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.

Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.

If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.

Update the KVM related speculation control functions to take TID_RDS into
account as well.

Based on a patch from Tim Chen. Completely rewritten.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 885f82bfbc6fefb6664ea27965c3ab9ac4194b8c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/msr-index.h
arch/x86/include/asm/thread_info.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/process.c
[u64->u32]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

prctl: Add speculation control prctls

Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.

PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:

Bit  Define           Description
0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
                      PR_SET_SPECULATION_CTRL
1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
                      disabled
2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
                      enabled

If all bits are 0 the CPU is not affected by the speculation misfeature.

If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.

PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.

The common return values are:

EINVAL  prctl is not implemented by the architecture or the unused prctl()
        arguments are not 0
ENODEV  arg2 is selecting a not supported speculation misfeature

PR_SET_SPECULATION_CTRL has these additional return values:

ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO   prctl control of the selected speculation misfeature is disabled

The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.

Based on an initial patch from Tim Chen and mostly rewritten.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit b617cfc858161140d69cc0b5cc211996b557a1c7)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
Documentation/userspace-api/index.rst
include/linux/nospec.h
include/uapi/linux/prctl.h
kernel/sys.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Create spec-ctrl.h to avoid include hell

Having everything in nospec-branch.h creates a hell of dependencies when
adding the prctl based switching mechanism. Move everything which is not
required in nospec-branch.h to spec-ctrl.h and fix up the includes in the
relevant files.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 28a2775217b17208811fa43a9e96bd1fdf417b86)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest

Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various
combinations of SPEC_CTRL MSR values.

The handling of the MSR (to take into account the host value of SPEC_CTRL
Bit(2)) is taken care of in patch:

KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit da39556f66f5cfe8f9c989206974f1cb16ca5d7c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kvm/cpuid.c
arch/x86/kvm/vmx.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested

AMD does not need the Speculative Store Bypass mitigation to be enabled.

The parameters for this are already available and can be done via MSR
C001_1020. Each family uses a different bit in that MSR for this.

[ tglx: Expose the bit mask via a variable and move the actual MSR fiddling
into the bugs code as that's the right thing to do and also required
to prepare for dynamic enable/disable ]

OraBug: 28041771
CVE: CVE-2018-3639

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 764f3c21588a059cd783c6ba0734d4db2d72822d)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/cpu/amd.c
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Whitelist allowed SPEC_CTRL MSR values

Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the
future (or in fact use different MSRs for the same functionality).

As such a run-time mechanism is required to whitelist the appropriate MSR
values.

[ tglx: Made the variable __ro_after_init ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1115a859f33276fe8afb31c60cf9d8e657872558)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[It is called bugs_64.c]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/intel: Set proper CPU features and setup RDS

Intel CPUs expose methods to:

- Detect whether RDS capability is available via CPUID.7.0.EDX[31],

- The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS.

- MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS.

With that in mind if spec_store_bypass_disable=[auto,on] is selected set at
boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it.

Note that this does not fix the KVM case where the SPEC_CTRL is exposed to
guests which can muck with it, see patch titled :
KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS.

And for the firmware (IBRS to be set), see patch titled:
x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits

[ tglx: Distangled it from the intel implementation and kept the call order ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 772439717dbf703b39990be58d8d4e3e4ad0598a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/msr-index.h

[Different file names]
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
arch/x86/kernel/cpu/cpu.h
arch/x86/kernel/cpu/intel.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation

Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.

Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.

As a first step to mitigate against such attacks, provide two boot command
line control knobs:

nospec_store_bypass_disable
spec_store_bypass_disable=[off,auto,on]

By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:

- auto - Kernel detects whether your CPU model contains an implementation
  of Speculative Store Bypass and picks the most appropriate
  mitigation.

- on   - disable Speculative Store Bypass
- off  - enable Speculative Store Bypass

[ tglx: Reordered the checks so that the whole evaluation is not done
   when the CPU does not support RDS ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 24f7fc83b9204d20f878c57cb77d261ae825e033)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt

[It is called Documentation/kernel-parameters.txt]

arch/x86/include/asm/cpufeatures.h

[It is called cpufeature.h]

arch/x86/kernel/cpu/bugs.c

[And it is bugs_64.c]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpufeatures: Add X86_FEATURE_RDS

Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU
supports Reduced Data Speculation.

[ tglx: Split it out from a later patch ]

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0cc5fa00b0a88dad140b4e5c2cead9951ad36822)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
[It is called cpufeature.h]
[We also need to use the scattered function to set the flag similar
to how the rest of CPUID.7.0.EDX[31] are done]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Expose /sys/../spec_store_bypass

Add the sysfs file for the new vulerability. It does not do much except
show the words 'Vulnerable' for recent x86 cores.

Intel cores prior to family 6 are known not to be vulnerable, and so are
some Atoms and some Xeon Phi.

It assumes that older Cyrix, Centaur, etc. cores are immune.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c456442cd3a59eeb1d60293c26cbe2ff2c4e42cf)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
[It is a different file - cpufeature.h]

arch/x86/kernel/cpu/bugs.c
[As well, called bugs_64.c]

arch/x86/kernel/cpu/common.c
[Location of cpu_set_bug_bits is different and also had to drop the __initconst]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpu/intel: Add Knights Mill to Intel family

Add CPUID of Knights Mill (KNM) processor to Intel family list.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr.luc@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0047f59834e5947d45f34f5f12eb330d158f700b)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpu: Rename Merrifield2 to Moorefield

Merrifield2 is actually Moorefield.

Rename it accordingly and drop tail digit from Merrifield1.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906184254.94440-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f5fbf848303c8704d0e1a1e7cabd08fd0a49552f)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/platform/atom/punit_atom_debug.c
[Does not exist]
drivers/pci/pci-mid.c
drivers/powercap/intel_rapl.c
[We never added the support for it]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs, KVM: Support the combination of guest and host IBRS

A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.

But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.

This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.

Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 5cf687548705412da47c9cec342fd952d71ed3d5)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c
[We need to preserve the check for ibrs_inuse - which we can do now in the
functions]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/IBRS: Warn if IBRS is enabled during boot.

It should never be. But in case it is lets warn and clear it.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs/IBRS: Use variable instead of defines for enabling IBRS

This follows what "x86/bugs: Read SPEC_CTRL MSR during boot and re-use
reserved bits" patch does - that is respect the other bits of the
SPEC CTRL MSR (if any at all).

This necessitates to convert all the assembler macros over, all
the various uses of the SPEC CTRL guarded by 'use_ibrs'.

Note the not so obvious change in the assembler macro from 'cmp' to
'test' to verify the right bit being set.

And to make sure it works with the IBRS support we need to
recognize it in x86_spec_ctrl_set.

This is not upstreamed. It builds on top of IBRS backport.

OraBug: 28041771
CVE: CVE-2018-3639

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs_64.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits

The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all
the other bits as reserved. The Intel SDM glossary defines reserved as
implementation specific - aka unknown.

As such at bootup this must be taken it into account and proper masking for
the bits in use applied.

A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511

[ tglx: Made x86_spec_ctrl_base __ro_after_init ]

OraBug: 28041771
CVE: CVE-2018-3639

Suggested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1b86883ccb8d5d9506529d42dbe1a5257cb30b18)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Conflicts:
arch/x86/include/asm/nospec-branch.h
[As we don't have the firmware_restrict_branch_speculation_start and
firmware_restrict_branch_speculation_end and end up with a different
name. See commit 473ad76ea8d76f34555d764a3d5820bc1b33cabf
"x86/speculation: Use IBRS if available before calling into firmware"]

arch/x86/kernel/cpu/bugs.c
[File is called bugs_64.c in UEK4]

[Also the backport needs nospec-branch.h in different files ,and we can't
use __ro_after_init]

Signed-off-by: Brian Maly <brian.maly@oracle.com>