www.infradead.org Git - users/jedix/linux-maple.git/log

x86/speculation: STUFF_RSB dynamic enable

The STUFF_RSB overwrite macro can be enabled dynamically with
rsb_overwrite_key instead of using X86_FEATURE_STUFF_RSB.

Signed-off-by: William Roche <william.roche@oracle.com>
Co-developed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
(cherry picked from commit 84e09871beb92364bd374d8c3bc3441a8c4be593)

Orabug: 29660924

Signed-off-by: William Roche <william.roche@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/spec_ctrl.h
arch/x86/kernel/cpu/bugs.c
cpufeatures.h vs cpufeature.h in UEK4
include <linux/jump_label.h> header in spec_ctrl.h to use this feature
bugs.c vs bugs_64.c in UEK4

Signed-off-by: Brian Maly <brian.maly@oracle.com>

int3 handler better address space detection on interrupts

In order to prepare the possibility to dynamically change an
interrupt handler code with static_branch_enable/disable,
as the interrupt can equally appear while in user space or
kernel space, the int3 handler itself must better identify if
the original interrupt is from kernel or userland.

Signed-off-by: William Roche <william.roche@oracle.com>
Co-developed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
(cherry picked from commit 594fc07cd96784004254680c9e1e4b757fb0a1f5)

Orabug: 29660924

Signed-off-by: William Roche <william.roche@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S
entry/entry_64.S vs kernel/entry_64.S in UEK4

Signed-off-by: Brian Maly <brian.maly@oracle.com>

repairing out-of-tree build functionality

The current uek4 tree (only) cannot build the binrpm-pkg with an objdir
outside the source tree. This fix redirects the, incorrectly, placed
generated firmware files and firmware parsers into the objtree.

Orabug: 29755100

Signed-off-by: Mark Nicholson <mark.j.nicholson@oracle.com>
Reviewed-by: Tianyue Lan <tianyue.lan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: fix false negatives *and* false positives in ext4_check_descriptors()

Ext4_check_descriptors() was getting called before s_gdb_count was
initialized.  So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.

For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.

Fix both of these problems.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit 44de022c4382541cebdd6de4465d1f4f465ff1dd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
    fs/ext4/super.c
    [The contextual has been changed]

Orabug: 29797007

Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ocfs2: fix ocfs2 read inode data panic in ocfs2_iget

In some cases, ocfs2_iget() reads the data of inode, which has been
deleted for some reason.  That will make the system panic.  So We should
judge whether this inode has been deleted, and tell the caller that the
inode is a bad inode.

For example, the ocfs2 is used as the backed of nfs, and the client is
nfsv3.  This issue can be reproduced by the following steps.

on the nfs server side,
..../patha/pathb

Step 1: The process A was scheduled before calling the function fh_verify.

Step 2: The process B is removing the 'pathb', and just completed the call
to function dput.  Then the dentry of 'pathb' has been deleted from the
dcache, and all ancestors have been deleted also.  The relationship of
dentry and inode was deleted through the function hlist_del_init.  The
following is the call stack.
dentry_iput->hlist_del_init(&dentry->d_u.d_alias)

At this time, the inode is still in the dcache.

Step 3: The process A call the function ocfs2_get_dentry, which get the
inode from dcache.  Then the refcount of inode is 1.  The following is the
call stack.
nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)

Step 4: Dirty pages are flushed by bdi threads.  So the inode of 'patha'
is evicted, and this directory was deleted.  But the inode of 'pathb'
can't be evicted, because the refcount of the inode was 1.

Step 5: The process A keep running, and call the function
reconnect_path(in exportfs_decode_fh), which call function
ocfs2_get_parent of ocfs2.  Get the block number of parent
directory(patha) by the name of ...  Then read the data from disk by the
block number.  But this inode has been deleted, so the system panic.

Process A                                             Process B
1. in nfsd3_proc_getacl                   |
2.                                        |        dput
3. fh_to_dentry(ocfs2_get_dentry)         |
4. bdi flush dirty cache                  |
5. ocfs2_iget                             |

[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
Invalid dinode #580640: OCFS2_VALID_FL not set

[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
after error

[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G        W
4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
[283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/21/2015
[283465.549657]  0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
ffffffffa0514758
[283465.550392]  000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
0000000000000008
[283465.551056]  ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
ffff88005df9f000
[283465.551710] Call Trace:
[283465.552516]  [<ffffffff816e839c>] dump_stack+0x63/0x81
[283465.553291]  [<ffffffff816e62d3>] panic+0xcb/0x21b
[283465.554037]  [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
[283465.554882]  [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
[283465.555768]  [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
[ocfs2]
[283465.556683]  [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
[283465.557408]  [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
[ocfs2]
[283465.557973]  [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
[ocfs2]
[283465.558525]  [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
[283465.559082]  [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
[283465.559622]  [<ffffffff81297c05>] reconnect_path+0xb5/0x300
[283465.560156]  [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
[283465.560708]  [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[283465.561262]  [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
[283465.561932]  [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
[283465.562862]  [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
[283465.563697]  [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
[283465.564510]  [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
[283465.565358]  [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
[sunrpc]
[283465.566272]  [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
[283465.567155]  [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
[283465.568020]  [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
[283465.568962]  [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
[283465.570112]  [<ffffffff810a622b>] kthread+0xcb/0xf0
[283465.571099]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
[283465.572114]  [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
[283465.573156]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180

Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e091eab028f9253eac5c04f9141bbc9d170acab3)

Orabug: 29233739

Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@Oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer

The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
as length value. The opt->len however is in control over the remote user
and can be used by an attacker to gain access beyond the bounds of the
actual packet.

To prevent any potential leak of heap memory, it is enough to check that
the resulting len calculation after calling l2cap_get_conf_opt is not
below zero. A well formed packet will always return >= 0 here and will
end with the length value being zero after the last option has been
parsed. In case of malformed packets messing with the opt->len field the
length value will become negative. If that is the case, then just abort
and ignore the option.

In case an attacker uses a too short opt->len value, then garbage will
be parsed, but that is protected by the unknown option handling and also
the option parameter size checks.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Orabug: 29526426
CVE: CVE-2019-3459
(cherry picked from commit 7c9cbd0b5e38a1672fcd137894ace3b042dfbf69)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

When doing option parsing for standard type values of 1, 2 or 4 octets,
the value is converted directly into a variable instead of a pointer. To
avoid being tricked into being a pointer, check that for these option
types that sizes actually match. In L2CAP every option is fixed size and
thus it is prudent anyway to ensure that the remote side sends us the
right option size along with option paramters.

If the option size is not matching the option type, then that option is
silently ignored. It is a protocol violation and instead of trying to
give the remote attacker any further hints just pretend that option is
not present and proceed with the default values. Implementation
following the specification and its qualification procedures will always
use the correct size and thus not being impacted here.

To keep the code readable and consistent accross all options, a few
cosmetic changes were also required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Orabug: 29526426
CVE: CVE-2019-3459
(cherry picked from commit af3d5d1c87664a4f150fcf3534c6567cb19909b0)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

HID: debug: fix the ring buffer implementation

commit 13054abbaa4f1fd4e6f3b4b63439ec033b4c8035 upstream.

Ring buffer implementation in hid_debug_event() and hid_debug_events_read()
is strange allowing lost or corrupted data. After commit 717adfdaf147
("HID: debug: check length before copy_to_user()") it is possible to enter
an infinite loop in hid_debug_events_read() by providing 0 as count, this
locks up a system. Fix this by rewriting the ring buffer implementation
with kfifo and simplify the code.

This fixes CVE-2019-3819.

v2: fix an execution logic and add a comment
v3: use __set_current_state() instead of set_current_state()

Backport to v4.4: some (tree-wide) patches are missing in v4.4 so
cherry-pick relevant pieces from:
* 6396bb22151 ("treewide: kzalloc() -> kcalloc()")
* a9a08845e9ac ("vfs: do bulk POLL* -> EPOLL* replacement")
* 92529623d242 ("HID: debug: improve hid_debug_event()")
* 174cd4b1e5fb ("sched/headers: Prepare to move signal wakeup & sigpending
methods from <linux/sched.h> into <linux/sched/signal.h>")

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1669187
Cc: stable@vger.kernel.org # v4.18+
Fixes: cd667ce24796 ("HID: use debugfs for events/reports dumping")
Fixes: 717adfdaf147 ("HID: debug: check length before copy_to_user()")
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b661fff5f8a0f19824df91cc3905ba2c5b54dc87)

Orabug: 29629481
CVE: CVE-2019-3819

Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: target: iscsi: Use hex2bin instead of a re-implementation

commit 1816494330a83f2a064499d8ed2797045641f92c upstream.

This change has the following effects, in order of descreasing importance:

1) Prevent a stack buffer overflow

2) Do not append an unnecessary NULL to an anyway binary buffer, which
   is writing one byte past client_digest when caller is:
   chap_string_to_hex(client_digest, chap_r, strlen(chap_r));

The latter was found by KASAN (see below) when input value hes expected size
(32 hex chars), and further analysis revealed a stack buffer overflow can
happen when network-received value is longer, allowing an unauthenticated
remote attacker to smash up to 17 bytes after destination buffer (16 bytes
attacker-controlled and one null).  As switching to hex2bin requires
specifying destination buffer length, and does not internally append any null,
it solves both issues.

This addresses CVE-2018-14633.

Beyond this:

- Validate received value length and check hex2bin accepted the input, to log
  this rejection reason instead of just failing authentication.

- Only log received CHAP_R and CHAP_C values once they passed sanity checks.

==================================================================
BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021

CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G           O      4.17.8kasan.sess.connops+ #2
Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014
Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod]
Call Trace:
dump_stack+0x71/0xac
print_address_description+0x65/0x22e
? chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
kasan_report.cold.6+0x241/0x2fd
chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod]
? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod]
? ftrace_caller_op_ptr+0xe/0xe
? __orc_find+0x6f/0xc0
? unwind_next_frame+0x231/0x850
? kthread+0x1a0/0x1c0
? ret_from_fork+0x35/0x40
? ret_from_fork+0x35/0x40
? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
? deref_stack_reg+0xd0/0xd0
? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
? is_module_text_address+0xa/0x11
? kernel_text_address+0x4c/0x110
? __save_stack_trace+0x82/0x100
? ret_from_fork+0x35/0x40
? save_stack+0x8c/0xb0
? 0xffffffffc1660000
? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod]
? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
? process_one_work+0x35c/0x640
? worker_thread+0x66/0x5d0
? kthread+0x1a0/0x1c0
? ret_from_fork+0x35/0x40
? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod]
? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod]
chap_main_loop+0x172/0x570 [iscsi_target_mod]
? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod]
? rx_data+0xd6/0x120 [iscsi_target_mod]
? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod]
? cyc2ns_read_begin.part.2+0x90/0x90
? _raw_spin_lock_irqsave+0x25/0x50
? memcmp+0x45/0x70
iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod]
? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod]
? del_timer+0xe0/0xe0
? memset+0x1f/0x40
? flush_sigqueue+0x29/0xd0
iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod]
? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod]
process_one_work+0x35c/0x640
worker_thread+0x66/0x5d0
? flush_rcu_work+0x40/0x40
kthread+0x1a0/0x1c0
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40

The buggy address belongs to the page:
page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x17fffc000000000()
raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00
ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00
>ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00
                                              ^
ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2
ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00
==================================================================

Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 755e45f3155cc51e37dc1cce9ccde10b84df7d93)

Orabug: 29778875
CVE: CVE-2018-14633

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libsas: fix a race condition when smp task timeout

When the lldd is processing the complete sas task in interrupt and set the
task stat as SAS_TASK_STATE_DONE, the smp timeout timer is able to be
triggered at the same time. And smp_task_timedout() will complete the task
wheter the SAS_TASK_STATE_DONE is set or not. Then the sas task may freed
before lldd end the interrupt process. Thus a use-after-free will happen.

Fix this by calling the complete() only when SAS_TASK_STATE_DONE is not
set. And remove the check of the return value of the del_timer(). Once the
LLDD sets DONE, it must call task->done(), which will call
smp_task_done()->complete() and the task will be completed and freed
correctly.

Reported-by: chenxiang <chenxiang66@hisilicon.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Ewan Milne <emilne@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Tomas Henzl <thenzl@redhat.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Hannes Reinecke <hare@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b90cd6f2b905905fb42671009dc0e27c310a16ae)

Orabug: 29783225
CVE: CVE-2018-20836

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: megaraid_sas: return error when create DMA pool failed

when create DMA pool for cmd frames failed, we should return -ENOMEM,
instead of 0.
In some case in:

    megasas_init_adapter_fusion()

    -->megasas_alloc_cmds()
       -->megasas_create_frame_pool
          create DMA pool failed,
        --> megasas_free_cmds() [1]

    -->megasas_alloc_cmds_fusion()
       failed, then goto fail_alloc_cmds.
    -->megasas_free_cmds() [2]

we will call megasas_free_cmds twice, [1] will kfree cmd_list,
[2] will use cmd_list.it will cause a problem:

Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd = ffffffc000f70000
[00000000] *pgd=0000001fbf893003, *pud=0000001fbf893003,
*pmd=0000001fbf894003, *pte=006000006d000707
Internal error: Oops: 96000005 [#1] SMP
Modules linked in:
CPU: 18 PID: 1 Comm: swapper/0 Not tainted
task: ffffffdfb9290000 ti: ffffffdfb923c000 task.ti: ffffffdfb923c000
PC is at megasas_free_cmds+0x30/0x70
LR is at megasas_free_cmds+0x24/0x70
...
Call trace:
[<ffffffc0005b779c>] megasas_free_cmds+0x30/0x70
[<ffffffc0005bca74>] megasas_init_adapter_fusion+0x2f4/0x4d8
[<ffffffc0005b926c>] megasas_init_fw+0x2dc/0x760
[<ffffffc0005b9ab0>] megasas_probe_one+0x3c0/0xcd8
[<ffffffc0004a5abc>] local_pci_probe+0x4c/0xb4
[<ffffffc0004a5c40>] pci_device_probe+0x11c/0x14c
[<ffffffc00053a5e4>] driver_probe_device+0x1ec/0x430
[<ffffffc00053a92c>] __driver_attach+0xa8/0xb0
[<ffffffc000538178>] bus_for_each_dev+0x74/0xc8
  [<ffffffc000539e88>] driver_attach+0x28/0x34
[<ffffffc000539a18>] bus_add_driver+0x16c/0x248
[<ffffffc00053b234>] driver_register+0x6c/0x138
[<ffffffc0004a5350>] __pci_register_driver+0x5c/0x6c
[<ffffffc000ce3868>] megasas_init+0xc0/0x1a8
[<ffffffc000082a58>] do_one_initcall+0xe8/0x1ec
[<ffffffc000ca7be8>] kernel_init_freeable+0x1c8/0x284
[<ffffffc0008d90b8>] kernel_init+0x1c/0xe4

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bcf3b67d16a4c8ffae0aa79de5853435e683945c)

Orabug: 29783254
CVE: CVE-2019-11810

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Bluetooth: hidp: fix buffer overflow

Struct ca is copied from userspace. It is not checked whether the "name"
field is NULL terminated, which allows local users to obtain potentially
sensitive information from kernel stack memory, via a HIDPCONNADD command.

This vulnerability is similar to CVE-2011-1079.

Signed-off-by: Young Xiao <YangX92@hotmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
(cherry picked from commit a1616a5ac99ede5d605047a9012481ce7ff18b16)

Orabug: 29786786
CVE: CVE-2019-11884

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation/mds: Add 'mitigations=' support for MDS

Add MDS to the new 'mitigations=' cmdline option.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 5c14068f87d04adc73ba3f41c2a303d3c3d1fa12)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
        bugs.c equivalent for UEK4 is bugs64.c
        Documentation/admin-guide/kernel-parameters.txt is
        Documentation/kernel-parameters.txt in UEK4

Orabug: 29791046
Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().

When it is to cleanup net namespace, rds_tcp_exit_net() will call
rds_tcp_kill_sock(), if t_sock is NULL, it will not call
rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
and reference 'net' which has already been freed.

In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
sock->ops->connect, but if connect() is failed, it will call
rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
failed, rds_connect_worker() will try to reconnect all the time, so
rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
connections.

Therefore, the condition !tc->t_sock is not needed if it is going to do
cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
NULL, and there is on other path to cancel cp_conn_w and free
connection. So this patch is to fix this.

rds_tcp_kill_sock():
...
if (net != c_net || !tc->t_sock)
...
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
==================================================================
BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
net/ipv4/af_inet.c:340
Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721

CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
Hardware name: linux,dummy-virt (DT)
Workqueue: krdsd rds_connect_worker
Call trace:
dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x120/0x188 lib/dump_stack.c:113
print_address_description+0x68/0x278 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x21c/0x348 mm/kasan/report.c:409
__asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
__sock_create+0x4f8/0x770 net/socket.c:1276
sock_create_kern+0x50/0x68 net/socket.c:1322
rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117

Allocated by task 687:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2705 [inline]
slab_alloc mm/slub.c:2713 [inline]
kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
kmem_cache_zalloc include/linux/slab.h:697 [inline]
net_alloc net/core/net_namespace.c:384 [inline]
copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
ksys_unshare+0x340/0x628 kernel/fork.c:2577
__do_sys_unshare kernel/fork.c:2645 [inline]
__se_sys_unshare kernel/fork.c:2643 [inline]
__arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960

Freed by task 264:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
slab_free_hook mm/slub.c:1370 [inline]
slab_free_freelist_hook mm/slub.c:1397 [inline]
slab_free mm/slub.c:2952 [inline]
kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
net_free net/core/net_namespace.c:400 [inline]
net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
net_drop_ns net/core/net_namespace.c:406 [inline]
cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117

The buggy address belongs to the object at ffff8003496a3f80
which belongs to the cache net_namespace of size 7872
The buggy address is located 1796 bytes inside of
7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
The buggy address belongs to the page:
page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
index:0x0 compound_mapcount: 0
flags: 0xffffe0000008100(slab|head)
raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb66ddd156203daefb8d71158036b27b0e2caf63)

Orabug: 29802785
CVE: CVE-2019-11815

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation/mds: Check for the right microcode before setting mitigation

With the addition of new mitigation for idle, need to check the availability
of microcode when mds=idle. The fix is to take the check out of the if
statement.

Orabug: 29797118
Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

vxlan: test dev->flags & IFF_UP before accessing vxlan->dev->dev_addr

vxlan_rcv was crashing either in eth_type_trans or
ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr)
due to NULL pointer dereference.

Same reason as the one explained in upstream commit 4179cb5a4c92
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")

Orabug: 29710939

Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()

Same reasons than the ones explained in commit 4179cb5a4c92
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")

netif_rx() or gro_cells_receive() must be called under a strict contract.

At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.

A similar protocol is used for gro_cells infrastructure, as
gro_cells_destroy() will be called only after a full rcu
grace period is observed after IFF_UP has been cleared.

Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP

Virtual drivers do not have this guarantee, and must
therefore make the check themselves.

Otherwise we risk use-after-free and/or crashes.

Orabug: 29710939

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 59cbf56fcd98ba2a715b6e97c4e43f773f956393)

Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/vxlan.c
It was applying the patch at vxlan_udp_encap_recv()
instead of vxlan_rcv().

Signed-off-by: Brian Maly <brian.maly@oracle.com>

nvme: allow timed-out ios to retry

Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(backport from upstream commit 0951338d9677f546e230685d68631dfd3f81cca5)

There are two changes in this nvme_req_needs_retry before this commit,
and there is no functional dependence on our issue. For simpler work,
we don't backport them.

Orabug: 29301607

Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: Introduce a pool of worker threads for connection management

RDS uses a single threaded work queue for connection management. This
involves creation and deletion of QPs and CQs. On certain HCAs, such
as CX-3, these operations are para-virtualized and some part of the
work has to be conducted by the Physical Function (PF) driver.

In fail-over and fail-back situations, there might be 1000s of
connections to tear down and re-establish. Hence, expand the number
work queues.

The local_wq is removed for simplicity and symmetry reasons.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
---

Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: Use rds_conn_path cp_wq when applicable

RDS has a two global work queues, one for loop-back connections and
another one for remote connections. The struct rds_conn_path has a
member cp_wq which is set to one of them. Use cp_wq consistently
instead of the global ones.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: ib: Implement proper cm_id compare

RDS attempts to establish if two rdma_cm_ids actually are the
same. This was implemented by comparing their addresses. But, an
rdma_cm_id may be destroyed and allocated again. Thus, a *new* id, but
the address could still be the same as the *old* one.

Solved by using an idr id as the cm_id's context. Added helper
functions to create and destroy rdma_cm_ids and comparing the
rdma_cm_ids.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "net/rds: prevent RDS connections using stale ARP entries"

This reverts commit 48c2d5f5e2580c9550db8ea4b433cf478925487e.

This commit is reverted for two reasons. Firstly, it doesn't fix the
problem it is supposed to fix. Secondly, it may, in some special
circumstances, create a long-lasting connection reject scenario.

As to the first reason, consider the following scenario during
fail-back. Let's say node A fails back first. It sends a DREQ. Node B
drops the IB connection. Now, both will attempt to connect and both
will perform route resolution. Since both nodes attempt to connect at
the same time, you get a race, and then the lower IP will connect. It
does so and succeeds, because both ends have done route
resolution. Traffic continue to flow. Then, node B fails back and the
connection is torn down again. This is exactly what commit
48c2d5f5e258 ("net/rds: prevent RDS connections using stale ARP
entries") said it would prevent.

As to the second reason, the following is an excerpt from the kernel
trace buffer (slightly edited for better brevity):

rds_ib_cm_handle_connect: 1033: saddr ::ffff:192.168.217.20 daddr ::ffff:192.168.216.252 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 0
rds_ib_cm_handle_connect: 1077: no route resolution saddr 0.0.0.0 daddr 0.0.0.0 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 0
rds_ib_cm_handle_connect: 1033: saddr ::ffff:192.168.216.253 daddr ::ffff:192.168.216.252 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001887599 tos 0
rds_ib_cm_handle_connect: 1077: no route resolution saddr 0.0.0.0 daddr 0.0.0.0 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001887599 tos 0
rds_ib_cm_handle_connect: 1033: saddr ::ffff:192.168.217.20 daddr ::ffff:192.168.216.252 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 2
rds_ib_cm_handle_connect: 1077: no route resolution saddr 0.0.0.0 daddr 0.0.0.0 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 2
rds_ib_cm_handle_connect: 1033: saddr ::ffff:192.168.216.253 daddr ::ffff:192.168.216.252 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001887599 tos 0
rds_ib_cm_handle_connect: 1077: no route resolution saddr 0.0.0.0 daddr 0.0.0.0 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001887599 tos 0
rds_ib_cm_handle_connect: 1033: saddr ::ffff:192.168.217.20 daddr ::ffff:192.168.216.252 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 4
rds_ib_cm_handle_connect: 1077: no route resolution saddr 0.0.0.0 daddr 0.0.0.0 RDSv4.1 lguid 0x10e00001888efa fguid 0x10e00001778a52 tos 4

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
* net/rds/ib_cm.c
* net/rds/rdma_transport.c

The nature of the conflicts were ftrace points that had been
IPv6-ified.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Dag Moxnes <dag.moxnes@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: ib: Flush ARP cache when needed

During Active/Active fail-over and fail-back, the ARP cache may
contain stale entries. Hence flush the foreign address from the ARP
cache when the following events are received:

   * address change
   * address error
   * unreachable
   * disconnect

Orabug: 29391909

Suggested-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Dag Moxnes <dag.moxnes@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: Add simple heuristics to determine connect delay

Introduce heuristics to get an estimate of how long it takes to form a
connection. We use this estimate as a factor to delay the passive
side, in case the active is unaware of the connection being down.

The delay should be as short as possible - to avoid extended
connection times in case the remote peer is unaware of the connection
being brought down. On the same time, the delay should be long enough
for the remote active peer to initiate a connect - in the case it is
aware the connection being brought down.

A sysctl variable rds_sysctl_passive_connect_delay_percent is
introduced to have the ability to tune the delay of the allegedly
passive side.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Dag Moxnes <dag.moxnes@oracle.com>
---

v1 -> v2:
* Incorporated review comments from Dag
* Split the old commit in two

Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: Fix one-sided connect

The decision to designate a peer to be the active side did not take
loopback connections into account. Further, a bug in
rds_shutdown_worker where the passive side, in case of no reconnect
racing, did not attempt to restart the connection.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Dag Moxnes <dag.moxnes@oracle.com>
---

v1 -> v2:
* Incorporated review comments from Dag
* Split the commit in two

Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: Consolidate and align ftrace related to connection management

Add some trace points, always include context in form of "conn" pointer
and <src_ip,dst_ip,tos>, and change some pr_warn/pr_debug to trace
points.

Add helper function conn_state_mnem() to display RDS connection states
symbolically.

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
---

v1 -> v2:
Incorporated review comments from Dag and Sudhakar

Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: ib: Fix gratuitous ARP storm

When the rds Active/Active bonding moves an address to another port,
it informs its peer by sending out 100 gratuitous ARPs (gARPs)
back-to-back.

The gARPs are broadcasts, so this mechanism may flood the fabric with
gARPs. These broadcasts may be dropped, and since the 100 gARPs for
one address are sent consecutive, all the gARPs for a particular
address may be lost.

Fixed by sending far less gARPs (3) and add some interval between them
(5 msec).

The module parameter rds_ib_active_bonding_arps_gap_ms has been
introduced. Both the existing rds_ib_active_bonding_arps and
rds_ib_active_bonding_arps_gap_ms are now writable.

The default values were
Suggested-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

IB/mlx4: Increase the timeout for CM cache

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when
an entry has to evicted form the cache. But life is not perfect,
remote peers may die or be rebooted. Hence, it's a timeout to wipe out
a cache entry, when the PF driver assumes the remote peer has gone.

During workloads where a high number of QPs are destroyed concurrently,
excessive amount of CM DREQ retries has been observed

The problem can be demonstrated in a bare-metal environment, where two
nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
have 16 vPorts per physical server.

64 processes are associated with each vPort and creates and destroys
one QP for each of the remote 64 processes. That is, 1024 QPs per
vPort, all in all 16K QPs. The QPs are created/destroyed using the
CM.

When tearing down these 16K QPs, excessive CM DREQ retries (and
duplicates) are observed. With some cat/paste/awk wizardry on the
infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
nodes:

cm_rx_duplicates:
      dreq  2102
cm_rx_msgs:
      drep  1989
      dreq  6195
       rep  3968
       req  4224
       rtu  4224
cm_tx_msgs:
      drep  4093
      dreq 27568
       rep  4224
       req  3968
       rtu  3968
cm_tx_retries:
      dreq 23469

Note that the active/passive side is equally distributed between the
two nodes.

Enabling pr_debug in cm.c gives tons of:

[171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
1,sl_cm_id: 0xd393089f} is NULL!

By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
tear-down phase of the application is reduced from approximately 90 to
50 seconds. Retries/duplicates are also significantly reduced:

cm_rx_duplicates:
      dreq  2460
[]
cm_tx_retries:
      dreq  3010
       req    47

Increasing the timeout further didn't help, as these duplicates and
retries stems from a too short CMA timeout, which was 20 (~4 seconds)
on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
numbers fell down to about 10 for both of them.

Adjustment of the CMA timeout is not part of this commit.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
(cherry picked from upstream commit
2612d723aadcf8281f9bf8305657129bd9f3cd57)

Orabug: 29391909

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Rosa Lopez <rosa.lopez@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

kvm/speculation: Allow KVM guests to use SSBD even if host does not

The bits set in x86_spec_ctrl_mask are used to determine the
allowed value that is written to SPEC_CTRL MSR before VMENTRY,
and controls which mitigations the guest can enable. In the
case of SSBD, unless the host has enabled SSBD always on
(which sets SSBD bit on x86_spec_ctrl_mask), the guest is
unable to use the SSBD mitigation. This was confirmed by
running the SSBD PoC and verifying that guests are always
vulnerable regardless of their own SSBD setting, unless
the host has booted with "spec_store_bypass_disable=on".

Set the SSBD bit in x86_spec_ctrl_mask when the host
CPU supports it, whether or not the host has chosen to
enable the mitigation in any of its modes.

Orabug: 29423804

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Keep enhanced IBRS on when spec_store_bypass_disable=on is used

When SSBD is unconditionally enabled using the kernel parameter
"spec_store_bypass_disable=on", enhanced IBRS is inadvertently turned
off. This happens because the SSBD initialization runs after the code
which selects enhanced IBRS as the spectre V2 mitigation and sets the
IBRS bit on the SPEC_CTRL MSR.

When "spec_store_bypass_disable=on" is used, ssb_init() calls
x86_spec_ctrl_set(SPEC_CTRL_INITIAL), which writes to the SPEC_CTRL
MSR to set the SSBD bit. The value written does not have the IBRS bit
set, since if basic IBRS is in use it will be set during the next
userspace to kernel transition. However, this is not the case for
enhanced IBRS where setting the bit once is sufficient. As a result,
enhanced IBRS remains disabled in this scenario unless manually
enabled afterwards using the sysfs knobs.

Fix the issue by using the correct value with the IBRS bit set when
the enhanced IBRS mitigation is in use.

Orabug: 29423804

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Clean up enhanced IBRS checks in bugs_64.c

There are multiple instances in bugs_64.c where the initialization
code for the various mitigations must check if enhanced IBRS is
selected as the spectre V2 mitigation. Create a short function
to do just that.

Orabug: 29423804

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

THP allocation might be really disruptive when allocated on NUMA system
with the local node full or hard to reclaim.  Stefan has posted an
allocation stall report on 4.12 based SLES kernel which suggests the
same issue:

  kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
  kvm cpuset=/ mems_allowed=0-1
  CPU: 10 PID: 84752 Comm: kvm Tainted: G        W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
  Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
  Call Trace:
   dump_stack+0x5c/0x84
   warn_alloc+0xe0/0x180
   __alloc_pages_slowpath+0x820/0xc90
   __alloc_pages_nodemask+0x1cc/0x210
   alloc_pages_vma+0x1e5/0x280
   do_huge_pmd_wp_page+0x83f/0xf00
   __handle_mm_fault+0x93d/0x1060
   handle_mm_fault+0xc6/0x1b0
   __do_page_fault+0x230/0x430
   do_page_fault+0x2a/0x70
   page_fault+0x7b/0x80
   [...]
  Mem-Info:
  active_anon:126315487 inactive_anon:1612476 isolated_anon:5
   active_file:60183 inactive_file:245285 isolated_file:0
   unevictable:15657 dirty:286 writeback:1 unstable:0
   slab_reclaimable:75543 slab_unreclaimable:2509111
   mapped:81814 shmem:31764 pagetables:370616 bounce:0
   free:32294031 free_pcp:6233 free_cma:0
  Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

The defrag mode is "madvise" and from the above report it is clear that
the THP has been allocated for MADV_HUGEPAGA vma.

Andrea has identified that the main source of the problem is
__GFP_THISNODE usage:

: The problem is that direct compaction combined with the NUMA
: __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
: hard the local node, instead of failing the allocation if there's no
: THP available in the local node.
:
: Such logic was ok until __GFP_THISNODE was added to the THP allocation
: path even with MPOL_DEFAULT.
:
: The idea behind the __GFP_THISNODE addition, is that it is better to
: provide local memory in PAGE_SIZE units than to use remote NUMA THP
: backed memory. That largely depends on the remote latency though, on
: threadrippers for example the overhead is relatively low in my
: experience.
:
: The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
: extremely slow qemu startup with vfio, if the VM is larger than the
: size of one host NUMA node. This is because it will try very hard to
: unsuccessfully swapout get_user_pages pinned pages as result of the
: __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
: allocations and instead of trying to allocate THP on other nodes (it
: would be even worse without vfio type1 GUP pins of course, except it'd
: be swapping heavily instead).

Fix this by removing __GFP_THISNODE for THP requests which are
requesting the direct reclaim.  This effectivelly reverts 5265047ac301
on the grounds that the zone/node reclaim was known to be disruptive due
to premature reclaim when there was memory free.  While it made sense at
the time for HPC workloads without NUMA awareness on rare machines, it
was ultimately harmful in the majority of cases.  The existing behaviour
is similar, if not as widespare as it applies to a corner case but
crucially, it cannot be tuned around like zone_reclaim_mode can.  The
default behaviour should always be to cause the least harm for the
common case.

If there are specialised use cases out there that want zone_reclaim_mode
in specific cases, then it can be built on top.  Longterm we should
consider a memory policy which allows for the node reclaim like behavior
for the specific memory ranges which would allow a

[1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com

Mel said:

: Both patches look correct to me but I'm responding to this one because
: it's the fix.  The change makes sense and moves further away from the
: severe stalling behaviour we used to see with both THP and zone reclaim
: mode.
:
: I put together a basic experiment with usemem configured to reference a
: buffer multiple times that is 80% the size of main memory on a 2-socket
: box with symmetric node sizes and defrag set to "always".  The defrag
: setting is not the default but it would be functionally similar to
: accessing a buffer with madvise(MADV_HUGEPAGE).  Usemem is configured to
: reference the buffer multiple times and while it's not an interesting
: workload, it would be expected to complete reasonably quickly as it fits
: within memory.  The results were;
:
: usemem
:                                   vanilla           noreclaim-v1
: Amean     Elapsd-1       42.78 (   0.00%)       26.87 (  37.18%)
: Amean     Elapsd-3       27.55 (   0.00%)        7.44 (  73.00%)
: Amean     Elapsd-4        5.72 (   0.00%)        5.69 (   0.45%)
:
: This shows the elapsed time in seconds for 1 thread, 3 threads and 4
: threads referencing buffers 80% the size of memory.  With the patches
: applied, it's 37.18% faster for the single thread and 73% faster with two
: threads.  Note that 4 threads showing little difference does not indicate
: the problem is related to thread counts.  It's simply the case that 4
: threads gets spread so their workload mostly fits in one node.
:
: The overall view from /proc/vmstats is more startling
:
:                          4.19.0-rc1  4.19.0-rc1
:                             vanillanoreclaim-v1r1
: Minor Faults               35593425      708164
: Major Faults                 484088          36
: Swap Ins                    3772837           0
: Swap Outs                   3932295           0
:
: Massive amounts of swap in/out without the patch
:
: Direct pages scanned        6013214           0
: Kswapd pages scanned              0           0
: Kswapd pages reclaimed            0           0
: Direct pages reclaimed      4033009           0
:
: Lots of reclaim activity without the patch
:
: Kswapd efficiency              100%        100%
: Kswapd velocity               0.000       0.000
: Direct efficiency               67%        100%
: Direct velocity           11191.956       0.000
:
: Mostly from direct reclaim context as you'd expect without the patch.
:
: Page writes by reclaim  3932314.000       0.000
: Page writes file                 19           0
: Page writes anon            3932295           0
: Page reclaim immediate        42336           0
:
: Writes from reclaim context is never good but the patch eliminates it.
:
: We should never have default behaviour to thrash the system for such a
: basic workload.  If zone reclaim mode behaviour is ever desired but on a
: single task instead of a global basis then the sensible option is to build
: a mempolicy that enforces that behaviour.

This was a severe regression compared to previous kernels that made
important workloads unusable and it starts when __GFP_THISNODE was
added to THP allocations under MADV_HUGEPAGE.  It is not a significant
risk to go to the previous behavior before __GFP_THISNODE was added, it
worked like that for years.

This was simply an optimization to some lucky workloads that can fit in
a single node, but it ended up breaking the VM for others that can't
possibly fit in a single node, so going back is safe.

Orabug: 29510356

[mhocko@suse.com: rewrote the changelog based on the one from Andrea]
Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
Fixes: 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Debugged-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ac5b2c18911ffe95c08d69273917f90212cf5659)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
mm/mempolicy.c
[__GFP_DIRECT_RECLAIM does not exist in UEK4, use __GFP_WAIT]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Reset device on RX buffer errors.

If the RX completion indicates RX buffers errors, the RX ring will be
disabled by firmware and no packets will be received on that ring from
that point on. Recover by resetting the device.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8e44e96c6c8e8fb80b84a2ca11798a8554f710f2)

Orabug: 29651238

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/mitigations: Fix the test for Xen PV guest

Commit 6af1c37c19ea ("x86/pti: Don't report XenPV as vulnerable")
looks at current hypervisor to determine whether we are running as a Xen
PV guest. This is incorrect since the test will be true for HVM guests
as well.

Instead we should see if we are xen_pv_domain().

(Using Xen-specific primitives in this file is not ideal. This is not
for upstream though so we are going to have to live with this)

Orabug: 29774291

Fixes: 6af1c37c19ea ("x86/pti: Don't report XenPV as vulnerable")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation/mds: Fix verw usage to use memory operand

verw instruction needs to be called with a memory operand instead
of the register operand to correctly flush the buffers affected by
MDS. The buffer overwriting occurs regards less of permission check
as well as the null selector.

Orabug: 29791036
CVE: CVE-2018-12127
CVE: CVE-2018-12130

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: sanitize E_D_TOV and R_A_TOV setting

When setting the FCP timeout we need to ensure a lower boundary
for E_D_TOV and R_A_TOV, otherwise we'd be getting spurious I/O
issues due to the fcp timer firing too early.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: use configured rport E_D_TOV

If fc_rport_error_retry() is attempting to retry the remote
port state we should be waiting for the configured e_d_tov
value rather than the default.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: additional debugging messages

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: don't advance state machine for incoming FLOGI

When we receive an FLOGI but have already sent our own we should
not advance the state machine but rather wait for our FLOGI to
return before continuing with PLOGI.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Do not login if the port is already started

When the port is already started we don't need to login; that
will only confuse the state machine.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Do not drop down to FLOGI for fc_rport_login()

When fc_rport_login() is called while the rport is not
in RPORT_ST_INIT, RPORT_ST_READY, or RPORT_ST_DELETE
login is already in progress and there's no need to
drop down to FLOGI; doing so will only confuse the
other side.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Do not take rdata->rp_mutex when processing a -FC_EX_CLOSED ELS response.

When an ELS response handler receives a -FC_EX_CLOSED, the rdata->rp_mutex is
already held which can lead to a deadlock condition like the following stack trace:

[<ffffffffa04d8f18>] fc_rport_plogi_resp+0x28/0x200 [libfc]
[<ffffffffa04cfa1a>] fc_invoke_resp+0x6a/0xe0 [libfc]
[<ffffffffa04d0c08>] fc_exch_mgr_reset+0x1b8/0x280 [libfc]
[<ffffffffa04d87b3>] fc_rport_logoff+0x43/0xd0 [libfc]
[<ffffffffa04ce73d>] fc_disc_stop+0x6d/0xf0 [libfc]
[<ffffffffa04ce7ce>] fc_disc_stop_final+0xe/0x20 [libfc]
[<ffffffffa04d55f7>] fc_fabric_logoff+0x17/0x70 [libfc]

The other ELS handlers need to follow the FLOGI response handler and simply do
a kref_put against the fc_rport_priv struct and exit when receving a
-FC_EX_CLOSED response.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: libfc: Fixup disc_mutex handling

The list of attached 'rdata' remote port structures is RCU
protected, so there is no need to take the 'disc_mutex' when
traversing it.
Rather we should be using rcu_read_lock() and kref_get_unless_zero()
to validate the entries.
We need, however, take the disc_mutex when deleting an entry;
otherwise we risk clashes with list_add.

Orabug: 25933179

Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xve: arm ud tx cq to generate completion interrupts

IPoIB polls for UD send cq for every 16th post_send() request to reduce
interrupt count; and it does not arm UD send cq (16 is controlled by
MAX_SEND_CQE variable)
XVE has followed IPoIB methodology in terms of handling UD send cq;
however, it missed to poll send cq after certain number of iterations.
This makes freeing of resources related to work request unreliable since
completion arrival is not controlled. This caused problem for live
migration; since initial UDP and ICMP skbs which are using UD work
requests are not getting freed. And, xenwatch process is getting stuck
on waiting for these skbs to be freed.

This patch does following:
- arm send cq at initialization. This will generate interrupt for
initial ud send requests.
- Once polling of send cq is completed, arm send cq again to generate
interrupt whenever next cqe arrives.

I'm going back to interrupt mechanism, since UD workload for xve is
extremely limited. And, I don't expect to generate interrupt flood here.
And, I don't want to miss out on freeing of skb (for example, if
scenario ends up as, only 10 post_send() are attempted for UD QP; and
after that, we try to live migrate that VM, we may miss completion if
our logic is, poll CQ at every 16th post_send() iteration)

Orabug: 28267050

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: sched: run ingress qdisc without locks

TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.

Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.

In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 087c1a601ad7f851a2d31f5fa0e5e9dfc766df55)
Orabug: 29395374
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Laurence Rochfort <laurence.rochfort@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix typo in firmware message timeout logic.

Orabug: 29412112

The logic that polls for the firmware message response uses a shorter
sleep interval for the first few passes.  But there was a typo so it
was using the wrong counter (larger counter) for these short sleep
passes.  The result is a slightly shorter timeout period for these
firmware messages than intended.  Fix it by using the proper counter.

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>

bnxt_en: Wait longer for the firmware message response to complete.

Orabug: 29412112

The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>

mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.

commit bb422a738f6566f7439cd347d54e321e4fe92a9f upstream.

Syzbot caught an oops at unregister_shrinker() because combination of
commit 1d3d4437eae1bb29 ("vmscan: per-node deferred work") and fault
injection made register_shrinker() fail and the caller of
register_shrinker() did not check for failure.

----------
[  554.881422] FAULT_INJECTION: forcing a failure.
[  554.881422] name failslab, interval 1, probability 0, space 0, times 0
[  554.881438] CPU: 1 PID: 13231 Comm: syz-executor1 Not tainted 4.14.0-rc8+ #82
[  554.881443] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  554.881445] Call Trace:
[  554.881459]  dump_stack+0x194/0x257
[  554.881474]  ? arch_local_irq_restore+0x53/0x53
[  554.881486]  ? find_held_lock+0x35/0x1d0
[  554.881507]  should_fail+0x8c0/0xa40
[  554.881522]  ? fault_create_debugfs_attr+0x1f0/0x1f0
[  554.881537]  ? check_noncircular+0x20/0x20
[  554.881546]  ? find_next_zero_bit+0x2c/0x40
[  554.881560]  ? ida_get_new_above+0x421/0x9d0
[  554.881577]  ? find_held_lock+0x35/0x1d0
[  554.881594]  ? __lock_is_held+0xb6/0x140
[  554.881628]  ? check_same_owner+0x320/0x320
[  554.881634]  ? lock_downgrade+0x990/0x990
[  554.881649]  ? find_held_lock+0x35/0x1d0
[  554.881672]  should_failslab+0xec/0x120
[  554.881684]  __kmalloc+0x63/0x760
[  554.881692]  ? lock_downgrade+0x990/0x990
[  554.881712]  ? register_shrinker+0x10e/0x2d0
[  554.881721]  ? trace_event_raw_event_module_request+0x320/0x320
[  554.881737]  register_shrinker+0x10e/0x2d0
[  554.881747]  ? prepare_kswapd_sleep+0x1f0/0x1f0
[  554.881755]  ? _down_write_nest_lock+0x120/0x120
[  554.881765]  ? memcpy+0x45/0x50
[  554.881785]  sget_userns+0xbcd/0xe20
(...snipped...)
[  554.898693] kasan: CONFIG_KASAN_INLINE enabled
[  554.898724] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  554.898732] general protection fault: 0000 [#1] SMP KASAN
[  554.898737] Dumping ftrace buffer:
[  554.898741]    (ftrace buffer empty)
[  554.898743] Modules linked in:
[  554.898752] CPU: 1 PID: 13231 Comm: syz-executor1 Not tainted 4.14.0-rc8+ #82
[  554.898755] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  554.898760] task: ffff8801d1dbe5c0 task.stack: ffff8801c9e38000
[  554.898772] RIP: 0010:__list_del_entry_valid+0x7e/0x150
[  554.898775] RSP: 0018:ffff8801c9e3f108 EFLAGS: 00010246
[  554.898780] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  554.898784] RDX: 0000000000000000 RSI: ffff8801c53c6f98 RDI: ffff8801c53c6fa0
[  554.898788] RBP: ffff8801c9e3f120 R08: 1ffff100393c7d55 R09: 0000000000000004
[  554.898791] R10: ffff8801c9e3ef70 R11: 0000000000000000 R12: 0000000000000000
[  554.898795] R13: dffffc0000000000 R14: 1ffff100393c7e45 R15: ffff8801c53c6f98
[  554.898800] FS:  0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
[  554.898804] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  554.898807] CR2: 00000000dbc23000 CR3: 00000001c7269000 CR4: 00000000001406e0
[  554.898813] DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
[  554.898816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  554.898818] Call Trace:
[  554.898828]  unregister_shrinker+0x79/0x300
[  554.898837]  ? perf_trace_mm_vmscan_writepage+0x750/0x750
[  554.898844]  ? down_write+0x87/0x120
[  554.898851]  ? deactivate_super+0x139/0x1b0
[  554.898857]  ? down_read+0x150/0x150
[  554.898864]  ? check_same_owner+0x320/0x320
[  554.898875]  deactivate_locked_super+0x64/0xd0
[  554.898883]  deactivate_super+0x141/0x1b0
----------

Since allowing register_shrinker() callers to call unregister_shrinker()
when register_shrinker() failed can simplify error recovery path, this
patch makes unregister_shrinker() no-op when register_shrinker() failed.
Also, reset shrinker->nr_deferred in case unregister_shrinker() was
by error called twice.

Orabug: 29456281

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Glauber Costa <glauber@scylladb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7880fc541566166d140954825fc83c826534e622)

Signed-off-by: John Sobecki <john.sobecki@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Handle midnight alternative notation in GeneralizedTime

The ASN.1 GeneralizedTime object carries an ISO 8601 format date and time.
The time is permitted to show midnight as 00:00 or 24:00 (the latter being
equivalent of 00:00 of the following day).

The permitted value is checked in x509_decode_time() but the actual
handling is left to mktime64().

Without this patch, certain X.509 certificates will be rejected and could
lead to an unbootable kernel.

Note that with this patch we also permit any 24:mm:ss time and extend this
to UTCTime, which whilst not strictly correct don't permit much leeway in
fiddling date strings.

Reported-by: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: David Woodhouse <David.Woodhouse@intel.com>
cc: John Stultz <john.stultz@linaro.org>

Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit 7650cb80e4e90b0fae7854b6008a46d24360515f)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Support leap seconds

The format of ASN.1 GeneralizedTime seems to be specified by ISO 8601
[X.680 46.3] and this apparently supports leap seconds (ie. the seconds
field is 60). It's not entirely clear that ASN.1 expects it, but we can
relax the seconds check slightly for GeneralizedTime.

This results in us passing a time with sec as 60 to mktime64(), which
handles it as being a duplicate of the 0th second of the next minute.

We can't really do otherwise without giving the kernel much greater
knowledge of where all the leap seconds are. Unfortunately, this would
require change the mapping of the kernel's current-time-in-seconds.

UTCTime, however, only supports a seconds value in the range 00-59, but for
the sake of simplicity allow this with UTCTime also.

Without this patch, certain X.509 certificates will be rejected,
potentially making a kernel unbootable.

Reported-by: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: David Woodhouse <David.Woodhouse@intel.com>
cc: John Stultz <john.stultz@linaro.org>

Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit da02559c9f864c8d62f524c1e0b64173711a16ab)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

X.509: Fix the time validation [ver #2]

This fixes CVE-2015-5327. It affects kernels from 4.3-rc1 onwards.

Fix the X.509 time validation to use month number-1 when looking up the
number of days in that month. Also put the month number validation before
doing the lookup so as not to risk overrunning the array.

This can be tested by doing the following:

cat <<EOF | openssl x509 -outform DER | keyctl padd asymmetric "" @s
-----BEGIN CERTIFICATE-----
MIIDbjCCAlagAwIBAgIJAN/lUld+VR4hMA0GCSqGSIb3DQEBCwUAMCkxETAPBgNV
BAoMCGxvY2FsLWNhMRQwEgYDVQQDDAtzaWduaW5nIGtleTAeFw0xNTA5MDEyMTMw
MThaFw0xNjA4MzEyMTMwMThaMCkxETAPBgNVBAoMCGxvY2FsLWNhMRQwEgYDVQQD
DAtzaWduaW5nIGtleTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANrn
crcMfMeG67nagX4+m02Xk9rkmsMKI5XTUxbikROe7GSUVJ27sPVPZp4mgzoWlvhh
jfK8CC/qhEhwep8Pgg4EJZyWOjhZb7R97ckGvLIoUC6IO3FC2ZnR7WtmWDgo2Jcj
VlXwJdHhKU1VZwulh81O61N8IBKqz2r/kDhIWiicUCUkI/Do/RMRfKAoDBcSh86m
gOeIAGfq62vbiZhVsX5dOE8Oo2TK5weAvwUIOR7OuGBl5AqwFlPnXQolewiHzKry
THg9e44HfzG4Mi6wUvcJxVaQT1h5SrKD779Z5+8+wf1JLaooetcEUArvWyuxCU59
qxA4lsTjBwl4cmEki+cCAwEAAaOBmDCBlTAMBgNVHRMEBTADAQH/MAsGA1UdDwQE
AwIHgDAdBgNVHQ4EFgQUyND/eKUis7ep/hXMJ8iZMdUhI+IwWQYDVR0jBFIwUIAU
yND/eKUis7ep/hXMJ8iZMdUhI+KhLaQrMCkxETAPBgNVBAoMCGxvY2FsLWNhMRQw
EgYDVQQDDAtzaWduaW5nIGtleYIJAN/lUld+VR4hMA0GCSqGSIb3DQEBCwUAA4IB
AQAMqm1N1yD5pimUELLhT5eO2lRdGUfTozljRxc7e2QT3RLk2TtGhg65JFFN6eml
XS58AEPVcAsSLDlR6WpOpOLB2giM0+fV/eYFHHmh22yqTJl4YgkdUwyzPdCHNOZL
hmSKeY9xliHb6PNrNWWtZwhYYvRaO2DX4GXOMR0Oa2O4vaYu6/qGlZOZv3U6qZLY
wwHEJSrqeBDyMuwN+eANHpoSpiBzD77S4e+7hUDJnql4j6xzJ65+nWJ89fCrQypR
4sN5R3aGeIh3QAQUIKpHilwek0CtEaYERgc5m+jGyKSc1rezJW62hWRTaitOc+d5
G5hh+9YpnYcxQHEKnZ7rFNKJ
-----END CERTIFICATE-----
EOF

If it works, it emit a key ID; if it fails, it should give a bad message
error.

Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Orabug: 29460344
CVE: CVE-2015-5327
(cherry picked from commit cc25b994acfbc901429da682d0f73c190e960206)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Conflict:

crypto/asymmetric_keys/x509_cert_parser.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: enable new Kconfig items in kernel configs

Orabug: 29475071

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

benet: remove broken and unused macro

Orabug: 29475071

is_broadcast_packet() expands to compare_ether_addr() which doesn't
exist since commit 7367d0b573d1 ("drivers/net: Convert uses of
compare_ether_addr to ether_addr_equal"). It turns out it's actually not
used.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: don't flip hw_features when VXLANs are added/deleted

Orabug: 29475071

the be2net implementation of .ndo_tunnel_{add,del}() changes the value of
NETIF_F_GSO_UDP_TUNNEL bit in 'features' and 'hw_features', but it forgets
to call netdev_features_change(). Moreover, ethtool setting for that bit
can potentially be reverted after a tunnel is added or removed.

GSO already does software segmentation when 'hw_enc_features' is 0, even
if VXLAN offload is turned on. In addition, commit 096de2f83ebc ("benet:
stricter vxlan offloading check in be_features_check") avoids hardware
segmentation of non-VXLAN tunneled packets, or VXLAN packets having wrong
destination port. So, it's safe to avoid flipping the above feature on
addition/deletion of VXLAN tunnels.

Fixes: 630f4b70567f ("be2net: Export tunnel offloads only when a VxLAN tunnel is created")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Fix memory leak in be_cmd_get_profile_config()

Orabug: 29475071

DMA allocated memory is lost in be_cmd_get_profile_config() when we
call it with non-NULL port_res parameter.

Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Use Kconfig flag to support for enabling/disabling adapters

Orabug: 29475071

Add flags to enable/disable supported chips in be2net.

With disable support are removed coresponding PCI IDs and
also codepaths with [BE2|BE3|BEx|lancer|skyhawk]_chip checks.

Disable chip will reduce module size by:
BE2 ~2kb
BE3 ~3kb
Lancer ~10kb
Skyhawk ~9kb

When enable skyhawk only it will reduce module size by ~20kb

New help style in Kconfig

Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Mark expected switch fall-through

Orabug: 29475071

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114787 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: fix spelling mistake "seqence" -> "sequence"

Orabug: 29475071

Trivial fix to spelling mistake in dev_info message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Update the driver version to 12.0.0.0

Orabug: 29475071

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: gather debug info and reset adapter (only for Lancer) on a tx-timeout

Orabug: 29475071

This patch handles a TX-timeout as follows:

1) This patch gathers and prints the following info that can
   help in diagnosing the cause of a TX-timeout.
   a) TX queue and completion queue entries.
   b) SKB and TCP/UDP header details.

2) For Lancer NICs (TX-timeout recovery is not supported for
   BE3/Skyhawk-R NICs), it recovers from the TX timeout as follows:

   a) On a TX-timeout, driver sets the PHYSDEV_CONTROL_FW_RESET_MASK
      bit in the PHYSDEV_CONTROL register. Lancer firmware goes into
      an error state and indicates this back to the driver via a bit
      in a doorbell register.
   b) Driver detects this and calls be_err_recover(). DMA is disabled,
      all pending TX skbs are unmapped and freed (be_close()). All rings
      are destroyed (be_clear()).
   c) The driver waits for the FW to re-initialize and re-creates all
      rings along with other data structs (be_resume())

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: move rss_flags field in rss_info to ensure proper alignment

Orabug: 29475071

The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.

Before patch:
struct rss_info {
        u64                        rss_flags;            /*     0     8 */
        u8                         rsstable[128];        /*     8   128 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        u8                         rss_queue[128];       /*   136   128 */
        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        u8                         rss_hkey[40];         /*   264    40 */
};

After patch:
struct rss_info {
        u8                         rsstable[128];        /*     0   128 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        u8                         rss_queue[128];       /*   128   128 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        u8                         rss_hkey[40];         /*   256    40 */
        u64                        rss_flags;            /*   296     8 */
};

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: re-order fields in be_error_recovert to avoid hole

Orabug: 29475071

- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union

These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused tx_jiffies field from be_tx_stats

Orabug: 29475071

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: move txcp field in be_tx_obj to eliminate holes in the struct

Orabug: 29475071

Before patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct be_tx_compl_info    txcp;                 /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct sk_buff *           sent_skb_list[2048];  /*   128 16384 */
        ...
}:

After patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */
        struct be_tx_compl_info    txcp;                 /*     4     4 */
        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct sk_buff *           sent_skb_list[2048];  /*   120 16384 */
        ...
};

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: reorder fields in be_eq_obj structure

Orabug: 29475071

Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused old custom busy-poll fields

Orabug: 29475071

The commit fb6113e688e0 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.

Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: remove unused old AIC info

Orabug: 29475071

The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.

Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

be2net: Fix error detection logic for BE3

Orabug: 29475071

Check for 0xE00 (RECOVERABLE_ERR) along with ARMFW UE (0x0)
in be_detect_error() to know whether the error is valid error or not

Fixes: 673c96e5a ("be2net: Fix UE detection logic for BE3")
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: sd: Do not override max_sectors_kb sysfs setting

A user may lower the max_sectors_kb setting in sysfs to accommodate
certain workloads. Previously we would always set the max I/O size to
either the block layer default or the optional preferred I/O size
reported by the device.

Keep the current heuristics for the initial setting of max_sectors_kb.
For subsequent invocations, only update the current queue limit if it
exceeds the capabilities of the hardware.

Orabug: 29596510

Cc: <stable@vger.kernel.org>
Reported-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 77082ca503bed061f7fbda7cfd7c93beda967a41)

Signed-off-by: John Sobecki <john.sobecki@oracle.com>
Tested-by: Dustin Samko <Dustin.Samko@gm.com>
Reviewed-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

USB: serial: io_ti: fix div-by-zero in set_termios

[ Upstream commit 6aeb75e6adfaed16e58780309613a578fe1ee90b ]

Fix a division-by-zero in set_termios when debugging is enabled and a
high-enough speed has been requested so that the divisor value becomes
zero.

Instead of just fixing the offending debug statement, cap the baud rate
at the base as a zero divisor value also appears to crash the firmware.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org> # 2.6.12
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Orabug: 29487834
CVE: CVE-2017-18360
(cherry picked from commit 2cd394cd10465fc0878958ba99e6080ac8ead559)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Drop oversize TX packets to prevent errors.

Orabug: 29516462

There have been reports of oversize UDP packets being sent to the
driver to be transmitted, causing error conditions.  The issue is
likely caused by the dst of the SKB switching between 'lo' with
64K MTU and the hardware device with a smaller MTU.  Patches are
being proposed by Mahesh Bandewar <maheshb@google.com> to fix the
issue.

In the meantime, add a quick length check in the driver to prevent
the error.  The driver uses the TX packet size as index to look up an
array to setup the TX BD.  The array is large enough to support all MTU
sizes supported by the driver.  The oversize TX packet causes the
driver to index beyond the array and put garbage values into the
TX BD.  Add a simple check to prevent this.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2b3c6885386020b1b9d92d45e8349637e27d1f66)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Read per-cpu value of x86_spec_ctrl_priv in x86_virt_spec_ctrl()

In x86_virt_spec_ctrl(), when IBRS is in use on the host, the baseline to
restore the host SPEC_CTRL must be taken from the privileged value which
has the IBRS bit set. In addition, it must be read from the per cpu variable
(x86_spec_ctrl_priv_cpu) that holds the SPEC_CTRL MSR for the current cpu.

Currently, this line:

hostval = this_cpu_read(x86_spec_ctrl_priv);

incorrectly uses the global x86_spec_ctrl_priv instead of the correct
per-cpu variable x86_spec_ctrl_priv_cpu, which assigns spurious values
to hostval.
Fix this issue by reading the correct per-cpu value instead of the
global.

Orabug: 29526401

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Keep enhanced IBRS on when prctl is used for SSBD control

When using the prctl system call to enable/disable SSBD mitigation for a
specific thread, it is necessary to update the SPEC_CTRL MSR on the CPU
running it. The value used as the base for the msr that will be written
is x86_spec_ctrl_base, which does not have the IBRS bit set. The relevant
SSBD bits are OR'd to this value before it is written to the MSR, but
the IBRS bit will remain unset.
As a result, the thread that requested the SSBD protection will run without
IBRS enabled, and when it is context switched out, IBRS will not be turned
back on again. This is not a problem in processors that use basic IBRS since
the bit is constantly toggled on kernel entry, but with enhanced IBRS this
is not necessary and therefore the bit remains unset.

Fix it by adding a check to detect when enhanced IBRS is in use, and add
the bit to the msr value that will be used as the baseline.

Orabug: 29526401

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data

The function hso_probe reads if_num from the USB device (as an u8) and uses
it without a length check to index an array, resulting in an OOB memory read
in hso_probe or hso_get_config_data.

Add a length check for both locations and updated hso_probe to bail on
error.

This issue has been assigned CVE-2018-19985.

Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5146f95df782b0ac61abde36567e718692725c89)

Orabug: 29605982
CVE: CVE-2018-19985

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: save io_tlb_used to local variable before leaving critical section

When swiotlb is full, the kernel would print io_tlb_used. However, the
result might be inaccurate at that time because we have left the critical
section protected by spinlock.

Therefore, we backup the io_tlb_used into local variable before leaving
critical section.

Fixes: 83ca25948940 ("swiotlb: dump used and total slots when swiotlb buffer is full")
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29637525

(cherry picked from commit 53b29c336830db48ad3dc737f88b8c065b1f0851)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
kernel/dma/swiotlb.c does not exist.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-By: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

swiotlb: dump used and total slots when swiotlb buffer is full

So far the kernel only prints the requested size if swiotlb buffer if full.
It is not possible to know whether it is simply an out of buffer, or it is
because swiotlb cannot allocate buffer with the requested size due to
fragmentation.

As 'io_tlb_used' is available since commit 71602fe6d4e9 ("swiotlb: add
debugfs to track swiotlb buffer usage"), both 'io_tlb_used' and
'io_tlb_nslabs' are printed when swiotlb buffer is full.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 29637525

(cherry picked from commit 83ca259489409a1fe8a83dad83a82f32174d4f31)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
kernel/dma/swiotlb.c does not exist.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-By: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs, kvm: don't miss SSBD when IBRS is in use.

When IBRS is in use, we unconditionnaly need to write to MSR_IA32_SPEC_CTRL
(it acts as a barrier) but we were failing to take into account the SSBD
state from the thread info flags, potentially disabling SSBD on the host on
tasks that needs it after a vmexit.

Orabug: 29642113

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

cifs: Fix use after free of a mid_q_entry

With protocol version 2.0 mounts we have seen crashes with corrupt mid
entries. Either the server->pending_mid_q list becomes corrupt with a
cyclic reference in one element or a mid object fetched by the
demultiplexer thread becomes overwritten during use.

Code review identified a race between the demultiplexer thread and the
request issuing thread. The demultiplexer thread seems to be written
with the assumption that it is the sole user of the mid object until
it calls the mid callback which either wakes the issuer task or
deletes the mid.

This assumption is not true because the issuer task can be woken up
earlier by a signal. If the demultiplexer thread has proceeded as far
as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
thread will happily end up calling cifs_delete_mid while the
demultiplexer thread still is using the mid object.

Inserting a delay in the cifs demultiplexer thread widens the race
window and makes reproduction of the race very easy:

if (server->large_buf)
buf = server->bigbuf;

+ usleep_range(500, 4000);

server->lstrp = jiffies;

To resolve this I think the proper solution involves putting a
reference count on the mid object. This patch makes sure that the
demultiplexer thread holds a reference until it has finished
processing the transaction.

Cc: stable@vger.kernel.org
Signed-off-by: Lars Persson <larper@axis.com>
Acked-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
(cherry picked from commit 696e420bb2a6624478105651d5368d45b502b324)

Orabug: 29654888

Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:

fs/cifs/connect.c
fs/cifs/smb2ops.c
fs/cifs/smb2transport.c
fs/cifs/transport.c
[
connect.c: contextual has changed.
smb2ops.c: hdr->Command changed to shdr->Command in the third line.
^
smb2transport.c: contextual has changed, the codes are the a statement
block of else, but this statement block has been moved outside.
transport.c: contextual has changed, the codes are the a statement
block of else, but this statement block has been moved outside.
]

Signed-off-by: Brian Maly <brian.maly@oracle.com>

binfmt_elf: switch to new creds when switching to new mm

We used to delay switching to the new credentials until after we had
mapped the executable (and possible elf interpreter). That was kind of
odd to begin with, since the new executable will actually then _run_
with the new creds, but whatever.

The bigger problem was that we also want to make sure that we turn off
prof events and tracing before we start mapping the new executable
state. So while this is a cleanup, it's also a fix for a possible
information leak.

Reported-by: Robert Święcki <robert@swiecki.net>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 9f834ec18defc369d73ccf9e87a2790bfa05bf46)

Orabug: 29677233
CVE: CVE-2019-11190

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/microcode: Don't return error if microcode update is not needed

Commit 347b54683 ("x86/microcode: Synchronize late microcode loading")
incorrectly returns -EINVAL error on all request_microcode_fw() failures
in reload_store(). In fact, when update is not needed or if there is no
microcode to load we don't need to treat this as an error.

Orabug: 29759756

Fixes: 347b54683 ("x86/microcode: Synchronize late microcode loading")
Reported-by: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Jamie Iles <jamie.iles@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/mds: Add empty commit for CVE-2019-11091

The fixes for MDS also cover this CVE which states to be:
"Microarchitectural Data SamplingUncacheable Memory(MDSUM): Uncacheable
memory on some microprocessors utilizing speculative execution may allow
an authenticated user to potentially enable information disclosure
via a side channel with local access"

Orabug: 29721935
CVE: CVE-2019-11091
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>

x86/microcode: Add loader version file in debugfs

We want to be able to find out whether late microcode loader is using a
"safe" method where the system is in stop_machines() --- i.e. all cores
are pinned in kernel with interrupts disabled. This is especially
important for core siblings --- if one thread is loading microcode while
the other is executing instructions that are being patched then bad
things may happen, including MCEs.

Presense of this file indicates that we are all good. We will also
provide version value of "1".

[root@ca-virt1-0 ~]# cat /sys/kernel/debug/x86/microcode_loader_version
1
[root@ca-virt1-0 ~]#

Orabug: 29754165

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/microcode: Fix CPU synchronization routine

Emanuel reported an issue with a hang during microcode update because my
dumb idea to use one atomic synchronization variable for both rendezvous
- before and after update - was simply bollocks:

  microcode: microcode_reload_late: late_cpus: 4
  microcode: __reload_late: cpu 2 entered
  microcode: __reload_late: cpu 1 entered
  microcode: __reload_late: cpu 3 entered
  microcode: __reload_late: cpu 0 entered
  microcode: __reload_late: cpu 1 left
  microcode: Timeout while waiting for CPUs rendezvous, remaining: 1

CPU1 above would finish, leave and the others will still spin waiting for
it to join.

So do two synchronization atomics instead, which makes the code a lot more
straightforward.

Also, since the update is serialized and it also takes quite some time per
microcode engine, increase the exit timeout by the number of CPUs on the
system.

That's ok because the moment all CPUs are done, that timeout will be cut
short.

Furthermore, panic when some of the CPUs timeout when returning from a
microcode update: we can't allow a system with not all cores updated.

Also, as an optimization, do not do the exit sync if microcode wasn't
updated.

Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de
(cherry picked from commit bb8c13d61a629276a162c1d2b1a20a815cbcfbb7)

Orabug: 29754165

Conflicts:
Context

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/microcode: Synchronize late microcode loading

Original idea by Ashok, completely rewritten by Borislav.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

[ Borislav: Rewrite completely. ]

Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
(cherry picked from commit a5321aec6412b20b5ad15db2d6b916c05349dbff)

Orabug: 29754165

Conflicts --- quite a few. Notable ones:
* We don't have microcode cache and so call request_microcode_fw() for
  each CPU
* No need to get/put_online_cpus() --- they are part of stop_machine()
* No stop_machine_cpuslocked() in uek4 but uek4's version of
  stop_machine() prevents CPU hotplug.
* uek4's has fewer result codes for microcode operations and thus error
  handling is slightly different.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/speculation: Support 'mitigations=' cmdline option

commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
(cherry picked from commit aaa95f2f1112dd4ec31ae13c4cf877dc7c7fcbc8)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/x86/kernel/cpu/bugs.c
arch/x86/mm/pti.c
Documentation/admin-guide/kernel-parameters.txt: different location
arch/x86/kernel/cpu/bugs.c: different name (bugs_64.c). Also we have different logic in nospectre_v2.
arch/x86/mm/pti.c: different location for mitigation arch/x86/mm/kaiser.c.

cpu/speculation: Add 'mitigations=' cmdline option

commit 98af8452945c55652de68536afdde3b520fec429 upstream

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
(cherry picked from commit 6cbbaa933b325234d6ffc93836d6b7c06dea7a56)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
Different location: Documentation/kernel-parameters.txt

x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off

commit e2c3c94788b08891dcf3dbe608f9880523ecd71b upstream

This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.

For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.

However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:

$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated

But rather:
Vulnerable; SMT vulnerable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomain
(cherry picked from commit 31203de125c7e160bcb42927a5db0bf01de98f6a)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bug_64.c in UEK4

x86/speculation/mds: Fix comment

commit cae5ec342645746d617dd420d206e1588d47768a upstream

s/L1TF/MDS/

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
(cherry picked from commit 9a5f1b636984bcf0a8495208e0fc9f30e2ec8359)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bugs_64.c on UEK4

x86/speculation/mds: update mds_mitigation to reflect debugfs configuration

If we enable mds_user_clear, we set mds_mitigation to MDS_MITIGATION_FULL or
MDS_MITIGATION_VMWERV. When we disable mds_user_clear, we set mds_mitigation to
MDS_MITIGATION_IDLE if mds_idle_clear, otherwise MDS_MITIGATION_OFF. If we
enable mds_idle_clear, we set mds_mitigation to MDS_MITIGATION_IDLE only if
mds_user_clear is disabled. When we disable mds_idle_clear, we set
mds_mitigation to MDS_MITIGATION_OFF if mds_user_clear is disabled.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: fix microcode late loading

In the microcode late loading case we have to:
- clear the CPU bugs related to MDS to be re-evaluated
- add proper evaluation of the MDS state and enable mitigation if necessary.

If the user has enforced off or idle mitigation, we keep it. Also if the
microcode fixes the MDS bug, mitigation will be turned off.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add boot option to enable MDS protection only while in idle

Such protection may be useful when we only care about virtualization and
no core sharing between guests is allowed.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>

x86/speculation/mds: Improve coverage for MDS vulnerability

We seem to be missing a bunch of cases when we don't clear fill/store
buffers for MDS vulnerability during return to userspace.

Since we always call DISABLE_IBRS in those cases let's define a new
macro SPEC_RETURN_TO_USER than will both disable IBRS and flush the
buffers.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>

x86/speculation/mds: Add SMT warning message

commit 39226ef02bfb43248b7db12a4fdccb39d95318e3 upstream

MDS is vulnerable with SMT. Make that clear with a one-time printk
whenever SMT first gets enabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 2dad2dcf7f7eab0e6d10d560e22fddd0a08b37b3)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
bugs.c vs bugs_64.c: we do not have arch_smt_update. Squash the check in mds_select_mitigation.

x86/speculation/mds: Add mds=full,nosmt cmdline option

commit d71eb0ce109a124b0fa714832823b9452f2762cf upstream

Add the mds=full,nosmt cmdline option. This is like mds=full, but with
SMT disabled if the CPU is vulnerable.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 623b724d5e50c15d160799446956ba0d23d4f978)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
arch/x86/kernel/cpu/bugs.c
bugs.64 vs bugs_64.c: different boot command line parsing code
Documentation/admin-guide/kernel-parameters.txt vs Documentation/kernel-parameters.txt

Documentation: Add MDS vulnerability documentation

commit 5999bbe7a6ea3c62029532ec84dc06003a1fa258 upstream

Add the initial MDS vulnerability documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 3b5d5994ef174daf5f77ba3f101cd879cf0c5fd6)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Documentation: Move L1TF to separate directory

commit 65fd4cb65b2dad97feb8330b6690445910b56d6a upstream

Move L1TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 19049ee5b2543696dd3cb164a4c4e566984f9615)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add mitigation mode VMWERV

commit 22dd8365088b6403630b82423cf906491859b65e upstream

In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.

Introduce an internal mitigation mode VWWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.

That said: Virtual Machines Will Eventually Receive Vaccine

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit a2227b3f734fad5ace7c99103d6a7bc020c193fd)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.

x86/speculation/mds: Add debugfs for controlling MDS

Add debugfs entries for controlling mds_user_clear and mds_idle_clear static
keys which would enable the mitigation in the respective paths.

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/speculation/mds: Add sysfs reporting for MDS

commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit db366061fff1f76407cb5d1b0975fcc381400cc3)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.
X86_HYPER_NATIVE doesn't exist so just leave that change out.
sched_smt_active() does not exist, instead use cpu_smp_control.
hypervisor_is_type replaced with cpu_has_hypervisor

x86/speculation/mds: Add mitigation control for MDS

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
(cherry picked from commit 4cad86e4abd472f637038e0ad70a70d0d7333f83)

Orabug: 29526900
CVE: CVE-2018-12126
CVE: CVE-2018-12130
CVE: CVE-2018-12127

Signed-off-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Conflicts:
The changes to arch/x86/kernel/cpu/bugs.c instead need to be made to
arch/x86/kernel/cpu/bugs_64.c.