]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
6 years agox86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD
Mihai Carabas [Wed, 7 Nov 2018 07:00:03 +0000 (09:00 +0200)]
x86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD

The commit 52817587e706 ('x86/cpufeatures: Disentangle SSBD enumeration') from
upstream disentangles SSBD enumeration. We did not backport that commit because
we did not have what to disentangle on UEK4. Our cpufeature was already
synthetic.

That commit also renames X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD. We
need this rename in order to not have conflicting cpu features while
backporting commit 6ac2f49edb1e ('x86/bugs: Add AMD's SPEC_CTRL MSR usage')
from upstream which introduces SPEC_CTRL MSR, which will be the prefered
method.

Orabug: 28870524
CVE: CVE-2018-3639

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoMake file credentials available to the seqfile interfaces
Linus Torvalds [Thu, 14 Apr 2016 18:22:00 +0000 (11:22 -0700)]
Make file credentials available to the seqfile interfaces

A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.

The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.

So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones.  Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.

It turns out that seq_file _does_ save away the user_ns information of
the file, though.  Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92)

Orabug: 29114879
CVE: CVE-2018-17972

Conflict:  Refactored include/linux/seq_file.h to include __GENKSYM__
and UEK_KABI_REPLACE() to pass check_kabi test.

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoproc: restrict kernel stack dumps to root
Jann Horn [Fri, 5 Oct 2018 22:51:58 +0000 (15:51 -0700)]
proc: restrict kernel stack dumps to root

Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU.  That means that
the stack can change under the stack walker.  The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers.  This can cause exposure of
kernel stack contents to userspace.

Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents.  See the added comment for a longer
rationale.

There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails.  Therefore, I believe
that this change is unlikely to break things.  In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.

Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f50 ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7)

Orabug: 29114879
CVE: CVE-2018-17972

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: Clean up retpoline code in bugs.c
Alejandro Jimenez [Tue, 22 Jan 2019 21:40:04 +0000 (16:40 -0500)]
x86/speculation: Clean up retpoline code in bugs.c

Now that the minimal retpoline modes are removed, also remove
unnecessary checks to simplify retpoline code.

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
WANG Chao [Mon, 10 Dec 2018 16:37:25 +0000 (00:37 +0800)]
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE

Commit

  4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")

replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
remaining pieces.

 [ bp: Massage commit message. ]

Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-kbuild@vger.kernel.org
Cc: srinivas.eeda@oracle.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
(cherry picked from commit e4f358916d528d479c3c12bd2fd03f2d5a576380)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
include/linux/compiler-gcc.h
include/linux/module.h
UEK4 either implements the changes in different files
or it does not have the patches that introduce the
lines changed by this cherry-picked commit.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/build: Fix compiler support check for CONFIG_RETPOLINE
Masahiro Yamada [Wed, 5 Dec 2018 06:27:19 +0000 (15:27 +0900)]
x86/build: Fix compiler support check for CONFIG_RETPOLINE

It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.

Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.

The easiest fix is to move this check to the "archprepare" like
this commit did:

  829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com
Link: https://lkml.org/lkml/2018/12/4/206
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 25896d073d8a0403b07e6dec56f58e6c33678207)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Makefile
The archprepare rule is different in UEK and upstream makefiles

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/retpoline: Remove minimal retpoline support
Zhenzhong Duan [Fri, 2 Nov 2018 08:45:41 +0000 (01:45 -0700)]
x86/retpoline: Remove minimal retpoline support

Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
reason to keep the minimal retpoline support around which only provided
basic protection in the assembly files.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@default
(cherry picked from commit ef014aae8f1cd2793e4e014bbb102bed53f852b7)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
Zhenzhong Duan [Fri, 2 Nov 2018 08:45:41 +0000 (01:45 -0700)]
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support

Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on the compiler capability.

Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
support it. Emit an error message in that case:

 "arch/x86/Makefile:226: *** You are building kernel with non-retpoline
  compiler, please update your compiler..  Stop."

[dwmw: Fail the build with non-retpoline compiler]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@default
(cherry picked from commit 4cd24de3a0980bf3100c9dcb08ef65ca7c31af48)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Kconfig
arch/x86/include/asm/nospec-branch.h
Minor differences between UEK and upstream.
arch/x86/Makefile
Need to add line defining RETPOLINE_CFLAGS.
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c
scripts/Makefile.build
Commit e699314 (objtool: Add retpoline validation) has not
been ported to UEK4, nothing to change.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonl80211: check for the required netlink attributes presence
Vladis Dronov [Tue, 12 Sep 2017 22:21:21 +0000 (00:21 +0200)]
nl80211: check for the required netlink attributes presence

nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.

This fixes CVE-2017-12153.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046
Fixes: e5497d766ad ("cfg80211/nl80211: support GTK rekey offload")
Cc: <stable@vger.kernel.org> # v3.1-rc1
Reported-by: bo Zhang <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit e785fa0a164aa11001cba931367c7f94ffaff888)

Orabug: 29245533
CVE: CVE-2017-12153

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoscsi: lpfc: Fix PT2PT PRLI reject (reapply patch)
James Smart [Wed, 12 Jul 2017 22:28:24 +0000 (18:28 -0400)]
scsi: lpfc: Fix PT2PT PRLI reject (reapply patch)

[backport of 114e80db15039e248eb4e458559cef57737930a8]
From: rkennedy <dick.kennedy@avagotech.com>

Orabug: 29281346

lpfc cannot establish connection with targets that send PRLI in P2P
configurations.

If lpfc rejects a PRLI that is sent from a target the target will not
resend and will reject the PRLI send from the initiator.

[tv: original mistakenly applied in reverse, because change was already
present in the code at that point; this reapplies forwards]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Fred Herard <fred.herard@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agords: congestion updates can be missed when kernel low on memory
Mukesh Kacker [Wed, 1 Aug 2018 18:37:01 +0000 (11:37 -0700)]
rds: congestion updates can be missed when kernel low on memory

The congestion updates are allocated under GFP_NOWAIT and can
fail under temporary memory pressure. These are not retried and
the update here retries them until sent.

On receiving congestion updates,  corrupt packet check failures
are not logged as warnings.

Orabug: 28425811

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonet/rds: ib: Fix endless RNR Retries caused by memory allocation failures
Venkat Venkatsubra [Thu, 17 Jan 2019 14:02:23 +0000 (06:02 -0800)]
net/rds: ib: Fix endless RNR Retries caused by memory allocation failures

Temporary memory allocation failures may cause an RDS connection
to be stuck in an endless RNR (receiver Not Ready). Right around the
time the RDS connection becomes stuck, it reports these recv buffer
allocation failures:

rcuos/10: page allocation failure: order:2, mode:0x2
Call Trace:
<IRQ> [<ffffffff81698cc0>] dump_stack+0x63/0x83
[<ffffffff8118e59a>] warn_alloc_failed+0xea/0x140
[<ffffffff810b93fa>] ? select_idle_sibling+0x2a/0x120
[<ffffffff81191e09>] __alloc_pages_slowpath+0x409/0x760
[<ffffffff81192411>] __alloc_pages_nodemask+0x2b1/0x2d0
[<ffffffff810bae62>] ? check_preempt_wakeup+0x112/0x230
[<ffffffff811dc3af>] alloc_pages_current+0xaf/0x170
[<ffffffffa12e2090>] rds_page_remainder_alloc+0x60/0x2a4
[<ffffffffa0b1c0ac>] rds_ib_refill_one_frag+0x13c/0x200 [rds_rdma]
[<ffffffffa12994cd>] rds_ib_recv_refill_one+0x8d/0x220
[<ffffffffa0b1dfbf>] rds_ib_recv_refill+0x11f/0x340 [rds_rdma]
[<ffffffffa129989e>] rds_ib_recv_cqe_handler+0x23e/0x290
[<ffffffffa0b19326>] poll_cq+0x66/0xe0 [rds_rdma]
[<ffffffffa0b1945d>] rds_ib_rx+0xbd/0x210 [rds_rdma]
[<ffffffffa0b1964a>] rds_ib_tasklet_fn_recv+0x3a/0x50 [rds_rdma]
[<ffffffff81088361>] tasklet_action+0xb1/0xc0
[<ffffffff8108871a>] __do_softirq+0x10a/0x350
[<ffffffff8169f53c>] do_softirq_own_stack+0x1c/0x30
 <EOI> [<ffffffff81088445>] do_softirq+0x55/0x60
[<ffffffff81088528>] __local_bh_enable_ip+0x88/0x90
[<ffffffff810e86d1>] rcu_nocb_kthread+0xf1/0x180
[<ffffffff810e85e0>] ? print_cpu_stall+0x170/0x170
[<ffffffff810e85e0>] ? print_cpu_stall+0x170/0x170
[<ffffffff810a465e>] kthread+0xce/0xf0
[<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8169dda2>] ret_from_fork+0x42/0x70
[<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70

We re-schedule recv buffer refiller on satisfying these conditions:

if (rds_conn_up(conn) &&
   (must_wake || (can_wait && ring_low)
              || rds_ib_ring_empty(&ic->i_recv_ring))) {
   queue_delayed_work(conn->c_wq, &conn->c_recv_w, 1);
}

This currently doesn't take into account memory allocation failures.

A bit later the memory pressure clears away.
But RDS does not refill receive buffers for that connection any more.
This is because the receiver is only woken up on the last packet of a
multi-packet message. But the last packet is never received, because the
recv queue becomes empty and we end up in the endless RNR Retry situation.

Orabug: 28127993

Consultation with: Haakon Bugge

Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonet: rds: fix excess initialization of the recv SGEs
Zhu Yanjun [Fri, 25 Jan 2019 02:14:52 +0000 (21:14 -0500)]
net: rds: fix excess initialization of the recv SGEs

In rds_ib_recv_init_ring(), an excess array element is incorrectly
initialized. This is not an OOB situation, as the sge array is
initialized to eight entries. With a fragment size of a maximum of 16KiB
and a page size of minimum 4KiB, then num_send_sge can at most become
five.

Orabug: 29004503

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxhci: fix usb2 resume timing and races.
Mathias Nyman [Fri, 11 Dec 2015 12:38:06 +0000 (14:38 +0200)]
xhci: fix usb2 resume timing and races.

According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.

On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.

On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.

There are a few issues with this approach

1. A host initiated resume will also generate a resume event. The event
   handler will find the port in resume state, believe it's a device
   initiated resume, and act accordingly.

2. A port status request might cut the resume signalling short if a
   get_port_status request is handled during the host resume signalling.
   The port will be found in resume state. The timestamp is not set leading
   to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
   get_port_status will proceed with moving the port to U0.

3. If an error, or anything else happens to the port during device
   initiated resume signalling it will leave all the device resume
   parameters hanging uncleared, preventing further suspend, returning
   -EBUSY, and cause the pm thread to busyloop trying to enter suspend.

Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state

This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS

Cc: stable <stable@vger.kernel.org>
Reported-by: Daniel J Blueman <daniel@quora.org>
Tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29028940

(cherry picked from commit f69115fdbc1ac0718e7d19ad3caa3da2ecfe1c96)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxhci: Fix a race in usb2 LPM resume, blocking U3 for usb2 devices
Mathias Nyman [Wed, 18 Nov 2015 08:48:22 +0000 (10:48 +0200)]
xhci: Fix a race in usb2 LPM resume, blocking U3 for usb2 devices

Clear device initiated resume variables once device is fully up and running
in U0 state.

Resume needs to be signaled for 20ms for usb2 devices before they can be
moved to U0 state.

An interrupt is triggered if a device initiates resume. As we handle the
event in interrupt context we can not sleep for 20ms, so we instead set
a resume flag, a timestamp, and start the roothub polling.

The roothub code will later move the port to U0 when it finds a port in
resume state with the resume flag set, and timestamp passed by 20ms.

A host initiated resume is however not done in interrupt context, and
host initiated resume code will directly signal resume, wait 20ms and then
move the port to U0.

These two codepaths can race, if we are in the middle of a host initated
resume, while sleeping for 20ms, we may handle a port event and find the
port in resume state. The port event handling code will assume the resume
was device initiated and set the resume flag and timestamp.

Root hub code will however not catch the port in resume state again as the
host initated resume code has already moved the port to U0.
The resume flag and timestamp will remain set for this port preventing port
from suspending again  (LPM setting port to U3)

Fix this for now by always clearing the device initated resume parameters
once port is in U0

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29028940

(cherry picked from commit dad67d5f3d0efe01d38c6cebcb6698280e51927b)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/usb/host/xhci-hub.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agouserfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Andrea Arcangeli [Fri, 14 Dec 2018 22:17:17 +0000 (14:17 -0800)]
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON.  Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.

Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397

commit 01e881f5a1fca4677e82733061868c6d6ea05ca7 upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/userfaultfd.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agouserfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
Andrea Arcangeli [Fri, 30 Nov 2018 22:09:32 +0000 (14:09 -0800)]
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas

After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration.  This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397

commit 29ec90660d68bbdd69507c1c8b4e33aa299278b1 upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/userfaultfd.c
mm/userfaultfd.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/apic/x2apic: set affinity of a single interrupt to one cpu
Jianchao Wang [Wed, 9 Jan 2019 08:20:24 +0000 (03:20 -0500)]
x86/apic/x2apic: set affinity of a single interrupt to one cpu

Customer want to offline the cpus to 2 per node. And finally, a
lpfc HBA cannot work any more due to no available
irq vectors.
[   51.031812] IRQ 284 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031817] IRQ 285 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031822] IRQ 286 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031827] IRQ 287 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.

It was due to cluster_vector_allocation_domain which want to set
interrupt affinity of a single interrupt to multiple CPUs and need
a same irq vector to be available on multiple cpus. This is difficult
for customer's case where there are a lot of HBAs on node 0 and only
2 or 4 cpus online there.

And actually, this feature has been discarded by the upstream.
https://lkml.org/lkml/2017/9/13/576
We close this feature by just set one cpu in retmask in
cluster_vector_allocation_domain.

Customer that encountered this issue used RHCK, since UEK4 also
has the same code, post a same patch for UEK4

Orabug: 29196396

Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang.oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxen/blkback: rework validate_io_op()
Dongli Zhang [Wed, 23 Jan 2019 07:48:36 +0000 (15:48 +0800)]
xen/blkback: rework validate_io_op()

Rework many if statements in validate_io_op() into a switch statement.

Orabug: 29199843

Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxen/blkback: optimize validate_io_op() to filter BLKIF_OP_RESERVED_1 operation
Dongli Zhang [Wed, 23 Jan 2019 07:48:00 +0000 (15:48 +0800)]
xen/blkback: optimize validate_io_op() to filter BLKIF_OP_RESERVED_1 operation

Instead of hardcoding operation = 4, BLKIF_OP_RESERVED_1 = 4 is defined in
the header file.

Orabug: 29199843

Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxen/blkback: do not BUG() for invalid blkif_request from frontend
Dongli Zhang [Wed, 23 Jan 2019 07:47:33 +0000 (15:47 +0800)]
xen/blkback: do not BUG() for invalid blkif_request from frontend

Upstream commit 0e367ae46503 ("xen/blkback: correctly respond to unknown,
non-native requests") fixed a bug to correctly respond to unknown,
non-native requests,  e.g., BLKIF_OP_RESERVED_1 or BLKIF_OP_PACKET for
64-bit SLES 11 guests when using a 32-bit backend.

Although such fix is already in uek4, it is broken by commit f0af2f840606
("xen-blkback: move indirect req allocation out-of-line") that introduced
the BUG() again.

This patch removes the BUG() to avoid panic backend by invalid
blkif_request from frontend.

Orabug: 29199843

Fixes: f0af2f840606 ("xen-blkback: move indirect req allocation out-of-line")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonet/rds: WARNING: at net/rds/recv.c:222 rds_recv_hs_exthdrs+0xf8/0x1e0
Venkat Venkatsubra [Fri, 18 Jan 2019 16:25:21 +0000 (08:25 -0800)]
net/rds: WARNING: at net/rds/recv.c:222 rds_recv_hs_exthdrs+0xf8/0x1e0

The stack trace looks as follows:

WARNING: at net/rds/recv.c:222 rds_recv_hs_exthdrs+0xf8/0x1e0 [rds]()
Call Trace:
dump_stack+0x63/0x81
warn_slowpath_common+0x8a/0xc0
warn_slowpath_null+0x1a/0x20
rds_recv_hs_exthdrs+0xf8/0x1e0 [rds]
rds_recv_local.isra.7+0x396/0x440 [rds]
rds_recv_incoming+0x2d8/0x3c0 [rds]
rds_ib_recv_cqe_handler+0x44f/0x6d0 [rds_rdma]
poll_rcq+0x7a/0xa0 [rds_rdma]
rds_ib_rx+0xa4/0x220 [rds_rdma]
rds_ib_tasklet_fn_recv+0x30/0x40 [rds_rdma]
...

commit 041dc3e4d3
("Backport multipath RDS from upstream to UEK4") treats an
incoming rds ping or rds pong differently if the local (in case of pong) or
sender's port (in case of ping) is 1 (RDS_FLAG_PROBE_PORT).

There is nothing stopping rds-ping from picking this port for it's local side
since it does wildcard socket bind.

The fix is to check for t_mp_capable transport.

Orabug: 29201779

Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxen-netback: wake up xenvif_dealloc_kthread when it should stop
Dongli Zhang [Mon, 21 Jan 2019 01:56:55 +0000 (09:56 +0800)]
xen-netback: wake up xenvif_dealloc_kthread when it should stop

The feature 'staging grant' changed the behaviour of
xenvif_zerocopy_callback() that queue->dealloc_prod may not increase during
the do-while loop because of 'staging grant'. As a result,
xenvif_skb_zerocopy_complete() would not wake up xenvif_dealloc_kthread
because (prod == queue->dealloc_prod).

This makes trouble when the xenvif_dealloc_kthread is requested to stop by
xenvif_disconnect(). When xenvif_dealloc_kthread is stopped while
inflight_packets is not 0, xenvif_dealloc_kthread would not exit until
inflight_packets becomes 0.

However, because of 'staging grant', xenvif_skb_zerocopy_complete() would
not wake up xenvif_dealloc_kthread() although inflight_packets is
decremented and already becomes 0. As a result, xenvif_dealloc_kthread will
never wakes up.

xenvif_skb_zerocopy_complete() should wake up xenvif_dealloc_kthread when
the latter is in the progress to stop.

Orabug: 29217927

Fixes: fdbb2e3659b3 ("xen-netback: use gref mappings for Tx requests")
Reported-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: remove nonblocking mode from xfs_vm_writepage"
Wengang Wang [Tue, 29 Jan 2019 03:21:20 +0000 (19:21 -0800)]
Revert "xfs: remove nonblocking mode from xfs_vm_writepage"

This reverts commit 6e2de7d4578d4f6ae76979286de5c5ee8e91754a.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: remove xfs_cancel_ioend"
Wengang Wang [Tue, 29 Jan 2019 03:19:09 +0000 (19:19 -0800)]
Revert "xfs: remove xfs_cancel_ioend"

This reverts commit c680537035066177fa845053354974f7245c02d8.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: Introduce writeback context for writepages"
Wengang Wang [Tue, 29 Jan 2019 03:18:57 +0000 (19:18 -0800)]
Revert "xfs: Introduce writeback context for writepages"

This reverts commit b104054b547e9034c5c7bf763d08e9803b5b58ed.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: xfs_cluster_write is redundant"
Wengang Wang [Tue, 29 Jan 2019 03:18:49 +0000 (19:18 -0800)]
Revert "xfs: xfs_cluster_write is redundant"

This reverts commit e58eae1b82358f6df9a88b1312cac667b3d968db.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: factor mapping out of xfs_do_writepage"
Wengang Wang [Tue, 29 Jan 2019 03:18:40 +0000 (19:18 -0800)]
Revert "xfs: factor mapping out of xfs_do_writepage"

This reverts commit 40a82631dc131f1b7f61b2a2fe6351c382aaf04f.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "xfs: don't chain ioends during writepage submission"
Wengang Wang [Tue, 29 Jan 2019 03:18:16 +0000 (19:18 -0800)]
Revert "xfs: don't chain ioends during writepage submission"

This reverts commit 34457adcadaf557febddb9f715368bbd5c3fd239.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomstflint: Fix coding style issues - left with LINUX_VERSION_CODE
Idan Mehalel [Tue, 24 Jul 2018 12:24:09 +0000 (15:24 +0300)]
mstflint: Fix coding style issues - left with LINUX_VERSION_CODE

Description:
Issue: 1471556

Orabug: 28878697

(cherry picked from commit 30e70911bcc22ac77b13d537225d7499261caac8)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomstflint: Fix coding-style issues
Idan Mehalel [Tue, 24 Jul 2018 08:17:46 +0000 (11:17 +0300)]
mstflint: Fix coding-style issues

Description:
Issue: 1471556

Orabug: 28878697

(cherry picked from commit d514e6f02dcd8436e864e8113fe010898be56d10)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomstflint: Fix errors found with checkpatch script
Idan Mehalel [Mon, 23 Jul 2018 12:24:12 +0000 (15:24 +0300)]
mstflint: Fix errors found with checkpatch script

Description:
Issue: 1471556

Title: Fix compilation isuue

Description:
Issue: N/A

Orabug: 28878697

(cherry picked from commit 8154be122d0f841208b787b728085c565710e0f7
and dfec3c77f977344d234c93704e59a5ca12832ab1)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Squashed two commits since the 1st commit has a compilation
issue which is fixed by the 2nd commit.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoAdded support for 5th Gen devices in Secure Boot module and mtcr
Adham Masarwah [Tue, 13 Mar 2018 09:21:09 +0000 (11:21 +0200)]
Added support for 5th Gen devices in Secure Boot module and mtcr

Signed-off-by: Adham Masarwah <adham@mellanox.com>
Orabug: 28878697

(cherry picked from commit 4cbcf2923e05d74694fa2a5355960ca979ee8a97)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoFix typos in mst_kernel
Adham Masarwah [Mon, 26 Feb 2018 08:16:07 +0000 (10:16 +0200)]
Fix typos in mst_kernel

Signed-off-by: Adham Masarwah <adham@mellanox.com>
Orabug: 28878697

(cherry picked from commit 5fd539b720c95b557f55aa6465fc220415d3dca4)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agobnxt_en: Report PCIe link properties with pcie_print_link_status()
Brian Maly [Thu, 25 Oct 2018 20:50:38 +0000 (16:50 -0400)]
bnxt_en: Report PCIe link properties with pcie_print_link_status()

Orabug: 28942099

Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - PCIe: Speed %s Width x%d
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[backport of upstream commit af125b754e2f09e6061e65db8f4eda0f7730011d]

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
[backport of UEK5 commit 48b32a7f2b4dddafbf42cde882c3c84c556fb477]
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.c
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.h

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoselinux: Perform both commoncap and selinux xattr checks
Eric W. Biederman [Mon, 2 Oct 2017 14:38:20 +0000 (09:38 -0500)]
selinux: Perform both commoncap and selinux xattr checks

When selinux is loaded the relax permission checks for writing
security.capable are not honored.  Which keeps file capabilities
from being used in user namespaces.

Stephen Smalley <sds@tycho.nsa.gov> writes:
> Originally SELinux called the cap functions directly since there was no
> stacking support in the infrastructure and one had to manually stack a
> secondary module internally.  inode_setxattr and inode_removexattr
> however were special cases because the cap functions would check
> CAP_SYS_ADMIN for any non-capability attributes in the security.*
> namespace, and we don't want to impose that requirement on setting
> security.selinux.  Thus, we inlined the capabilities logic into the
> selinux hook functions and adapted it appropriately.

Now that the permission checks in commoncap have evolved this
inlining of their contents has become a problem.  So restructure
selinux_inode_removexattr, and selinux_inode_setxattr to call
both the corresponding cap_inode_ function and dentry_has_perm
when the attribute is not a selinux security xattr.   This ensures
the policies of both commoncap and selinux are enforced.

This results in smack and selinux having the same basic structure
for setxattr and removexattr.  Performing their own special permission
checks when it is their modules xattr being written to, and deferring
to commoncap when that is not the case.  Then finally performing their
generic module policy on all xattr writes.

This structure is fine when you only consider stacking with the
commoncap lsm, but it becomes a problem if two lsms that don't want
the commoncap security checks on their own attributes need to be
stack.  This means there will need to be updates in the future as lsm
stacking is improved, but at least now the structure between smack and
selinux is common making the code easier to refactor.

This change also has the effect that selinux_linux_setotherxattr becomes
unnecessary so it is removed.

Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities")
Fixes: 7bbf0e052b76 ("[PATCH] selinux merge")
Historical Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 6b240306ee1631587a87845127824df54a0a5abe)

Orabug: 28951521

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
security/selinux/hooks.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoIntroduce v3 namespaced file capabilities
Serge E. Hallyn [Mon, 8 May 2017 18:11:56 +0000 (13:11 -0500)]
Introduce v3 namespaced file capabilities

Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

Example using tar:

 $ cp /bin/sleep sleepx
 $ mkdir b1 b2
 $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
 $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
 $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
 $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
 $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
   b2/sleepx = cap_sys_admin+ep
 # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
   v3 xattr, rootid is 100001

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
     commoncap.c: fix typos - s/v4/v3
     get_vfs_caps_from_disk: clarify the fs_ns root access check
     nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
       don't return v3 cap for caps owned by current root.
      return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
      . Change the flow of fscap writing to support s_user_ns writing.
      . Remove refuse_fcap_overwrite().  The value of the previous
        xattr doesn't matter.
   Apr 24 2017:
      . incorporate Eric's incremental diff
      . move cap_convert_nscap to setxattr and simplify its usage
   May 8, 2017:
      . fix leaking dentry refcount in cap_inode_getsecurity

Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
(cherry picked from commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340)

Orabug: 28951521

UEK4 does not support marking user namespace owner for a filesystem.
Adding that support requires cherrypicking below commit from mainline

6e4eab577a0cae15b3da9b888cff16fe57981b3e
“(fs: Add user namespace member to struct super_block)”

This would break KABI. So, in UEK4, the user namespace owner
for a super_block is always init_user_ns.

UEK4 also does not have the lsm hook framework which
was added to mainline by the following commit

b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef
“(LSM: Switch to lists of hooks)”

So, this backport ignores the change in LSM_HOOK.

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
security/commoncap.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agords: ib: Use a delay when reconnecting to the very same IP address
Håkon Bugge [Wed, 2 Jan 2019 13:59:35 +0000 (14:59 +0100)]
rds: ib: Use a delay when reconnecting to the very same IP address

An RDS IB connection may be formed from the very same IB port using
HCA level internal loop-back. If this connection attempt is performed
after RDS has cleared the ARP cache of the same IP address, an ARP IB
multicast is sent out on the IPoIB interface.

If the above scenario is performed on IPoIB interfaces that are
members of an IB Limited Partition, the ARP multicast will be dropped
by the HCA port. A corresponding PKey Violation is counted and a
corresponding PKey Violation Trap is sent to the OpenSM, subject to
rate control.

Now, due to a bug in RDS connection management, where it was not
anticipated that the peers of a connection could actually be the very
same port and have the same IP address, the reconnect attempts happens
with zero delay.

This leads to about 7700 connection attempts per second, about
4400 PKey Violations per second, and 8500 ARP multicasts per second.

This commit reduces the reconnect rate down to one second. This
because the RDS uses exponential backoff to calculate the delay, which
will shortly end up at rds_sysctl_reconnect_max_jiffies, which by
default is HZ, in other words, a delay at one second after the 10
first reconnects.

Orabug: 29138813

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com>
---

v1 -> v2:
   * Amended commit message as per Ka-Cheong's suggestions

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoChange mincore() to count "mapped" pages rather than "cached" pages
Linus Torvalds [Sun, 6 Jan 2019 01:50:59 +0000 (17:50 -0800)]
Change mincore() to count "mapped" pages rather than "cached" pages

The semantics of what "in core" means for the mincore() system call are
somewhat unclear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache" rather than "page is mapped in the mapping".

The problem with that traditional semantic is that it exposes a lot of
system cache state that it really probably shouldn't, and that users
shouldn't really even care about.

So let's try to avoid that information leak by simply changing the
semantics to be that mincore() counts actual mapped pages, not pages
that might be cheaply mapped if they were faulted (note the "might be"
part of the old semantics: being in the cache doesn't actually guarantee
that you can access them without IO anyway, since things like network
filesystems may have to revalidate the cache before use).

In many ways the old semantics were somewhat insane even aside from the
information leak issue.  From the very beginning (and that beginning is
a long time ago: 2.3.52 was released in March 2000, I think), the code
had a comment saying

  Later we can get more picky about what "in core" means precisely.

and this is that "later".  Admittedly it is much later than is really
comfortable.

NOTE! This is a real semantic change, and it is for example known to
change the output of "fincore", since that program literally does a
mmmap without populating it, and then doing "mincore()" on that mapping
that doesn't actually have any pages in it.

I'm hoping that nobody actually has any workflow that cares, and the
info leak is real.

We may have to do something different if it turns out that people have
valid reasons to want the old semantics, and if we can limit the
information leak sanely.

Cc: Kevin Easton <kevin@guarana.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 574823bfab82d9d8fa47f422778043fbb4b4f50e)
Orabug: 29187415
CVE: CVE-2019-5489
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: John Donnelly <John.P.Donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
mm/mincore.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoNFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
Kinglong Mee [Thu, 30 Jul 2015 13:55:02 +0000 (21:55 +0800)]
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1

According to rfc5661 18.16.4,
"If EXCLUSIVE4_1 was used, the client determines the attributes
 used for the verifier by comparing attrset with cva_attrs.attrmask;"

So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Orabug: 29204157

(cherry picked from commit ead8fb8c24411722b92198b3dccd102a76cdd050)
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Bill Baker <Bill.Baker@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: update i_disksize when new eof exceeds it
Shan Hai [Sat, 29 Dec 2018 05:34:53 +0000 (13:34 +0800)]
ext4: update i_disksize when new eof exceeds it

Orabug: 28940828

This patch is a helper for back porting upstream commit 45d8ec4d9fd5
(ext4: update i_disksize if direct write past ondisk size), add a condition
to allow updating i_disksize through calling ext4_ind_direct_IO when the new
eof exceeds both i_size and i_disksize.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: update i_disksize if direct write past ondisk size
Eryu Guan [Thu, 22 Mar 2018 15:44:59 +0000 (11:44 -0400)]
ext4: update i_disksize if direct write past ondisk size

Currently in ext4 direct write path, we update i_disksize only when
new eof is greater than i_size, and don't update it even when new
eof is greater than i_disksize but less than i_size. This doesn't
work well with delalloc buffer write, which updates i_size and
i_disksize only when delalloc blocks are resolved (at writeback
time), the i_disksize from direct write can be lost if a previous
buffer write succeeded at write time but failed at writeback time,
then results in corrupted ondisk inode size.

Consider this case, first buffer write 4k data to a new file at
offset 16k with delayed allocation, then direct write 4k data to the
same file at offset 4k before delalloc blocks are resolved, which
doesn't update i_disksize because it writes within i_size(20k), but
the extent tree metadata has been committed in journal. Then
writeback of the delalloc blocks fails (due to device error etc.),
and i_size/i_disksize from buffer write can't be written to disk
(still zero). A subsequent umount/mount cycle recovers journal and
writes extent tree metadata from direct write to disk, but with
i_disksize being zero.

Fix it by updating i_disksize too in direct write path when new eof
is greater than i_disksize but less than i_size, so i_disksize is
always consistent with direct write.

This fixes occasional i_size corruption in fstests generic/475.

Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Orabug: 28940828

commit 45d8ec4d9fd5468c08f2ef0b2b132bb62dc81a3d upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/indirect.c
code line mismatch

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: protect i_disksize update by i_data_sem in direct write path
Eryu Guan [Thu, 22 Mar 2018 15:41:25 +0000 (11:41 -0400)]
ext4: protect i_disksize update by i_data_sem in direct write path

i_disksize update should be protected by i_data_sem, by either taking
the lock explicitly or by using ext4_update_i_disksize() helper. But the
i_disksize updates in ext4_direct_IO_write() are not protected at all,
which may be racing with i_disksize updates in writeback path in
delalloc buffer write path.

This is found by code inspection, and I didn't hit any i_disksize
corruption due to this bug. Thanks to Jan Kara for catching this bug and
suggesting the fix!

Reported-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Orabug: 28940828

commit 73fdad00b208b139cf43f3163fbc0f67e4c6047c upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/indirect.c
code line mismatch

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c
Hui Peng [Mon, 3 Dec 2018 15:09:34 +0000 (16:09 +0100)]
ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c

If a USB sound card reports 0 interfaces, an error condition is triggered
and the function usb_audio_probe errors out. In the error path, there was a
use-after-free vulnerability where the memory object of the card was first
freed, followed by a decrement of the number of active chips. Moving the
decrement above the atomic_dec fixes the UAF.

[ The original problem was introduced in 3.1 kernel, while it was
  developed in a different form.  The Fixes tag below indicates the
  original commit but it doesn't mean that the patch is applicable
  cleanly. -- tiwai ]

Fixes: 362e4e49abe5 ("ALSA: usb-audio - clear chip->probing on error exit")
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 29042981
CVE: CVE-2018-19824
(cherry picked from commit 5f8cf712582617d523120df67d392059eaf2fc4b)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoALSA: usb-audio: Replace probing flag with active refcount
Takashi Iwai [Wed, 26 Aug 2015 08:20:59 +0000 (10:20 +0200)]
ALSA: usb-audio: Replace probing flag with active refcount

We can use active refcount for preventing autopm during probe.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 29042981
CVE: CVE-2018-19824
(cherry picked from commit a6da499b76b1a75412f047ac388e9ffd69a5c55b)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoALSA: usb-audio: Avoid nested autoresume calls
Takashi Iwai [Tue, 25 Aug 2015 14:09:00 +0000 (16:09 +0200)]
ALSA: usb-audio: Avoid nested autoresume calls

After the recent fix of runtime PM for USB-audio driver, we got a
lockdep warning like:

  =============================================
  [ INFO: possible recursive locking detected ]
  4.2.0-rc8+ #61 Not tainted
  ---------------------------------------------
  pulseaudio/980 is trying to acquire lock:
   (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio]
  but task is already holding lock:
   (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio]

This comes from snd_usb_autoresume() invoking down_read() and it's
used in a nested way.  Although it's basically safe, per se (as these
are read locks), it's better to reduce such spurious warnings.

The read lock is needed to guarantee the execution of "shutdown"
(cleanup at disconnection) task after all concurrent tasks are
finished.  This can be implemented in another better way.

Also, the current check of chip->in_pm isn't good enough for
protecting the racy execution of multiple auto-resumes.

This patch rewrites the logic of snd_usb_autoresume() & co; namely,
- The recursive call of autopm is avoided by the new refcount,
  chip->active.  The chip->in_pm flag is removed accordingly.
- Instead of rwsem, another refcount, chip->usage_count, is introduced
  for tracking the period to delay the shutdown procedure.  At
  the last clear of this refcount, wake_up() to the shutdown waiter is
  called.
- The shutdown flag is replaced with shutdown atomic count; this is
  for reducing the lock.
- Two new helpers are introduced to simplify the management of these
  refcounts; snd_usb_lock_shutdown() increases the usage_count, checks
  the shutdown state, and does autoresume.  snd_usb_unlock_shutdown()
  does the opposite.  Most of mixer and other codes just need this,
  and simply returns an error if it receives an error from lock.

Fixes: 9003ebb13f61 ('ALSA: usb-audio: Fix runtime PM unbalance')
Reported-and-tested-by: Alexnader Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 29042981
CVE: CVE-2018-19824
(cherry picked from commit 47ab154593827b1a8f0713a2b9dd445753d551d8)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>
Conflict:

sound/usb/mixer.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: validate that metadata blocks do not overlap superblock
Theodore Ts'o [Mon, 1 Aug 2016 04:51:02 +0000 (00:51 -0400)]
ext4: validate that metadata blocks do not overlap superblock

A number of fuzzing failures seem to be caused by allocation bitmaps
or other metadata blocks being pointed at the superblock.

This can cause kernel BUG or WARNings once the superblock is
overwritten, so validate the group descriptor blocks to make sure this
doesn't happen.

Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 829fa70dddadf9dd041d62b82cd7cea63943899d)

Orabug: 29114440
CVE: CVE-2018-1094

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: update inline int ext4_has_metadata_csum(struct super_block *sb)
John Donnelly [Mon, 7 Jan 2019 18:10:50 +0000 (10:10 -0800)]
ext4: update inline int ext4_has_metadata_csum(struct super_block *sb)

 to include ext4_has_feature_metadata_csum(sb) check.

Orabug: 29114440
CVE: CVE-2018-1094

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: always initialize the crc32c checksum driver
Theodore Ts'o [Fri, 30 Mar 2018 02:10:31 +0000 (22:10 -0400)]
ext4: always initialize the crc32c checksum driver

The extended attribute code now uses the crc32c checksum for hashing
purposes, so we should just always always initialize it.  We also want
to prevent NULL pointer dereferences if one of the metadata checksum
features is enabled after the file sytsem is originally mounted.

This issue has been assigned CVE-2018-1094.

https://bugzilla.kernel.org/show_bug.cgi?id=199183
https://bugzilla.redhat.com/show_bug.cgi?id=1560788

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit a45403b51582a87872927a3e0fc0a389c26867f1)

Orabug: 29114440
CVE: CVE-2018-1094

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoRevert "bnxt_en: Reduce default rings on multi-port cards."
Brian Maly [Thu, 10 Jan 2019 17:05:29 +0000 (12:05 -0500)]
Revert "bnxt_en: Reduce default rings on multi-port cards."

Orabug: 28687746

This reverts commit 143bdb401ce42631af3030f192c8fa6d148b9197.

This commit caused IRQs per dev to be reduced from 8 to 4 which resulted in TPCC throughput dropping by 18%.
Revert this commit so we have 8 IRQs per dev again.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
6 years agomlx4_core: Disable P_Key Violation Traps
Håkon Bugge [Thu, 4 Oct 2018 11:04:38 +0000 (13:04 +0200)]
mlx4_core: Disable P_Key Violation Traps

Exadata virt edition, actively using IB partitions, is exposed to
excessive P_Key Violation Traps being sent to the SM. This is close to
a DoS attack. In addition, the OpenSM logs are flooded with these
messages, hiding potential other log messages deemed important to
investigate customer issues.

In fw version 2.35.6312, the traps are disabled, still counting the
P-Key Violations.

This commit will conditionally disable the P_Key Violation Traps
subject to fw version.

Orabug: 27693633

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
---

v1 -> v2:
   * Incorporated review comments form jch
   * Made the disabling dependent on fw version

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agords: RDS connection does not reconnect after CQ access violation error
Venkat Venkatsubra [Mon, 7 Jan 2019 13:04:16 +0000 (05:04 -0800)]
rds: RDS connection does not reconnect after CQ access violation error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 28733324

Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
KarimAllah Ahmed [Sat, 3 Feb 2018 14:56:23 +0000 (15:56 +0100)]
KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/svm.c
Contextual and also we dropped msr_write_intercepted because we do not use it
(we have other logic for IBRS usage). No changes to svm_vcpu_run() because we
support IBRS and we have other code in place.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL - reloaded
Mihai Carabas [Fri, 7 Dec 2018 13:09:51 +0000 (15:09 +0200)]
KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL - reloaded

This commit is filling out the blanks that were missed in the backport
26a0cd21bb76 ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") due to lack
of different interfaces. 26a0cd21bb76 ("KVM/VMX: Allow direct access to
MSR_IA32_SPEC_CTRL") is basically an incomplet cherry-pick from
d28b387fb74da95d69d2615732f50cceb38e9a4d.

Also added the interception of MSR_IA32_SPEC_CTRL and
MSR_IA32_PRED_CMD in order for the get/set MSR handling to have a sense.

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM/x86: Add IBPB support
Ashok Raj [Thu, 1 Feb 2018 21:59:43 +0000 (22:59 +0100)]
KVM/x86: Add IBPB support

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: kvm@vger.kernel.org
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
(cherry picked from commit 15d45071523d89b3fb7372e2135fbd72f6af9506)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/cpuid.c
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c

All the conflicts were contextual. Major differences in the code between UEK4
and upstream (also in UEK4 we only have the feature IBRS, not SPEC_CTRL). We
had to introduce guest_cpuid_has_* functions in cpuid.h for each feature. Also
moved defines in cpuid.h that were needed in cpuid.h and cpuid.c.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM: x86: pass host_initiated to functions that read MSRs
Paolo Bonzini [Fri, 15 Jun 2018 09:04:25 +0000 (12:04 +0300)]
KVM: x86: pass host_initiated to functions that read MSRs

SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it.  Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 609e36d372ad9329269e4a1467bd35311893d1d6)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM: VMX: make MSR bitmaps per-VCPU
Paolo Bonzini [Fri, 15 Jun 2018 09:04:24 +0000 (12:04 +0300)]
KVM: VMX: make MSR bitmaps per-VCPU

Place the MSR bitmap in struct loaded_vmcs, and update it in place
every time the x2apic or APICv state can change.  This is rare and
the loop can handle 64 MSRs per iteration, in a similar fashion as
nested_vmx_prepare_msr_bitmap.

This prepares for choosing, on a per-VM basis, whether to intercept
the SPEC_CTRL and PRED_CMD MSRs.

Cc: stable@vger.kernel.org # prereq for Spectre mitigation
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry-picked from 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/vmx.c
Contextual - different content. Also vmx_enable_intercept_for_msr was already
in UEK4 as part of commit 8d14695f9542e9e0195d6e41ddaa52c32322adf5. We just
changed the signature.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM: VMX: introduce alloc_loaded_vmcs
Paolo Bonzini [Fri, 15 Jun 2018 09:04:23 +0000 (12:04 +0300)]
KVM: VMX: introduce alloc_loaded_vmcs

Group together the calls to alloc_vmcs and loaded_vmcs_init.  Soon we'll also
allocate an MSR bitmap there.

Cc: stable@vger.kernel.org # prereq for Spectre mitigation
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry-picked from f21f165ef922c2146cc5bdc620f542953c41714b)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
    arch/x86/kvm/vmx.c
Contextual

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM: nVMX: Eliminate vmcs02 pool
Jim Mattson [Fri, 15 Jun 2018 09:04:22 +0000 (12:04 +0300)]
KVM: nVMX: Eliminate vmcs02 pool

The potential performance advantages of a vmcs02 pool have never been
realized. To simplify the code, eliminate the pool. Instead, a single
vmcs02 is allocated per VCPU when the VCPU enters VMX operation.

Cc: stable@vger.kernel.org # prereq for Spectre mitigation
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Ameya More <ameya.more@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry-picked from de3a0021a60635de96aa92713c1a31a96747d72c)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoKVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
Radim Krčmář [Fri, 15 Jun 2018 09:04:21 +0000 (12:04 +0300)]
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC

msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses.  In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.

L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.

Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.

This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.

Fixes: 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry-picked from d048c098218e91ed0e10dfa1f0f80e2567fe4ef7)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
    arch/x86/kvm/vmx.c
Contextual: Elements like cached_vmcs12 were omitted from this cherry-pick.
They do not exist in UEK4.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoocfs2: don't clear bh uptodate for block read
Junxiao Bi [Fri, 28 Dec 2018 08:32:57 +0000 (00:32 -0800)]
ocfs2: don't clear bh uptodate for block read

For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
and submit the io, second wait io done, last check whether bh uptodate, if
not return io error.

If two sync io for the same bh were issued, it could be the first io done
and set uptodate flag, but just before check that flag, the second io came
in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
will return IO error.

Indeed it's not necessary to clear uptodate flag, as the io end handler
end_buffer_read_sync() will set or clear it based on io succeed or failed.

The following message was found from a nfs server but the underlying
storage returned no error.

[4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
[4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
[4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
[4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
[4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5

Same issue in non sync read ocfs2_read_blocks(), fixed it as well.

Link: http://lkml.kernel.org/r/20181121020023.3034-4-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 70306d9dce75abde855cefaf32b3f71eed8602a3)

Orabug: 28762940

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoocfs2: clear journal dirty flag after shutdown journal
Junxiao Bi [Fri, 28 Dec 2018 08:32:53 +0000 (00:32 -0800)]
ocfs2: clear journal dirty flag after shutdown journal

Dirty flag of the journal should be cleared at the last stage of umount,
if do it before jbd2_journal_destroy(), then some metadata in uncommitted
transaction could be lost due to io error, but as dirty flag of journal
was already cleared, we can't find that until run a full fsck.  This may
cause system panic or other corruption.

Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d85400af790dba2aa294f0a77e712f166681f977)

Orabug: 28924775

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoocfs2: fix panic due to unrecovered local alloc
Junxiao Bi [Fri, 28 Dec 2018 08:32:50 +0000 (00:32 -0800)]
ocfs2: fix panic due to unrecovered local alloc

mount.ocfs2 ignore the inconsistent error that journal is clean but
local alloc is unrecovered.  After mount, local alloc not empty, then
reserver cluster didn't alloc a new local alloc window, reserveration
map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
following panic.

This issue was reported at

  https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html

and was advised to fixed during mount.  But this is a very unusual
inconsistent state, usually journal dirty flag should be cleared at the
last stage of umount until every other things go right.  We may need do
further debug to check that.  Any way to avoid possible futher
corruption, mount should be abort and fsck should be run.

  (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
  found = 6518, set = 6518, taken = 8192, off = 15912372
  ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
  o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
  ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
  o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
  o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
  o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
  o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
  ------------[ cut here ]------------
  kernel BUG at fs/ocfs2/reservations.c:507!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
  CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
  task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
  RIP: 0010:[<ffffffffa05e96a8>]  [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
  Call Trace:
    ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
    ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
    __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
    ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
    ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
    ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
    ocfs2_write_begin+0x13e/0x230 [ocfs2]
    generic_perform_write+0xbf/0x1c0
    __generic_file_write_iter+0x19c/0x1d0
    ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
    __vfs_write+0xb8/0x110
    vfs_write+0xa9/0x1b0
    SyS_write+0x46/0xb0
    system_call_fastpath+0x18/0xd7
  Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
  RIP   __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
   RSP <ffff8800ea4db668>
  ---[ end trace 566f07529f2edf3c ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 532e1e54c8140188e192348c790317921cb2dc1c)

Orabug: 28924775

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonet: rds: fix rds_ib_sysctl_max_recv_allocation error
Zhu Yanjun [Wed, 26 Dec 2018 00:33:02 +0000 (19:33 -0500)]
net: rds: fix rds_ib_sysctl_max_recv_allocation error

Before the commit c682e8474bd4 ("net/rds: reduce memory footprint
during ib_post_recv in IB transport"), rds_ib_allocation increases
by one. So the function atomic_add_unless will work. After the commit,
rds_ib_allocation increases by 4 if the frag is 16K. Then
atomic_add_unless will not work.

Fixes: c682e8474bd4 ("net/rds: reduce memory footprint during ib_post_recv in IB transport")
Orabug: 28947481

Change-Id: Ib032cd170d28e403a888c86124b67892b25ed5a5
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reported-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: Always disable IBRS in disable_ibrs_and_friends()
Alejandro Jimenez [Fri, 21 Dec 2018 19:15:36 +0000 (14:15 -0500)]
x86/speculation: Always disable IBRS in disable_ibrs_and_friends()

Booting with "spectre_v2=retpoline" calls disable_ibrs_and_friends(false),
which does not set ibrs_disabled. Then, when attempting to enable
IBRS by writing 1 to ibrs_enabled, change_spectre_v2_mitigation() is called
and it looks like IBRS is already in use (ibrs_used = !ibrs_disabled) so it
does not activate IBRS.

Because now we use change_spectre_v2_mitigation() to activate IBRS when
using the retpoline fallback mechanism, and this function clears
ibrs_disabled before activating IBRS, we can fix disable_ibrs_and_friends()
and set ibrs_disabled there every time.

Orabug: 29139710

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agopinctrl: amd: Use devm_pinctrl_register() for pinctrl registration
Laxman Dewangan [Wed, 24 Feb 2016 09:14:07 +0000 (14:44 +0530)]
pinctrl: amd: Use devm_pinctrl_register() for pinctrl registration

Use devm_pinctrl_register() for pin control registration and clean
error path.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
(cherry picked from commit 251e22abde21833b3d29577e4d8c7aaccd650eee)

Orabug: 27539246
CVE: CVE-2017-18174

Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Minor conflict which checks the pointer with IS_ERR
macro.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/pinctrl/pinctrl-amd.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomlock: fix mlock count can not decrease in race condition
Yisheng Xie [Fri, 2 Jun 2017 21:46:43 +0000 (14:46 -0700)]
mlock: fix mlock count can not decrease in race condition

Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
  for i in `seq 4 15`
  do
  ./p_mlockall >> log &
  done
  sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN 4096

 int main(int argc, char ** argv)
 {
  int ret;
  void *adr = malloc(SPACE_LEN);
  if (!adr)
  return -1;

  ret = mlockall(MCL_CURRENT | MCL_FUTURE);
  printf("mlcokall ret = %d\n", ret);

  ret = munlockall();
  printf("munlcokall ret = %d\n", ret);

  free(adr);
  return 0;
 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6a583 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6a583 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 70feee0e1ef331b22cc51f383d532a0d043fbdcc)

Orabug: 27677611
CVE: CVE-2017-18221

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoperf/core: Fix the perf_cpu_time_max_percent check
Tan Xiaojun [Thu, 23 Feb 2017 06:04:39 +0000 (14:04 +0800)]
perf/core: Fix the perf_cpu_time_max_percent check

Use "proc_dointvec_minmax" instead of "proc_dointvec" to check the input
value from user-space.

If not, we can set a big value and some vars will overflow like
"sysctl_perf_event_sample_rate" which will cause a lot of unexpected
problems.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <acme@kernel.org>
Cc: <alexander.shishkin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1487829879-56237-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1572e45a924f254d9570093abde46430c3172e3d)

Orabug: 27823815
CVE: CVE-2017-18255

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/microcode/intel: Fix a wrong assignment of revision in _save_mc
Zhenzhong Duan [Fri, 1 Jun 2018 06:47:53 +0000 (14:47 +0800)]
x86/microcode/intel: Fix a wrong assignment of revision in _save_mc

We should compare revision of saved microcode with current, or else
revision_is_newer() always return false.

Orabug: 28190263

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomm: cma: fix incorrect type conversion for size during dma allocation
Rohit Vaswani [Thu, 22 Oct 2015 20:32:11 +0000 (13:32 -0700)]
mm: cma: fix incorrect type conversion for size during dma allocation

This was found during userspace fuzzing test when a large size dma cma
allocation is made by driver(like ion) through userspace.

  show_stack+0x10/0x1c
  dump_stack+0x74/0xc8
  kasan_report_error+0x2b0/0x408
  kasan_report+0x34/0x40
  __asan_storeN+0x15c/0x168
  memset+0x20/0x44
  __dma_alloc_coherent+0x114/0x18c

Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 67a2e213e7e937c41c52ab5bc46bf3f4de469f6e)

Orabug: 28407826
CVE: CVE-2017-9725

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: Make enhanced IBRS the default spectre v2 mitigation
Alejandro Jimenez [Tue, 6 Nov 2018 04:55:04 +0000 (23:55 -0500)]
x86/speculation: Make enhanced IBRS the default spectre v2 mitigation

Currently we use retpoline as the default spectre v2 mitigation.
On future processors that support the capability, enhanced IBRS
will be the default, and otherwise retpoline will be used.

From the upstream patch at:
https://lore.kernel.org/lkml/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com/

"The reason why Enhanced IBRS is the recommended mitigation on
processors which support it is that these processors also support
CET which provides a defense against ROP attacks. Retpoline is
very similar to ROP techniques and might trigger false positives
in the CET defense."

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Co-developed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
(cherry picked from commit 79bb6288902479281622b4ba0d6723d45732a2cc from UEK5)

Orabug: 28474851

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
(In UEK4, the relevant code is in arch/x86/kernel/cpu/bugs_64.c)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: Enable enhanced IBRS usage
Alejandro Jimenez [Tue, 6 Nov 2018 05:00:39 +0000 (00:00 -0500)]
x86/speculation: Enable enhanced IBRS usage

Enhanced IBRS supports an 'always on' model (aka IBRS_ALL) in
which IBRS is enabled once and never disabled, while basic IBRS
requires IBRS to be set after every transition to a more
privileged predictor mode.

IBRS is enabled at boot if selected as the spectre v2
mitigation, or by using the debugfs interface at
/sys/kernel/debug/x86/ibrs_enabled
to dynamically toggle between IBRS and retpoline during
regular system operation. In both cases, if enhanced IBRS is
available it will be preferred over basic IBRS.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Co-developed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
(cherry picked from commit d19b574a5e5dabca5158b3331aa1a31070da753c from UEK5)

Orabug: 28474851

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
(File named cpufeature.h in UEK4)

arch/x86/kernel/cpu/bugs.c
(File named bugs_64.c in UEK4)

arch/x86/kernel/cpu/common.c
arch/x86/kernel/cpu/spec_ctrl.c
(Corresponding code located in bug_64.c in UEK4)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: functions for supporting enhanced IBRS
Alejandro Jimenez [Fri, 26 Oct 2018 16:11:32 +0000 (12:11 -0400)]
x86/speculation: functions for supporting enhanced IBRS

Indirect Branch Restricted Speculation (IBRS) is available either
with a basic support (basic IBRS) or with an enhanced support
(enhanced IBRS). Currently only basic IBRS is implemented, and it
requires IBRS to be set after every transition to a more privileged
predictor mode.

Enhanced IBRS supports an 'always on' model in which IBRS is enabled
once and never disabled. We enhance the existing functions, introduce
new identifiers, and rename existing ones in order to be able to
differentiate between basic and enhanced IBRS.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Co-developed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
(cherry picked from commit d762c3e419e1df1e7671c346ab4247a495fbc3dd from UEK5)

Orabug: 28474851

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/spec_ctrl.h
(Differences in the IBRS related macro definitions. Not adding
spec_ctrl_flush_all_cpus() that is already in bugs_64.c. Change
set_ibrs_inuse() return value from boolean to void.)

arch/x86/kernel/cpu/bugs.c
(UEK4 uses bugs_64.c. Slight differences in
disable_ibrs_and_friends() and cpu_show_common().)

arch/x86/kernel/cpu/spec_ctrl.c
(No need to remove the spec_ctrl_flush_all_cpus(), it is in bugs_64.c)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxen/blkback: fix disconnect while I/Os in flight
Juergen Gross [Wed, 19 Dec 2018 00:31:01 +0000 (08:31 +0800)]
xen/blkback: fix disconnect while I/Os in flight

Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.

To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.

This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 28744234

(cherry picked from commit 46464411307746e6297a034a9983a22c9dfc5a0c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
  drivers/block/xen-blkback/xenbus.c

The objective of this patch backport is not for the deadlock issue, as
there is no xen_blkif_put() called in xen_blkif_disconnect() due to
conflicts.

xen_blkif_disconnect() may be entered twice during VM destroy. When there
is in-flight I/O for any rings, to enter xen_blkif_disconnect() for the
second the time would trigger the
"WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));". The
'active' would guarantee the ring would be skipped if it is already
cleaned up when xen_blkif_disconnect() is entered the second time.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomlx4_vnic: use the mlid while calling ib_detach_mcast
aru kolappan [Fri, 21 Dec 2018 01:28:45 +0000 (17:28 -0800)]
mlx4_vnic: use the mlid while calling ib_detach_mcast

In mlx4_vnic, vnic_mcast_detach_ll() calls ib_detach_mcast() with the port lid
instead of the mlid resulting in multicast detach to fail. This caused
the subsequent multicast attach to fail.

Orabug: 29029705

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: aru kolappan <aru.kolappan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoext4: fail ext4_iget for root directory if unallocated
Theodore Ts'o [Fri, 30 Mar 2018 01:56:09 +0000 (21:56 -0400)]
ext4: fail ext4_iget for root directory if unallocated

If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.

This issue has been assigned CVE-2018-1092.

https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit 8e4b5eae5decd9dfe5a4ee369c22028f90ab4c44)

Orabug: 29048557
CVE: CVE-2018-1092

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
The current patch borrows EFSBADCRC & EFSCORRUPTED flags from the patch below
6a797d27: ext4: call out CRC and corruption errors with specific error codes

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoBluetooth: hidp: buffer overflow in hidp_process_report
Mark Salyzyn [Tue, 31 Jul 2018 22:02:13 +0000 (15:02 -0700)]
Bluetooth: hidp: buffer overflow in hidp_process_report

commit 7992c18810e568b95c869b227137a2215702a805 upstream.

CVE-2018-9363

The buffer length is unsigned at all layers, but gets cast to int and
checked in hidp_process_report and can lead to a buffer overflow.
Switch len parameter to unsigned int to resolve issue.

This affects 3.18 and newer kernels.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Fixes: a4b1b5877b514b276f0f31efe02388a9c2836728 ("HID: Bluetooth: hidp: make sure input buffers are big enough")
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: security@kernel.org
Cc: kernel-team@android.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 17c1e0b1f6a161cc4f533d4869ff574273dbfe8d)

Orabug: 29121215
CVE: CVE-2018-9363

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoHID: debug: check length before copy_to_user()
Daniel Rosenberg [Mon, 2 Jul 2018 23:59:37 +0000 (16:59 -0700)]
HID: debug: check length before copy_to_user()

If our length is greater than the size of the buffer, we
overflow the buffer

Cc: stable@vger.kernel.org
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 717adfdaf14704fd3ec7fa2c04520c0723247eac)
Orabug: 29128165
CVE: CVE-2018-9516
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: John.Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/MCE: Serialize sysfs changes
Seunghun Han [Tue, 6 Mar 2018 14:21:43 +0000 (15:21 +0100)]
x86/MCE: Serialize sysfs changes

The check_interval file in

  /sys/devices/system/machinecheck/machinecheck<cpu number>

directory is a global timer value for MCE polling. If it is changed by one
CPU, mce_restart() broadcasts the event to other CPUs to delete and restart
the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the
mce_timer variable.

If more than one CPU writes a specific value to the check_interval file
concurrently, mce_timer is not protected from such concurrent accesses and
all kinds of explosions happen. Since only root can write to those sysfs
variables, the issue is not a big deal security-wise.

However, concurrent writes to these configuration variables is void of
reason so the proper thing to do is to serialize the access with a mutex.

Boris:

 - Make store_int_with_restart() use device_store_ulong() to filter out
   negative intervals
 - Limit min interval to 1 second
 - Correct locking
 - Massage commit message

Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
(cherry picked from commit b3b7c4795ccab5be71f080774c45bbbcc75c2aaf)

Orabug: 29149888
CVE: CVE-2018-7995

Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoInput: i8042 - fix crash at boot time
Chen Hong [Sun, 2 Jul 2017 22:11:10 +0000 (15:11 -0700)]
Input: i8042 - fix crash at boot time

The driver checks port->exists twice in i8042_interrupt(), first when
trying to assign temporary "serio" variable, and second time when deciding
whether it should call serio_interrupt(). The value of port->exists may
change between the 2 checks, and we may end up calling serio_interrupt()
with a NULL pointer:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
PGD 0
Oops: 0002 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:[<ffffffff8150feaf>]  [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
RSP: 0018:ffff880028203cc0  EFLAGS: 00010082
RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000282 RSI: 0000000000000098 RDI: 0000000000000050
RBP: ffff880028203cc0 R08: ffff88013e79c000 R09: ffff880028203ee0
R10: 0000000000000298 R11: 0000000000000282 R12: 0000000000000050
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000098
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000050 CR3: 0000000001a85000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88013e79c000, task ffff88013e79b500)
Stack:
ffff880028203d00 ffffffff813de186 ffffffffffffff02 0000000000000000
<d> 0000000000000000 0000000000000000 0000000000000000 0000000000000098
<d> ffff880028203d70 ffffffff813e0162 ffff880028203d20 ffffffff8103b8ac
Call Trace:
<IRQ>
 [<ffffffff813de186>] serio_interrupt+0x36/0xa0
[<ffffffff813e0162>] i8042_interrupt+0x132/0x3a0
[<ffffffff8103b8ac>] ? kvm_clock_read+0x1c/0x20
[<ffffffff8103b8b9>] ? kvm_clock_get_cycles+0x9/0x10
[<ffffffff810e1640>] handle_IRQ_event+0x60/0x170
[<ffffffff8103b154>] ? kvm_guest_apic_eoi_write+0x44/0x50
[<ffffffff810e3d8e>] handle_edge_irq+0xde/0x180
[<ffffffff8100de89>] handle_irq+0x49/0xa0
[<ffffffff81516c8c>] do_IRQ+0x6c/0xf0
[<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
[<ffffffff81076f63>] ? __do_softirq+0x73/0x1e0
[<ffffffff8109b75b>] ? hrtimer_interrupt+0x14b/0x260
[<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
[<ffffffff8100de05>] ? do_softirq+0x65/0xa0
[<ffffffff81076d95>] ? irq_exit+0x85/0x90
[<ffffffff81516d80>] ? smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20

To avoid the issue let's change the second check to test whether serio is
NULL or not.

Also, let's take i8042_lock in i8042_start() and i8042_stop() instead of
trying to be overly smart and using memory barriers.

Signed-off-by: Chen Hong <chenhong3@huawei.com>
[dtor: take lock in i8042_start()/i8042_stop()]
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
(cherry picked from commit 340d394a789518018f834ff70f7534fc463d3226)

Orabug: 29152328
CVE: CVE-2017-18079

Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agobase/memory, hotplug: fix a kernel oops in show_valid_zones()
Toshi Kani [Fri, 3 Feb 2017 21:13:23 +0000 (13:13 -0800)]
base/memory, hotplug: fix a kernel oops in show_valid_zones()

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

Orabug: 29050538

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a96dfddbcc04336bbed50dc2b24823e45e09e80c)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/base/memory.c
(retained existing show_valid_zones() code and modified
 to use valid pfns)

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
Toshi Kani [Fri, 3 Feb 2017 21:13:20 +0000 (13:13 -0800)]
mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()

Patch series "fix a kernel oops when reading sysfs valid_zones", v2.

A sysfs memory file is created for each 2GiB memory block on x86-64 when
the system has 64GiB or more memory.  [1] When the start address of a
memory block is not backed by struct page, i.e.  a memory range is not
aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
kernel oops.  This issue was observed on multiple x86-64 systems with
more than 64GiB of memory.  This patch-set fixes this issue.

Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
test the start section.

Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
to return valid [start, end).

Note for stable kernels: The memory block size change was made by commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64
systems"), which was accepted to 3.9.  However, this patch-set depends
on (and fixes) the change to test_pages_in_a_zone() made by commit
5f0f2887f4de ("mm/memory_hotplug.c: check for missing sections in
test_pages_in_a_zone()"), which was accepted to 4.4.

So, I recommend that we backport it up to 4.4.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

This patch (of 2):

test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
section since 'sec_end_pfn' is set equal to 'pfn'.  Since this function
is called for testing the range of a sysfs memory file, 'start_pfn' is
always aligned by section.

Fix it by properly setting 'sec_end_pfn' to the next section pfn.

Also make sure that this function returns 1 only when the range belongs
to a zone.

Orabug: 29050538

Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit deb88a2a19e85842d79ba96b05031739ec327ff4)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agodrivers/base/memory.c: prohibit offlining of memory blocks with missing sections
Seth Jennings [Fri, 11 Dec 2015 21:40:57 +0000 (13:40 -0800)]
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections

Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and 982792c782ef ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86.  This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.

Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present.  If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.

This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781

Orabug: 29050538

Signed-off-by: Seth Jennings <sjennings@variantweb.net>
Reported-by: Andrew Banman <abanman@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 26bbe7ef6d5cdc7ec08cba6d433fca4060f258f3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomm: Check if section present during memory block (un)registering
Yinghai Lu [Wed, 26 Aug 2015 05:12:37 +0000 (22:12 -0700)]
mm: Check if section present during memory block (un)registering

Tony found on his setup, if memory block size 512M will cause crash
during booting.

 BUG: unable to handle kernel paging request at ffffea0074000020
 IP: [<ffffffff81670527>] get_nid_for_pfn+0x17/0x40
 PGD 128ffcb067 PUD 128ffc9067 PMD 0
 Oops: 0000 [#1] SMP
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
...
 Call Trace:
  [<ffffffff81453b56>] ? register_mem_sect_under_node+0x66/0xe0
  [<ffffffff81453eeb>] register_one_node+0x17b/0x240
  [<ffffffff81b1f1ed>] ? pci_iommu_alloc+0x6e/0x6e
  [<ffffffff81b1f229>] topology_init+0x3c/0x95
  [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
 BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
 BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
 BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
 BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
 BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.
For example:
memory block : [0x2c18000000, 0x2c20000000) 512M
first three sections are not present.

Current register_mem_sect_under_node() assume first section is present,
but memory block section number range [start_section_nr, end_section_nr]
would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page area
within not present sections area.

So skip the pfn range that belong to absent section.

Also fixes unregister_mem_sect_under_nodes() that assume one section per
memory block.

Orabug: 29050538

Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org #v3.15
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7568fb63f57ac8672f8bf2018171255441238882)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agohugetlb: take PMD sharing into account when flushing tlb/caches
Mike Kravetz [Thu, 30 Aug 2018 23:27:48 +0000 (16:27 -0700)]
hugetlb: take PMD sharing into account when flushing tlb/caches

When fixing an issue with PMD sharing and migration, it was discovered
via code inspection that other callers of huge_pmd_unshare potentially
have an issue with cache and tlb flushing.

Use the routine adjust_range_if_pmd_sharing_possible() to calculate
worst case ranges for mmu notifiers.  Ensure that this range is flushed
if huge_pmd_unshare succeeds and unmaps a PUD_SIZE area.

Based on upstream dff11abe280b.  Ported to UEK4.

Orabug: 28951854

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agomm: migration: fix migration of huge PMD shared pages
Mike Kravetz [Wed, 21 Nov 2018 01:20:31 +0000 (17:20 -0800)]
mm: migration: fix migration of huge PMD shared pages

The page migration code employs try_to_unmap() to try and unmap the
source page.  This is accomplished by using rmap_walk to find all
vmas where the page is mapped.  This search stops when page mapcount
is zero.  For shared PMD huge pages, the page map count is always 1
no matter the number of mappings.  Shared mappings are tracked via
the reference count of the PMD page.  Therefore, try_to_unmap stops
prematurely and does not completely unmap all mappings of the source
page.

This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the
target page.  Hence, data is lost.

This problem was originally seen as DB corruption of shared global
areas after a huge page was soft offlined due to ECC memory errors.
DB developers noticed they could reproduce the issue by (hotplug)
offlining memory used to back huge pages.  A simple testcase can
reproduce the problem by creating a shared PMD mapping (note that
this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on
x86)), and using migrate_pages() to migrate process pages between
nodes while continually writing to the huge pages being migrated.

To fix, have the try_to_unmap_one routine check for huge PMD sharing
by calling huge_pmd_unshare for hugetlbfs huge pages.  If it is a
shared mapping it will be 'unshared' which removes the page table
entry and drops the reference on the PMD page.  After this, flush
caches and TLB.

mmu notifiers are called before locking page tables, but we can not
be sure of PMD sharing until page tables are locked.  Therefore,
check for the possibility of PMD sharing before locking so that
notifiers can prepare for the worst possible case.  The mmu notifier
calls in this commit are different than upstream.  That is because
upstream went to a different model here.  Instead of moving to the
new model, we leave existing model unchanged and only use the
mmu_*range* calls in this special case.

Based on upstream 017b1660df89.  Ported to UEK4.

Orabug: 28951854

Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agohugetlbfs: use truncate mutex to prevent pmd sharing race
Mike Kravetz [Thu, 8 Nov 2018 00:10:28 +0000 (16:10 -0800)]
hugetlbfs: use truncate mutex to prevent pmd sharing race

The synchronization mechanism for hugetlbfs pagefaults/truncation and
pmd sharing ideally needs to be modified to use i_mmap_rwsem.  See:
http://lkml.kernel.org/r/20181024045053.1467-1-mike.kravetz@oracle.com

In UEK, we have introduced a hugetlbfs truncate mutex in an inode
extension.  By taking this mutex earlier in hugetlb_fault (before calling
huge_pte_alloc), we eliminate the most common cause of problems where
ptep can be altered by a call to huge_pmd_unshare.

Orabug: 28896255

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agords: ib: Improve tracing during failover/back
Håkon Bugge [Tue, 13 Nov 2018 14:36:57 +0000 (15:36 +0100)]
rds: ib: Improve tracing during failover/back

Orabug: 28860366

Signed-off-by: Håkon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
---

v1 -> v2:
   * Added Sudhakar's r-b

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agords: ib: Remove superfluous add of address on fail-back device
Håkon Bugge [Tue, 30 Oct 2018 13:09:07 +0000 (14:09 +0100)]
rds: ib: Remove superfluous add of address on fail-back device

During failover, we see in the ibacm log:

acm_ipnl_handler: Link added : ib0
acm_ipnl_handler: System address removed ib0 : 192.168.200.200
acm_ipnl_handler: New system address available ib1 : 192.168.200.200
acm_ipnl_handler: System address removed ib1 : 192.168.200.200
acm_ipnl_handler: New system address available ib1 : 192.168.200.200

and everything is OK. Fail-back:

acm_ipnl_handler: Link added : ib0
acm_ipnl_handler: New system address available ib0 : 192.168.200.200
acm_ipnl_handler: System address removed ib0 : 192.168.200.200
acm_ipnl_handler: New system address available ib0 : 192.168.200.200
acm_ipnl_handler: System address removed ib1 : 192.168.200.200

The address is moved from ib1 to ib0, thereafter deleted.

This implies that ibacm looses the address when it's moved back to the
original device.

With this patch, we see:

acm_ipnl_handler: System address removed ib0 : 192.168.200.200
acm_ipnl_handler: New system address available ib1 : 192.168.200.200
acm_ipnl_handler: System address removed ib1 : 192.168.200.200
acm_ipnl_handler: New system address available ib1 : 192.168.200.200
acm_ipnl_handler: Link added : ib0
acm_ipnl_handler: System address removed ib1 : 192.168.200.200
acm_ipnl_handler: New system address available ib0 : 192.168.200.200
acm_ipnl_handler: System address removed ib0 : 192.168.200.200
acm_ipnl_handler: New system address available ib0 : 192.168.200.200

The first lines are failover, after the "Link added : ib0", it's
fail-back (which is done 10 seconds after link up).

Now we see that the fail-back address is properly restored.

Orabug: 28860366

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
---

v1 -> v2:
   * Changed $Subject
   * Added Sudhakar's r-b

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agolibiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
Fred Herard [Fri, 16 Nov 2018 17:50:13 +0000 (09:50 -0800)]
libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset

This commit addresses NULL pointer dereference in iscsi_eh_session_reset.
Reference should not be made to session->leadconn when session->state
is set to ISCSI_STATE_TERMINATE.

Orabug: 28946207

Signed-off-by: Fred Herard <fred.herard@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 315b38414a1a6830740d0bf27eab034c989f7563)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/libiscsi.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agowil6210: missing length check in wmi_set_ie
Lior David [Tue, 14 Nov 2017 13:25:39 +0000 (15:25 +0200)]
wil6210: missing length check in wmi_set_ie

Add a length check in wmi_set_ie to detect unsigned integer
overflow.

Signed-off-by: Lior David <qca_liord@qca.qualcomm.com>
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
(cherry picked from commit b5a8ffcae4103a9d823ea3aa3a761f65779fbe2a)

Orabug: 28951265
CVE: CVE-2018-5848

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflict:

drivers/net/wireless/ath/wil6210/wmi.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonetfilter: xt_osf: Add missing permission checks
Kevin Cernekee [Tue, 5 Dec 2017 23:42:41 +0000 (15:42 -0800)]
netfilter: xt_osf: Add missing permission checks

The capability check in nfnetlink_rcv() verifies that the caller
has CAP_NET_ADMIN in the namespace that "owns" the netlink socket.
However, xt_osf_fingers is shared by all net namespaces on the
system.  An unprivileged user can create user and net namespaces
in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable()
check:

    vpnns -- nfnl_osf -f /tmp/pf.os

    vpnns -- nfnl_osf -f /tmp/pf.os -d

These non-root operations successfully modify the systemwide OS
fingerprint list.  Add new capable() checks so that they can't.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 916a27901de01446bcf57ecca4783f6cff493309)

Orabug: 29037831
CVE: CVE-2017-17450

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/speculation: Fix bad argument to rdmsrl() in cpu_set_bug_bits()
Alejandro Jimenez [Wed, 12 Dec 2018 02:09:34 +0000 (21:09 -0500)]
x86/speculation: Fix bad argument to rdmsrl() in cpu_set_bug_bits()

At the beginning of cpu_set_bug_bits(), rdmsrl() is incorrectly
passed as its first argument the value of 86_FEATURE_IA32_ARCH_CAPS,
which is a CPUID feature bit and not a valid MSR value. The correct
parameter to pass in the first argument to rdmsrl() is
MSR_IA32_ARCH_CAPABILITIES (0x10a).

The value returned by rdmsrl(), specifically the RDCL_NO bit, is
later used to determine if the CPU is vulnerable to L1TF and
Meltdown exploits.

Orabug: 29044805

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agon_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
Linus Torvalds [Thu, 21 Dec 2017 01:57:06 +0000 (17:57 -0800)]
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)

Orabug: 28855335

We added support for EXTPROC back in 2010 in commit 26df6d13406d ("tty:
Add EXTPROC support for LINEMODE") and the intent was to allow it to
override some (all?) ICANON behavior.  Quoting from that original commit
message:

         There is a new bit in the termios local flag word, EXTPROC.
         When this bit is set, several aspects of the terminal driver
         are disabled.  Input line editing, character echo, and mapping
         of signals are all disabled.  This allows the telnetd to turn
         off these functions when in linemode, but still keep track of
         what state the user wants the terminal to be in.

but the problem turns out that "several aspects of the terminal driver
are disabled" is a bit ambiguous, and you can really confuse the n_tty
layer by setting EXTPROC and then causing some of the ICANON invariants
to no longer be maintained.

This fixes at least one such case (TIOCINQ) becoming unhappy because of
the confusion over whether ICANON really means ICANON when EXTPROC is set.

This basically makes TIOCINQ match the case of read: if EXTPROC is set,
we ignore ICANON.  Also, make sure to reset the ICANON state ie EXTPROC
changes, not just if ICANON changes.

Fixes: 26df6d13406d ("tty: Add EXTPROC support for LINEMODE")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 966031f340185eddd05affcf72b740549f056348)
CVE: CVE-2018-18386
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonfs: Don't take a reference on fl->fl_file for LOCK operation
Benjamin Coddington [Thu, 5 Jan 2017 15:20:16 +0000 (10:20 -0500)]
nfs: Don't take a reference on fl->fl_file for LOCK operation

I have reports of a crash that look like __fput() was called twice for
a NFSv4.0 file.  It seems possible that the state manager could try to
reclaim a lock and take a reference on the fl->fl_file at the same time the
file is being released if, during the close(), a signal interrupts the wait
for outstanding IO while removing locks which then skips the removal
of that lock.

Since 83bfff23e9ed ("nfs4: have do_vfs_lock take an inode pointer") has
removed the need to traverse fl->fl_file->f_inode in nfs4_lock_done(),
taking that reference is no longer necessary.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit 4b09ec4b14a168bf2c687e1f598140c3c11e9222)

Orabug: 28887442
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agox86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU...
Samuel Neves [Wed, 21 Feb 2018 20:50:36 +0000 (20:50 +0000)]
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations

Without this fix, /proc/cpuinfo will display an incorrect amount
of CPU cores, after bringing them offline and online again, as
exemplified below:

  $ cat /proc/cpuinfo | grep cores
  cpu cores : 4
  cpu cores : 8
  cpu cores : 8
  cpu cores : 20
  cpu cores : 4
  cpu cores : 3
  cpu cores : 2
  cpu cores : 2

This patch fixes this by always zeroing the booted_cores variable
upon turning off a logical CPU.

Tested-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jgross@suse.com
Cc: luto@kernel.org
Cc: prarit@redhat.com
Cc: vkuznets@redhat.com
Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.pt
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4596749339e06dc7a424fc08a15eded850ed78b7)

Orabug: 28933009

Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoALSA: seq: Fix regression by incorrect ioctl_mutex usages
Takashi Iwai [Mon, 19 Feb 2018 16:16:01 +0000 (17:16 +0100)]
ALSA: seq: Fix regression by incorrect ioctl_mutex usages

This is the revised backport of the upstream commit
b3defb791b26ea0683a93a4f49c77ec45ec96f10

We had another backport (e.g. 623e5c8ae32b in 4.4.115), but it applies
the new mutex also to the code paths that are invoked via faked
kernel-to-kernel ioctls.  As reported recently, this leads to a
deadlock at suspend (or other scenarios triggering the kernel
sequencer client).

This patch addresses the issue by taking the mutex only in the code
paths invoked by user-space, just like the original fix patch does.

Reported-and-tested-by: Andres Bertens <abertensu@yahoo.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29005188
CVE: CVE-2018-1000004

(cherry picked from commit 8e8992a93d66adb640631a6778a5110f01118202)
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agonet: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe()
Wei Yongjun [Thu, 11 Jan 2018 11:21:51 +0000 (11:21 +0000)]
net: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe()

platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.

This is detected by Coccinelle semantic patch.

@@
expression pdev, res, n, t, e, e1, e2;
@@

res = platform_get_resource(pdev, t, n);
+ if (!res)
+   return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 297a6961ffb8ff4dc66c9fbf53b924bd1dda05d5)

Orabug: 29012346
CVE: CVE-2018-8043

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoxfs: don't call xfs_da_shrink_inode with NULL bp
Eric Sandeen [Fri, 8 Jun 2018 16:53:49 +0000 (09:53 -0700)]
xfs: don't call xfs_da_shrink_inode with NULL bp

xfs_attr3_leaf_create may have errored out before instantiating a buffer,
for example if the blkno is out of range.  In that case there is no work
to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops
if we try.

This also seems to fix a flaw where the original error from
xfs_attr3_leaf_create gets overwritten in the cleanup case, and it
removes a pointless assignment to bp which isn't used after this.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969
Reported-by: Xu, Wen <wen.xu@gatech.edu>
Tested-by: Xu, Wen <wen.xu@gatech.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
(cherry picked from commit bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a)

Orabug: 28898616
CVE: CVE-2018-13094

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
6 years agoALSA: rawmidi: Change resized buffers atomically
Takashi Iwai [Tue, 17 Jul 2018 15:26:43 +0000 (17:26 +0200)]
ALSA: rawmidi: Change resized buffers atomically

The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the
current code is racy.  For example, the sequencer client may write to
buffer while it being resized.

As a simple workaround, let's switch to the resized buffer inside the
stream runtime lock.

Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0)

Orabug: 28898636
CVE: CVE-2018-10902

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>