www.infradead.org Git - users/jedix/linux-maple.git/log

percpu: fix synchronization between synchronous map extension and chunk destruction

For non-atomic allocations, pcpu_alloc() can try to extend the area
map synchronously after dropping pcpu_lock; however, the extension
wasn't synchronized against chunk destruction and the chunk might get
freed while extension is in progress.

This patch fixes the bug by putting most of non-atomic allocations
under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
is responsible for async chunk management including destruction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")
Orabug: 25060076
CVE: CVE-2016-4794
Mainline v4.7 commit 6710e594f71ccaad8101bc64321152af7cd9ea28
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

percpu: fix synchronization between chunk->map_extend_work and chunk destruction

Atomic allocations can trigger async map extensions which is serviced
by chunk->map_extend_work. pcpu_balance_work which is responsible for
destroying idle chunks wasn't synchronizing properly against
chunk->map_extend_work and may end up freeing the chunk while the work
item is still in flight.

This patch fixes the bug by rolling async map extension operations
into pcpu_balance_work.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")
Orabug: 25060076
CVE: CVE-2016-4794
Mainline v4.7 commit 4f996e234dad488e5d9ba0858bc1bae12eff82c3
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt

The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059885
CVE: CVE-2016-4578
Mainline v4.7 commit e4ec8cc8039a7063e24204299b462bd1383184a5
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in events via snd_timer_user_ccallback

The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059885
CVE: CVE-2016-4578
Mainline v4.7 commit 9a47e9cff994f37f7f0dbd9ae23740d0f64f9fe6
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS

The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059408
CVE: CVE-2016-4569
Mainline v4.7 commit cec8f96e49d9be372fdb0c3836dcf31ec71e457e
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
uek-rpm ol7: change uek-rpm/ol7/update-el release value from 7.1 to 7.3

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
perf tools: handle spaces in file names obtained from /proc/pid/maps

perf tools: handle spaces in file names obtained from /proc/pid/maps

Steam frequently puts game binaries in folders with spaces.

Note: "(deleted)" markers are now treated as part of the file name.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 6064803313ba ("perf tools: Use sscanf for parsing /proc/pid/maps")
Link: http://lkml.kernel.org/r/20160119190303.GA17579@marcin-Inspiron-7720
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 89fee59b504f86925894fcc9ba79d5c933842f93)

Orabug: 25072114
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

uek-rpm ol7: change uek-rpm/ol7/update-el release value from 7.1 to 7.3

Change release value in uek-rpm/ol7/update-el to 7.3 so that manual builds
will pick up the new OL7.3 secure boot key.
uek-rpm/ol6/update-el is not affected.

Orabug: 25050588

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/ofed:
  xsigo: send nack codes
  xsigo: xve driver has excessive messages
  xsigo: hard LOCKUP in freeing paths
  xsigo: Crash in xscore_port_num
  xsigo: Resize uVNIC/PVI CQ size
  xsigo: Optimizing Transmit completions
  xsigo: Implementing Jumbo MTU support
  RDS: rds debug messages are enabled by default
  net/rds: Fix new sparse warning
  net/rds: fix unaligned memory access

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks: (23 commits)
  NFS: Fix an LOCK/OPEN race when unlinking an open file
  intel_idle: correct BXT support
  intel_idle: re-work bxt_idle_state_table_update() and its helper
  x86/intel_idle: Use Intel family macros for intel_idle
  x86/cpu/intel: Introduce macros for Intel family numbers
  intel_idle: add BXT support
  intel_idle: Add KBL support
  intel_idle: Add SKX support
  intel_idle: Clean up all registered devices on exit.
  intel_idle: Propagate hot plug errors.
  intel_idle: Don't overreact to a cpuidle registration failure.
  intel_idle: Setup the timer broadcast only on successful driver load.
  intel_idle: Avoid a double free of the per-CPU data.
  intel_idle: Fix dangling registration on error path.
  intel_idle: Fix deallocation order on the driver exit path.
  intel_idle: Remove redundant initialization calls.
  intel_idle: Fix a helper function's return value.
  intel_idle: remove useless return from void function.
  intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
  intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
  ...

xsigo: send nack codes

Orabug: 24442792

Sometime uVNIC removal on OFOS won't trigger a actual removal
of Vstar interface, in that case uVNIC driver has to send NACK
code so that XCM will start cleaning its database.

Added additional codes as per XCM specification

Reported-by: jie zhu <jie.x.zhu@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Qingjun Wang <qingjun.wang@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: xve driver has excessive messages

Orabug: 24758335

Moved some message types from Warning to debug.

Consolidated multiple messages into single to avoid
flooding of messages on console

Added more counters to identify state of vnic.

Added a debug type xve_info

Reported-by: chien yen <chien.yen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: hard LOCKUP in freeing paths

Orabug: 24669507

When path->users becomes zero uVNIC driver starts
cleaning up the Forwarding table entries.

In some corner cases the call is invoked from transmit
function which is in interrupt context and that results
in a hard LOCKUP.

With new changes path->users is decremented in transmit
function to allow cleanup to happen from other thread.
Proper care is taken to avoid race between these
two contexts.

Reported-by: chien yen <chien.yen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Crash in xscore_port_num

Orabug: 24760465

When Server Profile context is not present
xcpm_get_xsmp_session_info returns error and uVNIC
driver has to handle that conditions

Reported-by: scarlett chen <scarlett.chen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Resize uVNIC/PVI CQ size

Orabug: 24765034

uVNIC/PVI should avoid CQ overflow condition

Resize CQ's to 16k to handle multiple connections
flushed simultaneously per path.

Increase Send Queue and receive Queue to 2k for
better performance.

Added counters to print CQ sizes.
Added stats to count RC completions.

Reported-by: scarlett chen <scarlett.chen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Optimizing Transmit completions

Orabug: 24928865

Added a timer for polling Transmit completion and
removed polling completion from a thread context.

Seeing Good Performance improvments with the changes.
In some cases uVNIC is seeing 10% increase in throughput

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Implementing Jumbo MTU support

Orabug: 24928804

With Titan and Saturn supporting Jumbo Infiniband frames
uVNIC can have MTU greater than 4k and upto 10k.

Allocate multiple pages for Receive descriptors code changes
for handling multiple page mapping and unmapping.

Took proper care for enabling Jumbo MTU only for Titan and only
in EoiB mode.

If Jumbo MTU is used for non-Titan cards uVNIC driver will NACK
the Install and OFOS will display a failure message for the
install.

Added stats to display Jumbo & removed legacy EoiB HeartBeat code.

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

NFS: Fix an LOCK/OPEN race when unlinking an open file

Orabug: 24476280

At Connectathon 2016, we found that recent upstream Linux clients
would occasionally send a LOCK operation with a zero stateid. This
appeared to happen in close proximity to another thread returning
a delegation before unlinking the same file while it remained open.

Earlier, the client received a write delegation on this file and
returned the open stateid. Now, as it is getting ready to unlink the
file, it returns the write delegation. But there is still an open
file descriptor on that file, so the client must OPEN the file
again before it returns the delegation.

Since commit 24311f884189 ('NFSv4: Recovery of recalled read
delegations is broken'), nfs_open_delegation_recall() clears the
NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
racing LOCK on the same inode to be put on the wire before the OPEN
operation has returned a valid open stateid.

To eliminate this race, serialize delegation return with the
acquisition of a file lock on the same file. Adopt the same approach
as is used in the unlock path.

This patch also eliminates a similar race seen when sending a LOCK
operation at the same time as returning a delegation on the same file.

Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Anna: Add sentence about LOCK / delegation race]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 11476e9dec39d90fe1e9bf12abc6f3efe35a073d)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>

intel_idle: correct BXT support

Orabug: 24810432

Commit 5dcef69486 ("intel_idle: add BXT support") added an 8-element
lookup array with just a 2-bit value used for lookups. As per the SDM
that bit field is really 3 bits wide. While this is supposedly benign
here, future re-use of the code for other CPUs might expose the issue.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit bef450962597ff39a7f9d53a30523aae9eb55843)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: re-work bxt_idle_state_table_update() and its helper

Orabug: 24810432

Since irtl_ns_units[] has itself zero entries, make sure the caller
recognized those cases along with the MSR read returning zero, as zero
is not a valid value for exit_latency and target_residency.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 3451ab3ebf92b12801878d8b5c94845afd4219f0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/intel_idle: Use Intel family macros for intel_idle

Orabug: 24810432

Use the new INTEL_FAM6_* macros for intel_idle.c.  Also fix up
some of the macros to be consistent with how some of the
intel_idle code refers to the model.

There's on oddity here: model 0x1F is uniquely referred to here
and nowhere else that I could find.  0x1E/0x1F are just spelled
out as "Intel Core i7 and i5 Processors" in the SDM or as "Intel
processors based on the Nehalem, Westmere microarchitectures" in
the RDPMC section.  Comments between tables 19-19 and 19-20 in
the SDM seem to point to 0x1F being some kind of Westmere, so
let's call it "WESTMERE2".

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jacob.jun.pan@intel.com
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001932.EE978EB9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit db73c5a8c80decbb6ddf208e58f3865b4df5384d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpu/intel: Introduce macros for Intel family numbers

Orabug: 24810432

Problem:

We have a boatload of open-coded family-6 model numbers.  Half of
them have these model numbers in hex and the other half in
decimal.  This makes grepping for them tons of fun, if you were
to try.

Solution:

Consolidate all the magic numbers.  Put all the definitions in
one header.

The names here are closely derived from the comments describing
the models from arch/x86/events/intel/core.c.  We could easily
make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
they seemed fine even with the longer versions to me.

Do not take any of these names too literally, like "DESKTOP"
or "MOBILE".  These are all colloquial names and not precise
descriptions of everywhere a given model will show up.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
Cc: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vishwanath Somayaji <vishwanath.somayaji@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: jacob.jun.pan@intel.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: platform-driver-x86@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001927.F2A7D828@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 970442c599b22ccd644ebfe94d1d303bf6f87c05)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: add BXT support

Orabug: 24810432

Broxton has all the HSW C-states, except C3.
BXT C-state timing is slightly different.

Here we trust the IRTL MSRs as authority
on maximum C-state latency, and override the driver's tables
with the values found in the associated IRTL MSRs.
Further we set the target_residency to 1x maximum latency,
trusting the hardware demotion logic.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5dcef694860100fd16885f052591b1268b764d21)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/msr-index.h

intel_idle: Add KBL support

Orabug: 24810432

KBL is similar to SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 3ce093d4de753d6c92cc09366e29d0618a62f542)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Add SKX support

Orabug: 24810432

SKX is similar to BDX

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit f9e71657c2c0a8f1c50884ab45794be2854e158e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Clean up all registered devices on exit.

Orabug: 24810432

This driver registers cpuidle devices when a CPU comes online, but it
leaves the registrations in place when a CPU goes offline. The module
exit code only unregisters the currently online CPUs, leaving the
devices for offline CPUs dangling.

This patch changes the driver to clean up all registrations on exit,
even those from CPUs that are offline.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 3e66a9ab53641a0f7a440e56f7b35bf5d77494b3)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Propagate hot plug errors.

Orabug: 24810432

If a cpuidle registration error occurs during the hot plug notifier
callback, we should really inform the hot plug machinery instead of
just ignoring the error. This patch changes the callback to properly
return on error.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 08820546e4c30c84d0a1f1a49df055e1719c07ea)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Don't overreact to a cpuidle registration failure.

Orabug: 24810432

The helper function, intel_idle_cpu_init, registers one new device
with the cpuidle layer. If the registration should fail, that
function immediately calls intel_idle_cpuidle_devices_uninit() to
unregister every last CPU's device. However, it makes no sense to do
so, when called from the hot plug notifier callback.

This patch moves the call to intel_idle_cpuidle_devices_uninit()
outside of the helper function to the one call site that actually
needs to perform the de-registrations.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit b69ef2c099c3e5f11bd5c33a9530d6522f72c9aa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Setup the timer broadcast only on successful driver load.

Orabug: 24810432

This driver sets the broadcast tick quite early on during probe and does
not clean up again in cast of failure. This patch moves the setup call
after the registration, placing the on_each_cpu() calls within the global
CPU lock region.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 2259a819a8d37e472f08c88bc0dd22194754adb4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Avoid a double free of the per-CPU data.

Orabug: 24810432

The helper function, intel_idle_cpuidle_devices_uninit, frees the
globally allocated per-CPU data. However, this function is invoked
from the hot plug notifier callback at a time when freeing that data
is not safe.

If the call to cpuidle_register_driver() should fail (say, due to lack
of memory), then the driver will free its per-CPU region. On the
*next* CPU_ONLINE event, the driver will happily use the region again
and even free it again if the failure repeats.

This patch fixes the issue by moving the call to free_percpu() outside
of the helper function at the two call sites that actually need to
free the per-CPU data.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ca42489d9ee3262482717c83428e087322fdc39c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Fix dangling registration on error path.

Orabug: 24810432

In the module_init() method, if the per-CPU allocation fails, then the
active cpuidle registration is not cleaned up. This patch fixes the
issue by attempting the allocation before registration, and then
cleaning it up again on registration failure.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e9df69ccd1322e87eee10f28036fad9e6c71f8dd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Fix deallocation order on the driver exit path.

Orabug: 24810432

In the module_exit() method, this driver first frees its per-CPU
pointer, then unregisters a callback making use of the pointer.
Furthermore, the function, intel_idle_cpuidle_devices_uninit, is racy
against CPU hot plugging as it calls for_each_online_cpu().

This patch corrects the issues by unregistering first on the exit path
while holding the hot plug lock.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 51319918bcc31f901646fc66348d41cf74ee0566)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Remove redundant initialization calls.

Orabug: 24810432

The function, intel_idle_cpuidle_driver_init, makes calls on each CPU
to auto_demotion_disable() and c1e_promotion_disable(). These calls
are redundant, as intel_idle_cpu_init() does the same calls just a bit
later on. They are also premature, as the driver registration may yet
fail.

This patch removes the redundant code.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 4a3dfb3fc0fb0fc9acd36c94b7145f9c9dd4d93a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Fix a helper function's return value.

Orabug: 24810432

The function, intel_idle_cpuidle_driver_init, delivers no error codes
at all. This patch changes the function to return 'void' instead of
returning zero.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5469c827d20ab013f43d4f5f94e101d0cf7afd2c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: remove useless return from void function.

Orabug: 24810432

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit f70415496d5ddf06fe7e0a22250d60bab2b2d7cc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Support for Intel Xeon Phi Processor x200 Product Family

Orabug: 24810432

Enables "Intel(R) Xeon Phi(TM) Processor x200 Product Family" support,
formerly code-named KNL. It is based on modified Intel Atom Silvermont
microarchitecture.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
[micah.barany@intel.com: adjusted values of residency and latency]
Signed-off-by: Micah Barany <micah.barany@intel.com>
[hubert.chrzaniuk@intel.com: removed deprecated CPUIDLE_FLAG_TIME_VALID flag]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Pawel Karczewski <pawel.karczewski@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 281baf7a702693deaa45c98ef0c5161006b48257)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled

Orabug: 24810432

Some SKL-H configurations require "intel_idle.max_cstate=7" to boot.
While that is an effective workaround, it disables C10.

This patch detects the problematic configuration,
and disables C8 and C9, keeping C10 enabled.

Note that enabling SGX in BIOS SETUP can also prevent this issue,
if the system BIOS provides that option.

https://bugzilla.kernel.org/show_bug.cgi?id=109081
"Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7"

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
(cherry picked from commit d70e28f57e14a481977436695b0c9ba165472431)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Skylake Client Support - updated

Orabug: 24810432

Addition of PC9 state, and minor tweaks to existing PC6 and PC8 states.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 135919a3a80565070b9645009e65f73e72c661c0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: Skylake Client Support

Orabug: 24810432

Skylake Client CPU idle Power states (C-states)
are similar to the previous generation, Broadwell.
However, Skylake does get its own table with updated
worst-case latency and average energy-break-even residency values.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 493f133f47750aa5566fafa9403617e3f0506f8c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

intel_idle: allow idle states to be freeze-mode specific

Orabug: 24810432

intel_idle uses a NULL "enter" field in a cpuidle state
to recognize the invalid entry terminating a variable-length array.

Linux-4.0 added support for the system-wide "freeze" state
in cpuidle drivers via the new "enter_freeze" field.

The natural way to expose a deep idle state for freeze,
but not for run-time idle is to supply "enter_freeze" without "enter";
so we update the driver to accept such states.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 7dd0e0af64afe4aa08ccdd167f64bd007f09b515)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

RDS: rds debug messages are enabled by default

rds use Kconfig option called "RDS_DEBUG" to enable rds debug messages.
This option cause the rds Makefile to add -DDEBUG to the rds gcc command
line.

When CONFIG_DYNAMIC_DEBUG is enabled, the "DEBUG" macro is used by
include/linux/dynamic_debug.h to decide if dynamic debug prints should
be sent by default to the kernel log.

rds should not enable this macro for production builds.

Orabug: 24956522

Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

net/rds: Fix new sparse warning

c0adf54a109 introduced new sparse warnings:
  CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
net/rds/ib_cm.c:194:51: warning: cast to restricted __be64

The temporary variable for sequence number should have been declared as __be64
rather than u64. Make it so.

Orabug: 24817685

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2783717a71e9babfdd7c36c7e35b790d2c01022)
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

net/rds: fix unaligned memory access

rdma_conn_param private data is copied using memcpy after headers such
as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
private data is aligned to the end of the structure that come before. if
this structure end with u32 the meaning is that the start of the private
data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
naturally aligned but in case the structure start is not 8 bytes aligned,
all u64 members of this structure will not be aligned. to solve this issue
we must use special macros that allow unaligned access to those
unaligned members.

Addresses the following kernel log seen when attempting to use RDMA:

Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]

Orabug: 24817685

Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
[Minor tweaks for top of tree by:]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c0adf54a10903b59037a4c5fcb933dfeeb7b2624)
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  sched: panic on corrupted stack end
  ecryptfs: forbid opening files without mmap handler
  proc: prevent stacking filesystems on top

sched: panic on corrupted stack end

Orabug: 24971905
CVE: CVE-2016-1583

Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g. via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).

Just panic directly.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 29d6455178a09e1dc340380c582b13356227e8df)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
kernel/sched/core.c

ecryptfs: forbid opening files without mmap handler

Orabug: 24971905
CVE: CVE-2016-1583

This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.

Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2f36db71009304b3f0b95afacd8eba1f9f046b87)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

proc: prevent stacking filesystems on top

Orabug: 24971905
CVE: CVE-2016-1583

This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.

(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)

Signed-off-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
uek-rpm nano: remove the OL6 nano kernel dependency on kernel-firmware

uek-rpm nano: remove the OL6 nano kernel dependency on kernel-firmware

linux-nano-firmware obsoletes kernel-firmware. Remove the requirement
for it from the OL6 nano kernel-uek.spec.

Orabug: 25023723

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Merge branch topic/uek-4.1/fuse of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/fuse:
fuse: direct-io: don't dirty ITER_BVEC pages

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  btrfs: Handle unaligned length in extent_same
  panic, x86: Fix re-entrance problem due to panic on NMI
  kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup
  Fix compilation error introduced by "cancel the setfilesize transation when io error happen"
  cancel the setfilesize transation when io error happen
  mm/hugetlb: optimize minimum size (min_size) accounting
  Btrfs: fix device replace of a missing RAID 5/6 device
  Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
  bpf: fix double-fdput in replace_map_fd_with_map_ptr()

Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/stable-cherry-picks:
kvm:vmx: more complete state update on APICv on/off

btrfs: Handle unaligned length in extent_same

The extent-same code rejects requests with an unaligned length. This
poses a problem when we want to dedupe the tail extent of files as we
skip cloning the portion between i_size and the extent boundary.

If we don't clone the entire extent, it won't be deleted. So the
combination of these behaviors winds up giving us worst-case dedupe on
many files.

We can fix this by allowing a length that extents to i_size and
internally aligining those to the end of the block. This is what
btrfs_ioctl_clone() so we can just copy that check over.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit e1d227a42ea2b4664f94212bd1106b9a3413ffb8)
Signed-off-by: Divya Indi <divya.indi@oracle.com>
Orabug: 24696342

panic, x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 1717f2096b543cede7a380c858c765c41936bc35)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24327572

kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup

In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 55537871ef666b4153fd1ef8782e4a13fee142cc)

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 24327572

Fix compilation error introduced by "cancel the setfilesize transation
when io error happen"

xfs_trans_cancel() has two args in UEK4.

Orabug: 24385189
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

cancel the setfilesize transation when io error happen

When I ran xfstest/073 case, the remount process was blocked to wait
transactions to be zero. I found there was a io error happened, and
the setfilesize transaction was not released properly. We should add
the changes to cancel the io error in this case.

Reproduction steps:
1. dd if=/dev/zero of=xfs1.img bs=1M count=2048
2. mkfs.xfs xfs1.img
3. losetup -f ./xfs1.img /dev/loop0
4. mount -t xfs /dev/loop0 /home/test_dir/
5. mkdir /home/test_dir/test
6. mkfs.xfs -dfile,name=image,size=2g
7. mount -t xfs -o loop image /home/test_dir/test
8. cp a file bigger than 2g to /home/test_dir/test
9. mount -t xfs -o remount,ro /home/test_dir/test

[ dchinner: moved io error detection to xfs_setfilesize_ioend() after
transaction context restoration. ]

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Orabug: 24385189
mainline commit: 5cb13dcd0fac071b45c4bebe1801a08ff0d89cad

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

mm/hugetlb: optimize minimum size (min_size) accounting

Orabug: 24450029

It was observed that minimum size accounting associated with the
hugetlbfs min_size mount option may not perform optimally and as
expected.  As huge pages/reservations are released from the filesystem
and given back to the global pools, they are reserved for subsequent
filesystem use as long as the subpool reserved count is less than
subpool minimum size.  It does not take into account used pages within
the filesystem.  The filesystem size limits are not exceeded and this is
technically not a bug.  However, better behavior would be to wait for
the number of used pages/reservations associated with the filesystem to
drop below the minimum size before taking reservations to satisfy
minimum size.

An optimization is also made to the hugepage_subpool_get_pages() routine
which is called when pages/reservations are allocated.  This does not
change behavior, but simply avoids the accounting if all reservations
have already been taken (subpool reserved count == 0).

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 24450029
(cherry picked from commit 09a95e29cb30a3930db22d340ddd072a82b6b0db)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Btrfs: fix device replace of a missing RAID 5/6 device

The original implementation of device replace on RAID 5/6 seems to have
missed support for replacing a missing device. When this is attempted,
we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which
crashes when we try to dereference it. This happens because
btrfs_map_block() has no choice but to return us the missing device
because RAID 5/6 don't have any alternate mirrors to read from, and a
missing device has a NULL bdev.

The idea implemented here is to handle the missing device case
separately, which better only happen when we're replacing a missing RAID
5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to
reconstruct the data from parity, check it with
scrub_recheck_block_checksum(), and write it out with
scrub_write_block_to_dev_replace().

Reported-by: Philip <bugzilla@philip-seeger.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Orabug: 24447930
signed-off-by: Divya Indi <divya.indi@oracle.com>
(cherry picked from commit 73ff61dbe5edeb1799d7e91c8b0641f87feb75fa)

Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation

The current RAID 5/6 recovery code isn't quite prepared to handle
missing devices. In particular, it expects a bio that we previously
attempted to use in the read path, meaning that it has valid pages
allocated. However, missing devices have a NULL blkdev, and we can't
call bio_add_page() on a bio with a NULL blkdev. We could do manual
manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add
a separate path that allows us to manually add pages to the rbio.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Orabug: 24447930
Signed-off-by: Divya Indi <divya.indi@oracle.com>
(cherry picked from commit b4ee1782686d5b7a97826d67fdeaefaedbca23ce)

kvm:vmx: more complete state update on APICv on/off

The function to update APICv on/off state (in particular, to deactivate
it when enabling Hyper-V SynIC) is incomplete: it doesn't adjust
APICv-related fields among secondary processor-based VM-execution
controls. As a result, Windows 2012 guests get stuck when SynIC-based
auto-EOI interrupt intersected with e.g. an IPI in the guest.

In addition, the MSR intercept bitmap isn't updated every time "virtualize
x2APIC mode" is toggled. This path can only be triggered by a malicious
guest, because Windows didn't use x2APIC but rather their own synthetic
APIC access MSRs; however a guest running in a SynIC-enabled VM could
switch to x2APIC and thus obtain direct access to host APIC MSRs
(CVE-2016-4440).

The patch fixes those omissions.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Reported-by: Steve Rutherford <srutherford@google.com>
Reported-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 23347009
CVE: CVE-2016-4440
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

fuse: direct-io: don't dirty ITER_BVEC pages

When reading from a loop device backed by a fuse file it deadlocks on
lock_page().

This is because the page is already locked by the read() operation done on
the loop device. In this case we don't want to either lock the page or
dirty it.

So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.

Orabug : 22652336

Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>

bpf: fix double-fdput in replace_map_fd_with_map_ptr()

When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode
references a non-map file descriptor as a map file descriptor, the error
handling code called fdput() twice instead of once (in __bpf_map_get() and
in replace_map_fd_with_map_ptr()). If the file descriptor table of the
current task is shared, this causes f_count to be decremented too much,
allowing the struct file to be freed while it is still in use
(use-after-free). This can be exploited to gain root privileges by an
unprivileged user.

This bug was introduced in
commit 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only
exploitable since
commit 1be7f75d1668 ("bpf: enable non-root eBPF programs") because
previously, CAP_SYS_ADMIN was required to reach the vulnerable code.

(posted publicly according to request by maintainer)

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8358b02bf67d3a5d8a825070e1aa73f25fb2e4c7)

Orabug: 23268285
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

mlx4_ib: remove WARN_ON() based on incorrect assumptions

A WARN_ON() was inserted when user data was introduced to the
ibv_cmd_alloc_shpd() by another infiniband provider to make
sure that no user data is sent to older providers such as this
one which do not expect it.

It assumed when no user data is sent the udata->inlen is zero.
The user-kernel API however always sends at least 8 octets
(which may not be initialized in case of provider libraries that
do not user user data).

We remove the WARN_ON and rely on providers not to touch that
field if their companion library is not expected to initialize it.

Orabug: 24972331

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm nano: remove ql23xx-firmware from kernel_prereq

Orabug: 24938352

Remove kernel_prereq ql23xx-firmware from the OL6 nano kernel-uek.spec

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

nvme: fix max_segments integer truncation

The block layer uses an unsigned short for max_segments. The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 24928835
Cherry picked commit: 45686b6198bd824f083ff5293f191d78db9d708a
Conflicts:
UEK4 QU2 nvme module doesn't have core.c file. All the functions
resides in pci.c. Hence, this patch is manually ported to the respective
function in pci.c
drivers/nvme/host/core.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Conflicts:
arch/x86/xen/enlighten.c

Revert "ib/mlx4: Initialize multiple Mellanox HCAs in parallel"

This reverts commit a661980d2a809dbe208914b8eec46c18b78c18fb.

Orabug: 24951493
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

No ILOM web console keyboard support in ueknano kernel

Enable HID driver in the OL6 ueknano kernel config.

Orabug: 24946756
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

mlx4_core/ib: set the IB port MTU to 2K

'commit 096335b3f983 ("mlx4_core: Allow dynamic MTU configuration for IB
ports")' overwrite the default port MTU and sets it as 4K. Since this
directly impacts the HW VLs supported and Oracle workloads heavily uses
all supported 8 VLs for traffic classification, 2K default needs to
be kept as is.

We initilise it to default 2k so that the feature(dynamic MTU configuration)
is still available for non DB users to set the desired MTU value using sysctl.

Also for CX2 cards, commit 596c5ff4b7b3 ("net/mlx4: adjust initial
value of vl_cap in mlx4_SET_PORT") broke the vl_cap which made the
supported VLs to 4 irrespective of MTU size.

Orabug: 24946479

Tested-by: Pierre Orzechowski <pierre.e.orzechowski@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: cq: transfer headroom attribute to user mode

This commit makes sure old libsif versions works
with the driver while providing a forward compatible
way of making additional changes to the extra
headroom in the CQs.

We anticipate to be able to trim
down the extra entries once we have PQP errors
handled transparently. This commit then ensures that
the headroom is only set in one place, at the
driver side, and that user mode just can
pick up the configured headroom from the kernel.
This is done by providing the used headroom
in a formerly reserved 32 bit field, thus no changes
to the packet size is necessary.

Nevertheless we increment the abi version from
3.6 to 3.7 to allow libsif to detect whether
the headroom field can be trusted.

Orabug: 24926265

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: franklin <osl04sys_no_grp@oracle.com>

sif: Minor cleanup commit

- Remove some unused variables
- Minor changes due to bug fixes in hardware header
file generator

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

sif: Add vendor flag to support testing without oversized CQs

After introduction of extra CQ entries to reduce risk of
having duplicate completions overflow a CQ, we no longer can
trigger various CQ overflow scenarios without running a lot of
requests. We need to be able to test with a minimal set of operations
to allow co-sim based tests for further analysis.

Introduce a new vendor_flag no_x_cqe = 0x80 to turn off
the allocation of extra CQEs.

Orabug: 24919301

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

sif: cq: Fix the max_cqe capability supported by SIF

Orabug: 24673784

This patch fixes an incomplete patch in commit "cq: Add
additional SIF visible cqes to CQ". The max_cqe
capability reported by query_device is incorrect because
it includes the SIF visible cqes.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

The initial commit with the same title was somehow probably
subject to a merge issue and it's effect got lost entirely
by a subsequent patch.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Fix modify_qp_hw from SQE to RTS

Orabug: 24810237

Fix an issue in modify_qp_hw from SQE to RTS returns
EPSC_MODIFY_CANNOT_CHANGE_QP_ATTR. In sif, qp transition
from SQE to RTS must explicitly set the req_access_error
to 0.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pd: Implement Oracle ib_core compliance shared pd

Orabug: 24713410

shared pd is not an IBTA defined feature, but an Oracle
Linux extension. Even though PSIF can share a pd easily,
it must comply with the Oracle ib_core implementation which
requires a new pd "object" when reusing a pd (via share_pd
verbs).

Without a new pd "object", it causes a NULL pointer deference
during pd clean-up phase. Thus, this patch creates a new pd
"object" when reusing a pd, and this pd "object" is pointing
to the original pd index.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Add timeout to the threaded interrupt handler

This commit implements a timeout that prevents soft lockup issues when
the threaded interrupt function (sif_intr_worker) keeps processing
events for a long period. If the timeout is reached, the threaded
handler returns IRQ_HANDLED even if there are more events to be
processed. In such a case, the coalescing mechanism will generate
an IRQ for the last event.

Orabug: 24839976

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

IB/mlx4: Scatter CQs to different EQs

If the user does not request a specific comp vector, use a weight based
algorithm to set the EQ.

Orabug: 24705943

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: fix panic with handlers running post teardown

Shutdown cqe reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet BUG, tasklet handler could
still run after tasklet_kill which lead can lead to kernel
panic. Fix for core tasklet code was proposed and accepted
upstream, but it comes with bagage of fixing quite a
few bad users of it. Also for receive, we have additional
kthread to take care.

The BUG fix done as part of Orabug 2446085, had an additional
assumption that reaping code won't reap all the CQEs after
QP moved to error state which was not correct. QP is
moved to error state as part of rdma_disconnect() and
all the CQEs are reaped by the loop properly.

Any handler running after above and trying to access the
qp/cq resources gets exposed to race conditions. Patch
fixes this race by makes sure that handlers returns
without any action post teardown.

Orabug: 24460805

Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

uek-rpm nano: replace linux-firmware dependency with linux-nano-firmware

The Linux nano kernel now uses linux-nano-firmware. Replace the
linux-firmware dependency with it.

Orabug: 24938352

Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/dtrace of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Conflicts:
include/linux/mm.h

mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619)
Orabug: 24926639
Conflicts:
        include/linux/mm.h
        mm/gup.c
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

x86/acpi: store ACPI ids from MADT for future usage

Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 3e9e57fad3d8530aa30787f861c710f598ddc4e7)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

xen-netback: fix error handling on netback_probe()

In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.

Signed-off-by: Filipe Manco <filipe.manco@neclab.eu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cce94483e47e8e3d74cf4475dea33f9fd4b6ad9f)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

xen: change the type of xen_vcpu_id to uint32_t

We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 55467dea2967259f21f4f854fc99d39cc5fea60e)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

xenbus: don't look up transaction IDs for ordinary writes

This should really only be done for XS_TRANSACTION_END messages, or
else at least some of the xenstore-* tools don't work anymore.

Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition")
Reported-by: Richard Schütz <rschuetz@uni-koblenz.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Richard Schütz <rschuetz@uni-koblenz.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9a035a40f7f3f6708b79224b86c5777a3334f7ea)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

xen-blkfront: free resources if xlvbd_alloc_gendisk fails

Current code forgets to free resources in the failure path of
xlvbd_alloc_gendisk(), this patch fix it.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 4e876c2bd37fbb5c37a4554a79cf979d486f0e82)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937

Conflicts:
drivers/block/xen-blkfront.c

xen: add static initialization of steal_clock op to xen_time_ops

pv_time_ops might be overwritten with xen_time_ops after the
steal_clock operation has been initialized already. To prevent calling
a now uninitialized function pointer add the steal_clock static
initialization to xen_time_ops.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit d34c30cc1fa80f509500ff192ea6bc7d30671061)
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Orabug: 24820937