]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
9 years agojbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
OGAWA Hirofumi [Thu, 10 Mar 2016 04:47:25 +0000 (23:47 -0500)]
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

Orabug: stable_rc4

[ Upstream commit c0a2ad9b50dd80eeccd73d9ff962234590d5ec93 ]

On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.

The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.

mount (id=10)
write transaction (id=11)
write transaction (id=12)
umount (id=10) <= the bug doesn't write latest ID

mount (id=10)
write transaction (id=11)
crash

mount
[recovery process]
transaction (id=11)
transaction (id=12) <= valid transaction ID, but old commit
                                       must not replay

Like above, this bug become the cause of recovery failure, or FS
corruption.

So why ->j_tail_sequence doesn't point latest ID?

Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)

So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.

So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH.  (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)

BTW,

journal->j_tail_sequence =
++journal->j_transaction_sequence;

Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3b6a70271a4e1e72585b3a7236cbd17d646b38d1)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agosg: fix dxferp in from_to case
Douglas Gilbert [Thu, 3 Mar 2016 05:31:29 +0000 (00:31 -0500)]
sg: fix dxferp in from_to case

Orabug: 23331104

[ Upstream commit 5ecee0a3ee8d74b6950cb41e8989b0c2174568d4 ]

One of the strange things that the original sg driver did was let the
user provide both a data-out buffer (it followed the sg_header+cdb)
_and_ specify a reply length greater than zero. What happened was that
the user data-out buffer was copied into some kernel buffers and then
the mid level was told a read type operation would take place with the
data from the device overwriting the same kernel buffers. The user would
then read those kernel buffers back into the user space.

From what I can tell, the above action was broken by commit fad7f01e61bf
("sg: set dxferp to NULL for READ with the older SG interface") in 2008
and syzkaller found that out recently.

Make sure that a user space pointer is passed through when data follows
the sg_header structure and command.  Fix the abnormal case when a
non-zero reply_len is also given.

Fixes: fad7f01e61bf737fe8a3740d803f000db57ecac6
Cc: <stable@vger.kernel.org> #v2.6.28+
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 685a50b681ddd07ff2b7714797b5793adcc691e7)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomd/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list
NeilBrown [Wed, 9 Mar 2016 01:58:25 +0000 (12:58 +1100)]
md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

Orabug: 23331103

[ Upstream commit 550da24f8d62fe81f3c13e3ec27602d6e44d43dc ]

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another.  This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes.  When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero.  So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec <martin.svec@zoner.cz> (and others)
Tested-by: Tom Weber <linux@junkyard.4t2.com>
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Cc: stable@vger.kernel.org (v4.1 and later)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 88e2df1b99cfc0087ccd2faa8ca61d7263fc007a)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobe2iscsi: set the boot_kset pointer to NULL in case of failure
Maurizio Lombardi [Fri, 4 Mar 2016 09:41:49 +0000 (10:41 +0100)]
be2iscsi: set the boot_kset pointer to NULL in case of failure

Orabug: 23331102

[ Upstream commit 84bd64993f916bcf86270c67686ecf4cea7b8933 ]

In beiscsi_setup_boot_info(), the boot_kset pointer should be set to
NULL in case of failure otherwise an invalid pointer dereference may
occur later.

Cc: <stable@vger.kernel.org>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d5226186331401679a52df554a9607022e470983)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agox86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs
Bjorn Helgaas [Fri, 26 Feb 2016 15:15:11 +0000 (09:15 -0600)]
x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs

Orabug: stable_rc4

[ Upstream commit b894157145e4ac7598d7062bc93320898a5e059e ]

The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register
where a BAR should be.  We don't know what the side effects of sizing the
"BAR" would be, and we don't know what address space the "BAR" might appear
to describe.

Mark these devices as having non-compliant BARs so the PCI core doesn't
touch them.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit fcc5834794fdcb62637a1deb25bff5787bf73757)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobcache: fix cache_set_flush() NULL pointer dereference on OOM
Eric Wheeler [Mon, 7 Mar 2016 23:17:50 +0000 (15:17 -0800)]
bcache: fix cache_set_flush() NULL pointer dereference on OOM

Orabug: 23331099

[ Upstream commit f8b11260a445169989d01df75d35af0f56178f95 ]

When bch_cache_set_alloc() fails to kzalloc the cache_set, the
asyncronous closure handling tries to dereference a cache_set that
hadn't yet been allocated inside of cache_set_flush() which is called
by __cache_set_unregister() during cleanup.  This appears to happen only
during an OOM condition on bcache_register.

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d09a05998d79dcfaa25de84624dce9f806fe4e7c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobcache: cleaned up error handling around register_cache()
Eric Wheeler [Fri, 26 Feb 2016 22:33:56 +0000 (14:33 -0800)]
bcache: cleaned up error handling around register_cache()

Orabug: 23331096

[ Upstream commit 9b299728ed777428b3908ac72ace5f8f84b97789 ]

Fix null pointer dereference by changing register_cache() to return an int
instead of being void.  This allows it to return -ENOMEM or -ENODEV and
enables upper layers to handle the OOM case without NULL pointer issues.

See this thread:
  http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521

Fixes this error:
  gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register

  bcache: register_cache() error opening sdh2: cannot allocate memory
  BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
  IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
  PGD 120dff067 PUD 1119a3067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: veth ip6table_filter ip6_tables
  (...)
  CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  Workqueue: events cache_set_flush [bcache]
  task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000
  RIP: 0010:[<ffffffffc05a7e8d>]  [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0e6555443a206655885bc4126d9a3a0e2d9d17a3)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobcache: fix race of writeback thread starting before complete initialization
Eric Wheeler [Fri, 26 Feb 2016 22:39:06 +0000 (14:39 -0800)]
bcache: fix race of writeback thread starting before complete initialization

Orabug: stable_rc4

[ Upstream commit 07cc6ef8edc47f8b4fc1e276d31127a0a5863d4d ]

The bch_writeback_thread might BUG_ON in read_dirty() if
dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
its related initialization.  This patch downs the dc->writeback_lock until
after initialization is complete, thus preventing bch_writeback_thread
from proceeding prematurely.

See this thread:
  http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 7269554a57352f66aefb3e85cb7e11c4b63bba59)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agosched/cputime: Fix steal_account_process_tick() to always return jiffies
Chris Friesen [Sun, 6 Mar 2016 05:18:48 +0000 (23:18 -0600)]
sched/cputime: Fix steal_account_process_tick() to always return jiffies

Orabug: stable_rc4

[ Upstream commit f9c904b7613b8b4c85b10cd6b33ad41b2843fa9d ]

The callers of steal_account_process_tick() expect it to return
whether a jiffy should be considered stolen or not.

Currently the return value of steal_account_process_tick() is in
units of cputime, which vary between either jiffies or nsecs
depending on CONFIG_VIRT_CPU_ACCOUNTING_GEN.

If cputime has nsecs granularity and there is a tiny amount of
stolen time (a few nsecs, say) then we will consider the entire
tick stolen and will not account the tick on user/system/idle,
causing /proc/stats to show invalid data.

The fix is to change steal_account_process_tick() to accumulate
the stolen time and only account it once it's worth a jiffy.

(Thanks to Frederic Weisbecker for suggestions to fix a bug in my
first version of the patch.)

Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/56DBBDB8.40305@mail.usask.ca
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 23745ba7ffac8cffbe648812fe7dc485d6df9404)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoperf/x86/intel: Add definition for PT PMI bit
Stephane Eranian [Thu, 3 Mar 2016 19:50:40 +0000 (20:50 +0100)]
perf/x86/intel: Add definition for PT PMI bit

Orabug: 23331092

[ Upstream commit 5690ae28e472d25e330ad0c637a5cea3fc39fb32 ]

This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 559920294e5db893cf5abedb00f56c2d72bca8c8)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agox86: Add new MSRs and MSR bits used for Intel Skylake PMU support
Andi Kleen [Sun, 10 May 2015 19:22:41 +0000 (12:22 -0700)]
x86: Add new MSRs and MSR bits used for Intel Skylake PMU support

Orabug: 23331091

[ Upstream commit b83ff1c8617aac03a1cf807aafa848fe0f0908f2 ]

Add new MSRs (LBR_INFO) and some new MSR bits used by the Intel Skylake
PMU driver.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit c7af1256a07538167fe1b14a6714e7b92cf82179)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agousb: hub: fix a typo in hub_port_init() leading to wrong logic
Oliver Neukum [Wed, 17 Feb 2016 10:52:43 +0000 (11:52 +0100)]
usb: hub: fix a typo in hub_port_init() leading to wrong logic

Orabug: 23331090

[ Upstream commit 0d5ce778c43bf888328231bcdce05d5c860655aa ]

A typo of j for i led to a logic bug. To rule out future
confusion, the variable names are made meaningful.

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 8e1682fddbd122a565965a83a5f8235e8bcadc10)

Conflict:

drivers/usb/core/hub.c

9 years agoof: alloc anywhere from memblock if range not specified
Vinayak Menon [Mon, 22 Feb 2016 13:45:44 +0000 (19:15 +0530)]
of: alloc anywhere from memblock if range not specified

Orabug: 23331089

[ Upstream commit e53b50c0cbe392c946807abf7d07615a3c588642 ]

early_init_dt_alloc_reserved_memory_arch passes end as 0 to
__memblock_alloc_base, when limits are not specified. But
__memblock_alloc_base takes end value of 0 as MEMBLOCK_ALLOC_ACCESSIBLE
and limits the end to memblock.current_limit. This results in regions
never being placed in HIGHMEM area, for e.g. CMA.
Let __memblock_alloc_base allocate from anywhere in memory if limits are
not specified.

Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 444cf5487d5f51a3ecce2a0dfe237156290dfc7f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: Handle FTL rebuild failure state during device initialization
Asai Thambi SP [Thu, 25 Feb 2016 05:18:20 +0000 (21:18 -0800)]
mtip32xx: Handle FTL rebuild failure state during device initialization

Orabug: stable_rc4

[ Upstream commit aae4a033868c496adae86fc6f9c3e0c405bbf360 ]

Allow device initialization to finish gracefully when it is in
FTL rebuild failure state. Also, recover device out of this state
after successfully secure erasing it.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 65963ead8aefa685ec2e22d403461101c243683e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT
Asai Thambi SP [Mon, 11 May 2015 22:50:50 +0000 (15:50 -0700)]
mtip32xx: fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT

Orabug: 23331087

[ Upstream commit ee04bed690cb49a49512a641405bac42d13c2b2a ]

Fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0e536ed27652e8d5d74e13378a8f48b52cb21c95)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: Handle safe removal during IO
Asai Thambi SP [Thu, 25 Feb 2016 05:18:10 +0000 (21:18 -0800)]
mtip32xx: Handle safe removal during IO

Orabug: 23331085

[ Upstream commit 51c6570eb922146470c2fe660c34585414679bd6 ]

Flush inflight IOs using fsync_bdev() when the device is safely
removed. Also, block further IOs in device open function.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit afc16b3aab195d12ad40546d3cffcab5e3511ead)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: fix crash on surprise removal of the drive
Asai Thambi SP [Mon, 11 May 2015 22:53:18 +0000 (15:53 -0700)]
mtip32xx: fix crash on surprise removal of the drive

Orabug: 23331084

[ Upstream commit 2132a544727eb17f76bfef8b550a016a41c38821 ]

pci and block layers have changed a lot compared to when SRSI support was added.
Given the current state of pci and block layers, this driver do not have to do
any specific handling.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 13af0df20f8e78dc1e7239a29b2862addde3953e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: fix rmmod issue
Asai Thambi SP [Mon, 11 May 2015 22:48:00 +0000 (15:48 -0700)]
mtip32xx: fix rmmod issue

Orabug: 23331083

[ Upstream commit 02b48265e7437bfe153af16337b14ee74f00905f ]

put_disk() need to be called after del_gendisk() to free the disk object structure.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 6b9d9c35930bd75ddcfafb8eb7db909ceb63af10)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: Avoid issuing standby immediate cmd during FTL rebuild
Asai Thambi SP [Thu, 25 Feb 2016 05:17:32 +0000 (21:17 -0800)]
mtip32xx: Avoid issuing standby immediate cmd during FTL rebuild

Orabug: 23331082

[ Upstream commit d8a18d2d8f5de55666c6011ed175939d22c8e3d8 ]

Prevent standby immediate command from being issued in remove,
suspend and shutdown paths, while drive is in FTL rebuild process.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 15d38f73263562c2ebe3cdb7c6380e0b9a98c76c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtip32xx: Print exact time when an internal command is interrupted
Asai Thambi SP [Thu, 25 Feb 2016 05:16:38 +0000 (21:16 -0800)]
mtip32xx: Print exact time when an internal command is interrupted

Orabug: 23331081

[ Upstream commit 5b7e0a8ac85e2dfd83830dc9e0b3554d153a37e3 ]

Print exact time when an internal command is interrupted.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit c9d3e69a692eed01045bfa718505c3e887e87d85)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoquota: Fix possible GPF due to uninitialised pointers
Nikolay Borisov [Thu, 3 Mar 2016 09:54:57 +0000 (10:54 +0100)]
quota: Fix possible GPF due to uninitialised pointers

Orabug: 23331080

[ Upstream commit ab73ef46398e2c0159f3a71de834586422d2a44a ]

When dqget() in __dquot_initialize() fails e.g. due to IO error,
__dquot_initialize() will pass an array of uninitialized pointers to
dqput_all() and thus can lead to deference of random data. Fix the
problem by properly initializing the array.

CC: stable@vger.kernel.org
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit ab1cc52b3f62f2445c60cbe390d26c50ebc0f3bd)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoxfs: fix two memory leaks in xfs_attr_list.c error paths
Mateusz Guzik [Tue, 1 Mar 2016 22:51:09 +0000 (09:51 +1100)]
xfs: fix two memory leaks in xfs_attr_list.c error paths

Orabug: 23331078

[ Upstream commit 2e83b79b2d6c78bf1b4aa227938a214dcbddc83f ]

This plugs 2 trivial leaks in xfs_attr_shortform_list and
xfs_attr3_leaf_list_int.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 594103da3005639712b3123a612791c8f4d3f4e9)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agonfsd4: fix bad bounds checking
J. Bruce Fields [Tue, 1 Mar 2016 01:21:21 +0000 (20:21 -0500)]
nfsd4: fix bad bounds checking

Orabug: 23331077

[ Upstream commit 4aed9c46afb80164401143aa0fdcfe3798baa9d5 ]

A number of spots in the xdr decoding follow a pattern like

n = be32_to_cpup(p++);
READ_BUF(n + 4);

where n is a u32.  The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher.  I'm not sure exactly what the consequences
are, but we've seen crashes soon after.

Instead, just break these up into two READ_BUF()s.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d876f71611ad9b720cc890075b3c4bec25bd54b5)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoIB/srpt: Simplify srpt_handle_tsk_mgmt()
Bart Van Assche [Thu, 11 Feb 2016 19:03:09 +0000 (11:03 -0800)]
IB/srpt: Simplify srpt_handle_tsk_mgmt()

Orabug: 23331076

[ Upstream commit 51093254bf879bc9ce96590400a87897c7498463 ]

Let the target core check task existence instead of the SRP target
driver. Additionally, let the target core check the validity of the
task management request instead of the ib_srpt driver.

This patch fixes the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
Oops: 0002 [#1] SMP
Call Trace:
 [<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
 [<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
 [<ffffffff8109726f>] kthread+0xcf/0xe0
 [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 3e4f574857ee ("ib_srpt: Convert TMR path to target_submit_tmr")
Tested-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 179e72b561d3d331c850e1a5779688d7a7de5246)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoX.509: Fix leap year handling again
David Howells [Wed, 24 Feb 2016 14:37:15 +0000 (14:37 +0000)]
X.509: Fix leap year handling again

Orabug: 23331075

[ Upstream commit ac4cbedfdf55455b4c447f17f0fa027dbf02b2a6 ]

There are still a couple of minor issues in the X.509 leap year handling:

 (1) To avoid doing a modulus-by-400 in addition to a modulus-by-100 when
     determining whether the year is a leap year or not, I divided the year
     by 100 after doing the modulus-by-100, thereby letting the compiler do
     one instruction for both, and then did a modulus-by-4.

     Unfortunately, I then passed the now-modified year value to mktime64()
     to construct a time value.

     Since this isn't a fast path and since mktime64() does a bunch of
     divisions, just condense down to "% 400".  It's also easier to read.

 (2) The default month length for any February where the year doesn't
     divide by four exactly is obtained from the month_length[] array where
     the value is 29, not 28.

     This is fixed by altering the table.

Reported-by: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e62c5259a62f3da2a911f8fe6275dbf43d3b624f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoPKCS#7: Improve and export the X.509 ASN.1 time object decoder
David Howells [Wed, 29 Jul 2015 15:58:32 +0000 (16:58 +0100)]
PKCS#7: Improve and export the X.509 ASN.1 time object decoder

Orabug: 23331074

[ Upstream commit fd19a3d195be23e8d9d0d66576b96ea25eea8323 ]

Make the X.509 ASN.1 time object decoder fill in a time64_t rather than a
struct tm to make comparison easier (unfortunately, this makes readable
display less easy) and export it so that it can be used by the PKCS#7 code
too.

Further, tighten up its parsing to reject invalid dates (eg. weird
characters, non-existent hour numbers) and unsupported dates (eg. timezones
other than 'Z' or dates earlier than 1970).

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit f85d91f88486b34679c532a5687466eaf335258f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoX.509: Extract both parts of the AuthorityKeyIdentifier
David Howells [Mon, 20 Jul 2015 20:16:26 +0000 (21:16 +0100)]
X.509: Extract both parts of the AuthorityKeyIdentifier

Orabug: 23331073

[ Upstream commit b92e6570a992c7d793a209db282f68159368201c ]

Extract both parts of the AuthorityKeyIdentifier, not just the keyIdentifier,
as the second part can be used to match X.509 certificates by issuer and
serialNumber.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3ccbbbf7b5ceb75bef47692c39f443a3efb38437)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocrypto: ccp - memset request context to zero during import
Tom Lendacky [Thu, 25 Feb 2016 22:48:13 +0000 (16:48 -0600)]
crypto: ccp - memset request context to zero during import

Orabug: 23331072

[ Upstream commit ce0ae266feaf35930394bd770c69778e4ef03ba9 ]

Since a crypto_ahash_import() can be called against a request context
that has not had a crypto_ahash_init() performed, the request context
needs to be cleared to insure there is no random data present. If not,
the random data can result in a kernel oops during crypto_ahash_update().

Cc: <stable@vger.kernel.org> # 3.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit dad41d54081e1bd2ef601c702ff4ea0f7428a965)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoRAID5: check_reshape() shouldn't call mddev_suspend
Shaohua Li [Thu, 25 Feb 2016 01:38:28 +0000 (17:38 -0800)]
RAID5: check_reshape() shouldn't call mddev_suspend

Orabug: 23331070

[ Upstream commit 27a353c026a879a1001e5eac4bda75b16262c44a ]

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Cc: stable@vger.kernel.org (v4.1+)
Reported-and-tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 503f8305ab1b82d8788a2f161e7c52c0c0f6aeac)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomd/raid5: Compare apples to apples (or sectors to sectors)
Jes Sorensen [Tue, 16 Feb 2016 21:44:24 +0000 (16:44 -0500)]
md/raid5: Compare apples to apples (or sectors to sectors)

Orabug: 23331068

[ Upstream commit e7597e69dec59b65c5525db1626b9d34afdfa678 ]

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Cc: stable@vger.kernel.org v3.7+
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5255a738ee6ecc0e479728efe5668efd64901197)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agofix kABI breakage from pci_dev changes
Dan Duval [Sun, 22 May 2016 15:35:04 +0000 (11:35 -0400)]
fix kABI breakage from pci_dev changes

Orabug: 23331203

Orabug: 23331203

Commit bb44fa317be6c1dd0650feaae3326ac11f2d37a4 ("PCI: Add
dev->has_secondary_link to track downstream PCIe links") and commit
0af1534fb7b3dba6e11d6c2670912725a2087217 ("PCI: Disable IO/MEM decoding
for devices with non-compliant BARs") added one-bit flags to the
pci_dev structure.  Technically, this broke kABI (according to the
checker, anyway).  In reality, these bits were added after a bunch
of other one-bit fields, the result being that their addition didn't
extend the size of the structure, nor did it change the offsets of
any existing fields of the structure.

This commit simply wraps these two fields in "#ifndef __GENKSYMS__"
to hide them from the checker.

Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 32594c00fba3c410b5a339f2ddd0e4c583170186)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoPCI: Disable IO/MEM decoding for devices with non-compliant BARs
Bjorn Helgaas [Thu, 25 Feb 2016 20:35:57 +0000 (14:35 -0600)]
PCI: Disable IO/MEM decoding for devices with non-compliant BARs

Orabug: 23331067

[ Upstream commit b84106b4e2290c081cdab521fa832596cdfea246 ]

The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.

Some non-spec-compliant devices put registers other than BARs where the
BARs should be.  When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.

Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit aa57ba13f44426a076ac567e965654453d4be1f1)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoPCI: Add dev->has_secondary_link to track downstream PCIe links
Yijing Wang [Thu, 21 May 2015 07:05:02 +0000 (15:05 +0800)]
PCI: Add dev->has_secondary_link to track downstream PCIe links

Orabug: 23331066

[ Upstream commit d0751b98dfa391f862e02dc36a233a54615e3f1d ]

A PCIe Port is an interface to a Link.  A Root Port is a PCI-PCI bridge in
a Root Complex and has a Link on its secondary (downstream) side.  For
other Ports, the Link may be on either the upstream (closer to the Root
Complex) or downstream side of the Port.

The usual topology has a Root Port connected to an Upstream Port.  We
previously assumed this was the only possible topology, and that a
Downstream Port's Link was always on its downstream side, like this:

                  +---------------------+
  +------+        |          Downstream |
  | Root |        | Upstream       Port +--Link--
  | Port +--Link--+ Port                |
  +------+        |          Downstream |
                  |                Port +--Link--
                  +---------------------+

But systems do exist (see URL below) where the Root Port is connected to a
Downstream Port.  In this case, a Downstream Port's Link may be on either
the upstream or downstream side:

                  +---------------------+
  +------+        |            Upstream |
  | Root |        | Downstream     Port +--Link--
  | Port +--Link--+ Port                |
  +------+        |          Downstream |
                  |                Port +--Link--
                  +---------------------+

We can't use the Port type to determine which side the Link is on, so add a
bit in struct pci_dev to keep track.

A Root Port's Link is always on the Port's secondary side.  A component
(Endpoint or Port) on the other end of the Link obviously has the Link on
its upstream side.  If that component is a Port, it is part of a Switch or
a Bridge.  A Bridge has a PCI or PCI-X bus on its secondary side, not a
Link.  The internal bus of a Switch connects the Port to another Port whose
Link is on the downstream side.

[bhelgaas: changelog, comment, cache "type", use if/else]
Link: http://lkml.kernel.org/r/54EB81B2.4050904@pobox.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94361
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 517a021fdba44206722c85bd9267dabd67475fa6)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomtd: onenand: fix deadlock in onenand_block_markbad
Aaro Koskinen [Sat, 20 Feb 2016 20:27:48 +0000 (22:27 +0200)]
mtd: onenand: fix deadlock in onenand_block_markbad

Orabug: 23331065

[ Upstream commit 5e64c29e98bfbba1b527b0a164f9493f3db9e8cb ]

Commit 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
incorrectly changed onenand_block_markbad() to call mtd_block_markbad
instead of onenand_chip's block_markbad function. As a result the function
will now recurse and deadlock. Fix by reverting the change.

Fixes: 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 084b44e9cbc6eec0b0b13138918fb5935208f3b4)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoaic7xxx: Fix queue depth handling
Alan [Mon, 15 Feb 2016 18:53:15 +0000 (18:53 +0000)]
aic7xxx: Fix queue depth handling

Orabug: 23331064

[ Upstream commit 5a51a7abca133860a6f4429655a9eda3c4afde32 ]

We were setting the queue depth correctly, then setting it back to
two. If you hit this as a bisection point then please send me an email
as it would imply we've been hiding other bugs with this one.

Cc: <stable@vger.kernel.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit cf438ddac48b27e7de0514f96d912e132c908df4)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoaacraid: Fix memory leak in aac_fib_map_free
Raghava Aditya Renukunta [Wed, 3 Feb 2016 23:06:02 +0000 (15:06 -0800)]
aacraid: Fix memory leak in aac_fib_map_free

Orabug: 23331062

[ Upstream commit f88fa79a61726ce9434df9b4aede36961f709f17 ]

aac_fib_map_free() calls pci_free_consistent() without checking that
dev->hw_fib_va is not NULL and dev->max_fib_size is not zero.If they are
indeed NULL/0, this will result in a hang as pci_free_consistent() will
attempt to invalidate cache for the entire 64-bit address space
(which would take a very long time).

Fixed by adding a check to make sure that dev->hw_fib_va and
dev->max_fib_size are not NULL and 0 respectively.

Fixes: 9ad5204d6 - "[SCSI]aacraid: incorrect dma mapping mask during blinked recover or user initiated reset"
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@pmcs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 6d2cd58f288e320c5a023219ac8726b30657ddbb)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoaacraid: Fix RRQ overload
Raghava Aditya Renukunta [Wed, 3 Feb 2016 23:06:00 +0000 (15:06 -0800)]
aacraid: Fix RRQ overload

Orabug: 23331060

[ Upstream commit 3f4ce057d51a9c0ed9b01ba693df685d230ffcae ]

The driver utilizes an array of atomic variables to keep track of IO
submissions to each vector. To submit an IO multiple threads iterate
through the array to find a vector which has empty slots to send an
IO. The reading and updating of the variable is not atomic, causing race
conditions when a thread uses a full vector to submit an IO.

Fixed by mapping each FIB to a vector, the submission path then uses
said vector to submit IO thereby removing the possibly of a race
condition.The vector assignment is started from 1 since vector 0 is
reserved for the use of AIF management FIBS.If the number of MSIx
vectors is 1 (MSI or INTx mode) then all the fibs are allocated to
vector 0.

Fixes: 495c0217 "aacraid: MSI-x support"
Cc: stable@vger.kernel.org # v4.1
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@pmcs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 34bb7098221c2a37f5d9ef3591971a25719ae21a)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agodm: fix excessive dm-mq context switching
Mike Snitzer [Fri, 5 Feb 2016 13:49:01 +0000 (08:49 -0500)]
dm: fix excessive dm-mq context switching

Orabug: 23331058

[ Upstream commit 6acfe68bac7e6f16dc312157b1fa6e2368985013 ]

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b00e16ca2d !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit fb1840e257d26b6aad69b864cb7ed4a67ab986b6)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: iterate over buffer heads correctly in move_extent_per_page()
Eryu Guan [Sun, 21 Feb 2016 23:38:44 +0000 (18:38 -0500)]
ext4: iterate over buffer heads correctly in move_extent_per_page()

Orabug: 23331057

[ Upstream commit 87f9a031af48defee9f34c6aaf06d6f1988c244d ]

In commit bcff24887d00 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 31a37d7c7ef7b4f90b600fcddd1c385a39f9d34c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agotools/hv: Use include/uapi with __EXPORTED_HEADERS__
Kamal Mostafa [Thu, 28 Jan 2016 06:29:33 +0000 (22:29 -0800)]
tools/hv: Use include/uapi with __EXPORTED_HEADERS__

Orabug: 23331055

[ Upstream commit 50fe6dd10069e7c062e27f29606f6e91ea979399 ]

Use the local uapi headers to keep in sync with "recently" added #define's
(e.g. VSS_OP_REGISTER1).

Fixes: 3eb2094c59e8 ("Adding makefile for tools/hv")
Cc: <stable@vger.kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 2168661199ff5eeab52a8aa2408a6b21fbf4f743)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agonet: irda: Fix use-after-free in irtty_open()
Peter Hurley [Sun, 10 Jan 2016 01:48:45 +0000 (17:48 -0800)]
net: irda: Fix use-after-free in irtty_open()

Orabug: 23331054

[ Upstream commit 401879c57f01cbf2da204ad2e8db910525c6dbea ]

The N_IRDA line discipline may access the previous line discipline's closed
and already-fre private data on open [1].

The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).

[1]
    ==================================================================
    BUG: KASAN: use-after-free in irtty_open+0x422/0x550 at addr ffff8800331dd068
    Read of size 4 by task a.out/13960
    =============================================================================
    BUG kmalloc-512 (Tainted: G    B          ): kasan: bad access detected
    -----------------------------------------------------------------------------
    ...
    Call Trace:
     [<ffffffff815fa2ae>] __asan_report_load4_noabort+0x3e/0x40 mm/kasan/report.c:279
     [<ffffffff836938a2>] irtty_open+0x422/0x550 drivers/net/irda/irtty-sir.c:436
     [<ffffffff829f1b80>] tty_ldisc_open.isra.2+0x60/0xa0 drivers/tty/tty_ldisc.c:447
     [<ffffffff829f21c0>] tty_set_ldisc+0x1a0/0x940 drivers/tty/tty_ldisc.c:567
     [<     inline     >] tiocsetd drivers/tty/tty_io.c:2650
     [<ffffffff829da49e>] tty_ioctl+0xace/0x1fd0 drivers/tty/tty_io.c:2883
     [<     inline     >] vfs_ioctl fs/ioctl.c:43
     [<ffffffff816708ac>] do_vfs_ioctl+0x57c/0xe60 fs/ioctl.c:607
     [<     inline     >] SYSC_ioctl fs/ioctl.c:622
     [<ffffffff81671204>] SyS_ioctl+0x74/0x80 fs/ioctl.c:613
     [<ffffffff852a7876>] entry_SYSCALL_64_fastpath+0x16/0x7a

Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0f412b8aa88883f7e3059c5a2c1e56ce0dd8bf86)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocrypto: ccp - Don't assume export/import areas are aligned
Tom Lendacky [Tue, 2 Feb 2016 17:38:21 +0000 (11:38 -0600)]
crypto: ccp - Don't assume export/import areas are aligned

Orabug: 23331052

[ Upstream commit b31dde2a5cb1bf764282abf934266b7193c2bc7c ]

Use a local variable for the exported and imported state so that
alignment is not an issue. On export, set a local variable from the
request context and then memcpy the contents of the local variable to
the export memory area. On import, memcpy the import memory area into
a local variable and then use the local variable to set the request
context.

Cc: <stable@vger.kernel.org> # 3.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b053d66b66e702f74ddc986863b30eaac41e7f4c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocrypto: ccp - Limit the amount of information exported
Tom Lendacky [Fri, 29 Jan 2016 18:45:14 +0000 (12:45 -0600)]
crypto: ccp - Limit the amount of information exported

Orabug: 23331051

[ Upstream commit d1662165ae612ec8b5f94a6b07e65ea58b6dce34 ]

Since the exported information can be exposed to user-space, instead of
exporting the entire request context only export the minimum information
needed.

Cc: <stable@vger.kernel.org> # 3.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5badf7e00f0968d820bf7bba9081339bfca3489c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agotty: Fix GPF in flush_to_ldisc(), part 2
Peter Hurley [Mon, 11 Jan 2016 04:36:12 +0000 (20:36 -0800)]
tty: Fix GPF in flush_to_ldisc(), part 2

Orabug: 23331050

[ Upstream commit f33798deecbd59a2955f40ac0ae2bc7dff54c069 ]

commit 9ce119f318ba ("tty: Fix GPF in flush_to_ldisc()") fixed a
GPF caused by a line discipline which does not define a receive_buf()
method.

However, the vt driver (and speakup driver also) pushes selection
data directly to the line discipline receive_buf() method via
tty_ldisc_receive_buf(). Fix the same problem in tty_ldisc_receive_buf().

Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3beac2b0bffd8e977ef1d464dd424f22af965a96)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocrypto: ccp - Add hash state import and export support
Tom Lendacky [Tue, 12 Jan 2016 17:17:38 +0000 (11:17 -0600)]
crypto: ccp - Add hash state import and export support

Orabug: 23331048

[ Upstream commit 952bce9792e6bf36fda09c2e5718abb5d9327369 ]

Commit 8996eafdcbad ("crypto: ahash - ensure statesize is non-zero")
added a check to prevent ahash algorithms from successfully registering
if the import and export functions were not implemented. This prevents
an oops in the hash_accept function of algif_hash. This commit causes
the ccp-crypto module SHA support and AES CMAC support from successfully
registering and causing the ccp-crypto module load to fail because the
ahash import and export functions are not implemented.

Update the CCP Crypto API support to provide import and export support
for ahash algorithms.

Cc: <stable@vger.kernel.org> # 3.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 1ad241d40ef8d9a50ced5c35bf55e9a21a997516)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years ago[media] usbvision: fix crash on detecting device with invalid configuration
Vladis Dronov [Mon, 16 Nov 2015 17:55:11 +0000 (15:55 -0200)]
[media] usbvision: fix crash on detecting device with invalid configuration

Orabug: stable_rc4

[ Upstream commit fa52bd506f274b7619955917abfde355e3d19ffe ]

The usbvision driver crashes when a specially crafted usb device with invalid
number of interfaces or endpoints is detected. This fix adds checks that the
device has proper configuration expected by the driver.

Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 37dee22181885e7847e8c95843b6e94138edbd43)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoinclude/linux/poison.h: fix LIST_POISON{1,2} offset
Vasily Kulikov [Wed, 9 Sep 2015 22:36:00 +0000 (15:36 -0700)]
include/linux/poison.h: fix LIST_POISON{1,2} offset

Orabug: 23331045

[ Upstream commit 8a5e5e02fc83aaf67053ab53b359af08c6c49aaf ]

Poison pointer values should be small enough to find a room in
non-mmap'able/hardly-mmap'able space.  E.g.  on x86 "poison pointer space"
is located starting from 0x0.  Given unprivileged users cannot mmap
anything below mmap_min_addr, it should be safe to use poison pointers
lower than mmap_min_addr.

The current poison pointer values of LIST_POISON{1,2} might be too big for
mmap_min_addr values equal or less than 1 MB (common case, e.g.  Ubuntu
uses only 0x10000).  There is little point to use such a big value given
the "poison pointer space" below 1 MB is not yet exhausted.  Changing it
to a smaller value solves the problem for small mmap_min_addr setups.

The values are suggested by Solar Designer:
http://www.openwall.com/lists/oss-security/2015/05/02/6

Signed-off-by: Vasily Kulikov <segoon@openwall.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 46460a03f44f1915ded434057fa46332438b3a6e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoKEYS: Fix handling of stored error in a negatively instantiated user key
David Howells [Tue, 24 Nov 2015 21:36:31 +0000 (21:36 +0000)]
KEYS: Fix handling of stored error in a negatively instantiated user key

Orabug: stable_rc4

[ Upstream commit 096fe9eaea40a17e125569f9e657e34cdb6d73bd ]

If a user key gets negatively instantiated, an error code is cached in the
payload area.  A negatively instantiated key may be then be positively
instantiated by updating it with valid data.  However, the ->update key
type method must be aware that the error code may be there.

The following may be used to trigger the bug in the user key type:

    keyctl request2 user user "" @u
    keyctl add user user "a" @u

which manifests itself as:

BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
PGD 7cc30067 PUD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000
RIP: 0010:[<ffffffff810a376f>]  [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280
 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
RSP: 0018:ffff88003dd8bdb0  EFLAGS: 00010246
RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82
RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82
R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700
FS:  0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0
Stack:
 ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82
 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5
 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620
Call Trace:
 [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129
 [<     inline     >] __key_update security/keys/key.c:730
 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908
 [<     inline     >] SYSC_add_key security/keys/keyctl.c:125
 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185

Note the error code (-ENOKEY) in EDX.

A similar bug can be tripped by:

    keyctl request2 trusted user "" @u
    keyctl add trusted user "a" @u

This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d979e967f848caf908a1401b7ad67cf13f06ef9f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoKVM: x86: Reload pit counters for all channels when restoring state
Andrew Honig [Wed, 18 Nov 2015 22:50:23 +0000 (14:50 -0800)]
KVM: x86: Reload pit counters for all channels when restoring state

Orabug: 23331042

[ Upstream commit 0185604c2d82c560dab2f2933a18f797e74ab5a8 ]

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.

Signed-off-by: Andy Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 90352f3f473a29db1289ec31facc1ac18cc66e9e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoovl: fix permission checking for setattr
Miklos Szeredi [Fri, 4 Dec 2015 18:18:48 +0000 (19:18 +0100)]
ovl: fix permission checking for setattr

Orabug: 23331041

[ Upstream commit acff81ec2c79492b180fade3c2894425cd35a545 ]

[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr()
away - instead of "copy verbatim with metadata" + "chmod/chown/utimes"
(with the former being always safe and the latter failing in case of
insufficient permissions) it tries to combine these two.  Note that copyup
itself will have to do ->setattr() anyway; _that_ is where the elevated
capabilities are right.  Having these two ->setattr() (one to set verbatim
copy of metadata, another to do what overlayfs ->setattr() had been asked
to do in the first place) combined is where it breaks.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 2cadb57dff500076a87b934cac64bb5a2293b644)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobtrfs: async-thread: Fix a use-after-free error for trace
Qu Wenruo [Fri, 22 Jan 2016 01:28:38 +0000 (09:28 +0800)]
btrfs: async-thread: Fix a use-after-free error for trace

Orabug: 23331040

[ Upstream commit 0a95b851370b84a4b9d92ee6d1fa0926901d0454 ]

Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
So no one use use that pointer after queue_work().

Fix the user-after-free bug by move the trace line before queue_work().

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b9a54ed91c7bbd5c18a4170be078d9f7e28560ed)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobtrfs: Fix no_space in write and rm loop
Zhao Lei [Tue, 1 Dec 2015 10:39:40 +0000 (18:39 +0800)]
btrfs: Fix no_space in write and rm loop

Orabug: 23331039

[ Upstream commit 08acfd9dd845dc052c5eae33e6c3976338070069 ]

commit e1746e8381cd2af421f75557b5cae3604fc18b35 upstream.

I see no_space in v4.4-rc1 again in xfstests generic/102.
It happened randomly in some node only.
(one of 4 phy-node, and a kvm with non-virtio block driver)

By bisect, we can found the first-bad is:
 commit bdced438acd8 ("block: setup bi_phys_segments after splitting")'
But above patch only triggered the bug by making bio operation
faster(or slower).

Main reason is in our space_allocating code, we need to commit
page writeback before wait it complish, this patch fixed above
bug.

BTW, there is another reason for generic/102 fail, caused by
disable default mixed-blockgroup, I'll fix it in xfstests.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d5b55a7aae08c0e0785430126bcc4a9ae7f5c737)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobtrfs: wait for delayed iputs on no space
Zhao Lei [Thu, 9 Apr 2015 04:34:43 +0000 (12:34 +0800)]
btrfs: wait for delayed iputs on no space

Orabug: 23331038

[ Upstream commit 9a4e7276d39071576d369e607d7accb84b41d0b4 ]

btrfs will report no_space when we run following write and delete
file loop:
 # FILE_SIZE_M=[ 75% of fs space ]
 # DEV=[ some dev ]
 # MNT=[ some dir ]
 #
 # mkfs.btrfs -f "$DEV"
 # mount -o nodatacow "$DEV" "$MNT"
 # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done
 #

Reason:
 iput() and evict() is run after write pages to block device, if
 write pages work is not finished before next write, the "rm"ed space
 is not freed, and caused above bug.

Fix:
 We can add "-o flushoncommit" mount option to avoid above bug, but
 it have performance problem. Actually, we can to wait for on-the-fly
 writes only when no-space happened, it is which this patch do.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 42bd8f4fda813558c3045c60ad6436b1c7430ec7)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agosecurity: let security modules use PTRACE_MODE_* with bitmasks
Jann Horn [Wed, 20 Jan 2016 23:00:01 +0000 (15:00 -0800)]
security: let security modules use PTRACE_MODE_* with bitmasks

Orabug: 23331036

[ Upstream commit 3dfb7d8cdbc7ea0c2970450e60818bb3eefbad69 ]

It looks like smack and yama weren't aware that the ptrace mode
can have flags ORed into it - PTRACE_MODE_NOAUDIT until now, but
only for /proc/$pid/stat, and with the PTRACE_MODE_*CREDS patch,
all modes have flags ORed into them.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit ee6ad435c9872610c5a52cad02e331951cf2fb25)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agox86/entry/compat: Add missing CLAC to entry_INT80_32
Andy Lutomirski [Wed, 24 Feb 2016 20:18:49 +0000 (12:18 -0800)]
x86/entry/compat: Add missing CLAC to entry_INT80_32

Orabug: 23331033

[ Upstream commit 3d44d51bd339766f0178f0cf2e8d048b4a4872aa ]

This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.

I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.

Stable maintainers: I can provide a backport to 4.3 and earlier if
needed.  This needs to be backported all the way to 3.10.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.10 and later
Fixes: 63bcff2a307b ("x86, smap: Add STAC and CLAC instructions to control user space access")
Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 1f9780e372264c0cd71571c94da08cc49ae327e3)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agokernel/resource.c: fix muxed resource handling in __request_region()
Simon Guinot [Wed, 9 Sep 2015 22:15:18 +0000 (00:15 +0200)]
kernel/resource.c: fix muxed resource handling in __request_region()

Orabug: 23331032

[ Upstream commit 59ceeaaf355fa0fb16558ef7c24413c804932ada ]

In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released.  A pointer on the conflicting resource is kept.  At wake-up
this pointer is used as a parent to retry to request the region.

A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed).  Another
problem is that the next call to __request_region() fails to detect a
remaining conflict.  The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself.  It is likely
to succeed anyway, even if there is still a conflict.

Instead, the parent of the conflicting resource should be passed to
__request_region().

As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.

Reported-and-tested-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit aa1311b426d5cc249887c8cbfa21a6dda2c5a201)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobtrfs: initialize the seq counter in struct btrfs_device
Sebastian Andrzej Siewior [Fri, 15 Jan 2016 13:37:15 +0000 (14:37 +0100)]
btrfs: initialize the seq counter in struct btrfs_device

Orabug: 23331031

[ Upstream commit 546bed631203344611f42b2af1d224d2eedb4e6b ]

I managed to trigger this:
| INFO: trying to register non-static key.
| the code is fine but needs lockdep annotation.
| turning off the locking correctness validator.
| CPU: 1 PID: 781 Comm: systemd-gpt-aut Not tainted 4.4.0-rt2+ #14
| Hardware name: ARM-Versatile Express
| [<80307cec>] (dump_stack)
| [<80070e98>] (__lock_acquire)
| [<8007184c>] (lock_acquire)
| [<80287800>] (btrfs_ioctl)
| [<8012a8d4>] (do_vfs_ioctl)
| [<8012ac14>] (SyS_ioctl)

so I think that btrfs_device_data_ordered_init() is not invoked behind
a macro somewhere.

Fixes: 7cc8e58d53cd ("Btrfs: fix unprotected device's variants on 32bits machine")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 2068256b08824ad53b28fb08952b62dd35e66593)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume...
Chandan Rajendra [Thu, 7 Jan 2016 13:26:59 +0000 (18:56 +0530)]
Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots

Orabug: stable_rc4

[ Upstream commit f32e48e925964c4f8ab917850788a87e1cef3bad ]

The following call trace is seen when btrfs/031 test is executed in a loop,

[  158.661848] ------------[ cut here ]------------
[  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
[  158.664102] BTRFS: Transaction aborted (error -2)
[  158.664774] Modules linked in:
[  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2
[  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
[  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
[  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
[  158.670769] Call Trace:
[  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
[  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
[  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
[  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
[  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
[  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
[  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
[  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
[  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
[  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
[  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
[  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
[  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
[  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
[  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
[  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
[  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
[  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
[  158.709508] BTRFS info (device loop0): forced readonly
[  158.737113] BTRFS info (device loop0): disk space caching is enabled
[  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
[  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
  btrfs_drop_snapshot()
    Add root corresponding to subvol 257 into
    btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
  257 is returned as the next free objectid
  btrfs_read_fs_root_no_name()
    Finds the btrfs_root instance corresponding to the old subvol with ID 257
    in btrfs_fs_info->fs_roots_radix.
    Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit c19cd7e350c5ad9f3f4b3e486dbb4dab737a22a4)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: fix transaction handle leak on failure to create hard link
Filipe Manana [Tue, 5 Jan 2016 16:24:05 +0000 (16:24 +0000)]
Btrfs: fix transaction handle leak on failure to create hard link

Orabug: 23331029

[ Upstream commit 271dba4521aed0c37c063548f876b49f5cd64b2e ]

If we failed to create a hard link we were not always releasing the
the transaction handle we got before, resulting in a memory leak and
preventing any other tasks from being able to commit the current
transaction.
Fix this by always releasing our transaction handle.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 9bf972e8aa6110d750cf1ddab68511f478a6a751)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: fix number of transaction units required to create symlink
Filipe Manana [Thu, 31 Dec 2015 18:16:29 +0000 (18:16 +0000)]
Btrfs: fix number of transaction units required to create symlink

Orabug: 23331027

[ Upstream commit 9269d12b2d57d9e3d13036bb750762d1110d425c ]

We weren't accounting for the insertion of an inline extent item for the
symlink inode nor that we need to update the parent inode item (through
the call to btrfs_add_nondir()). So fix this by including two more
transaction units.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a1f535acffbd95ae6ae81656e8bba39af094c3f0)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: send, don't BUG_ON() when an empty symlink is found
Filipe Manana [Thu, 31 Dec 2015 18:07:59 +0000 (18:07 +0000)]
Btrfs: send, don't BUG_ON() when an empty symlink is found

Orabug: 23331026

[ Upstream commit a879719b8c90e15c9e7fa7266d5e3c0ca962f9df ]

When a symlink is successfully created it always has an inline extent
containing the source path. However if an error happens when creating
the symlink, we can leave in the subvolume's tree a symlink inode without
any such inline extent item - this happens if after btrfs_symlink() calls
btrfs_end_transaction() and before it calls the inode eviction handler
(through the final iput() call), the transaction gets committed and a
crash happens before the eviction handler gets called, or if a snapshot
of the subvolume is made before the eviction handler gets called. Sadly
we can't just avoid this by making btrfs_symlink() call
btrfs_end_transaction() after it calls the eviction handler, because the
later can commit the current transaction before it removes any items from
the subvolume tree (if it encounters ENOSPC errors while reserving space
for removing all the items).

So make send fail more gracefully, with an -EIO error, and print a
message to dmesg/syslog informing that there's an empty symlink inode,
so that the user can delete the empty symlink or do something else
about it.

Reported-by: Stephen R. van den Berg <srb@cuci.nl>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e92c51b734d57e04f0c9b43106b5294834339be6)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agobtrfs: statfs: report zero available if metadata are exhausted
David Sterba [Sat, 10 Oct 2015 15:59:53 +0000 (17:59 +0200)]
btrfs: statfs: report zero available if metadata are exhausted

Orabug: 23331025

[ Upstream commit ca8a51b3a979d57b082b14eda38602b7f52d81d1 ]

There is one ENOSPC case that's very confusing. There's Available
greater than zero but no file operation succeds (besides removing
files). This happens when the metadata are exhausted and there's no
possibility to allocate another chunk.

In this scenario it's normal that there's still some space in the data
chunk and the calculation in df reflects that in the Avail value.

To at least give some clue about the ENOSPC situation, let statfs report
zero value in Avail, even if there's still data space available.

Current:
  /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test

New:
  /dev/sdb1             4.0G  3.3G     0 100% /mnt/test

We calculate the remaining metadata space minus global reserve. If this
is (supposedly) smaller than zero, there's no space. But this does not
hold in practice, the exhausted state happens where's still some
positive delta. So we apply some guesswork and compare the delta to a 4M
threshold. (Practically observed delta was 2M.)

We probably cannot calculate the exact threshold value because this
depends on the internal reservations requested by various operations, so
some operations that consume a few metadata will succeed even if the
Avail is zero. But this is better than the other way around.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 4e3fa12f124507ad17f999c28ef35803596ff2c6)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: igrab inode in writepage
Josef Bacik [Thu, 22 Oct 2015 19:05:09 +0000 (15:05 -0400)]
Btrfs: igrab inode in writepage

Orabug: 23331024

[ Upstream commit be7bd730841e69fe8f70120098596f648cd1f3ff ]

We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages.  If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic.  Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit bb055e837f904fdc80d4d82819b0a9aaf35dce4a)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoBtrfs: add missing brelse when superblock checksum fails
Anand Jain [Wed, 7 Oct 2015 09:23:23 +0000 (17:23 +0800)]
Btrfs: add missing brelse when superblock checksum fails

Orabug: 23331023

[ Upstream commit b2acdddfad13c38a1e8b927d83c3cf321f63601a ]

Looks like oversight, call brelse() when checksum fails. Further down the
code, in the non error path, we do call brelse() and so we don't see
brelse() in the goto error paths.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit c0109d289de5a48e54a2d070981a629fc241f112)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: fix bh->b_state corruption
Jan Kara [Fri, 19 Feb 2016 05:18:25 +0000 (00:18 -0500)]
ext4: fix bh->b_state corruption

Orabug: 23331022

[ Upstream commit ed8ad83808f009ade97ebbf6519bc3a97fefbc0c ]

ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

CC: stable@vger.kernel.org
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a7635d6a0849007d2192bc02c038cc1b9d91b274)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agodax: don't abuse get_block mapping for endio callbacks
Dave Chinner [Wed, 3 Jun 2015 23:18:18 +0000 (09:18 +1000)]
dax: don't abuse get_block mapping for endio callbacks

Orabug: 23331020

[ Upstream commit e842f2903908934187af7232fb5b21da527d1757 ]

dax_fault() currently relies on the get_block callback to attach an
io completion callback to the mapping buffer head so that it can
run unwritten extent conversion after zeroing allocated blocks.

Instead of this hack, pass the conversion callback directly into
dax_fault() similar to the get_block callback. When the filesystem
allocates unwritten extents, it will set the buffer_unwritten()
flag, and hence the dax_fault code can call the completion function
in the contexts where it is necessary without overloading the
mapping buffer head.

Note: The changes to ext4 to use this interface are suspect at best.
In fact, the way ext4 did this end_io assignment in the first place
looks suspect because it only set a completion callback when there
wasn't already some other write() call taking place on the same
inode. The ext4 end_io code looks rather intricate and fragile with
all it's reference counting and passing to different contexts for
modification via inode private pointers that aren't protected by
locks...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0e3029cfab4a7884b32e1b2d3c19c12eded9804a)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoiio: adis_buffer: Fix out-of-bounds memory access
Lars-Peter Clausen [Fri, 27 Nov 2015 13:55:56 +0000 (14:55 +0100)]
iio: adis_buffer: Fix out-of-bounds memory access

Orabug: 23331019

[ Upstream commit d590faf9e8f8509a0a0aa79c38e87fcc6b913248 ]

The SPI tx and rx buffers are both supposed to be scan_bytes amount of
bytes large and a common allocation is used to allocate both buffers. This
puts the beginning of the tx buffer scan_bytes bytes after the rx buffer.
The initialization of the tx buffer pointer is done adding scan_bytes to
the beginning of the rx buffer, but since the rx buffer is of type __be16
this will actually add two times as much and the tx buffer ends up pointing
after the allocated buffer.

Fix this by using scan_count, which is scan_bytes / 2, instead of
scan_bytes when initializing the tx buffer pointer.

Fixes: aacff892cbd5 ("staging:iio:adis: Preallocate transfer message")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3e77cb858ce67861e142186e3b131c04ddbc6edd)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoptrace: use fsuid, fsgid, effective creds for fs access checks
Jann Horn [Wed, 20 Jan 2016 23:00:04 +0000 (15:00 -0800)]
ptrace: use fsuid, fsgid, effective creds for fs access checks

Orabug: 23331018

[ Upstream commit caaee6234d05a58c5b4d05e7bf766131b810a657 ]

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit ab88ce5feca4204ecf4e7ef6c6693ff67edc2169)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agosched: Fix crash in sched_init_numa()
Raghavendra K T [Fri, 15 Jan 2016 19:01:23 +0000 (00:31 +0530)]
sched: Fix crash in sched_init_numa()

Orabug: 23331017

[ Upstream commit 9c03ee147193645be4c186d3688232fa438c57c7 ]

The following PowerPC commit:

  c118baf80256 ("arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodes")

avoids allocating bootmem memory for non existent nodes.

But when DEBUG_PER_CPU_MAPS=y is enabled, my powerNV system failed to boot
because in sched_init_numa(), cpumask_or() operation was done on
unallocated nodes.

Fix that by making cpumask_or() operation only on existing nodes.

[ Tested with and w/o DEBUG_PER_CPU_MAPS=y on x86 and PowerPC. ]

Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <gkurz@linux.vnet.ibm.com>
Cc: <grant.likely@linaro.org>
Cc: <nikunj@linux.vnet.ibm.com>
Cc: <vdavydov@parallels.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: <linux-mm@kvack.org>
Cc: <peterz@infradead.org>
Cc: <benh@kernel.crashing.org>
Cc: <paulus@samba.org>
Cc: <mpe@ellerman.id.au>
Cc: <anton@samba.org>
Link: http://lkml.kernel.org/r/1452884483-11676-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8cf0abcfb3b1ce60a9bd866db451a093dc015233)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoHID: usbhid: fix recursive deadlock
Ioan-Adrian Ratiu [Fri, 20 Nov 2015 20:19:02 +0000 (22:19 +0200)]
HID: usbhid: fix recursive deadlock

Orabug: 23331016

[ Upstream commit e470127e9606b1fa151c4184243e61296d1e0c0f ]

The critical section protected by usbhid->lock in hid_ctrl() is too
big and because of this it causes a recursive deadlock. "Too big" means
the case statement and the call to hid_input_report() do not need to be
protected by the spinlock (no URB operations are done inside them).

The deadlock happens because in certain rare cases drivers try to grab
the lock while handling the ctrl irq which grabs the lock before them
as described above. For example newer wacom tablets like 056a:033c try
to reschedule proximity reads from wacom_intuos_schedule_prox_event()
calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
which tries to grab the usbhid lock already held by hid_ctrl().

There are two ways to get out of this deadlock:
    1. Make the drivers work "around" the ctrl critical region, in the
    wacom case for ex. by delaying the scheduling of the proximity read
    request itself to a workqueue.
    2. Shrink the critical region so the usbhid lock protects only the
    instructions which modify usbhid state, calling hid_input_report()
    with the spinlock unlocked, allowing the device driver to grab the
    lock first, finish and then grab the lock afterwards in hid_ctrl().

This patch implements the 2nd solution.

Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit af24c621219ec87b221c1bbade56a506bf09deb9)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoperf/core: Fix perf_sched_count derailment
Alexander Shishkin [Thu, 24 Mar 2016 11:14:53 +0000 (11:14 +0000)]
perf/core: Fix perf_sched_count derailment

Orabug: 23331015

[ Upstream commit 927a5570855836e5d5859a80ce7e91e963545e8f ]

The error path in perf_event_open() is such that asking for a sampling
event on a PMU that doesn't generate interrupts will end up in dropping
the perf_sched_count even though it hasn't been incremented for this
event yet.

Given a sufficient amount of these calls, we'll end up disabling
scheduler's jump label even though we'd still have active events in the
system, thereby facilitating the arrival of the infernal regions upon us.

I'm fixing this by moving account_event() inside perf_event_alloc().

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1456917854-29427-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 919e67a63aa967566a909f4f6e1c13f8e88cf76e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoperf: Cure event->pending_disable race
Peter Zijlstra [Thu, 24 Mar 2016 11:14:52 +0000 (11:14 +0000)]
perf: Cure event->pending_disable race

Orabug: 23331014

[ Upstream commit 28a967c3a2f99fa3b5f762f25cb2a319d933571b ]

Because event_sched_out() checks event->pending_disable _before_
actually disabling the event, it can happen that the event fires after
it checks but before it gets disabled.

This would leave event->pending_disable set and the queued irq_work
will try and process it.

However, if the event trigger was during schedule(), the event might
have been de-scheduled by the time the irq_work runs, and
perf_event_disable_local() will fail.

Fix this by checking event->pending_disable _after_ we call
event->pmu->del(). This depends on the latter being a compiler
barrier, such that the compiler does not lift the load and re-creates
the problem.

Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 882f862db7f3509f055208b2e3e5bd263265a03b)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoperf: Do not double free
Peter Zijlstra [Thu, 24 Mar 2016 11:14:51 +0000 (11:14 +0000)]
perf: Do not double free

Orabug: 23331013

[ Upstream commit 130056275ade730e7a79c110212c8815202773ee ]

In case of: err_file: fput(event_file), we'll end up calling
perf_release() which in turn will free the event.

Do not then free the event _again_.

Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5709e7ba03717ab760b5ad6ebcf4a9e2f633dcc4)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: fix races of writeback with punch hole and zero range
Jan Kara [Mon, 7 Dec 2015 19:34:49 +0000 (14:34 -0500)]
ext4: fix races of writeback with punch hole and zero range

Orabug: 23331012

When doing delayed allocation, update of on-disk inode size is postponed
until IO submission time. However hole punch or zero range fallocate
calls can end up discarding the tail page cache page and thus on-disk
inode size would never be properly updated.

Make sure the on-disk inode size is updated before truncating page
cache.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Mingming Cao <mingming.cao@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit f2b132595b89d9236b386e1d6ed3fcf5e9edf4cb)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: fix races between buffered IO and collapse / insert range
Jan Kara [Mon, 7 Dec 2015 19:31:11 +0000 (14:31 -0500)]
ext4: fix races between buffered IO and collapse / insert range

Orabug: 23331011

Current code implementing FALLOC_FL_COLLAPSE_RANGE and
FALLOC_FL_INSERT_RANGE is prone to races with buffered writes and page
faults. If buffered write or write via mmap manages to squeeze between
filemap_write_and_wait_range() and truncate_pagecache() in the fallocate
implementations, the written data is simply discarded by
truncate_pagecache() although it should have been shifted.

Fix the problem by moving filemap_write_and_wait_range() call inside
i_mutex and i_mmap_sem. That way we are protected against races with
both buffered writes and page faults.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Mingming Cao <mingming.cao@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 181aaebde9360b8235df647ee36dafdc041d4964)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: move unlocked dio protection from ext4_alloc_file_blocks()
Jan Kara [Mon, 7 Dec 2015 19:29:17 +0000 (14:29 -0500)]
ext4: move unlocked dio protection from ext4_alloc_file_blocks()

Orabug: 23331010

Currently ext4_alloc_file_blocks() was handling protection against
unlocked DIO. However we now need to sometimes call it under i_mmap_sem
and sometimes not and DIO protection ranks above it (although strictly
speaking this cannot currently create any deadlocks). Also
ext4_zero_range() was actually getting & releasing unlocked DIO
protection twice in some cases. Luckily it didn't introduce any real bug
but it was a land mine waiting to be stepped on.  So move DIO protection
out from ext4_alloc_file_blocks() into the two callsites.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Mingming Cao <mingming.cao@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 9621787d69783fc23d14e1332377d7170d6928ed)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoext4: fix races between page faults and hole punching
Jan Kara [Mon, 7 Dec 2015 19:28:03 +0000 (14:28 -0500)]
ext4: fix races between page faults and hole punching

Orabug: 23331007

Currently, page faults and hole punching are completely unsynchronized.
This can result in page fault faulting in a page into a range that we
are punching after truncate_pagecache_range() has been called and thus
we can end up with a page mapped to disk blocks that will be shortly
freed. Filesystem corruption will shortly follow. Note that the same
race is avoided for truncate by checking page fault offset against
i_size but there isn't similar mechanism available for punching holes.

Fix the problem by creating new rw semaphore i_mmap_sem in inode and
grab it for writing over truncate, hole punching, and other functions
removing blocks from extent tree and for read over page faults. We
cannot easily use i_data_sem for this since that ranks below transaction
start and we need something ranking above it so that it can be held over
the whole truncate / hole punching operation. Also remove various
workarounds we had in the code to reduce race window when page fault
could have created pages with stale mapping information.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Mingming Cao <mingming.cao@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 248766f068fd1d3d95479f470bc926d1136141d6)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoKVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Paolo Bonzini [Tue, 8 Mar 2016 11:13:39 +0000 (12:13 +0100)]
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

Orabug: stable_rc4

[ Upstream commit 844a5fe219cf472060315971e15cbf97674a3324 ]

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution.  This will still cause a user write to fault, while
supervisor writes will succeed.  User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit eac525506a083a389ba173880979a6291401af2d)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoKVM: VMX: disable PEBS before a guest entry
Radim Krčmář [Fri, 4 Mar 2016 14:08:42 +0000 (15:08 +0100)]
KVM: VMX: disable PEBS before a guest entry

Orabug: 23331004

[ Upstream commit 7099e2e1f4d9051f31bbfa5803adf954bb5d76ef ]

Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
Cc: <stable@vger.kernel.org>
Reported-by: Jiří Olša <jolsa@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit eb34a645aee3906baac9cad7defdabf61ac40bfd)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agojffs2: reduce the breakage on recovery from halfway failed rename()
Al Viro [Tue, 8 Mar 2016 04:07:10 +0000 (23:07 -0500)]
jffs2: reduce the breakage on recovery from halfway failed rename()

Orabug: 23331003

[ Upstream commit f93812846f31381d35c04c6c577d724254355e7f ]

d_instantiate(new_dentry, old_inode) is absolutely wrong thing to
do - it will oops if new_dentry used to be positive, for starters.
What we need is d_invalidate() the target and be done with that.

Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit c62aadae234ffad0901c20ac1a1aa4e13cce1c20)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoncpfs: fix a braino in OOM handling in ncp_fill_cache()
Al Viro [Tue, 8 Mar 2016 03:17:07 +0000 (22:17 -0500)]
ncpfs: fix a braino in OOM handling in ncp_fill_cache()

Orabug: 23331002

[ Upstream commit 803c00123a8012b3a283c0530910653973ef6d8f ]

Failing to allocate an inode for child means that cache for *parent* is
incompletely populated.  So it's parent directory inode ('dir') that
needs NCPI_DIR_CACHE flag removed, *not* the child inode ('inode', which
is what we'd failed to allocate in the first place).

Fucked-up-in: commit 5e993e25 ("ncpfs: get rid of d_validate() nonsense")
Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 906e5a6e6e73316fa4741ca53be014c9477a100c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoovl: copy new uid/gid into overlayfs runtime inode
Konstantin Khlebnikov [Sun, 31 Jan 2016 13:21:29 +0000 (16:21 +0300)]
ovl: copy new uid/gid into overlayfs runtime inode

Orabug: 23331001

[ Upstream commit b81de061fa59f17d2730aabb1b84419ef3913810 ]

Overlayfs must update uid/gid after chown, otherwise functions
like inode_owner_or_capable() will check user against stale uid.
Catched by xfstests generic/087, it chowns file and calls utimes.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 546a8b3c4059af5fd8466f3d1848321e7613904c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoovl: ignore lower entries when checking purity of non-directory entries
Konstantin Khlebnikov [Sun, 31 Jan 2016 13:17:53 +0000 (16:17 +0300)]
ovl: ignore lower entries when checking purity of non-directory entries

Orabug: stable_rc4

[ Upstream commit 45d11738969633ec07ca35d75d486bf2d8918df6 ]

After rename file dentry still holds reference to lower dentry from
previous location. This doesn't matter for data access because data comes
from upper dentry. But this stale lower dentry taints dentry at new
location and turns it into non-pure upper. Such file leaves visible
whiteout entry after remove in directory which shouldn't have whiteouts at
all.

Overlayfs already tracks pureness of file location in oe->opaque.  This
patch just uses that for detecting actual path type.

Comment from Vivek Goyal's patch:

Here are the details of the problem. Do following.

$ mkdir upper lower work merged upper/dir/
$ touch lower/test
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=
work merged
$ mv merged/test merged/dir/
$ rm merged/dir/test
$ ls -l merged/dir/
/usr/bin/ls: cannot access merged/dir/test: No such file or directory
total 0
c????????? ? ? ? ?            ? test

Basic problem seems to be that once a file has been unlinked, a whiteout
has been left behind which was not needed and hence it becomes visible.

Whiteout is visible because parent dir is of not type MERGE, hence
od->is_real is set during ovl_dir_open(). And that means ovl_iterate()
passes on iterate handling directly to underlying fs. Underlying fs does
not know/filter whiteouts so it becomes visible to user.

Why did we leave a whiteout to begin with when we should not have.
ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave
whiteout if file is pure upper. In this case file is not found to be pure
upper hence whiteout is left.

So why file was not PURE_UPPER in this case? I think because dentry is
still carrying some leftover state which was valid before rename. For
example, od->numlower was set to 1 as it was a lower file. After rename,
this state is not valid anymore as there is no such file in lower.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Viktor Stanchev <me@viktorstanchev.com>
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 091baa9c784fe57b8778a4b754931ffe57245db3)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoovl: fix getcwd() failure after unsuccessful rmdir
Rui Wang [Fri, 8 Jan 2016 15:09:59 +0000 (23:09 +0800)]
ovl: fix getcwd() failure after unsuccessful rmdir

Orabug: 23330998

[ Upstream commit ce9113bbcbf45a57c082d6603b9a9f342be3ef74 ]

ovl_remove_upper() should do d_drop() only after it successfully
removes the dir, otherwise a subsequent getcwd() system call will
fail, breaking userspace programs.

This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e786702fff38e2b5142029d6de615abf1c8e436f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomac80211: check PN correctly for GCMP-encrypted fragmented MPDUs
Johannes Berg [Fri, 26 Feb 2016 21:13:40 +0000 (22:13 +0100)]
mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs

Orabug: 23330996

[ Upstream commit 9acc54beb474c81148e2946603d141cf8716b19f ]

Just like for CCMP we need to check that for GCMP the fragments
have PNs that increment by one; the spec was updated to fix this
security issue and now has the following text:

The receiver shall discard MSDUs and MMPDUs whose constituent
MPDU PN values are not incrementing in steps of 1.

Adapt the code for CCMP to work for GCMP as well, luckily the
relevant fields already alias each other so no code duplication
is needed (just check the aliasing with BUILD_BUG_ON.)

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 87e0016ccb1f9cbe377d4af19cb840acbbdff206)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocan: gs_usb: fixed disconnect bug by removing erroneous use of kfree()
Maximilain Schneider [Tue, 23 Feb 2016 01:17:28 +0000 (01:17 +0000)]
can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()

Orabug: stable_rc4

[ Upstream commit e9a2d81b1761093386a0bb8a4f51642ac785ef63 ]

gs_destroy_candev() erroneously calls kfree() on a struct gs_can *, which is
allocated through alloc_candev() and should instead be freed using
free_candev() alone.

The inappropriate use of kfree() causes the kernel to hang when
gs_destroy_candev() is called.

Only the struct gs_usb * which is allocated through kzalloc() should be freed
using kfree() when the device is disconnected.

Signed-off-by: Maximilian Schneider <max@schneidersoft.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 01ff3a0a01366a231593476cfe775596ebdba30f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomac80211: fix use of uninitialised values in RX aggregation
Chris Bainbridge [Wed, 27 Jan 2016 15:46:18 +0000 (15:46 +0000)]
mac80211: fix use of uninitialised values in RX aggregation

Orabug: 23330994

[ Upstream commit f39ea2690bd61efec97622c48323f40ed6e16317 ]

Use kzalloc instead of kmalloc for struct tid_ampdu_rx to
initialize the "removed" field (all others are initialized
manually). That fixes:

UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29
load of value 2 is not a valid value for type '_Bool'
CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265
Workqueue: phy0 rt2x00usb_work_rxdone
 0000000000000004 ffff880254a7ba50 ffffffff8181d866 0000000000000007
 ffff880254a7ba78 ffff880254a7ba68 ffffffff8188422d ffffffff8379b500
 ffff880254a7bab8 ffffffff81884747 0000000000000202 0000000348620032
Call Trace:
 [<ffffffff8181d866>] dump_stack+0x45/0x5f
 [<ffffffff8188422d>] ubsan_epilogue+0xd/0x40
 [<ffffffff81884747>] __ubsan_handle_load_invalid_value+0x67/0x70
 [<ffffffff82227b4d>] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730
 [<ffffffff8222ca14>] ieee80211_prepare_and_rx_handle+0xd04/0x1c00
 [<ffffffff8222db03>] __ieee80211_rx_handle_packet+0x1f3/0x750
 [<ffffffff8222e4a7>] ieee80211_rx_napi+0x447/0x990

While at it, convert to use sizeof(*tid_agg_rx) instead.

Fixes: 788211d81bfdf ("mac80211: fix RX A-MPDU session reorder timer deletion")
Cc: stable@vger.kernel.org
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
[reword commit message, use sizeof(*tid_agg_rx)]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d5bb89facc7b689292d85471be1fdbae1928e224)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoipv6: drop frames with attached skb->sk in forwarding
Hannes Frederic Sowa [Thu, 8 Oct 2015 16:19:53 +0000 (18:19 +0200)]
ipv6: drop frames with attached skb->sk in forwarding

Orabug: 23330993

[ Upstream commit 9ef2e965e55481a52d6d91ce61977a27836268d3 ]

This is a clone of commit 2ab957492d13b ("ip_forward: Drop frames with
attached skb->sk") for ipv6.

This commit has exactly the same reasons as the above mentioned commit,
namely to prevent panics during netfilter reload or a misconfigured stack.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b014bae072a1dad1767c5c6e7e7b480165685c3e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoubi: Fix out of bounds write in volume update code
Richard Weinberger [Sun, 21 Feb 2016 09:53:03 +0000 (10:53 +0100)]
ubi: Fix out of bounds write in volume update code

Orabug: 23330992

[ Upstream commit e4f6daac20332448529b11f09388f1d55ef2084c ]

ubi_start_leb_change() allocates too few bytes.
ubi_more_leb_change_data() will write up to req->upd_bytes +
ubi->min_io_size bytes.

Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 325940deb74b23351f507d5f1e1e01592c1efa1c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoPM / sleep / x86: Fix crash on graph trace through x86 suspend
Todd E Brandt [Thu, 3 Mar 2016 00:05:29 +0000 (16:05 -0800)]
PM / sleep / x86: Fix crash on graph trace through x86 suspend

Orabug: 23330991

[ Upstream commit 92f9e179a702a6adbc11e2fedc76ecd6ffc9e3f7 ]

Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.

To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8ef267aabd98f9df0279b9bb4245a3b985ead692)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agouse ->d_seq to get coherency between ->d_inode and ->d_flags
Al Viro [Mon, 29 Feb 2016 17:12:46 +0000 (12:12 -0500)]
use ->d_seq to get coherency between ->d_inode and ->d_flags

Orabug: 23330989

[ Upstream commit a528aca7f359f4b0b1d72ae406097e491a5ba9ea ]

Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit c8ce76e3c6cd937e8b3fd8ae3573f23767b70eca)

Conflict:

fs/dcache.c

9 years agoALSA: hdspm: Fix zero-division
Takashi Iwai [Mon, 29 Feb 2016 13:32:42 +0000 (14:32 +0100)]
ALSA: hdspm: Fix zero-division

Orabug: 23330988

[ Upstream commit c1099c3294c2344110085a38c50e478a5992b368 ]

HDSPM driver contains a code issuing zero-division potentially in
system sample rate ctl code.  This patch fixes it by not processing
a zero or invalid rate value as a divisor, as well as excluding the
invalid value to be passed via the given ctl element.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 6177e82a6b6586a057e2f00940e2e220b993547e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoALSA: hdspm: Fix wrong boolean ctl value accesses
Takashi Iwai [Mon, 29 Feb 2016 13:25:16 +0000 (14:25 +0100)]
ALSA: hdspm: Fix wrong boolean ctl value accesses

Orabug: 23330987

[ Upstream commit 537e48136295c5860a92138c5ea3959b9542868b ]

snd-hdspm driver accesses enum item values (int) instead of boolean
values (long) wrongly for some ctl elements.  This patch fixes them.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b9800dd1d9eeb4b5f81a485b270702a02b142b9b)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agoCIFS: Fix SMB2+ interim response processing for read requests
Pavel Shilovsky [Sat, 27 Feb 2016 08:58:18 +0000 (11:58 +0300)]
CIFS: Fix SMB2+ interim response processing for read requests

Orabug: 23330986

[ Upstream commit 6cc3b24235929b54acd5ecc987ef11a425bd209e ]

For interim responses we only need to parse a header and update
a number credits. Now it is done for all SMB2+ command except
SMB2_READ which is wrong. Fix this by adding such processing.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 78b821d76e779822877604052f06219294e9e038)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agocifs: fix out-of-bounds access in lease parsing
Justin Maggard [Tue, 9 Feb 2016 23:52:08 +0000 (15:52 -0800)]
cifs: fix out-of-bounds access in lease parsing

Orabug: 23330983

[ Upstream commit deb7deff2f00bdbbcb3d560dad2a89ef37df837d ]

When opening a file, SMB2_open() attempts to parse the lease state from the
SMB2 CREATE Response.  However, the parsing code was not careful to ensure
that the create contexts are not empty or invalid, which can lead to out-
of-bounds memory access.  This can be seen easily by trying
to read a file from a OSX 10.11 SMB3 server.  Here is sample crash output:

BUG: unable to handle kernel paging request at ffff8800a1a77cc6
IP: [<ffffffff8828a734>] SMB2_open+0x804/0x960
PGD 8f77067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 3 PID: 2876 Comm: cp Not tainted 4.5.0-rc3.x86_64.1+ #14
Hardware name: NETGEAR ReadyNAS 314          /ReadyNAS 314          , BIOS 4.6.5 10/11/2012
task: ffff880073cdc080 ti: ffff88005b31c000 task.ti: ffff88005b31c000
RIP: 0010:[<ffffffff8828a734>]  [<ffffffff8828a734>] SMB2_open+0x804/0x960
RSP: 0018:ffff88005b31fa08  EFLAGS: 00010282
RAX: 0000000000000015 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88007eb8c8b0
RBP: ffff88005b31fad8 R08: 666666203d206363 R09: 6131613030383866
R10: 3030383866666666 R11: 00000000000002b0 R12: ffff8800660fd800
R13: ffff8800a1a77cc2 R14: 00000000424d53fe R15: ffff88005f5a28c0
FS:  00007f7c8a2897c0(0000) GS:ffff88007eb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8800a1a77cc6 CR3: 000000005b281000 CR4: 00000000000006e0
Stack:
 ffff88005b31fa70 ffffffff88278789 00000000000001d3 ffff88005f5a2a80
 ffffffff00000003 ffff88005d029d00 ffff88006fde05a0 0000000000000000
 ffff88005b31fc78 ffff88006fde0780 ffff88005b31fb2f 0000000100000fe0
Call Trace:
 [<ffffffff88278789>] ? cifsConvertToUTF16+0x159/0x2d0
 [<ffffffff8828cf68>] smb2_open_file+0x98/0x210
 [<ffffffff8811e80c>] ? __kmalloc+0x1c/0xe0
 [<ffffffff882685f4>] cifs_open+0x2a4/0x720
 [<ffffffff88122cef>] do_dentry_open+0x1ff/0x310
 [<ffffffff88268350>] ? cifsFileInfo_get+0x30/0x30
 [<ffffffff88123d92>] vfs_open+0x52/0x60
 [<ffffffff88131dd0>] path_openat+0x170/0xf70
 [<ffffffff88097d48>] ? remove_wait_queue+0x48/0x50
 [<ffffffff88133a29>] do_filp_open+0x79/0xd0
 [<ffffffff8813f2ca>] ? __alloc_fd+0x3a/0x170
 [<ffffffff881240c4>] do_sys_open+0x114/0x1e0
 [<ffffffff881241a9>] SyS_open+0x19/0x20
 [<ffffffff8896e257>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 4d 8d 6c 07 04 31 c0 4c 89 ee e8 47 6f e5 ff 31 c9 41 89 ce 44 89 f1 48 c7 c7 28 b1 bd 88 31 c0 49 01 cd 4c 89 ee e8 2b 6f e5 ff <45> 0f b7 75 04 48 c7 c7 31 b1 bd 88 31 c0 4d 01 ee 4c 89 f6 e8
RIP  [<ffffffff8828a734>] SMB2_open+0x804/0x960
 RSP <ffff88005b31fa08>
CR2: ffff8800a1a77cc6
---[ end trace d9f69ba64feee469 ]---

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 94a7d752e43717119822fee8b04be903e694017e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agovfio: fix ioctl error handling
Michael S. Tsirkin [Sun, 28 Feb 2016 14:31:39 +0000 (16:31 +0200)]
vfio: fix ioctl error handling

Orabug: 23330982

[ Upstream commit 8160c4e455820d5008a1116d2dca35f0363bb062 ]

Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.

Fix up vfio to do
return copy_to_user(...)) ?
-EFAULT : 0;

everywhere.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 1590808b43559d8330599158453f28f1b16ffd54)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agonamei: ->d_inode of a pinned dentry is stable only for positives
Al Viro [Sun, 28 Feb 2016 00:23:16 +0000 (19:23 -0500)]
namei: ->d_inode of a pinned dentry is stable only for positives

Orabug: 23330981

[ Upstream commit d4565649b6d6923369112758212b851adc407f0c ]

both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 9b77cd137fd841d7a14e1c9428cfc49f4df0306e)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agodo_last(): don't let a bogus return value from ->open() et.al. to confuse us
Al Viro [Sun, 28 Feb 2016 00:17:33 +0000 (19:17 -0500)]
do_last(): don't let a bogus return value from ->open() et.al. to confuse us

Orabug: stable_rc4

[ Upstream commit c80567c82ae4814a41287618e315a60ecf513be6 ]

... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3960cde3e356057bd60adce1b625a7d178b9581c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomm: numa: quickly fail allocations for NUMA balancing on full nodes
Mel Gorman [Fri, 26 Feb 2016 23:19:31 +0000 (15:19 -0800)]
mm: numa: quickly fail allocations for NUMA balancing on full nodes

Orabug: 23330978

[ Upstream commit 8479eba7781fa9ffb28268840de6facfc12c35a7 ]

Commit 4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
flag combination due to confusing semantics.  It noted that
alloc_misplaced_dst_page() was one such user after changes made by
commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify").

Unfortunately when GFP_THISNODE was removed, users of
alloc_misplaced_dst_page() started waking kswapd and entering direct
reclaim because the wrong GFP flags are cleared.  The consequence is
that workloads that used to fit into memory now get reclaimed which is
addressed by this patch.

The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching.  The configuration
uses 80% of memory and is run 3 times for varying numbers of clients.
The results on a 4-socket NUMA box are

mutilate
                            4.4.0                 4.4.0
                          vanilla           numaswap-v1
Hmean    1      8394.71 (  0.00%)     8395.32 (  0.01%)
Hmean    4     30024.62 (  0.00%)    34513.54 ( 14.95%)
Hmean    7     32821.08 (  0.00%)    70542.96 (114.93%)
Hmean    12    55229.67 (  0.00%)    93866.34 ( 69.96%)
Hmean    21    39438.96 (  0.00%)    85749.21 (117.42%)
Hmean    30    37796.10 (  0.00%)    50231.49 ( 32.90%)
Hmean    47    18070.91 (  0.00%)    38530.13 (113.22%)

The metric is queries/second with the more the better.  The results are
way outside of the noise and the reason for the improvement is obvious
from some of the vmstats

                                 4.4.0       4.4.0
                               vanillanumaswap-v1r1
Minor Faults                1929399272  2146148218
Major Faults                  19746529        3567
Swap Ins                      57307366        9913
Swap Outs                     50623229       17094
Allocation stalls                35909         443
DMA allocs                           0           0
DMA32 allocs                  72976349   170567396
Normal allocs               5306640898  5310651252
Movable allocs                       0           0
Direct pages scanned         404130893      799577
Kswapd pages scanned         160230174           0
Kswapd pages reclaimed        55928786           0
Direct pages reclaimed         1843936       41921
Page writes file                  2391           0
Page writes anon              50623229       17094

The vanilla kernel is swapping like crazy with large amounts of direct
reclaim and kswapd activity.  The figures are aggregate but it's known
that the bad activity is throughout the entire test.

Note that simple streaming anon/file memory consumers also see this
problem but it's not as obvious.  In those cases, kswapd is awake when
it should not be.

As there are at least two reclaim-related bugs out there, it's worth
spelling out the user-visible impact.  This patch only addresses bugs
related to excessive reclaim on NUMA hardware when the working set is
larger than a NUMA node.  There is a bug related to high kswapd CPU
usage but the reports are against laptops and other UMA hardware and is
not addressed by this patch.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 419ddc3099727a291a67d41498c3d1caddb75392)

Signed-off-by: Dan Duval <dan.duval@oracle.com>
9 years agomm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Andrea Arcangeli [Fri, 26 Feb 2016 23:19:28 +0000 (15:19 -0800)]
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

Orabug: stable_rc4

[ Upstream commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 ]

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d347d0e9ae617bd44ca7679786ebf11a06d50372)

Signed-off-by: Dan Duval <dan.duval@oracle.com>