www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/stable-cherry-picks:
btrfs: trimming some start_transaction() code away
dm flakey: fix reads to be issued if drop_writes configured

btrfs: trimming some start_transaction() code away

Orabug: 25615755

Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc().
We're just initializing most fields to 0, false and NULL later on
_anyway_, so to make the code mode readable and potentially gain
a bit of performance (completely untested claim), we should fill our
btrfs_trans_handle with zeros on allocation then just initialize
those five remaining fields (not counting the list_heads) as normal.

Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit f2f767e7345dfe56102d6809f647ba38a238f718)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Conflicts:
fs/btrfs/transaction.c

dm flakey: fix reads to be issued if drop_writes configured

v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Orabug: 25444528

Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Aniket Alshi <aniket.alshi@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/ofed:
  Revert "RDS: Make message size limit compliant with spec"
  RDS: ActiveBonding: Make its own thread for active active
  RDS: correct condition check in reconnect_timeout()
  RDS: ActiveBonding: Create a cluster sync point for failback

Revert "RDS: Make message size limit compliant with spec"

This change *partially* reverts commit
3a0891b9a49e42d05519a2f8b33c86267eb63d35

This partial revert is needed due to regression with some RDS
applications. It reverts RDS message size limit changes but retains the
QOS related changes in the data send path.

Orabug: 25472193

Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

RDS: ActiveBonding: Make its own thread for active active

The IP related work has no relation to RDS internal workers and
the active active itself is for full HOST than just RDS. Move
all the IP specific work(s) to its own dedicated workqueue.

This will reduce queue contention and also further reduce direct
link between RDS and Active bonding and help to separate the
active active code from RDS.

Orabug: 25026643

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: correct condition check in reconnect_timeout()

Correcting following condition check which can never be true:
if (rds_conn_up(conn) == RDS_CONN_DISCONNECTING)

Orabug: 25026643
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

RDS: ActiveBonding: Create a cluster sync point for failback

On hardware port linkups, at time the multi-cast joins fails
which delays the IP layer to bringup the interface quickly.
Subsequent multi-cast retry might succeed and then the IP
layer will be ready for IP migration. This happens very
sporadically on bare metal systems but more often on VM systems
and the number of multi-cast queries also goes up with number of VMs.

This create load of RC connection thrashing across the cluster
since the IP migration gets staggered which is not ideal for
active active. So we create a sync point so that entire cluster
gets synced up. This helps to reduce the thrashing and premature
failover attempts. Obviously its only applicable for failback

A user sysctl is provided "active_bonding_failback_ms"
in case there is a need to tune the sync point.

Orabug: 25026643

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
uek-rpm nano: enable ol6 secureboot signing

uek-rpm nano: enable ol6 secureboot signing

Enable image signing in uek-rpm/ol6-nano/kernel-uek.spec and add certs:
uek-rpm/ol6-nano/secureboot.cer
uek-rpm/ol6-nano/securebootca.cer

Orabug: 25422956
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/xen:
xen-netback: fix extra_info handling in xenvif_tx_err()

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/drivers:
ib_uverbs: Allocate pd in a lazy manner to conserve resources

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  net: Documentation: Fix default value tcp_limit_output_bytes
  tcp: double default TSQ output bytes limit
  scsi: qla2xxx: Get mutex lock before checking optrom_state
  kvm: x86: Check memopp before dereference (CVE-2016-8630)
  firewire: net: guard against rx buffer overflows
  USB: usbfs: fix potential infoleak in devio
  usbnet: cleanup after bind() in probe()
  cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
  cdc_ncm: Add support for moving NDP to end of NCM frame
  x86/mm/32: Enable full randomization on i386 and X86_32

xen-netback: fix extra_info handling in xenvif_tx_err()

Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72eec92accabe3ec34f27a9d3cd459bf5a877c33)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 25445336
Tested-by: Majid Valiollahzadeh <majid.valiollahzadeh@oracle.com>

net: Documentation: Fix default value tcp_limit_output_bytes

Commit c39c4c6abb89 ("tcp: double default TSQ output bytes limit")
updated default value for tcp_limit_output_bytes

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 821b414405a78c3d38921c2545b492eb974d3814)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

tcp: double default TSQ output bytes limit

Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c39c4c6abb89d24454b63798ccbae12b538206a5)
Oracle-Bug: 25424818
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Raj Matharasi <raj.matharasi@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

scsi: qla2xxx: Get mutex lock before checking optrom_state

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

        if (ha->optrom_state != QLA_SREADING)
                return 0;

        mutex_lock(&ha->optrom_mutex);
        rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
            ha->optrom_region_size);
        mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296]  [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941]  [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571]  [<ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206]  [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839]  [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466]  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Orabug: 25344639

Tested-by: Milan P. Gandhi <mgandhi@redhat.com>
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c7702b8c22712a06080e10f1d2dee1a133ec8809)
Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

kvm: x86: Check memopp before dereference (CVE-2016-8630)

Orabug: 25133227
CVE: CVE-2016-8630

Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d9092f52d7e61dd1557f2db2400ddb430e85937e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

firewire: net: guard against rx buffer overflows

Orabug: 25063191
CVE: CVE-2016-8633

The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams.  A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.

So, drop any packets carrying a fragment with offset + length larger
than datagram_size.

In addition, ensure that
  - GASP header, unfragmented encapsulation header, or fragment
    encapsulation header actually exists before we access it,
  - the encapsulated datagram or fragment is of nonzero size.

Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
(cherry picked from commit 667121ace9dbafb368618dbabcf07901c962ddac)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

USB: usbfs: fix potential infoleak in devio

Orabug: 23267548
CVE: CVE-2016-4482

The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 681fef8380eb818c0b845fca5d2ab1dcbab114ee)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

usbnet: cleanup after bind() in probe()

In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1666984c8625b3db19a9abc298931d35ab7bc64b)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind

usbnet_link_change will call schedule_work and should be
avoided if bind is failing. Otherwise we will end up with
scheduled work referring to a netdev which has gone away.

Instead of making the call conditional, we can just defer
it to usbnet_probe, using the driver_info flag made for
this purpose.

Orabug: 23070825
CVE-2016-3951

Fixes: 8a34b0ae8778 ("usbnet: cdc_ncm: apply usbnet_link_change")
Reported-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4d06dd537f95683aba3651098ae288b7cbff8274)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

cdc_ncm: Add support for moving NDP to end of NCM frame

NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).

V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB

Orabug: 23070825
CVE-2016-3951

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a0e3e989d66bb7204b163d9cfaa7fa96d0f2023)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

x86/mm/32: Enable full randomization on i386 and X86_32

Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files
(libraries, vDSO, etc.). This patch enables randomization for the
libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.

By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.

This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.

The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.

Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.

This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".

Orabug: 23070708
CVE-2016-3672

Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8b8addf891de8a00e4d39fc32f93f7c5eb8feceb)
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ib_uverbs: Allocate pd in a lazy manner to conserve resources

For usnic devices devices where the maximum number of pd
resources are limited (usnic devices), its a waste to
allocate this resource on device initialization.

We delay the allocation to first use.

Orabug: 22378991

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/drivers: (66 commits)
  ib/mlx4: add msi-x allocation kernel msg logging
  NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata
  nvme: Limit command retries
  nvme: avoid cqe corruption when update at the same time as read
  NVMe: Don't unmap controller registers on reset
  net: ena: change the return type of ena_set_push_mode() to be void.
  net: ena: Fix error return code in ena_device_init()
  net: ena: Remove unnecessary pci_set_drvdata()
  net: ena: Add a driver for Amazon Elastic Network Adapters  (ENA)
  bnxt_en: Add interface to support RDMA driver.
  bnxt_en: Refactor the driver registration function with firmware.
  bnxt_en: Reserve RDMA resources by default.
  bnxt_en: Improve completion ring allocation for VFs.
  bnxt_en: Move function reset to bnxt_init_one().
  bnxt_en: Enable MSIX early in bnxt_init_one().
  bnxt_en: Add bnxt_set_max_func_irqs().
  bnxt_en: Add PFC statistics.
  bnxt_en: Implement DCBNL to support host-based DCBX.
  bnxt_en: Update firmware header file to latest 1.6.0.
  bnxt_en: Re-factor bnxt_setup_tc().
  ...

Conflicts:
drivers/nvme/host/pci.c

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
Don't feed anything but regular iovec's to blk_rq_map_user_iov
crypto: algif_hash - Only export and import on sockets with data

ib/mlx4: add msi-x allocation kernel msg logging

Kernel msg prints are added in the mlx4 driver when enabling msi-x
vectors during device initialization. This would help us to debug
issues when we encounter errors in this area on both bare metal and
VM.

Orabug: 25307234

Related Orabug: 23479018, 20597484

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

Don't feed anything but regular iovec's to blk_rq_map_user_iov

In theory we could map other things, but there's a reason that function
is called "user_iov". Using anything else (like splice can do) just
confuses it.

Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0)

Orabug: 25230657
CVE: CVE-2016-9576
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:
block/blk-map.c

crypto: algif_hash - Only export and import on sockets with data

Orabug: 25097996
CVE: CVE-2016-8646

The hash_accept call fails to work on sockets that have not received
any data. For some algorithm implementations it may cause crashes.

This patch fixes this by ensuring that we only export and import on
sockets that have received data.

Cc: stable@vger.kernel.org
Reported-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
(cherry picked from commit 4afa5f9617927453ac04b24b584f6c718dfb4f45)
Signed-off-by: Aniket Alshi <aniket.alshi@oracle.com>

Merge branch topic/uek-4.1/sparc of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/sparc:
  Revert "sparc64: struct adi_caps should use __u64, not u64"
  SPARC64: ds driver: Make memory allocations ATOMIC and enhance debugging
  sparc64: Add symbolic access to M7 performance counters to perf
  sonoma: perf: add support for sonoma (s7) into perf
  sparc64:M8 cpu recognition typo fix
  sparc64: Add M7 hardware cache events into perf
  sparc64: Fix the watchdog corrupting performance counters
  sparc64: Fix incorrect counting when using multiple perf counters
  sparc64: Fix a race condition when stopping performance counters
  sparc64: Stop performance counter before updating
  sparc64: enable cpu hotplug feature for UEK4
  sparc64: release thirds level cache reference for cpu hotplug feature
  sparc64: fix compile warning section mismatch in find_node()
  sparc64: fix sun4v_build_irq NULL pointer dereference
  SPARC64: ldmvsw: tx queue stuck in stopped state after LDC reset
  sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable
  sparc64: Setup a scheduling domain for highest level cache.

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/xen:
xenbus: fix deadlock on writes to /proc/xen/xenbus

xenbus: fix deadlock on writes to /proc/xen/xenbus

/proc/xen/xenbus does not work correctly. A read blocked waiting for
a xenstore message holds the mutex needed for atomic file position
updates. This blocks any writes on the same file handle, which can
deadlock if the write is needed to unblock the read.

Clear FMODE_ATOMIC_POS when opening this device to always get
character device like sematics.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Orabug: 25425387
(cherry picked from commit 581d21a2d02a798ee34e56dbfa13f891b3a90c30)
Jira: OCC-36718
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
  net: ena: enable driver in kernel configs
  userfaultfd: enable userfaultfd in UEK OL6 and OL7 configs
  bnxt: enable BNXT_DCB in uek kernel configs

Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/stable-cherry-picks: (187 commits)
  ext4: verify extent header depth
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
  sysv, ipc: fix security-layer leaking
  dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING
  dm rq: fix the starting and stopping of blk-mq queues
  dm flakey: error READ bios during the down_interval
  CIFS: Fix a possible invalid memory access in smb2_query_symlink()
  fs/cifs: make share unaccessible at root level mountable
  Input: i8042 - break load dependency between atkbd/psmouse and i8042
  module: Invalidate signatures on force-loaded modules
  Documentation/module-signing.txt: Note need for version info if reusing a key
  net/irda: fix NULL pointer dereference on memory allocation failure
  fs/dcache.c: avoid soft-lockup in dput()
  iscsi-target: Fix panic when adding second TCP connection to iSCSI session
  audit: fix a double fetch in audit_log_single_execve_arg()
  Fix broken audit tests for exec arg len
  audit: Fix check of return value of strnlen_user()
  cifs: fix crash due to race in hmac(md5) handling
  dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
  ...

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks: (55 commits)
  userfaultfd: fix SIGBUS resulting from false rwsem wakeups
  userfaultfd: hugetlbfs: fix add copy_huge_page_from_user for hugetlb userfaultfd support
  userfaultfd: hugetlbfs: reserve count on error in __mcopy_atomic_hugetlb
  userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY
  userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges
  userfaultfd: hugetlbfs: add userfaultfd_hugetlb test
  userfaultfd: hugetlbfs: allow registration of ranges containing huge pages
  userfaultfd: hugetlbfs: add userfaultfd hugetlb hook
  userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processing
  userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY
  userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support
  userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support
  mm/hugetlb: fix huge page reservation leak in private mapping error paths
  mm/hugetlb: fix huge page reserve accounting for private mappings
  userfaultfd: don't pin the user memory in userfaultfd_file_create()
  userfaultfd: don't block on the last VM updates at exit time
  sparc: add waitfd to 32 bit system call tables
  userfaultfd: remove kernel header include from uapi header
  userfaultfd: register uapi generic syscall (aarch64)
  userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
  ...

Conflicts:
arch/x86/syscalls/syscall_32.tbl
arch/x86/syscalls/syscall_64.tbl
fs/Makefile
include/linux/mm_types.h
mm/hugetlb.c

Revert "sparc64: struct adi_caps should use __u64, not u64"

This reverts commit 04b6750492f8551a82a0336803922f736917639a.

Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: ds driver: Make memory allocations ATOMIC and enhance debugging

This patch fixes the following issues:

1. BUG 25107317 - Kernel Panic: Watchdog HARD LOCKUP out of ds_cap_fini()
2. BUG 24787856 - Forward port 19811909 - Unnecessary
warning - ldom_req_sp_token

BUG 25107317 appears to be caused by the ds driver allocating memory using
the GFP_KERNEL flag (which can result in sleeping) while holding a spinlock.
This is a violation of rules and resulted in the panic.

To fix BUG 24787856, the error message in question was changed to a
printk_once() which will result in the message only appearing once
in the console log instead of repeatedly.

The debugging facility in the driver was also enhanced by adding 3 separate
debug levels for the ds driver debug messages.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Orabug: 25107317, 24787856
(cherry picked from commit f3bf272f0512120708a2966a7916b51c34efe56d)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Add symbolic access to M7 performance counters to perf

This commit provides symbolic access to every performance counter
provided in the M7. The 'perf list' command can be used to provide
a complete list of these new events, which will be reported as
shown below.

Br_mispred OR cpu/Br_mispred/                      [Kernel PMU event]
Br_taken OR cpu/Br_taken/                          [Kernel PMU event]
Br_tgt_mispred OR cpu/Br_tgt_mispred/              [Kernel PMU event]

Orabug: 23313970

Note: This commit is based on a cherry-pick of the following:
3bc29d39f2cb5ba72d945d79f82dd0c98dc55643
bd91767dfdbee52537ec3f1454c8c2cf0cf77a84

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Acked-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 39f70b2fa98ea10931133ab983f521c70cb7429f)

sonoma: perf: add support for sonoma (s7) into perf

This commit ensures that perf will now recognise that
it is running on a sonoma device and will initialise
correctly.

Orabug: 24931042

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit f39f00c4536c8c6ca0585a200a56894c2c158743)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:M8 cpu recognition typo fix

(cherry picked from commit 764d030ec66da2e0be166af0fac0f36f1f4aacae)
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 6deca46c941b66734021c2feff6eb9a1eef8d173)

sparc64: Add M7 hardware cache events into perf

Use the enhanced performance instrumentation provided
in the M7 to enable the following hardware cache
events in perf.

L1-dcache-load-misses
L1-dcache-loads
L1-dcache-prefetches
L1-dcache-store-misses
L1-dcache-stores
L1-icache-load-misses
L1-icache-loads
L1-icache-prefetches
LLC-load-misses
LLC-loads
LLC-prefetches
LLC-store-misses
LLC-stores
branch-load-misses
dTLB-load-misses
dTLB-store-misses
iTLB-load-misses

Orabug: 24621144

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit b1d3b6ce6d4a3e5cf88a16c1a99bf37e0b805131)
(cherry picked from commit 16f97e434978b46f8b92d911b907478a4fb3d00a)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix the watchdog corrupting performance counters

There is a race condition in the perf_event_grab_pmc() which
means that we do not increment the active_events count correctly
when a new event is added. Ultimately, we end up with a negative
value for the active_event count. This means that the next time
we try and add a new event the watchdog will not be stopped
correctly and corruption of the performance count will
be observed.

Note: In sparc64 land the watchdog is implemented using one
of the performance counters.

This issue is fixed by moving the mutex lock to make
sure it encompasses the whole critical section in the
perf_event_grab_pmc().

Orabug: 23106709

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit 54ed00318fec5db3fab1b035ade5d95926d84799)
(cherry picked from commit d9ad125578c9f2fa015beb9dc10bd3d1eb9004ec)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix incorrect counting when using multiple perf counters

Commit 165050c1 introduced a change to the way we deal with
performance counter overflow interrupts. This change had the
side effect that when a performance counter overflow was
detected it assumed all performance counters in use
had overflowed. Thus, when using multiple performance
counters the event counting was incorrect.

This commit fixes this incorrect counting behaviour.

Orabug: 23106709

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit ef4dab8459ac6dd32538dc9448caf55ab68c2231)
(cherry picked from commit 741d96c0e37d7a73e17433355bb5bf513f2053af)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix a race condition when stopping performance counters

When stopping a performance counter that is close to overflowing,
there is a race condition that can occur between writing to the
PCRx register to stop the counter (and also clearing the PCRx.ov
bit at the same time) vs the performance counter overflowing and
setting the PCRx.ov bit in the PCRx register.
The result of this race condition is that we occassionally miss
a performance counter overflow interrupt, which in turn leads
to incorrect event counting.
This race condition has been observed when counting cpu cycles.
To fix this issue when stopping a performance counter,
we simply allow it to continue counting and overflow before
stopping it. This allows the performance counter overflow
interrupt to be generated and acted upon.
This fix is applied for M7, T5 and T4 devices.

Note: This commit is based on the following commits:
8b9b5b404e754e5c271341f5d7ea4797374c9844
a2d17bc33bdcc1cefd84bca44f2fd27075b16058
960f1607bec735e8da7dbd5df818da0a2e2b0305

Orabug: 22876587

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
(cherry picked from commit e5b7619e1de2f3e0dd858f632bc08ce64c344245)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Stop performance counter before updating

In order to reliably clear the PCRx.ov bit when updating a
performance counter value, we need to stop it counting first.
If we do not do this, then we can miss performance counter
overflow events.

Orabug: 22876587

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
(cherry picked from commit 6de93dc001ed2f440ed3881722934fbda2de0d4f)
(cherry picked from commit b36dd4d8040cd53f7e8de5a1d145be483d185105)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: enable cpu hotplug feature for UEK4

This patch provides users with an option to
disable/enable cpu at runtime by writing to
/sys/devices/system/cpu/cpuX/online field.

Eg:
$ echo [0/1] > /sys/devices/system/cpu/cpu2/online

Orabug: 24946811
Orabug: 22546196

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit a53c94ca8afc7a7603ff3c1154d81abb113a9e71)

sparc64: release thirds level cache reference for cpu hotplug feature

This crash as see on T7-4 which was related to the introduction of
3rd level caching patch.

issue: sysfs: cannot create duplicate filename '/devices/system/cpu/cpu0/cache'

Orabug: 24841354

Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit c33aebff52457ee7d0bacc922dc23b07cee4139a)

sparc64: fix compile warning section mismatch in find_node()

A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.

Orabug: 24674753

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
(cherry picked from commit e58d08f923190fc4dc2a1962710f84672c2bc9b2)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: fix sun4v_build_irq NULL pointer dereference

sun4v_build_irq assume the given irq number is valid and use
it to get the handler pointer, the pointer is dereference
without being checked and cause kernel panic.

The cause of the invalid irq is that the tx/rx irq have never
been free during device removal. irq number end up exhausted during
continuous device add/removal test.

tx/rx irq is allocated during vio_device_probe() using irq_alloc()
and cookie_assign(). To free the tx/rx irq, cookie_unassign() and
irq_free() is called when the device is removed.

Orabug: 23082240

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 80043637b8fb1eabc16ab5947019f4dcdbb8c79f)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

SPARC64: ldmvsw: tx queue stuck in stopped state after LDC reset

The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.

The root cause was determined to be from the following series of
events:

1. Guest domain panics - resulting in the guest no longer processing
   network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
   control due to no more available tx drings and stops the tx queue
   for the guest domain
3. The LDC of the network connection for the guest is reset when
   the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
   responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
   the normal method to re-enable the tx queue. But the ACK never comes
   because the tx queue was cleared due to the LDC reset.

To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.

Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Orabug: 24714685
(cherry picked from commit d84ad41602ceb070c05d2633bc09d81f66796e15)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Implement watchdog_nmi_enable and watchdog_nmi_disable

Implement functions watchdog_nmi_enable and watchdog_nmi_disable
to enable/disable nmi watchdogs. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch
specific nmi watchdogs using proc or sysctl interface.

Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog

It can also achieved using the sysctl parameter kernel.nmi_watchdog

Orabug: 24796651

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 43e96774e0a338e883e9ced9e717424df126b153)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Setup a scheduling domain for highest level cache.

Individual scheduler domain should consist different hierarchy
consisting of cores sharing similar property. Currently, no
scheduler domain is defined separately for the cores that shares
the last level cache. As a result, the scheduler fails to take
advantage of cache locality while migrating tasks during load
balancing.

Here are the cpu masks currently present for sparc that are/can
be used in scheduler domain construction.
cpu_core_map : set based on the cores that shares l1 cache.
core_core_sib_map : is set based on the socket id or max cache id.
The prior SPARC notion of socket was defined as highest level of
shared cache. However, the MD record on T7 platforms now describes
the CPUs that share the physical socket and this is no longer tied
to shared cache.

That's why a separate cpu mask needs to be created that truly
represent highest level of shared cache for all platforms.

Modified after cherry picked from upstream commit.
d624716b6c67e60681180786564b92ddb521148a
The implementation is largely based on Chris's patches.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1e655ca52bb2727471f20cf4d8f62b4b9f69e6fc)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25186219

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where the invalid completion entries in admin queue
and IO queue can occur when the queues straddle an 8MB DMA address boundary.

This patch limits admin queue depth to 64 for EPIC SSDs.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

nvme: Limit command retries

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25256529
Conflicts:
Patched the commits manually due to the lack of core.c file
drivers/nvme/host/pci.c
drivers/nvme/host/nvme.h

Signed-off-by: Ashok Vairavan <ashok.vairavan@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

nvme: avoid cqe corruption when update at the same time as read

Make sure the CQE phase (validity) is read before the rest of the
structure. The phase bit is the highest address and the CQE
read will happen on most platforms from lower to upper addresses
and will be done by multiple non-atomic loads. If the structure
is updated by PCI during the reads from the processor, the
processor may get a corrupted copy.

The addition of the new nvme_cqe_valid function that verifies
the validity bit also allows refactoring of the other CQE read
sequences.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d783e0bd02e700e7a893ef4fa71c69438ac1c276)

Orabug: 24960824
Conflicts:
nvme_poll() function is not available in UEK4QU2. Resolved
the conflicts around nvme poll function.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

NVMe: Don't unmap controller registers on reset

Unmapping the registers on reset or shutdown is not necessary. Keeping
the mapping simplifies reset handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b00a726a9fd82ddd4c10344e46f0d371e1674303)

Orabug: 24758839
Conflicts:
The changes are merged manually as the nvme upstream is not in sync with
UEK4 QU2
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: ena: enable driver in kernel configs

Orabug: 25307221

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: ena: change the return type of ena_set_push_mode() to be void.

Orabug: 25307221

This patch changes the return type of ena_set_push_mode() to be void,
as it always returns 0.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 184b49c89f39f5c5ad262a6456248284e10984c6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: ena: Fix error return code in ena_device_init()

Orabug: 25307221

Fix to return a negative error code from the invalid dma width
error handling case instead of 0.

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6e22066fd02b675260b980b3e42b7d616a9839c5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: ena: Remove unnecessary pci_set_drvdata()

Orabug: 25307221

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 557bc7d44d52d52374bc72e9cc3b0beb41026886)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: ena: Add a driver for Amazon Elastic Network Adapters
(ENA)

Orabug: 25307221

This is a driver for the ENA family of networking devices.

Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1738cd3ed342294360d6a74d4e58800004bff854)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: verify extent header depth

Orabug: 25308146

[ Upstream commit 7bc9491645118c9461bd21099c31755ff6783593 ]

Although the extent tree depth of 5 should enough be for the worst
case of 2*32 extents of length 1, the extent tree code does not
currently to merge nodes which are less than half-full with a sibling
node, or to shrink the tree depth if possible.  So it's possible, at
least in theory, for the tree depth to be greater than 5.  However,
even in the worst case, a tree depth of 32 is highly unlikely, and if
the file system is maliciously corrupted, an insanely large eh_depth
can cause memory allocation failures that will trigger kernel warnings
(here, eh_depth = 65280):

    JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
    CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
    Stack:
     604a8947 625badd8 0002fd09 00000000
     60078643 00000000 62623910 601bf9bc
     62623970 6002fc84 626239b0 900000125
    Call Trace:
     [<6001c2dc>] show_stack+0xdc/0x1a0
     [<601bf9bc>] dump_stack+0x2a/0x2e
     [<6002fc84>] __warn+0x114/0x140
     [<6002fdff>] warn_slowpath_null+0x1f/0x30
     [<60165829>] start_this_handle+0x569/0x580
     [<60165d4e>] jbd2__journal_start+0x11e/0x220
     [<60146690>] __ext4_journal_start_sb+0x60/0xa0
     [<60120a81>] ext4_truncate+0x131/0x3a0
     [<60123677>] ext4_setattr+0x757/0x840
     [<600d5d0f>] notify_change+0x16f/0x2a0
     [<600b2b16>] do_truncate+0x76/0xc0
     [<600c3e56>] path_openat+0x806/0x1300
     [<600c55c9>] do_filp_open+0x89/0xf0
     [<600b4074>] do_sys_open+0x134/0x1e0
     [<600b4140>] SyS_open+0x20/0x30
     [<6001ea68>] handle_syscall+0x88/0x90
     [<600295fd>] userspace+0x3fd/0x500
     [<6001ac55>] fork_handler+0x85/0x90

    ---[ end trace 08b0b88b6387a244 ]---

[ Commit message modified and the extent tree depath check changed
from 5 to 32 -- tytso ]

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 867df5e12818bca1d37fc4c9a35346764879f2af)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

nfsd: check permissions when setting ACLs

Orabug: 25308145

[ Upstream commit 999653786df6954a31044528ac3f7a5dadca08f4 ]

Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly. Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.

Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)

This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.

The permission checks and the inode locking were lost with commit
4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.

Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl]
Fixes: 4ac7249e
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 28a6d048eb841a6bca10558a6c9e30ec5ca2b1af)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

posix_acl: Add set_posix_acl

Orabug: 25308144

[ Upstream commit 485e71e8fb6356c08c7fc6bcce4bf02c9a9a663f ]

Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.

The prototype already exists in include/linux/posix_acl.h.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 173f43c05f782df4fe42cc1152f9306ef76dc6eb)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

sysv, ipc: fix security-layer leaking

Orabug: 25308143

[ Upstream commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 ]

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit e2b438fdfa4d7041b883e5f2acd2c39da1fe5e68)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING

Orabug: 25308142

[ Upstream commit eaf9a7361f47727b166688a9f2096854eef60fbe ]

Otherwise, there is potential for both DMF_SUSPENDED* and
DMF_NOFLUSH_SUSPENDING to not be set during dm_suspend() -- which is
definitely _not_ a valid state.

This fix, in conjuction with "dm rq: fix the starting and stopping of
blk-mq queues", addresses the potential for request-based DM multipath's
__multipath_map() to see !dm_noflush_suspending() during suspend.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 834ced1a13bdf510eae34d08bae1f1ae49b33141)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

dm rq: fix the starting and stopping of blk-mq queues

Orabug: 25308141

[ Upstream commit 7d9595d848cdff5c7939f68eec39e0c5d36a1d67 ]

Improve dm_stop_queue() to cancel any requeue_work. Also, have
dm_start_queue() and dm_stop_queue() clear/set the QUEUE_FLAG_STOPPED
for the blk-mq request_queue.

On suspend dm_stop_queue() handles stopping the blk-mq request_queue
BUT: even though the hw_queues are marked BLK_MQ_S_STOPPED at that point
there is still a race that is allowing block/blk-mq.c to call ->queue_rq
against a hctx that it really shouldn't. Add a check to
dm_mq_queue_rq() that guards against this rarity (albeit _not_
race-free).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # must patch dm.c on < 4.8 kernels
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 655fe78746d0b9141fe763535fc16d6652665c13)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

dm flakey: error READ bios during the down_interval

Orabug: 25308140

[ Upstream commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 ]

When the corrupt_bio_byte feature was introduced it caused READ bios to
no longer be errored with -EIO during the down_interval. This had to do
with the complexity of needing to submit READs if the corrupt_bio_byte
feature was used.

Fix it so READ bios are properly errored with -EIO; doing so early in
flakey_map() as long as there isn't a match for the corrupt_bio_byte
feature.

Fixes: a3998799fb4df ("dm flakey: add corrupt_bio_byte feature")
Reported-by: Akira Hayakawa <ruby.wktk@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 980b6556a8df88e6160da185f72a4bc548e72199)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

CIFS: Fix a possible invalid memory access in smb2_query_symlink()

Orabug: 25308139

[ Upstream commit 7893242e2465aea6f2cbc2639da8fa5ce96e8cc2 ]

During following a symbolic link we received err_buf from SMB2_open().
While the validity of SMB2 error response is checked previously
in smb2_check_message() a symbolic link payload is not checked at all.
Fix it by adding such checks.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a3b180a9da61b9be52e1bcf8ff54b4cea3ce332c)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

fs/cifs: make share unaccessible at root level mountable

Orabug: 25308138

[ Upstream commit a6b5058fafdf508904bbf16c29b24042cef3c496 ]

if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:

- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry

fixes bso#8950

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit b7e61a108f9fccd8d1b90ffe62704b929ba841eb)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

Input: i8042 - break load dependency between atkbd/psmouse and i8042

Orabug: 25308136

[ Upstream commit 4097461897df91041382ff6fcd2bfa7ee6b2448c ]

As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.

> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c.  atkbd.c depends on libps2.c because it invokes
> ps2_command().  libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner().  As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)

To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.

Reported-by: Mark Laws <mdl@60hz.org>
Tested-by: Mark Laws <mdl@60hz.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit b5e8e7f655d3d01430c357bdebe72a6ddc19e9a7)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

module: Invalidate signatures on force-loaded modules

Orabug: 25308135

[ Upstream commit bca014caaa6130e57f69b5bf527967aa8ee70fdd ]

Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. Loading a signed module meant for a
kernel with a different ABI could have interesting effects.
Therefore, treat all signatures as invalid when a module is
force-loaded.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 6ac9857245bfb71d836f46db817b0c11e3e4bf69)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

Documentation/module-signing.txt: Note need for version info if reusing a key

Orabug: 25308134

[ Upstream commit b8612e517c3c9809e1200b72c474dbfd969e5a83 ]

Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. If a module signing key is used for
multiple ABI-incompatible kernels, the modules need to include enough
version information to distinguish them.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit e9071d07878c865449c5afb062e6d305c62d1e85)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net/irda: fix NULL pointer dereference on memory allocation failure

Orabug: 25308133

[ Upstream commit d3e6952cfb7ba5f4bfa29d4803ba91f96ce1204d ]

I ran into this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
    RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
    RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
    RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
    RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
    R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
    FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
    Stack:
     0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
     ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
     ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
    Call Trace:
     [<ffffffff82bca542>] irda_connect+0x562/0x1190
     [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
     [<ffffffff825b4489>] SyS_connect+0x9/0x10
     [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
     [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
    RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
     RSP <ffff880111747bb8>
    ---[ end trace 4cda2588bc055b30 ]---

The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 7e6f0e1e3091dc9aada9d067aac7adbb5a9d9b06)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

fs/dcache.c: avoid soft-lockup in dput()

Orabug: 25308132

[ Upstream commit 47be61845c775643f1aa4d2a54343549f943c94c ]

We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:

WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[<ffffffc0003986f8>] dput+0x100/0x298
[<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
[<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
[<ffffffc00038f780>] filename_lookup+0x38/0xf0
[<ffffffc000391180>] user_path_at_empty+0x78/0xd0
[<ffffffc0003911f4>] user_path_at+0x1c/0x28
[<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230

->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.

Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.

Cc: <stable@vger.kernel.org>
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 7d06f7f8963d8a705aeda1a5b60a69bfca75935e)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

iscsi-target: Fix panic when adding second TCP connection to iSCSI session

Orabug: 25308130

[ Upstream commit 8abc718de6e9e52d8a6bfdb735060554aeae25e4 ]

In MC/S scenario, the conn->sess has been set NULL in
iscsi_login_non_zero_tsih_s1 when the second connection comes here,
then kernel panic.

The conn->sess will be assigned in iscsi_login_non_zero_tsih_s2. So
we should check whether it's NULL before calling.

Signed-off-by: Feng Li <lifeng1519@gmail.com>
Tested-by: Sumit Rai <sumit.rai@calsoftinc.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 7f5a3c76f3d4fe9a0aa9c659249757a2b31db42f)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

audit: fix a double fetch in audit_log_single_execve_arg()

Orabug: 25308129

[ Upstream commit 43761473c254b45883a64441dd0bc85a42f3645c ]

There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1].  Of course this leaves a window of
opportunity for an unsavory application to munge with the data.

This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s).  In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).

As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:

* https://github.com/linux-audit/audit-testsuite/issues/25

[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.

[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data.  I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation).  The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 634a3fc5f16470e9b78ccd7ce643305122d5ebb2)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

Fix broken audit tests for exec arg len

Orabug: 25308128

[ Upstream commit 45820c294fe1b1a9df495d57f40585ef2d069a39 ]

The "fix" in commit 0b08c5e5944 ("audit: Fix check of return value of
strnlen_user()") didn't fix anything, it broke things.  As reported by
Steven Rostedt:

"Yes, strnlen_user() returns 0 on fault, but if you look at what len is
  set to, than you would notice that on fault len would be -1"

because we just subtracted one from the return value.  So testing
against 0 doesn't test for a fault condition, it tests against a
perfectly valid empty string.

Also fix up the usual braindamage wrt using WARN_ON() inside a
conditional - make it part of the conditional and remove the explicit
unlikely() (which is already part of the WARN_ON*() logic, exactly so
that you don't have to write unreadable code.

Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Paul Moore <pmoore@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a4664afa0dffd5340c61511d3da14e30bfd01517)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

audit: Fix check of return value of strnlen_user()

Orabug: 25308127

[ Upstream commit 0b08c5e59441d08ab4b5e72afefd5cd98a4d83df ]

strnlen_user() returns 0 when it hits fault, not -1. Fix the test in
audit_log_single_execve_arg(). Luckily this shouldn't ever happen unless
there's a kernel bug so it's mostly a cosmetic fix.

CC: Paul Moore <pmoore@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit a49b282f08d96cd73838e4e1a5ace747d432ba7d)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

cifs: fix crash due to race in hmac(md5) handling

Orabug: 25308126

[ Upstream commit bd975d1eead2558b76e1079e861eacf1f678b73b ]

The secmech hmac(md5) structures are present in the TCP_Server_Info
struct and can be shared among multiple CIFS sessions.  However, the
server mutex is not currently held when these structures are allocated
and used, which can lead to a kernel crashes, as in the scenario below:

mount.cifs(8) #1 mount.cifs(8) #2

Is secmech.sdeschmaccmd5 allocated?
// false

Is secmech.sdeschmaccmd5 allocated?
// false

secmech.hmacmd = crypto_alloc_shash..
secmech.sdeschmaccmd5 = kzalloc..
sdeschmaccmd5->shash.tfm = &secmec.hmacmd;

secmech.sdeschmaccmd5 = kzalloc
// sdeschmaccmd5->shash.tfm
// not yet assigned

crypto_shash_update()
deref NULL sdeschmaccmd5->shash.tfm

Unable to handle kernel paging request at virtual address 00000030
epc   : 8027ba34 crypto_shash_update+0x38/0x158
ra    : 8020f2e8 setup_ntlmv2_rsp+0x4bc/0xa84
Call Trace:
  crypto_shash_update+0x38/0x158
  setup_ntlmv2_rsp+0x4bc/0xa84
  build_ntlmssp_auth_blob+0xbc/0x34c
  sess_auth_rawntlmssp_authenticate+0xac/0x248
  CIFS_SessSetup+0xf0/0x178
  cifs_setup_session+0x4c/0x84
  cifs_get_smb_ses+0x2c8/0x314
  cifs_mount+0x38c/0x76c
  cifs_do_mount+0x98/0x440
  mount_fs+0x20/0xc0
  vfs_kern_mount+0x58/0x138
  do_mount+0x1e8/0xccc
  SyS_mount+0x88/0xd4
  syscall_common+0x30/0x54

Fix this by locking the srv_mutex around the code which uses these
hmac(md5) structures.  All the other secmech algos already have similar
locking.

Fixes: 95dc8dd14e2e84cc ("Limit allocation of crypto mechanisms to dialect which requires")
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit dd265663a1a9a14d80ea34d9bde9b15d7b614bd2)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

dm: fix second blk_delay_queue() parameter to be in msec units not jiffies

Orabug: 25308125

[ Upstream commit bd9f55ea1cf6e14eb054b06ea877d2d1fa339514 ]

Commit d548b34b062 ("dm: reduce the queue delay used in dm_request_fn
from 100ms to 10ms") always intended the value to be 10 msecs -- it
just expressed it in jiffies because earlier commit 7eaceaccab ("block:
remove per-queue plugging") did.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d548b34b062 ("dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms")
Cc: stable@vger.kernel.org # 4.1+ -- stable@ backports must be applied to drivers/md/dm.c
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit abf9569225763ab83a538530454f7d280fd08e4a)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: short-cut orphan cleanup on error

Orabug: 25308123

[ Upstream commit c65d5c6c81a1f27dec5f627f67840726fcd146de ]

If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.

    EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
    EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
    Aborting journal on device loop0-8.
    EXT4-fs (loop0): Remounting filesystem read-only
    EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
    EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
    EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
    [...]

See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 881052c264a9a44481f1a29d4877478e54b4c690)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

cifs: Check for existing directory when opening file with O_CREAT

Orabug: 25308121

[ Upstream commit 8d9535b6efd86e6c07da59f97e68f44efb7fe080 ]

When opening a file with O_CREAT flag, check to see if the file opened
is an existing directory.

This prevents the directory from being opened which subsequently causes
a crash when the close function for directories cifs_closedir() is called
which frees up the file->private_data memory while the file is still
listed on the open file list for the tcon.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 9b01eafbc9514e71056e0a1a4714606385a431a4)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: validate s_reserved_gdt_blocks on mount

Orabug: 25308119

[ Upstream commit 5b9554dc5bf008ae7f68a52e3d7e76c0920938a2 ]

If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit d4d783d829cd3ade06c6b39fd7d02f00026ba699)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: don't call ext4_should_journal_data() on the journal inode

Orabug: 25308118

[ Upstream commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c ]

If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().

Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.

Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit e19f0ec5aeb659cb7dd2bf6e4f1e842f5ad71fcf)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: fix deadlock during page writeback

Orabug: 25308117

[ Upstream commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 ]

Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.

The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.

Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().

[ Changed to defer the call to ext4_journal_stop() only if the handle
is synchronous. --tytso ]

Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 906d6f4d9cdc8509c505f29f6146ec627fef2f06)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ext4: check for extents that wrap around

Orabug: 25308116

[ Upstream commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 ]

An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:

ext4_lblk_t last = lblock + len - 1;

if (len == 0 || lblock > last)
return 0;

since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().

We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).

Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit c580d82e1d5532b785e339450d43f82e5a8b4e79)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

fs/proc/task_mmu.c: fix mm_access() mode parameter in pagemap_read()

Orabug: 25308115

Backport of caaee6234d05a58c5b4d05e7bf766131b810a657 ("ptrace: use fsuid,
fsgid, effective creds for fs access checks") to v4.1 failed to update the
mode parameter in the mm_access() call in pagemap_read() to have one of the
new PTRACE_MODE_*CREDS flags.

Attempting to read any other process' pagemap results in a WARN()

WARNING: CPU: 0 PID: 883 at kernel/ptrace.c:229 __ptrace_may_access+0x14a/0x160()
denying ptrace access check without PTRACE_MODE_*CREDS
Modules linked in: loop sg e1000 i2c_piix4 ppdev virtio_balloon virtio_pci parport_pc i2c_core virtio_ring ata_generic serio_raw pata_acpi virtio parport pcspkr floppy acpi_cpufreq ip_tables ext3 mbcache jbd sd_mod ata_piix crc32c_intel libata
CPU: 0 PID: 883 Comm: cat Tainted: G        W       4.1.12-51.el7uek.x86_64 #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000286 00000000619f225a ffff88003b6fbc18 ffffffff81717021
  ffff88003b6fbc70 ffffffff819be870 ffff88003b6fbc58 ffffffff8108477a
  000000003b6fbc58 0000000000000001 ffff88003d287000 0000000000000001
Call Trace:
  [<ffffffff81717021>] dump_stack+0x63/0x81
  [<ffffffff8108477a>] warn_slowpath_common+0x8a/0xc0
  [<ffffffff81084805>] warn_slowpath_fmt+0x55/0x70
  [<ffffffff8108e57a>] __ptrace_may_access+0x14a/0x160
  [<ffffffff8108f372>] ptrace_may_access+0x32/0x50
  [<ffffffff81081bad>] mm_access+0x6d/0xb0
  [<ffffffff81278c81>] pagemap_read+0xe1/0x360
  [<ffffffff811a046b>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
  [<ffffffff8120d2e7>] __vfs_read+0x37/0x100
  [<ffffffff812b9ab4>] ? security_file_permission+0x84/0xa0
  [<ffffffff8120d8b6>] ? rw_verify_area+0x56/0xe0
  [<ffffffff8120d9c6>] vfs_read+0x86/0x140
  [<ffffffff8120e945>] SyS_read+0x55/0xd0
  [<ffffffff8171eb6e>] system_call_fastpath+0x12/0x71

Fixes: ab88ce5feca4 (ptrace: use fsuid, fsgid, effective creds for fs access checks)
Signed-off-by: Kenny Keslar <kenny.keslar@oracle.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 5c576457aca8fc07bdb800a4589357801133f81b)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

pps: do not crash when failed to register

Orabug: 25308113

[ Upstream commit 368301f2fe4b07e5fb71dba3cc566bc59eb6705f ]

With this command sequence:

  modprobe plip
  modprobe pps_parport
  rmmod pps_parport

the partport_pps modules causes this crash:

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: parport_detach+0x1d/0x60 [pps_parport]
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
    parport_unregister_driver+0x65/0xc0 [parport]
    SyS_delete_module+0x187/0x210

The sequence that builds up to this is:

1) plip is loaded and takes the parport device for exclusive use:

    plip0: Parallel port at 0x378, using IRQ 7.

2) pps_parport then fails to grab the device:

    pps_parport: parallel port PPS client
    parport0: cannot grant exclusive access for device pps_parport
    pps_parport: couldn't register with parport0

3) rmmod of pps_parport is then killed because it tries to access
    pardev->name, but pardev (taken from port->cad) is NULL.

So add a check for NULL in the test there too.

Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit bd6d85d6ebaae18815d9c75157a2c86990c6e748)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

radix-tree: fix radix_tree_iter_retry() for tagged iterators.

Orabug: 25308112

[ Upstream commit 3cb9185c67304b2a7ea9be73e7d13df6fb2793a1 ]

radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:

  RIP: radix_tree_next_slot include/linux/radix-tree.h:473
    find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
  ....
  Call Trace:
    pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
    mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
    ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
    do_writepages+0x97/0x100 mm/page-writeback.c:2364
    __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
    filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
    ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
    vfs_fsync_range+0x10a/0x250 fs/sync.c:195
    vfs_fsync fs/sync.c:209
    do_fsync+0x42/0x70 fs/sync.c:219
    SYSC_fdatasync fs/sync.c:232
    SyS_fdatasync+0x19/0x20 fs/sync.c:230
    entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207

We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().

Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit bea9acd81cc894c89ac81335467278d189547d05)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

libceph: apply new_state before new_up_client on incrementals

Orabug: 25308111

[ Upstream commit 930c532869774ebf8af9efe9484c597f896a7d46 ]

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901
Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 6831c98ce0b8a3e88db64aa224372effd0dcc694)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

libceph: set 'exists' flag for newly up osd

Orabug: 25308110

[ Upstream commit 6dd74e44dc1df85f125982a8d6591bc4a76c9f5d ]

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 5210f975ba197ead8b703ec818260191c0a51133)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tty/vt/keyboard: fix OOB access in do_compute_shiftstate()

Orabug: 25308109

[ Upstream commit 510cccb5b0c8868a2b302a0ab524da7912da648b ]

The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do

sym = U(key_maps[0][k]);

with large 'k'.

To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.

Also while at it let's switch to for_each_set_bit() instead of open-coding
it.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 9524cc41374df87b9c8d200e3561ad408bfa5844)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

media: fix airspy usb probe error path

Orabug: 25308108

[ Upstream commit aa93d1fee85c890a34f2510a310e55ee76a27848 ]

Fix a memory leak on probe error of the airspy usb device driver.

The problem is triggered when more than 64 usb devices register with
v4l2 of type VFL_TYPE_SDR or VFL_TYPE_SUBDEV.

The memory leak is caused by the probe function of the airspy driver
mishandeling errors and not freeing the corresponding control structures
when an error occours registering the device to v4l2 core.

A badusb device can emulate 64 of these devices, and then through
continual emulated connect/disconnect of the 65th device, cause the
kernel to run out of RAM and crash the kernel, thus causing a local DOS
vulnerability.

Fixes CVE-2016-5400

Signed-off-by: James Patrick-Evans <james@jmp-e.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit ce05d315cec02835c77fa3f4b5119960e1654913)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

mm, compaction: prevent VM_BUG_ON when terminating freeing scanner

Orabug: 25308107

[ Upstream commit a46cbf3bc53b6a93fb84a5ffb288c354fa807954 ]

It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check.  In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.

This should never have been a VM_BUG_ON() since it's not a fatal
condition.  It should have been a VM_WARN_ON() at best, or even handled
gracefully.

Regardless, we need to terminate anytime the full pageblock scan was not
done.  The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.

[rientjes@google.com: don't rescan pages in a pageblock]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit fe071fb0d4e9fd40fe7c46c6a9f8f23d5f27e92f)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

mm, compaction: simplify handling restart position in free pages scanner

Orabug: 25308106

[ Upstream commit f5f61a320bf6275f37fcabf6645b4ac8e683c007 ]

Handling the position where compaction free scanner should restart
(stored in cc->free_pfn) got more complex with commit e14c720efdd7 ("mm,
compaction: remember position within pageblock in free pages scanner").
Currently the position is updated in each loop iteration of
isolate_freepages(), although it should be enough to update it only when
breaking from the loop.  There's also an extra check outside the loop
updates the position in case we have met the migration scanner.

This can be simplified if we move the test for having isolated enough
from the for-loop header next to the test for contention, and
determining the restart position only in these cases.  We can reuse the
isolate_start_pfn variable for this instead of setting cc->free_pfn
directly.  Outside the loop, we can simply set cc->free_pfn to current
value of isolate_start_pfn without any extra check.

Also add a VM_BUG_ON to catch possible mistake in the future, in case we
later add a new condition that terminates isolate_freepages_block()
prematurely without also considering the condition in
isolate_freepages().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit ca0d868322c49b0d6ee4dfaae94a28e12969552c)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ALSA: pcm: Free chmap at PCM free callback, too

Orabug: 25308105

[ Upstream commit a8ff48cb70835f48de5703052760312019afea55 ]

The chmap ctls assigned to PCM streams are freed in the PCM disconnect
callback. However, since the disconnect callback isn't called when
the card gets freed before registering, the chmap ctls may still be
left assigned. They are eventually freed together with other ctls,
but it may cause an Oops at pcm_chmap_ctl_private_free(), as the
function refers to the assigned PCM stream, while the PCM objects have
been already freed beforehand.

The fix is to free the chmap ctls also at PCM free callback, not only
at PCM disconnect.

Reported-by: Laxminath Kasam <b_lkasam@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 43506e749d3f8d7e012a1bc4cb57b18a03ecfee6)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ovl: Copy up underlying inode's ->i_mode to overlay inode

Orabug: 25308103

[ Upstream commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc ]

Right now when a new overlay inode is created, we initialize overlay
inode's ->i_mode from underlying inode ->i_mode but we retain only
file type bits (S_IFMT) and discard permission bits.

This patch changes it and retains permission bits too. This should allow
overlay to do permission checks on overlay inode itself in task context.

[SzM] It also fixes clearing suid/sgid bits on write.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 31534f8fead7e0dff7ba68bd5dfcf6a9dfe908bc)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ovl: handle ATTR_KILL*

Orabug: 25308102

[ Upstream commit 51234eac5dd8b5feda9a3a8fa766f5398ecf91e3 ]

commit b99c2d913810e56682a538c9f2394d76fca808f8 upstream.

Before 4bacc9c9234c ("overlayfs: Make f_path...") file->f_path pointed to
the underlying file, hence suid/sgid removal on write worked fine.

After that patch file->f_path pointed to the overlay file, and the file
mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid
removal simply stopped working.

The fix is to copy the mode bits, but then ovl_setattr() needs to clear
ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in
the next patch copy the mode.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit cb75f65fe798bcac694f6bde299c52d31bdc8e96)
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>