www.infradead.org Git - users/jedix/linux-maple.git/log

fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()

Orabug: 22514912

Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine.  The
argument end is of type pgoff_t.  It was being converted to a vaddr offset
and passed to unmap_hugepage_range.  However, end was also being used as
an argument to the vma_interval_tree_foreach controlling loop.  In
addition, the conversion of end to vaddr offset was incorrect.

hugetlb_vmtruncate_list is called as part of a file truncate or fallocate
hole punch operation.

When truncating a hugetlbfs file, this bug could prevent some pages from
being unmapped.  This is possible if there are multiple vmas mapping the
file, and there is a sufficiently sized hole between the mappings.  The
size of the hole between two vmas (A,B) must be such that the starting
virtual address of B is greater than (ending virtual address of A <<
PAGE_SHIFT).  In this case, the pages in B would not be unmapped.  If
pages are not properly unmapped during truncate, the following BUG is hit:

kernel BUG at fs/hugetlbfs/inode.c:428!

In the fallocate hole punch case, this bug could prevent pages from being
unmapped as in the truncate case.  However, for hole punch the result is
that unmapped pages will not be removed during the operation.  For hole
punch, it is also possible that more pages than desired will be unmapped.
This unnecessary unmapping will cause page faults to reestablish the
mappings on subsequent page access.

Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> [4.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 29bfa18ca579419db54d93250c199bcb54d80047)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Btrfs: incremental send, fix clone operations for compressed extents

Marc reported a problem where the receiving end of an incremental send
was performing clone operations that failed with -EINVAL. This happened
because, unlike for uncompressed extents, we were not checking if the
source clone offset and length, after summing the data offset, falls
within the source file's boundaries.

So make sure we do such checks when attempting to issue clone operations
for compressed extents.

Problem reproducible with the following steps:

  $ mkfs.btrfs -f /dev/sdb
  $ mount -o compress /dev/sdb /mnt
  $ mkfs.btrfs -f /dev/sdc
  $ mount -o compress /dev/sdc /mnt2

  # Create the file with a single extent of 128K. This creates a metadata file
  # extent item with a data start offset of 0 and a logical length of 128K.
  $ xfs_io -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" /mnt/foo

  # Now rewrite the range 64K to 112K of our file. This will make the inode's
  # metadata continue to point to the 128K extent we created before, but now
  # with an extent item that points to the extent with a data start offset of
  # 112K and a logical length of 16K.
  # That metadata file extent item is associated with the logical file offset
  # at 176K and covers the logical file range 176K to 192K.
  $ xfs_io -c "pwrite -S 0xbb 64K 112K" -c "fsync" /mnt/foo

  # Now rewrite the range 180K to 12K. This will make the inode's metadata
  # continue to point the the 128K extent we created earlier, with a single
  # extent item that points to it with a start offset of 112K and a logical
  # length of 4K.
  # That metadata file extent item is associated with the logical file offset
  # at 176K and covers the logical file range 176K to 180K.
  $ xfs_io -c "pwrite -S 0xcc 180K 12K" -c "fsync" /mnt/foo

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1

  $ touch /mnt/bar
  # Calls the btrfs clone ioctl.
  $ ~/xfstests/src/cloner -s $((176 * 1024)) -d $((176 * 1024)) \
    -l $((4 * 1024)) /mnt/foo /mnt/bar

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2

  $ btrfs send /mnt/snap1 | btrfs receive /mnt2
  At subvol /mnt/snap1
  At subvol snap1

  $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive /mnt2
  At subvol /mnt/snap2
  At snapshot snap2
  ERROR: failed to clone extents to bar
  Invalid argument

A test case for fstests follows soon.

Orabug: 22466327

Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.cz>
Tested-by: Jan Alexander Steffens (heftig) <jan.steffens@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 619d8c4ef7c5dd346add55da82c9179cd2e3387e)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

xprtrdma: xprt_rdma_free() must not release backchannel reqs

Orabug: 22365704

Preserve any rpcrdma_req that is attached to rpc_rqst's allocated
for the backchannel. Otherwise, after all the pre-allocated
backchannel req's are consumed, incoming backward calls start
writing on freed memory.

Somehow this hunk got lost.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: John Sobecki <john.sobecki@oracle.com>
Tested-by: Dai Ngo <dai.ngo@oracle.com>

block: bump BLK_DEF_MAX_SECTORS to 2560

Orabug: 22611290

A value of 2560 (1280k) will accommodate a 10-data-disk stripe
write with chunk size 128k. In the testing I've done using
iozone, fio, and aio-stress across a number of different storage
devices, a value of 1280 does not show a big performance
difference from 512, but will hopefully help software RAID
setups using SATA disks, as reported by Christoph.

NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to
BLK_DEF_MAX_SECTORS. So, this patch essentially changes aeoblk to
Use a larger maximum sector size, and I did not test this.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d2be537c3ba3568acd79cd178327b842e60d035e)

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>

Revert "block: remove artifical max_hw_sectors cap"

Orabug: 22611290

EXT4-fs warning (device dm-X): ext4_end_bio:313: I/O error writing to ino
Below is revert text from mainline:
This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays.  It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.

The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k.  In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512.  This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 30e2bc08b2bb7c069246feee78f7ed4006e130fe)

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>

KEYS: Fix keyring ref leak in join_session_keyring()

If a thread is asked to join as a session keyring the keyring that's already
set as its session, we leak a keyring reference.

This can be tested with the following program:

      #include <stddef.h>
      #include <stdio.h>
      #include <sys/types.h>
      #include <keyutils.h>

      int main(int argc, const char *argv[])
      {
        int i = 0;
        key_serial_t serial;

        serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING,
            "leaked-keyring");
        if (serial < 0) {
          perror("keyctl");
          return -1;
        }

        if (keyctl(KEYCTL_SETPERM, serial,
             KEY_POS_ALL | KEY_USR_ALL) < 0) {
          perror("keyctl");
          return -1;
        }

        for (i = 0; i < 100; i++) {
          serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING,
              "leaked-keyring");
          if (serial < 0) {
            perror("keyctl");
            return -1;
          }
        }

        return 0;
      }

If, after the program has run, there something like the following line in
/proc/keys:

    3f3d898f I--Q---   100 perm 3f3f0000     0     0 keyring leaked-keyring: empty

with a usage count of 100 * the number of times the program has been run,
then the kernel is malfunctioning.  If leaked-keyring has zero usages or
has been garbage collected, then the problem is fixed.

Reported-by: Yevgeny Pats <yevgeny@perception-point.io>
Signed-off-by: David Howells <dhowells@redhat.com>
Orabug: 22563965
CVE: CVE-2016-0728
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring

The following sequence of commands:

    i=`keyctl add user a a @s`
    keyctl request2 keyring foo bar @t
    keyctl unlink $i @s

tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set.  However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code.  When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:

BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
...
CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
...
Call Trace:
[<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
[<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
[<ffffffff8105ec9b>] process_one_work+0x28e/0x547
[<ffffffff8105fd17>] worker_thread+0x26e/0x361
[<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
[<ffffffff810648ad>] kthread+0xf3/0xfb
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
[<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2

Note the value in RAX.  This is a 32-bit representation of -ENOKEY.

The solution is to only call ->destroy() if the key was successfully
instantiated.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Orabug: 22373388
CVE: CVE-2015-7872
(cherry picked from mainline commit f05819df10d7b09f6d1eb6f8534a8f68e5a4fe61)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KEYS: Fix race between key destruction and finding a keyring by name

There appears to be a race between:

(1) key_gc_unused_keys() which frees key->security and then calls
     keyring_destroy() to unlink the name from the name list

(2) find_keyring_by_name() which calls key_permission(), thus accessing
     key->security, on a key before checking to see whether the key usage is 0
     (ie. the key is dead and might be cleaned up).

Fix this by calling ->destroy() before cleaning up the core key data -
including key->security.

Reported-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Orabug: 22373388
(cherry picked from mainline commit 94c4554ba07adbdde396748ee7ae01e86cf2d8d7)
pre-requisite patch for mainline f05819df10d7b09f6d1eb6f8534a8f68e5a4fe61
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KVM: svm: unconditionally intercept #DB

This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 22333633
CVE: 2015-8104
mainline commit cbdb967af3d54993f5814f1cee0ed311a055377d
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

KVM: x86: work around infinite loop in microcode when #AC is delivered

It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions. This causes the
microcode to enter an infinite loop where the core never receives
another interrupt. The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).

Signed-off-by: Eric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 22333632
CVE: CVE-2015-5307
mainline commit 54a20552e1eae07aa240fa370a0293e006b5faed
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

mm-hugetlb-resv-map-memory-leak-for-placeholder-entries-v2

Orabug: 22302415

V2: The original version of the patch did not correctly handle placeholder
entries before the range to be deleted. The new check is more specific
and only matches placeholders at the start of range.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fd2e0def3e0954be0453b625ce12c48e4a83bc70)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

mm/hugetlb.c: fix resv map memory leak for placeholder entries

Orabug: 22302415

Dmitry Vyukov reported the following memory leak

unreferenced object 0xffff88002eaafd88 (size 32):
  comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s)
  hex dump (first 32 bytes):
    28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff  (.Nc....(.Nc....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<     inline     >] kmalloc include/linux/slab.h:458
    [<ffffffff815efa64>] region_chg+0x2d4/0x6b0 mm/hugetlb.c:398
    [<ffffffff815f0c63>] __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791
    [<     inline     >] vma_needs_reservation mm/hugetlb.c:1813
    [<ffffffff815f658e>] alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845
    [<     inline     >] hugetlb_no_page mm/hugetlb.c:3543
    [<ffffffff815fc561>] hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717
    [<ffffffff815fd349>] follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880
    [<ffffffff815a2bb2>] __get_user_pages+0x542/0xf30 mm/gup.c:497
    [<ffffffff815a400e>] populate_vma_page_range+0xde/0x110 mm/gup.c:919
    [<ffffffff815a4207>] __mm_populate+0x1c7/0x310 mm/gup.c:969
    [<ffffffff815b74f1>] do_mlock+0x291/0x360 mm/mlock.c:637
    [<     inline     >] SYSC_mlock2 mm/mlock.c:658
    [<ffffffff815b7a4b>] SyS_mlock2+0x4b/0x70 mm/mlock.c:648

Dmitry identified a potential memory leak in the routine region_chg, where
a region descriptor is not free'ed on an error path.

However, the root cause for the above memory leak resides in region_del.
In this specific case, a "placeholder" entry is created in region_chg.
The associated page allocation fails, and the placeholder entry is left in
the reserve map.  This is "by design" as the entry should be deleted when
the map is released.  The bug is in the region_del routine which is used
to delete entries within a specific range (and when the map is released).
region_del did not handle the case where a placeholder entry exactly
matched the start of the range range to be deleted.  In this case, the
entry would not be deleted and leaked.  The fix is to take these special
placeholder entries into account in region_del.

The region_chg error path leak is also fixed.

Fixes: feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8ea87f6c208bd070a2701b53d2479a415f4159f4)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Prevent syncing frozen file system

Orabug: 22332381

port from RH. https://lkml.org/lkml/2015/7/9/724
fs] vfs: Prevent syncing frozen file system (Lukas Czerner)
      [1243404 1241791]
port from RH. https://lkml.org/lkml/2015/7/9/487

From Lukas Czerner <>
Subject [RFC][PATCH] fs: Prevent syncing frozen file system
Date Thu, 9 Jul 2015 19:45:45 +0200

Currently we can end up in a deadlock because of broken
sb_start_write -> s_umount ordering.

The race goes like this:

- write the file
- unlink the file - final_iput will not be calles as file is opened
- freeze the file system
- Now simultaneously close the file and call sync (or syncfs on that
   particular file system). Sync will get to wait_sb_inodes() where it will
   grab the referece to the inode (__iget()) and later to call iput().
   If we manage to close the file and drop the reference in between those
   calls sync will attempt to do a iput_final() because the inode is now
   unlinked and we're holding the last reference to it. This will
   however block on a frozen file system (ext4_delete_inode for
   example).

Note that I've not been able to reproduce the issue, I've only seen this
happen once. However with some instrumentation (like msleep() in the
wait_sb_inodes() it can be achieved.

Fix this by properly doing sb_start_write/sb_end_write to prevent us
from fsfreeze.

Note that with this patch syncfs will block on the frozen file system
which is probably ok, but sync will block if any file system happens to
be frozen - not sure if that's a problem, but it's certainly different
from what we've been used to.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>

Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"

Orabug: 22186705

This reverts commit 0ea14034782db03efc55a7d8dbfa7cd140e7e959.

This commit has been seen to cause hard hangs or crashes when
flock(..., LOCK_EX) is attempted on an NFSv4-mounted filesystem.

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm-hugetlbfs-fix-bugs-in-fallocate-hole-punch-of-areas-with-holes-v3

Orabug: 22220400

V3:
  Add more descriptive comments and minor improvements as suggested by
  Naoya Horiguchi
v2:
  Make remove_inode_hugepages simpler after verifying truncate can not
  race with page faults here.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d4d7420e86e29267ff2e20226cbcadcf7e6eab8c)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes

Orabug: 22220400

Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole
punch code.  These problems are in the routine remove_inode_hugepages and
mostly occur in the case where there are holes in the range of pages to be
removed.  These holes could be the result of a previous hole punch or
simply sparse allocation.  The current code could access pages outside the
specified range.

remove_inode_hugepages handles both hole punch and truncate operations.
Page index handling was fixed/cleaned up so that the loop index always
matches the page being processed.  The code now only makes a single pass
through the range of pages as it was determined page faults could not race
with truncate.  A cond_resched() was added after removing up to
PAGEVEC_SIZE pages.

Some totally unnecessary code in hugetlbfs_fallocate() that remained from
early development was also removed.

Tested with fallocate tests submitted here:
http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/
And, some ftruncate tests under development

Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ef9a2b7a46755b6b2d4ab522c2ffa53c6e1a0729)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

btrfs: Print Warning only if ENOSPC_DEBUG is enabled

Orabug: 21626666

Signed-off-by : Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Bo Liu <bo.li.liu@oracle.com>

rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats

Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().

Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.

This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.

Orabug: 21857538

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm: do not ignore mapping_gfp_mask in page cache allocation paths

page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path. Anyway it
is better to obey mapping gfp mask and prevent from later breakage.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6afdb859b71019143b8eecda02b8b29b03185055)

Orabug: 22066703

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

PCI: Set SR-IOV NumVFs to zero after enumeration

Orabug: 21547430

The enumeration path should leave NumVFs set to zero. But after
4449f079722c ("PCI: Calculate maximum number of buses required for VFs"),
we call virtfn_max_buses() in the enumeration path, which changes NumVFs.
This NumVFs change is visible via lspci and sysfs until a driver enables
SR-IOV.

Set NumVFs to zero after virtfn_max_buses() computes the maximum number of
buses.

Fixes: 4449f079722c ("PCI: Calculate maximum number of buses required for VFs")
Based-on-patch-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

virtio-net: drop NETIF_F_FRAGLIST

virtio-net: drop NETIF_F_FRAGLIST

Orabug: 22154074
CVE-2015-5156

virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.

A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.

Drop NETIF_F_FRAGLIST so we only get what we can handle.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

blk-mq: avoid excessive boot delays with large lun counts

Zhangqing Luo reported long boot times on a system with thousands of
LUNs when scsi-mq was enabled.  He narrowed the problem down to
blk_mq_add_queue_tag_set, where every queue is frozen in order to set
the BLK_MQ_F_TAG_SHARED flag.  Each added device will freeze all queues
added before it in sequence, which involves waiting for an RCU grace
period for each one.  We don't need to do this.  After the second queue
is added, only new queues need to be initialized with the shared tag.
We can do that by percolating the flag up to the blk_mq_tag_set, and
updating the newly added queue's hctxs if the flag is set.

This problem was introduced by commit 0d2602ca30e41 (blk-mq: improve
support for shared tags maps).

Reported-and-tested-by: Jason Luo <zhangqing.luo@oracle.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2404e607a9ee36db361bebe32787dafa1f7d6c00)

Orabug: 21879600
Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>

tcp_cubic: better follow cubic curve after idle period

Orabug: 21920285

Jana Iyengar found an interesting issue on CUBIC :

The epoch is only updated/reset initially and when experiencing losses.
The delta "t" of now - epoch_start can be arbitrary large after app idle
as well as the bic_target. Consequentially the slope (inverse of
ca->cnt) would be really large, and eventually ca->cnt would be
lower-bounded in the end to 2 to have delayed-ACK slow-start behavior.

This particularly shows up when slow_start_after_idle is disabled
as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle
time.

Jana initial fix was to reset epoch_start if app limited,
but Neal pointed out it would ask the CUBIC algorithm to recalculate the
curve so that we again start growing steeply upward from where cwnd is
now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth
curve to be the same shape, just shifted later in time by the amount of
the idle period.

Reported-by: Jana Iyengar <jri@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Sangtae Ha <sangtae.ha@gmail.com>
Cc: Lawrence Brakmo <lawrence@brakmo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 30927520dbae297182990bb21d08762bcc35ce1d)

Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

NVMe: Setup max hardware sector count to 512KB

Linux in box NVMe driver does not handle 0 MDTS as expected
•0 MDTS - the drive can accept any request size.
•The device driver set up max hardware sector size by
BLK_SAFE_MAX_SECTORS or 124KB.
•Every IO size greater than 124KB is splitted by 124KB and remainder.

Hence performance drop at 128KB IO size.

Orabug: 21818316

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

intel_pstate: enable HWP per CPU

intel_pstate: enable HWP per CPU

Orabug: 21325983

HWP previously was only enabled at driver load time, on the boot
CPU, however, HWP must be enabled per package. Move the code to
enable HWP to the cpufreq driver init path so that it will be
called per CPU.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Tested-by: David Zhuang <david.zhuang@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ba88d4338f226766f510e207911dde8c1875e072)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/cpufreq/intel_pstate.c
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

nfs: take extra reference to fl->fl_file when running a LOCKU operation

Jean reported another crash, similar to the one fixed by feaff8e5b2cf:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 21687670
(cherry picked from mainline commit db2efec0caba4f81a22d95a34da640b86c313c8e)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

mm: madvise allow remove operation for hugetlbfs

Orabug: 21652814

Now that we have hole punching support for hugetlbfs, we can
also support the MADV_REMOVE interface to it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8f787c8989ce599cbf0feb10ecea912d07111439)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA

Orabug: 21652814

Commit 56bb4d795 introduced a build error if CONFIG_NUMA is not
defined. When fallocate preallocation allocates pages, it will
use the defined numa policy. However, if numa is not defined
there is no such policy and no code should reference numa policy.
Create wrappers to isolate policy manipulation code that are a
NOOP in the non-NUMA case.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0ce9057732d1dd94ef2bd32c8acb68ae68b08a2f)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: add hugetlbfs_fallocate()

Orabug: 21652814

This is based on the shmem version, but it has diverged quite a bit.  We
have no swap to worry about, nor the new file sealing.  Add
synchronication via the fault mutex table to coordinate page faults,
fallocate allocation and fallocate hole punch.

What this allows us to do is move physical memory in and out of a
hugetlbfs file without having it mapped.  This also gives us the ability
to support MADV_REMOVE since it is currently implemented using
fallocate().  MADV_REMOVE lets madvise() remove pages from the middle of a
hugetlbfs file, which wasn't possible before.

hugetlbfs fallocate only operates on whole huge pages.

Based on code by Dave Hansen.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 79925b07fd22d7c4e4e77cdc26edb26dc4ff2701)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: New huge_add_to_page_cache helper routine

Orabug: 21652814

Currently, there is only a single place where hugetlbfs pages are added to
the page cache. The new fallocate code be adding a second one, so break
the functionality out into its own helper.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4b153581930e8c61250078efcdcce3e19bc2a45b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

Orabug: 21652814

Areas hole punched by fallocate will not have entries in the
region/reserve map. However, shared mappings with min_size subpool
reservations may still have reserved pages. alloc_huge_page needs to
handle this special case and do the proper accounting.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b21aa74b8077ce5b9c5fea566fe37af866934746)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch

Orabug: 21652814

In vma_has_reserves(), the current assumption is that reserves are always
present for shared mappings.  However, this will not be the case with
fallocate hole punch.  When punching a hole, the present page will be
deleted as well as the region/reserve map entry (and hence any
reservation).  vma_has_reserves is passed "chg" which indicates whether or
not a region/reserve map is present.  Use this to determine if reserves
are actually present or were removed via hole punch.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 320fb799e7cedc1edb96fd69c686547c731d5fc8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb.c: make vma_has_reserves() return bool

Orabug: 21652814

This makes vma_has_reserves() return bool due to this particular function
only returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f7bba6ea9b80a19c4849b4d870fb2f2bd492713b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: truncate_hugepages() takes a range of pages

Orabug: 21652814

Modify truncate_hugepages() to take a range of pages (start, end) instead
of simply start.  If an end value of LLONG_MAX is passed, the current
"truncate" functionality is maintained.  Existing callers are modified to
pass LLONG_MAX as end of range.  By keying off end == LLONG_MAX, the
routine behaves differently for truncate and hole punch.  Page removal is
now synchronized with page allocation via faults by using the fault mutex
table.  The hole punch case can experience the rare region_del error and
must handle accordingly.

Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the
case where region_del returns an error.

Since the routine handles more than just the truncate case, it is renamed
to remove_inode_hugepages().  To be consistent, the routine
truncate_huge_page() is renamed remove_huge_page().

Downstream of remove_inode_hugepages(), the routine
hugetlb_unreserve_pages() is also modified to take a range of pages.
hugetlb_unreserve_pages is modified to detect an error from region_del and
pass it back to the caller.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6a57804ccdfb77b8f333b736a3ee7cb1bf8732e1)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete

Orabug: 21652814

fallocate hole punch will want to unmap a specific range of pages. Modify
the existing hugetlb_vmtruncate_list() routine to take a start/end range.
If end is 0, this indicates all pages after start should be unmapped.
This is the same as the existing truncate functionality. Modify existing
callers to add 0 as end of range.

Since the routine will be used in hole punch as well as truncate
operations, it is more appropriately renamed to hugetlb_vmdelete_list().

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit dea23e2a9e811e5fba895a134f701455908aa0d3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: expose hugetlb fault mutex for use by fallocate

Orabug: 21652814

hugetlb page faults are currently synchronized by the table of mutexes
(htlb_fault_mutex_table).  fallocate code will need to synchronize with
the page fault code when it allocates or deletes pages.  Expose interfaces
so that fallocate operations can be synchronized with page faults.  Minor
name changes to be more consistent with other global hugetlb symbols.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fec73245c33b067c60f520908a93c971003664c8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add region_del() to delete a specific range of entries

Orabug: 21652814

fallocate hole punch will want to remove a specific range of pages.  The
existing region_truncate() routine deletes all region/reserve map entries
after a specified offset.  region_del() will provide this same
functionality if the end of region is specified as LONG_MAX.  Hence,
region_del() can replace region_truncate().

Unlike region_truncate(), region_del() can return an error in the rare
case where it can not allocate memory for a region descriptor.  This ONLY
happens in the case where an existing region must be split.  Current
callers passing LONG_MAX as end of range will never experience this error
and do not need to deal with error handling.  Future callers of
region_del() (such as fallocate hole punch) will need to handle this
error.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c2cfad5701106f8ddb0607b9e09d524ef55ef0ec)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix

Orabug: 21652814

fix typo in comment, use more cols

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 203a5abc2eb0bcd7f8e5a3742467e845de368df8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add cache of descriptors to resv_map for region_add

Orabug: 21652814

hugetlbfs is used today by applications that want a high degree of control
over huge page usage.  Often, large hugetlbfs files are used to map a
large number huge pages into the application processes.  The applications
know when page ranges within these large files will no longer be used, and
ideally would like to release them back to the subpool or global pools for
other uses.  The fallocate() system call provides an interface for
preallocation and hole punching within files.  This patch set adds
fallocate functionality to hugetlbfs.

fallocate hole punch will want to remove a specific range of pages.  When
pages are removed, their associated entries in the region/reserve map will
also be removed.  This will break an assumption in the
region_chg/region_add calling sequence.  If a new region descriptor must
be allocated, it is done as part of the region_chg processing.  In this
way, region_add can not fail because it does not need to attempt an
allocation.

To prepare for fallocate hole punch, create a "cache" of descriptors that
can be used by region_add if necessary.  region_chg will ensure there are
sufficient entries in the cache.  It will be necessary to track the number
of in progress add operations to know a sufficient number of descriptors
reside in the cache.  A new routine region_abort is added to adjust this
in progress count when add operations are aborted.  vma_abort_reservation
is also added for callers creating reservations with
vma_needs_reservation/vma_commit_reservation.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 27af163113310a86b6d19bb5693c1a08eb89b0f7)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

Orabug: 21652814

alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the
number of pages which will be added to the reserve map.  Subpool and
global reserve counts are adjusted based on the output of region_chg.
Before the pages are actually added to the reserve map, these routines
could race and add fewer pages than expected.  If this happens, the
subpool and global reserve counts are not correct.

Compare the number of pages actually added (region_add) to those expected
to added (region_chg).  If fewer pages are actually added, this indicates
a race and adjust counters accordingly.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33039678c8da8133e30ea3250d10ae14701dae2b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: compute/return the number of regions added by region_add()

Orabug: 21652814

Modify region_add() to keep track of regions(pages) added to the reserve
map and return this value. The return value can be compared to the return
value of region_chg() to determine if the map was modified between calls.

Make vma_commit_reservation() also pass along the return value of
region_add(). In the normal case, we want vma_commit_reservation to
return the same value as the preceding call to vma_needs_reservation.
Create a common __vma_reservation_common routine to help keep the special
case return values in sync

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cf3ad20bfeadda693e408d85684790714fc29b08)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: document the reserve map/region tracking routines

Orabug: 21652814

While working on hugetlbfs fallocate support, I noticed the following race
in the existing code.  It is unlikely that this race is hit very often in
the current code.  However, if more functionality to add and remove pages
to hugetlbfs mappings (such as fallocate) is added the likelihood of
hitting this race will increase.

alloc_huge_page and hugetlb_reserve_pages use information from the reserve
map to determine if there are enough available huge pages to complete the
operation, as well as adjust global reserve and subpool usage counts.  The
order of operations is as follows:

- call region_chg() to determine the expected change based on reserve map
- determine if enough resources are available for this operation
- adjust global counts based on the expected change
- call region_add() to update the reserve map

The issue is that reserve map could change between the call to region_chg
and region_add.  In this case, the counters which were adjusted based on
the output of region_chg will not be correct.

In order to hit this race today, there must be an existing shared hugetlb
mmap created with the MAP_NORESERVE flag.  A page fault to allocate a huge
page via this mapping must occur at the same another task is mapping the
same region without the MAP_NORESERVE flag.

The patch set does not prevent the race from happening.  Rather, it adds
simple functionality to detect when the race has occurred.  If a race is
detected, then the incorrect counts are adjusted.

Review comments pointed out the need for documentation of the existing
region/reserve map routines.  This patch set also adds documentation in
this area.

This patch (of 3):

This is a documentation only patch and does not modify any code.
Descriptions of the routines used for reserve map/region tracking are
added.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1dd308a7b49d4bdbc17bfa570675ecc8cf7bedb3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

CVE-2015-666: Revert "sched/x86_64: Don't save flags on context switch"

This reverts commit 2c7577a7583747c9b71f26dced7f696b739da745.

CVE Request: Linux x86_64 NT flag issue

When I fixed Linux's NT flag handling, I added an optimization to
Linux 3.19 and up.  A malicious 32-bit program might be able to leak
NT into an unrelated task.  On a CONFIG_PREEMPT=y kernel, this is a
straightforward DoS.  On a CONFIG_PREEMPT=n kernel, it's probably
still exploitable for DoS with some more care.

I believe that this could be used for privilege escalation, too, but
it won't be easy.

The fix is just to revert the optimization:

Orabug: 21689349
CVE:  CVE-2015-666

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

intel_pstate: append more Oracle OEM table id to vendor bypass list

Orabug: 21447179

Append more Oracle X86 servers that have their own power management,

SUN FIRE X4275 M3
SUN FIRE X4170 M3
and
SUN FIRE X6-2

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Fix filesystem deadlock on removal

Move gendisk deletion before controller shutdown so filesystem may sync
dirty pages. Before, this would deadlock trying to allocate requests
on frozen queues that are about to be deleted.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:

drivers/block/nvme-core.c

Orabug: 21569452
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Failed controller initialization fixes

This fixes an infinite device reset loop that may occur on devices that
fail initialization. If the drive fails to become ready for any reason
that does not involve an admin command timeout, the probe task should
assume the drive is unavailable and remove it from the topology. In
the case an admin command times out during device probing, the driver's
existing reset action will handle removing the drive.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit de3eff2bad56f0a29d3915105223d368f2bbc94e)

Conflicts:

drivers/block/nvme-core.c

Orabug: 21569452

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Unify controller probe and resume

This unifies probe and resume so they both may be scheduled in the same
way. This is necessary for error handling that may occur during device
initialization since the task to cleanup the device wouldn't be able to
run if it is blocked on device initialization.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:

drivers/block/nvme-core.c

Orabug: 21569452

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Automatic namespace rescan

Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.

There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

block: add blk_set_queue_dying() to blkdev.h

We export this function and NVMe wants to use it, but for some reason
it was never added to the block header. Do that.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3f21c265cd5f7ae867cc0e86a1f6d5093f1963cc)

Orabug: 21569452
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Don't use fake status on cancelled command

Synchronized commands do different things for timed out commands
vs. controller returned errors.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 17188bb403e9098a815dd850aedb6c150d2a3a6b)

Conflicts:

drivers/block/nvme-core.c

Orabug: 21569452

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Fix device cleanup on initialization failure

Don't release block queue and tagging resoureces if the driver never
got them in the first place. This can happen if the controller fails to
become ready, if memory wasn't available to allocate a tagset or admin
queue, or if the resources were released as part of error recovery.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4af0e21caf8e676e57ea27d7cce3426e473e498c)

Conflicts:

drivers/block/nvme-core.c

Orabug: 21569452

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: add sysfs and ioctl controller reset

We need the ability to perform an nvme controller reset as discussed on
the mailing list thread:

http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html

This adds a sysfs entry that when written to will reset perform an NVMe
controller reset if the controller was successfully initialized in the
first place.

This also adds locking around resetting the device in the async probe
method so the driver can't schedule two resets.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Brandon Schultz <brandon.schulz@hgst.com>
Cc: David Sariel <david.sariel@pmcs.com>
Updated by Jens to:

1) Merge this with the ioctl reset patch from David Sariel. The ioctl
path now shares the reset code from the sysfs path.

2) Don't flush work if we fail issuing the reset.

Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 21569452
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

NVMe: Return busy status on suspended queue

The sync path previously scheduled itself out unconditionally, but
that shouldn't happen if the command wasn't posted to the submission
queue. This patch returns immediately when the suspended queue returns
busy status.

Orabug: 21316131

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Linux 4.1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull scsi target fixes from Nicholas Bellinger:
"Apologies for the late pull request.

  Here are the outstanding target-pending fixes for v4.1 code.

  The series contains three patches from Sagi + Co that address a few
  iser-target issues that have been uncovered during recent testing at
  Mellanox.

  Patch #1 has a v3.16+ stable tag, and #2-3 have v3.10+ stable tags"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iser-target: Fix possible use-after-free
  iser-target: release stale iser connections
  iser-target: Fix variable-length response error completion

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"A smattering of fixes,

  mgag200:
      don't accept modes that aren't aligned properly as hw can't do it

  i915:
      two regression fixes

  radeon:
      one query to allow userspace fixes
      one oops fixer for older hw with new options enabled"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: don't probe MST on hw we don't support it on
  drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query
  drm/mgag200: Reject non-character-cell-aligned mode widths
  Revert "drm/i915: Don't skip request retirement if the active list is empty"
  drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Michael Turquette:
"Very late clk regression fixes for the ARM-based AT91 platform.

  These went unnoticed by me until recently, hence the late pull
  request"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: at91: fix h32mx prototype inclusion in pmc header
  clk: at91: trivial: typo in peripheral clock description
  clk: at91: fix PERIPHERAL_MAX_SHIFT definition
  clk: at91: pll: fix input range validity check

Merge tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Nothing looks scary, just a few usual HD-audio regression fixes and
  fixup, in addition to a minor Kconfig dependency fix for the old MIPS
  drivers"

* tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix unused label skip_i915
  ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model)
  ALSA: mips: let SND_SGI_O2 select SND_PCM
  ALSA: hda - Fix audio crackles on Dell Latitude E7x40
  ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine

Merge branch 'ccf/atmel-fixes-for-4.1' of https://github.com/bbrezillon/linux-at91 into clk-fixes

clk: at91: fix h32mx prototype inclusion in pmc header

Trivial fix that prevents to compile this pmc clock driver if h32mx clock is
present but smd clock isn't.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock")
Cc: <stable@vger.kernel.org> # 3.18+

clk: at91: trivial: typo in peripheral clock description

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

clk: at91: fix PERIPHERAL_MAX_SHIFT definition

Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the
round_rate and set_rate logic accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: "Wu, Songjun" <Songjun.Wu@atmel.com>

clk: at91: pll: fix input range validity check

The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.

Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Jonas Andersson <jonas@microbit.se>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c documentation fix from Wolfram Sang:
"Here is a small documentation fix for I2C.

  We already had a user who unsuccessfully tried to get the new slave
  framework running with the currently broken example.  So, before this
  happens again, I'd like to have this how-to-use section fixed for 4.1
  already.  So that no more hacking time is wasted"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: slave: fix the example how to instantiate from userspace

revert "cpumask: don't perform while loop in cpumask_next_and()"

Revert commit 534b483a86e6 ("cpumask: don't perform while loop in
cpumask_next_and()").

This was a minor optimization, but it puts a `struct cpumask' on the
stack, which consumes too much stack space.

Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-intel-fixes-2015-06-18' of git://anongit.freedesktop.org/drm-intel into drm-fixes

one fix, one revert
* tag 'drm-intel-fixes-2015-06-18' of git://anongit.freedesktop.org/drm-intel:
Revert "drm/i915: Don't skip request retirement if the active list is empty"
drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes

two radeon fixes
one MST fix,
one query addition, destined for stable, and to fix a regression
* 'drm-fixes-4.1' of git://people.freedesktop.org/~deathsimple/linux:
drm/radeon: don't probe MST on hw we don't support it on
drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query

drm/radeon: don't probe MST on hw we don't support it on

If you do radeon.mst=1 on a gpu without mst hw, and then
plug some mst hw it will oops instead of falling back.

So check we have DCE5 at least before proceeding.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Christian König <christian.koenig@amd.com>

drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query

This tells userspace that it's safe to use the RADEON_VA_UNMAP operation
of the DRM_RADEON_GEM_VA ioctl.

Cc: stable@vger.kernel.org
(NOTE: Backporting this commit requires at least backports of commits
26d4d129b6042197b4cbc8341c0618f99231af2f,
48afbd70ac7b6aa62e8d452091023941d8085f8a and
c29c0876ec05d51a93508a39b90b92c29ba6423d as well, otherwise using
RADEON_VA_UNMAP runs into trouble)

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>

Merge tag 'trace-fix-filter-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing filter fix from Steven Rostedt:
"Vince Weaver reported a warning when he added perf event filters into
  his fuzzer tests.  There's a missing check of balanced operations when
  parenthesis are used, and this triggers a WARN_ON() and when reading
  the failure, the filter reports no failure occurred.

  The operands were not being checked if they match, this adds that"

* tag 'trace-fix-filter-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have filter check for balanced ops

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm bugfix from Marcelo Tosatti:
"Rrestore APIC migration functionality"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix lapic.timer_mode on restore

Kconfig: disable Media Controller for DVB

Since when we start discussions about the usage Media Controller for
complex hardware, one thing become clear: the way it is, MC fails to
map anything different than capture/output/m2m video-only streaming.

The point is that MC has entities named as devnodes, but the only
devnode used (before the DVB patches) is MEDIA_ENT_T_DEVNODE_V4L.
Due to the way MC got implemented, however, this entity actually
doesn't represent the devnode, but the hardware I/O engine that
receives data via DMA.

By coincidence, such DMA is associated with the V4L device node
on webcam hardware, but this is not true even for other V4L2
devices. For example, on USB hardware, the DMA is done via the
USB controller. The data passes though a in-kernel filter that
strips off the URB headers. Other V4L2 devices like radio may not
even have DMA. When it have, the DMA is done via ALSA, and not
via the V4L devnode.

In other words, MC is broken as a whole, but tagging it as BROKEN
right now would do more harm than good.

So, instead, let's mark, for now, the DVB part as broken and
block all new changes to MC while we fix this mess, whith
we hopefully will do for the next Kernel version.

Requested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes the following issues:

   - Crash in caam hash due to uninitialised buffer lengths.

   - Alignment issue in caam RNG that may lead to non-random output"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: caam - fix RNG buffer cache alignment
  crypto: caam - improve initalization for context state saves

mm: shmem_zero_setup skip security check and lockdep conflict with XFS

It appears that, at some point last year, XFS made directory handling
changes which bring it into lockdep conflict with shmem_zero_setup():
it is surprising that mmap() can clone an inode while holding mmap_sem,
but that has been so for many years.

Since those few lockdep traces that I've seen all implicated selinux,
I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
v3.13's commit c7277090927a ("security: shmem: implement kernel private
shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.

This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
(and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers
which cloned inode in mmap(), but if so, I cannot locate them now.

Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com>
Reported-and-tested-by: Daniel Wagner <wagi@monom.org>
Reported-and-tested-by: Morten Stevens <mstevens@fedoraproject.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i2c: slave: fix the example how to instantiate from userspace

I copied the wrong shell code into the documentation. Sorry to all who
tried to get sense out of this current example :/ Slight rewording while
we are here.

Reported-by: Tim Bakker <bakkert@mymail.vcu.edu>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org

tracing: Have filter check for balanced ops

When the following filter is used it causes a warning to trigger:

# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error

------------[ cut here ]------------
WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
Modules linked in: bnep lockd grace bluetooth  ...
CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
  0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
  0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
  ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
Call Trace:
  [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
  [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
  [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
  [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81159065>] replace_preds+0x3c5/0x990
  [<ffffffff811596b2>] create_filter+0x82/0xb0
  [<ffffffff81159944>] apply_event_filter+0xd4/0x180
  [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
  [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
  [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
  [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
  [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
  [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
  [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
---[ end trace e11028bd95818dcd ]---

Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.

This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:

# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression

And give no kernel warning.

Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: stable@vger.kernel.org # 2.6.31+
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ALSA: hda - Fix unused label skip_i915

When CONFIG_SND_HDA_I915=n, we get a compile warning:
  sound/pci/hda/hda_intel.c: In function ‘azx_probe_continue’:
  sound/pci/hda/hda_intel.c:1882:2: warning: label ‘skip_i915’ defined but not used [-Wunused-label]

Fix it by putting again ifdef to it.  Sigh.

Fixes: bf06848bdbe5 ('ALSA: hda - Continue probing even if i915 binding fails')
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

crypto: caam - fix RNG buffer cache alignment

The hwrng output buffers (2) are cast inside of a a struct (caam_rng_ctx)
allocated in one DMA-tagged region. While the kernel's heap allocator
should place the overall struct on a cacheline aligned boundary, the 2
buffers contained within may not necessarily align. Consenquently, the ends
of unaligned buffers may not fully flush, and if so, stale data will be left
behind, resulting in small repeating patterns.

This fix aligns the buffers inside the struct.

Note that not all of the data inside caam_rng_ctx necessarily needs to be
DMA-tagged, only the buffers themselves require this. However, a fix would
incur the expense of error-handling bloat in the case of allocation failure.

Cc: stable@vger.kernel.org
Signed-off-by: Steve Cornelius <steve.cornelius@freescale.com>
Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - improve initalization for context state saves

Multiple function in asynchronous hashing use a saved-state block,
a.k.a. struct caam_hash_state, which holds a stash of information
between requests (init/update/final). Certain values in this state
block are loaded for processing using an inline-if, and when this
is done, the potential for uninitialized data can pose conflicts.
Therefore, this patch improves initialization of state data to
prevent false assignments using uninitialized data in the state block.

This patch addresses the following traceback, originating in
ahash_final_ctx(), although a problem like this could certainly
exhibit other symptoms:

kernel BUG at arch/arm/mm/dma-mapping.c:465!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 80004000
[00000000] *pgd=00000000
Internal error: Oops: 805 [#1] PREEMPT SMP
Modules linked in:
CPU: 0    Not tainted  (3.0.15-01752-gdd441b9-dirty #40)
PC is at __bug+0x1c/0x28
LR is at __bug+0x18/0x28
pc : [<80043240>]    lr : [<8004323c>]    psr: 60000013
sp : e423fd98  ip : 60000013  fp : 0000001c
r10: e4191b84  r9 : 00000020  r8 : 00000009
r7 : 88005038  r6 : 00000001  r5 : 2d676572  r4 : e4191a60
r3 : 00000000  r2 : 00000001  r1 : 60000093  r0 : 00000033
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 1000404a  DAC: 00000015
Process cryptomgr_test (pid: 1306, stack limit = 0xe423e2f0)
Stack: (0xe423fd98 to 0xe4240000)
fd80:                                                       11807fd1 80048544
fda0: 88005000 e4191a00 e5178040 8039dda0 00000000 00000014 2d676572 e4191008
fdc0: 88005018 e4191a60 00100100 e4191a00 00000000 8039ce0c e423fea8 00000007
fde0: e4191a00 e4227000 e5178000 8039ce18 e419183c 80203808 80a94a44 00000006
fe00: 00000000 80207180 00000000 00000006 e423ff08 00000000 00000007 e5178000
fe20: e41918a4 80a949b4 8c4844e2 00000000 00000049 74227000 8c4844e2 00000e90
fe40: 0000000e 74227e90 ffff8c58 80ac29e0 e423fed4 8006a350 8c81625c e423ff5c
fe60: 00008576 e4002500 00000003 00030010 e4002500 00000003 e5180000 e4002500
fe80: e5178000 800e6d24 007fffff 00000000 00000010 e4001280 e4002500 60000013
fea0: 000000d0 804df078 00000000 00000000 00000000 00000000 00000000 00000000
fec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
fee0: 00000000 00000000 e4227000 e4226000 e4753000 e4752000 e40a5000 e40a4000
ff00: e41e7000 e41e6000 00000000 00000000 00000000 e423ff14 e423ff14 00000000
ff20: 00000400 804f9080 e5178000 e4db0b40 00000000 e4db0b80 0000047c 00000400
ff40: 00000000 8020758c 00000400 ffffffff 0000008a 00000000 e4db0b40 80206e00
ff60: e4049dbc 00000000 00000000 00000003 e423ffa4 80062978 e41a8bfc 00000000
ff80: 00000000 e4049db4 00000013 e4049db0 00000013 00000000 00000000 00000000
ffa0: e4db0b40 e4db0b40 80204cbc 00000013 00000000 00000000 00000000 80204cfc
ffc0: e4049da0 80089544 80040a40 00000000 e4db0b40 00000000 00000000 00000000
ffe0: e423ffe0 e423ffe0 e4049da0 800894c4 80040a40 80040a40 00000000 00000000
[<80043240>] (__bug+0x1c/0x28) from [<80048544>] (___dma_single_dev_to_cpu+0x84)
[<80048544>] (___dma_single_dev_to_cpu+0x84/0x94) from [<8039dda0>] (ahash_fina)
[<8039dda0>] (ahash_final_ctx+0x180/0x428) from [<8039ce18>] (ahash_final+0xc/0)
[<8039ce18>] (ahash_final+0xc/0x10) from [<80203808>] (crypto_ahash_op+0x28/0xc)
[<80203808>] (crypto_ahash_op+0x28/0xc0) from [<80207180>] (test_hash+0x214/0x5)
[<80207180>] (test_hash+0x214/0x5b8) from [<8020758c>] (alg_test_hash+0x68/0x8c)
[<8020758c>] (alg_test_hash+0x68/0x8c) from [<80206e00>] (alg_test+0x7c/0x1b8)
[<80206e00>] (alg_test+0x7c/0x1b8) from [<80204cfc>] (cryptomgr_test+0x40/0x48)
[<80204cfc>] (cryptomgr_test+0x40/0x48) from [<80089544>] (kthread+0x80/0x88)
[<80089544>] (kthread+0x80/0x88) from [<80040a40>] (kernel_thread_exit+0x0/0x8)
Code: e59f0010 e1a01003 eb126a8d e3a03000 (e5833000)
---[ end trace d52a403a1d1eaa86 ]---

Cc: stable@vger.kernel.org
Signed-off-by: Steve Cornelius <steve.cornelius@freescale.com>
Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

KVM: x86: fix lapic.timer_mode on restore

lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.

Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.

There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.

Cc: stable@vger.kernel.org
Fixes: f30ebc312ca9 ("KVM: x86: optimize some accesses to LVTT and SPIV")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

drm/mgag200: Reject non-character-cell-aligned mode widths

Turns out 1366x768 does not in fact work on this hardware.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model)

The new Dell XPS13 also requires the similar quirk for fixing the
noisy outputs. (But, as the codec was changed, now the fixup for
Latitude is used instead.)

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=99851
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Revert "drm/i915: Don't skip request retirement if the active list is empty"

This reverts commit 0aedb1626566efd72b369c01992ee7413c82a0c5.

I messed things up while applying [1] to drm-intel-fixes. Rectify.

[1] http://mid.gmane.org/1432827156-9605-1-git-send-email-ville.syrjala@linux.intel.com

Fixes: 0aedb1626566 ("drm/i915: Don't skip request retirement if the active list is empty")
Cc: stable@vger.kernel.org
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

ALSA: mips: let SND_SGI_O2 select SND_PCM

Fix the missing dependency on PCM stuff.

[Add the same fix for HAL2, too -- tiwai]

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Fix audio crackles on Dell Latitude E7x40

We still got a report that the audio crackles and noises occur with
the recent 4.1 kernels on Dell machines. These machines seem to need
similar workarounds that have been applied to the recent Dell XPS 13
models. Since the codec of these machines (Dell Latitute E7240 and
E7440) is different from XPS 13's one, we need a new fixup entry.

Also, it was confirmed that the previous workaround to disable the
widget power-save (commit [219f47e4f964: ALSA: hda - Disable widget
power-saving for ALC292 & co]) is no longer needed after this fix.
So, this patch includes the partial revert of the commit, too.

Reported-and-tested-by: Mihai Donțu <mihai.dontu@gmail.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine

On a HP Envy TouchSmart laptop, there are 2 speakers (main speaker
and subwoofer speaker), 1 headphone and 2 DACs, without this fixup,
the headphone will be assigned to a DAC and the 2 speakers will be
assigned to another DAC, this assignment makes the surround-2.1
channels invalid.

To fix it, here using a DAC/pin preference map to bind the main
speaker to 1 DAC and the subwoofer speaker will be assigned to another
DAC.

Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

With the introduction of multiple views of an obj in the same vm, each
vma was taught to cache its copy of the pages (so that different views
could have different page arrangements). However, this missed decoupling
those vma->ggtt_view.pages when the vma released its reference on the
obj->pages. As we don't always free the vma, this leads to a possible
scenario (e.g. execbuffer interrupted by the shrinker) where the vma
points to a stale obj->pages, and explodes.

Fixes regression from commit fe14d5f4e5468c5b80a24f1a64abcbe116143670
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Wed Dec 10 17:27:58 2014 +0000

drm/i915: Infrastructure for supporting different GGTT views per object

Tvrtko says, if someone else will be confused how this can happen, key
is the reservation execbuffer path. That puts the VMA on the exec_list
which prevents i915_vma_unbind and i915_gem_vma_destroy from fully
destroying the VMA. So the VMA is left existing as an empty object in
the list - unbound and disassociated with the backing store. Kind of a
cached memory object. And then re-using it needs to clear the cached
pages pointer which is fixed above.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1227892
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[Jani: Added Tvrtko's explanation to commit message.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

Linux 4.1-rc8

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"Here are hopefully last set of fixes for 4.1. This time we have:

   - fixing pause capability reporting on both dmaengine pause & resume
     support by Krzysztof

   - locking fix fir at_xdmac by Ludovic

   - slave configuration fix for at_xdmac by Ludovic"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Fix choppy sound because of unimplemented resume
  dmaengine: at_xdmac: rework slave configuration part
  dmaengine: at_xdmac: lock fixes

Merge tag 'ntb-4.1' of git://github.com/jonmason/ntb

Pull NTB fixes from Jon Mason:
"I apologize for the tardiness of this request.  Here are a couple of
  last minute NTB bug fixes for v4.1:

  NTB bug fixes to address issues in unmapping the MW reg base and
  vbase, and an uninitialized variable on Atom platforms"

* tag 'ntb-4.1' of git://github.com/jonmason/ntb:
  ntb: initialize max_mw for Atom before using it
  ntb: iounmap MW reg and vbase in error path

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull more MIPS fixes from Ralf Baechle:
"Another round of 4.1 MIPS fixes, one fix to a MIPS-specific #if
  condition in lib/mpi, one fix to the MIPS GIC irqchip driver and one
  SSB fix.

  Details:
   - fix handling of clock in chipco SSB driver.
   - fix two MIPS-specific #if conditions to correctly work for GCC 5.1.
   - fix damage to R6 pgtable bits done by XPA support.
   - fix possible crash due to unloading modules that contain statically
     defined platform devices.
   - fix disabling of the MSA ASE on context switch to also work
     correctly when a new thread/process has the CPU for the very first
     time.

  This is part of linux-next and has been beaten to death on
  Imagination's test farm.

  While things are not looking too grim this pull request also means the
  rate of fixes for 4.1 remains nearly constant so I'd not be unhappy if
  you'd delay the release"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MPI: MIPS: Fix compilation error with GCC 5.1
  IRQCHIP: mips-gic: Don't nest calls to do_IRQ()
  MIPS: MSA: bugfix - disable MSA correctly for new threads/processes.
  MIPS: Loongson: Do not register 8250 platform device from module.
  MIPS: Cobalt: Do not build MTD platform device registration code as module.
  SSB: Fix handling of ssb_pmu_get_alp_clock()
  MIPS: pgtable-bits: Fix XPA damage to R6 definitions.

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irqchip fix from Thomas Gleixner:
"A single fix for an off by one bug in the sunxi irqchip driver"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: sunxi-nmi: Fix off-by-one error in irq iterator

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull lockdep fix from Ingo Molnar:
"A lockdep/modules unload race fix that can oops"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix a race between /proc/lock_stat and module unload

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"A regression fix for a crash, and a Intel HSW uncore PMU driver fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization"
perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP

Merge tag 'sound-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Most of commits are regression fixes for HD-audio: a few corner case
  fixes for regmap transition, and i915 binding issues.

  In addition, a quirk for another USB-audio device supporting DSD"

* tag 'sound-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Abort the probe without i915 binding for HSW/BDW
  ALSA: hda - Re-add the lost fake mute support
  ALSA: hda - Continue probing even if i915 binding fails
  ALSA: hda - Don't actually write registers for caps overwrites
  ALSA: hda - fix number of devices query on hotplug
  ALSA: usb-audio: add native DSD support for JLsounds I2SoverUSB

MPI: MIPS: Fix compilation error with GCC 5.1

This patch fixes mips compilation error:

lib/mpi/generic_mpih-mul1.c: In function 'mpihelp_mul_1':
lib/mpi/longlong.h:651:2: error: impossible constraint in 'asm'

Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/10546/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

IRQCHIP: mips-gic: Don't nest calls to do_IRQ()

The GIC chained handlers use do_IRQ() to call the subhandlers. This
means that irq_enter() calls get nested, which leads to preempt count
looking like we're in nested interrupts, which in turn leads to all
system time being accounted as IRQ time in account_system_time().

Fix it by using generic_handle_irq(). Since these same functions are
used in some systems (if cpu_has_veic) from a low-level vectored
interrupt handler which does not go throught do_IRQ(), we need to do it
conditionally.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: tglx@linutronix.de
Cc: jason@lakedaemon.net
Patchwork: https://patchwork.linux-mips.org/patch/10545/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix uninitialized struct station_info in cfg80211_wireless_stats(),
    from Johannes Berg.

2) Revert commit attempt to fix ipv6 protocol resubmission, it adds
    regressions.

3) Endless loops can be created in bridge port lists, fix from Nikolay
    Aleksandrov.

4) Don't WARN_ON() if sk->sk_forward_alloc is non-zero in
    sk_clear_memalloc, it is a legal situation during swap deactivation.
    Fix from Mel Gorman.

5) Fix order of disabling interrupts and unlocking NAPI in enic driver
    to avoid a race.  From Govindarajulu Varadarajan.

6) High and low register writes are swapped when programming the start
    of periodic output in igb driver.  From Richard Cochran.

7) Fix device rename handling in mpls stack, from Robert Shearman.

8) Do not trigger compaction synchronously when optimistically trying
    to allocate an order 3 page in alloc_skb_with_frags() and
    skb_page_frag_refill().  From Shaohua Li.

9) Authentication with COOKIE_ECHO is not handled properly in SCTP, fix
    from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt
  sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
  net: don't wait for order-3 page allocation
  mpls: handle device renames for per-device sysctls
  net: igb: fix the start time for periodic output signals
  enic: fix memory leak in rq_clean
  enic: check return value for stat dump
  enic: unlock napi busy poll before unmasking intr
  net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap
  bridge: fix multicast router rlist endless loop
  tipc: disconnect socket directly after probe failure
  Revert "ipv6: Fix protocol resubmission"
  cfg80211: wext: clear sinfo struct before calling driver

Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt

This patch fix URL (http to https) for wiki.wireshark.org.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO

Currently, we can ask to authenticate DATA chunks and we can send DATA
chunks on the same packet as COOKIE_ECHO, but if you try to combine
both, the DATA chunk will be sent unauthenticated and peer won't accept
it, leading to a communication failure.

This happens because even though the data was queued after it was
requested to authenticate DATA chunks, it was also queued before we
could know that remote peer can handle authenticating, so
sctp_auth_send_cid() returns false.

The fix is whenever we set up an active key, re-check send queue for
chunks that now should be authenticated. As a result, such packet will
now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

Reported-by: Liu Wei <weliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>