www.infradead.org Git - users/jedix/linux-maple.git/log

Btrfs: protect orphan block rsv with spin_lock

We've been seeing warnings coming out of the orphan commit stuff forever from
ceph.  Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock.  So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime.  Then clear
the root's orphan block rsv and release the lock.  With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
(cherry picked from commit 90290e19820e3323ce6b9c2888eeb68bf29c278b)

Btrfs: don't call btrfs_throttle in file write

Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 45a8090e626ab470c91142954431a93846030b0d)

Btrfs: release space on error in page_mkwrite

If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit ec39e180fd3188c983c94603634bfcd019f42ae7)

Btrfs: fix btrfsck error 400 when truncating a compressed

Reproduce steps:
# mkfs.btrfs /dev/sdb5
# mount /dev/sdb5 -o compress=lzo /mnt
# dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1
# sync
# truncate -s 64K /mnt/tmpfile
root 5 inode 257 errors 400

This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit f70a9a6b94af86fca069a7552ab672c31b457786)

Btrfs: do not use btrfs_end_transaction_throttle everywhere

A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount.  This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns.  This adds a ridiculous amount of
latency and isn't really needed.  The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return.  This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 7ad85bb76a61801362701b77c5cee5aa09f35369)

Btrfs: fix possible deadlock when opening a seed device

The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

    open_ctree()
        lock(chunk_mutex);
        read_chunk_tree();
            read_one_dev();
                open_seed_devices();
                    lock(uuid_mutex);

and then we hit a lockdep splat.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit b367e47fb3a70f5d24ebd6faf7d42436d485fb2d)

Btrfs: update global block_rsv when creating a new block group

A bug was triggered while using seed device:

    # mkfs.btrfs /dev/loop1
    # btrfstune -S 1 /dev/loop1
    # mount -o /dev/loop1 /mnt
    # btrfs dev add /dev/loop2 /mnt

btrfs: block rsv returned -28
------------[ cut here ]------------
WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]()
...
Call Trace:
...
[<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs]
[<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs]
[<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs]
[<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs]
[<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs]
[<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
[<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs]
[<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs]
[<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs]
[<c04f55f7>] do_vfs_ioctl+0x496/0x4cc
[<c04f5660>] sys_ioctl+0x33/0x4f
[<c07b9edf>] sysenter_do_call+0x12/0x38
---[ end trace 906adac595facc7d ]---

Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit c7c144db531fda414e532adac56e965ce332e2a5)

Btrfs: rewrite btrfs_trim_block_group()

There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Before patching:

# fstrim -v /mnt/
/mnt/: 1496465408 bytes were trimmed

After patching:

# fstrim -v /mnt/
/mnt/: 2193768448 bytes were trimmed

And this matches the total free space:

# btrfs fi df /mnt
Data: total=3.58GB, used=1.79GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=205.12MB, used=97.14MB
Metadata: total=8.00MB, used=0.00

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit 7fe1e641502616220437079258506196bc4d8cbf)

Btrfs: simplfy calculation of stripe length for discard operation

For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().

However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:

        dev1          dev2           dev3    (raid0)
        -----------------------------------
        s0 s3 s6      s1 s4 s7       s2 s5

Each device has (total_stripes / nr_dev) stripes, or plus one.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit ec9ef7a13be4dcce964c8503e8999087945e5b9e)

Btrfs: don't pre-allocate btrfs bio

We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.

Also we don't have to calcuate the stripe number twice.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit de11cc12df17337979e0929d2831887432f236ca)

Btrfs: don't pass a trans handle unnecessarily in volumes.c

Some functions never use the transaction handle passed to them.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit 125ccb0ae6806dbec31abf4a85448971df3b4e39)

Btrfs: reserve metadata space in btrfs_ioctl_setflags()

Check and reserve space for btrfs_update_inode().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit 4da6f1a332f6c16b6594c7892f13c31459b9b1c8)

Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()

We can recover from errors and return -errno to user space.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit f062abf089ff705e09bbaa6fa1e2fd7688a0f2ea)

Btrfs: check the return value of io_ctl_init()

It can return -ENOMEM.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit 706efc6630c2722602541a6a2fc5900a4e38456a)

Btrfs: avoid possible NULL deref in io_ctl_drop_pages()

If we run into some failure path in io_ctl_prepare_pages(),
io_ctl->pages[] array may have some NULL pointers.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit a1ee5a45818acc7f9c13e560827cf3e8735ac919)

Btrfs: add pinned extents to on-disk free space cache correctly

I got this while running xfstests:

[24256.836098] block group 317849600 has an wrong amount of free space
[24256.836100] btrfs: failed to load free space cache for block group 317849600

We should clamp the extent returned by find_first_extent_bit(),
so the start of the extent won't smaller than the start of the
block group.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
(cherry picked from commit db804f23a72bada58f083dfad6a65d019ddb3bd4)

Btrfs: revamp clustered allocation logic

Parameterize clusters on minimum total size, minimum chunk size and
minimum contiguous size for at least one chunk, without limits on
cluster, window or gap sizes. Don't tolerate any fragmentation for
SSD_SPREAD; accept it for metadata, but try to keep data dense.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 1bb91902dc90e25449893e693ad45605cb08fbe5)

Btrfs: don't set up allocation result twice

We store the allocation start and length twice in ins, once right
after the other, but with intervening calls that may prevent the
duplicate from being optimized out by the compiler. Remove one of the
assignments.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit fc7c1077ceb99c35e5f9d0ce03dc7740565bb2bf)

Btrfs: test free space only for unclustered allocation

Since the clustered allocation may be taking extents from a different
block group, there's no point in spin-locking and testing the current
block group free space before attempting to allocate space from a
cluster, even more so when we might refrain from even trying the
cluster in the current block group because, after the cluster was set
up, not enough free space remained. Furthermore, cluster creation
attempts fail fast when the block group doesn't have enough free
space, so the test was completely superfluous.

I've move the free space test past the cluster allocation attempt,
where it is more useful, and arranged for a cluster in the current
block group to be released before trying an unclustered allocation,
when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
the cluster stands a chance of being combined with additional free
space in the block group so as to succeed in the allocation attempt.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit a5f6f719a5cd7caeee8ed8137cf3f94c3bbebc65)

Btrfs: use bigger metadata chunks on bigger filesystems

The 256MB chunk is a little small on a huge FS. This scales up the
chunk size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 1100373f8aa69e377386499350496e3d8565605f)

Btrfs: lower the bar for chunk allocation

The chunk allocation code has tried to keep a pretty tight lid on creating new
metadata chunks. This is partially because in the past the reservation
code didn't give us an accurate idea of how much space was being used.

The new code is much more accurate, so we're able to get rid of some of these
checks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit cf1d72c9ceec391d34c48724da57282e97f01122)

Btrfs: run chunk allocations while we do delayed refs

Btrfs tries to batch extent allocation tree changes to improve performance
and reduce metadata trashing. But it doesn't allocate new metadata chunks
while it is doing allocations for the extent allocation tree.

This commit changes the delayed refence code to do chunk allocations if we're
getting low on room. It prevents crashes and improves performance.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 203bf287cb01a5dc26c20bd3737cecf3aeba1d48)

Btrfs: call d_instantiate after all ops are setup

This closes races where btrfs is calling d_instantiate too soon during
inode creation. All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
(cherry picked from commit 08c422c27f855d27b0b3d9fa30ebd938d4ae6f1f)

Btrfs: fix worker lock misuse in find_worker

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
(cherry picked from commit 8d532b2afb2eacc84588db709ec280a3d1219be3)

xen/config: turn CONFIG_XEN_DEBUG_FS off.

That option makes the Xen spinlock code (xen/spinlock.c) accumulate
statistics about how many locks taken, time in slowpath, etc.
Good information during debugging but not in production.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

proc: clean up and fix /proc/<pid>/mem handling

Orabug: 13618927
CVE-2012-0056
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.

Reported-by: Jüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

fs/proc/base.c

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

set XEN_MAX_DOMAIN_MEMORY for 512

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

add __init arguments to init functions

Fix following issues:
WARNING: vmlinux.o(.text+0x3aba): Section mismatch in reference from the function xen_align_and_add_e820_region() to the function .init.text:e820_add_region()
The function xen_align_and_add_e820_region() references
the function __init e820_add_region().
This is often because xen_align_and_add_e820_region lacks a __init
annotation or the annotation of e820_add_region is wrong.

WARNING: vmlinux.o(.text+0x2e9ec): Section mismatch in reference from the function acpi_map_cpu2node() to the variable .cpuinit.data:__apicid_to_node
The function acpi_map_cpu2node() references
the variable __cpuinitdata __apicid_to_node.
This is often because acpi_map_cpu2node lacks a __cpuinitdata
annotation or the annotation of __apicid_to_node is wrong.

WARNING: vmlinux.o(.text+0x2e9f1): Section mismatch in reference from the function acpi_map_cpu2node() to the function .cpuinit.text:numa_set_node()
The function acpi_map_cpu2node() references
the function __cpuinit numa_set_node().
This is often because acpi_map_cpu2node lacks a __cpuinit
annotation or the annotation of numa_set_node is wrong.

WARNING: vmlinux.o(.text+0x3f9b4): Section mismatch in reference from the function enable_iommus() to the function .init.text:iommu_set_device_table()
The function enable_iommus() references
the function __init iommu_set_device_table().
This is often because enable_iommus lacks a __init
annotation or the annotation of iommu_set_device_table is wrong.

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

hpwdt: clean up set_memory_x call for 32 bit

1. addess has to be page aligned.
2. set_memory_x uses page size argument, not size.
Bug causes with following commit:
commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date:   Mon Nov 7 10:59:00 2011 +0100

     watchdog: hpwdt: Changes to handle NX secure bit in 32bit path

    commit e67d668e147c3b4fec638c9e0ace04319f5ceccd upstream.

    This patch makes use of the set_memory_x() kernel API in order
    to make necessary BIOS calls to source NMIs.

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

SPEC: v2.6.39-100.0.20

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Enable Kabi Check
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

net/bna driver update from 2.3.2.3 to 3.0.2.2

Orabug: 13255
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

scsi/bfa driver update from 2.3.2.3 to 3.0.2.2

Orabug: 13254
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Updated version to 5.02.00.00.06.02-uek1

Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Added error logging for firmware abort

JIRA Key: IUEKR2ISCSI-15

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Disable generating pause frames in case of FW hung

JIRA Key: IUEKR2ISCSI-14

Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Temperature monitoring for ISP82XX core.

During watchdog, need to monitor temperature of ISP82XX core
and set device state to FAILED when temperature reaches
"Panic" level.

JIRA Key: IUEKR2ISCSI-13

Signed-off-by: Mike Hernandez <michael.hernandez@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: check for FW alive before calling chip_reset

Check for firmware alive and do premature completion of
mbox commands in case of FW hung before doing chip_reset

Jira Key: IUEKR2ISCSI-16

Signed-off-by: Shyam Sunder <shyam.sunder@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Remove the unused macros

Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Lalit Chandivade <lalit.chadnivade@qlogic.com>

qla4xxx: cleanup, make qla4xxx_build_ddb_list short

Make qla4xxx_build_ddb_list shorter by adding more helper functions.

JIRA Key: IUEKR2ISCSI-12

Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: clear the RISC interrupt bit during firmware init

Fix for the kdump kernel panic issue with 80xx adapters.

JIRA Key: IUEKR2ISCSI-11

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Sarang Radke <sarang.radke@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: clear the SCSI COMPLETION INTERRUPT bit during firmware init

Fix for the kdump kernel panic issue on 40xx adapters.

JIRA Key: IUEKR2ISCSI-10

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Prasanna Mumbai <prasanna.mumbai@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Fixed BFS with sendtargets as boot index.

If ql4xdisablesysfsboot = 0 and sendtargets entry as boot index then
driver does export sendtarget entries in sysfs but iscsistart does not
do discovery. So in this case let driver do the discovery and
login to the targets.

JIRA Key: IUEKR2ISCSI-9

Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Correct the default relogin timeout value

The ACB default timeout value is used to set the default
relogin timeout value. For ISP4022 adapters where
the ACB default value is set to 2560s, limit the relogin
timeout to 12s.

JIRA Key: IUEKR2ISCSI-8

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

qla4xxx: Limit the ACB Default Timeout value to 12s

The ACB default timeout value is set to 2560s in the
ISP4022 firmware. This caused the driver to loop
for 2560s. Hence limit the default timeout at the driver
level to min 12s.

Also break out from the loop if the sendtargets list was empty.

JIRA Key: IUEKR2ISCSI-7

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

bond_alb: don't disable softirq under bond_alb_xmit

Orabug: 13323879
No need to lock soft irqs under bond_alb_xmit()
which already has softirq disabled.

Changes:
1. add non-bh/bh version to tlb_clear_slave()

2. represent BH and non BH hash table locks
_lock_rx_hashtbl_bh/_unlock_rx_hashtbl_bh
_lock_rx_hashtbl/_unlock_rx_hashtbl
_lock_tx_hashtbl_bh/_unlock_tx_hashtbl_bh
_lock_tx_hashtbl/_unlock_tx_hashtbl

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>

fix kernel version

SPEC: v2.6.39-100.0.19

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

scripts/git-changelog: generate rpm changelog script

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Revert "hpwd watchdog mark page executable"

Orabug: 13115973
This reverts commit e7494242c42201128204e50b2a707de335dd6334.
Bug fix is better covered with following upstream commit:

commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date: Mon Nov 7 10:59:00 2011 +0100

watchdog: hpwdt: Changes to handle NX secure bit in 32bit path

commit e67d668e147c3b4fec638c9e0ace04319f5ceccd upstream.
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.

This is needed for SLES11 SP2 and the latest upstream kernel as it appears
the NX Execute Disable has grown in its control.

Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Partial revert of mainline removal of deprecated sysfs interface for 13568528

Jan. 06, 2012
Oracle bug 13568528
Patch written by Andrew Thomas
Ported by Chuck Anderson

This patch partialy reverts the removal in mainline of a deprecated sysfs
interface needed by the OVM3.0.4 UEK2 based dom0 kernel when it is used
to install OVM3.0.4

Comments from Andrew:

The problem is that in newer kernels, even with the
CONFIG_SYSFS_DEPRECATED[_V2] flags set, some nodes have been removed so to
tools looking in sysfs, pieces are missing.  This breaks anaconda (actually
kudzu) for us.  For OVM3 we use the dom0 kernel as the install kernel, so we
need UEK2 to provide the right "shape" sysfs.  This isn't an issue for OL
because you use the old RHEL kernel to install UEK1/2].  That said, this
issue affects more than us.  As Joe Jin points out, bug 13100678, required
kudzu fixes for eth devices.  Arguably the OVM3 anaconda issue can also be
fixed in kudzu, but what no one knows is if the missing sysfs nodes are
symptoms of a wider set of tools related problems and therefore whether the
correct fix is to revert sysfs changes in UEK2 so that the sysfs it presents
is isomorphic to what 2.6.18 based kernels provide.

A "better" set of tools would be from 6uX, but in order to get those
installed/upgraded on OVM3 is not a trivial task because the system customers
have already installed is 5u5 based.  We have other tools in dom0 [eg our
agent] which "know" about the old flavour of sysfs and these would need
porting.  You either change the kernel OR you change all the tools that rely
on sysfs... the problem is that customers can install there own tools on OL5.

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

scsi:lpfc update to 8.3.5.58

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Let KERNEL_VERSION be 3.0.x, and override UTSNAME

This will let out-of-tree modules correctly detect the kernel version
when building against it, but it will still identify as 2.6.39
everywhere in userspace.

Signed-off-by: Nelson Elhage <nelson.elhage@oracle.com>
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

qla4xxx: Fix qla4xxx_dump_buffer to dump buffer correctly

KERN_INFO in printk adding new line character that mess-up
dump print format. Remove KERN_INFO to fix dump format.

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: Fix the IDC locking mechanism

This ensures the transition of dev_state from COLD to
INITIALIZING is within lock and atomic.

Jira Key: IUEKR2ISCSI-6

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: Wait for disable_acb before doing set_acb

In function qla4xxx_iface_set_param wait for disable_acb to
complete so that set_acb will not fail.

Jira Key: IUEKR2ISCSI-5

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: Don't recover adapter if device state is FAILED

Multiple reset request don't get handled correctly as
the driver tries to recover adapter which is in FAILED state.

Jira Key: IUEKR2ISCSI-4

Signed-off-by: Sarang Radke <sarang.radke@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: fix call trace on rmmod with ql4xdontresethba=1

abort all active commands from eh_host_reset in-case
of ql4xdontresethba=1

Fix following call trace:-
Nov 21 14:50:47 172.17.140.111 qla4xxx 0000:13:00.4: qla4_8xxx_disable_msix: qla4xxx (rsp_q)
Nov 21 14:50:47 172.17.140.111 qla4xxx 0000:13:00.4: PCI INT A disabled
Nov 21 14:50:47 172.17.140.111 slab error in kmem_cache_destroy(): cache `qla4xxx_srbs': Can't free all objects
Nov 21 14:50:47 172.17.140.111 Pid: 9154, comm: rmmod Tainted: G           O 3.2.0-rc2+ #2
Nov 21 14:50:47 172.17.140.111 Call Trace:
Nov 21 14:50:47 172.17.140.111  [<c051231a>] ? kmem_cache_destroy+0x9a/0xb0
Nov 21 14:50:47 172.17.140.111  [<c0489c4a>] ? sys_delete_module+0x14a/0x210
Nov 21 14:50:47 172.17.140.111  [<c04fd552>] ? do_munmap+0x202/0x280
Nov 21 14:50:47 172.17.140.111  [<c04a6d4e>] ? audit_syscall_entry+0x1ae/0x1d0
Nov 21 14:50:47 172.17.140.111  [<c083019f>] ? sysenter_do_call+0x12/0x28
Nov 21 14:51:50 172.17.140.111 SLAB: cache with size 64 has lost its name
Nov 21 14:51:50 172.17.140.111 iscsi: registered transport (qla4xxx)
Nov 21 14:51:50 172.17.140.111 qla4xxx 0000:13:00.4: PCI INT A -> GSI 28 (level, low) -> IRQ 28

Jira Key: IUEKR2ISCSI-3

Signed-off-by: Sarang Radke <sarang.radke@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: Fix CPU lockups when ql4xdontresethba set

Fix issue where CPU lockup is seen when ql4xdontresethba is set and
driver is "stuck" in NEED_RESET state handler.

Jira Key: IUEKR2ISCSI-2

Signed-off-by: Mike Hernandez <michael.hernandez@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

qla4xxx: Perform context resets in case of context failures.

For 4032, context reset was the same as chip reset, and any firmware
issue was recovered by performing a chip reset.
For 82xx, the iSCSI firmware runs along with FCoE and the NIC
firmware contexts, and an error encountered doesnot essentially mean
that a chip reset is necessary.

Perform Chip resets only in the following cases:
1. Mailbox system error.
2. Mailbox command timeout.
3. fw_heartbeat_counter counter stops incrementing.

For all other cases, only perform a context reset.
1. Command Completion with an invalid srb.
2. Other mailbox failures.

Jira Key: IUEKR2ISCSI-1

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Shyam Sunder <shyam.sunder@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

do not obsolete firmware

Orabug: 13535055
Do not remove firmware for other kernels.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"

This reverts commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05.

We piggyback on upstream git commit 12275dd4b747f5d87fa36229774d76bca8e63068
which says:

commit 12275dd4b747f5d87fa36229774d76bca8e63068
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Dec 19 09:30:35 2011 -0500

    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"

    This reverts commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05.
    As when booting the kernel under Amazon EC2 as an HVM guest it ends up
    hanging during startup. Reverting this we loose the fix for kexec
    booting to the crash kernels.

    Fixes Canonical BZ #901305 (http://bugs.launchpad.net/bugs/901305)

Tested-by: Alessandro Salvatori <sandr8@gmail.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Revert "xen-blkback: convert hole punching to discard request on loop devices"

This reverts commit 7139323c5ef7b342d770efef6ae463ff201e3d83.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

ath9k: Fix kernel panic in AR2427 in AP mode

commit b25bfda38236f349cde0d1b28952f4eea2148d3f upstream.

don't do aggregation related stuff for 'AP mode client power save
handling' if aggregation is not enabled in the driver, otherwise it
will lead to panic because those data structures won't be never
intialized in 'ath_tx_node_init' if aggregation is disabled

EIP is at ath_tx_aggr_wakeup+0x37/0x80 [ath9k]
EAX: e8c09a20 EBX: f2a304e8 ECX: 00000001 EDX: 00000000
ESI: e8c085e0 EDI: f2a304ac EBP: f40e1ca4 ESP: f40e1c8c
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper/1 (pid: 0, ti=f40e0000 task=f408e860
task.ti=f40dc000)
Stack:
0001e966 e8c09a20 00000000 f2a304ac e8c085e0 f2a304ac
f40e1cb0 f8186741
f8186700 f40e1d2c f922988d f2a304ac 00000202 00000001
c0b4ba43 00000000
0000000f e8eb75c0 e8c085e0 205b0001 34383220 f2a304ac
f2a30000 00010020
Call Trace:
[<f8186741>] ath9k_sta_notify+0x41/0x50 [ath9k]
[<f8186700>] ? ath9k_get_survey+0x110/0x110 [ath9k]
[<f922988d>] ieee80211_sta_ps_deliver_wakeup+0x9d/0x350
[mac80211]
[<c018dc75>] ? __module_address+0x95/0xb0
[<f92465b3>] ap_sta_ps_end+0x63/0xa0 [mac80211]
[<f9246746>] ieee80211_rx_h_sta_process+0x156/0x2b0
[mac80211]
[<f9247d1e>] ieee80211_rx_handlers+0xce/0x510 [mac80211]
[<c018440b>] ? trace_hardirqs_on+0xb/0x10
[<c056936e>] ? skb_queue_tail+0x3e/0x50
[<f9248271>] ieee80211_prepare_and_rx_handle+0x111/0x750
[mac80211]
[<f9248bf9>] ieee80211_rx+0x349/0xb20 [mac80211]
[<f9248949>] ? ieee80211_rx+0x99/0xb20 [mac80211]
[<f818b0b8>] ath_rx_tasklet+0x818/0x1d00 [ath9k]
[<f8187a75>] ? ath9k_tasklet+0x35/0x1c0 [ath9k]
[<f8187a75>] ? ath9k_tasklet+0x35/0x1c0 [ath9k]
[<f8187b33>] ath9k_tasklet+0xf3/0x1c0 [ath9k]
[<c0151b7e>] tasklet_action+0xbe/0x180

Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Reported-by: Ashwin Mendonca <ashwinloyal@gmail.com>
Tested-by: Ashwin Mendonca <ashwinloyal@gmail.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race

commit 50b8d257486a45cba7b65ca978986ed216bbcc10 upstream.

Test-case:

int main(void)
{
int pid, status;

pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}

assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);

assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);

do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);

return 1;
}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.

Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Revert "rtc: Disable the alarm in the hardware"

commit 157e8bf8b4823bfcdefa6c1548002374b61f61df upstream.

This reverts commit c0afabd3d553c521e003779c127143ffde55a16f.

It causes failures on Toshiba laptops - instead of disabling the alarm,
it actually seems to enable it on the affected laptops, resulting in
(for example) the laptop powering on automatically five minutes after
shutdown.

There's a patch for it that appears to work for at least some people,
but it's too late to play around with this, so revert for now and try
again in the next merge window.

See for example

http://bugs.debian.org/652869

Reported-and-bisected-by: Andreas Friedrich <afrie@gmx.net> (Toshiba Tecra)
Reported-by: Antonio-M. Corbi Bellot <antonio.corbi@ua.es> (Toshiba Portege R500)
Reported-by: Marco Santos <marco.santos@waynext.com> (Toshiba Portege Z830)
Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> (Toshiba Portege R830)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Requested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hung_task: fix false positive during vfork

commit f9fab10bbd768b0e5254e53a4a8477a94bfc4b96 upstream.

vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

drm/radeon/kms/atom: fix possible segfault in pm setup

commit 4376eee92e5a8332b470040e672ea99cd44c826a upstream.

If we end up with no power states, don't look up
current vddc.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=44130

agd5f: fix patch formatting

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

xfs: log all dirty inodes in xfs_fs_sync_fs

Commit be4f1ac828776bbc7868a68b465cd8eedb733cfd upstream.

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

xfs: log the inode in ->write_inode calls for kupdate

Commit 0b8fd3033c308e4088760aa1d38ce77197b4e074 upstream.

If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mfd: Turn on the twl4030-madc MADC clock

commit 3d6271f92e98094584fd1e609a9969cd33e61122 upstream.

Without turning the MADC clock on, no MADC conversions occur.

$ cat /sys/class/hwmon/hwmon0/device/in8_input
[ 53.428436] twl4030_madc twl4030_madc: conversion timeout!
cat: read error: Resource temporarily unavailable

Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mfd: Check for twl4030-madc NULL pointer

commit d0e84caeb4cd535923884735906e5730329505b4 upstream.

If the twl4030-madc device wasn't registered, and another device, such
as twl4030-madc-hwmon, calls twl4030_madc_conversion() a NULL pointer is
dereferenced.

Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mfd: Copy the device pointer to the twl4030-madc structure

commit 66cc5b8e50af87b0bbd0f179d76d2826f4549c13 upstream.

Worst case this fixes the following error:
[ 72.086212] (NULL device *): conversion timeout!

Best case it prevents a crash

Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

mfd: Fix mismatch in twl4030 mutex lock-unlock

commit e178ccb33569da17dc897a08a3865441b813bdfb upstream.

A mutex is locked on entry into twl4030_madc_conversion().
Immediate return on some error conditions leaves the
mutex locked.

This patch ensures that mutex is always unlocked before
leaving the function.

Signed-off-by: Sanjeev Premi <premi@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

iwlwifi: update SCD BC table for all SCD queues

commit 96f1f05af76b601ab21a7dc603ae0a1cea4efc3d upstream.

Since we configure all the queues as CHAINABLE, we need to update the
byte count for all the queues, not only the AGGREGATABLE ones.

Not doing so can confuse the SCD and make the fw assert.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ipv4: using prefetch requires including prefetch.h

[ Upstream commit b9eda06f80b0db61a73bd87c6b0eb67d8aca55ad ]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ipv4: reintroduce route cache garbage collector

[ Upstream commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb ]

Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.

As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.

Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.

Reported-by: Robert Gladewitz <gladewitz@gmx.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ipv4: flush route cache after change accept_local

[ Upstream commit d01ff0a049f749e0bf10a35bb23edd012718c8c2 ]

After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd

[ Upstream commit a76c0adf60f6ca5ff3481992e4ea0383776b24d2 ]

When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.

The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.

When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.

Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.

The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.

Chunk
Size    Unpatched     No Overhead
-------------------------------------
   4    15.2 Kbit [!]   12.2 Mbit [!]
   8    35.8 Kbit [!]   26.0 Mbit [!]
  16    95.5 Kbit [!]   54.4 Mbit [!]
  32   106.7 Mbit      102.3 Mbit
  64   189.2 Mbit      188.3 Mbit
128   331.2 Mbit      334.8 Mbit
256   537.7 Mbit      536.0 Mbit
512   766.9 Mbit      766.6 Mbit
1024   810.1 Mbit      808.6 Mbit

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sctp: fix incorrect overflow check on autoclose

[ Upstream commit 2692ba61a82203404abd7dd2a027bda962861f74 ]

Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value.  If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.

This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.

1) Avoid overflowing autoclose * HZ.

2) Keep the default autoclose bound consistent across 32- and 64-bit
   platforms (INT_MAX / HZ in this patch).

3) Keep the autoclose value consistent between setsockopt() and
   getsockopt() calls.

Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sch_gred: should not use GFP_KERNEL while holding a spinlock

[ Upstream commit 3f1e6d3fd37bd4f25e5b19f1c7ca21850426c33f ]

gred_change_vq() is called under sch_tree_lock(sch).

This means a spinlock is held, and we are not allowed to sleep in this
context.

We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

net: have ipconfig not wait if no dev is available

[ Upstream commit cd7816d14953c8af910af5bb92f488b0b277e29d ]

previous commit 3fb72f1e6e6165c5f495e8dc11c5bbd14c73385c
makes IP-Config wait for carrier on at least one network device.

Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.

Cc: Micha Nelissen <micha@neli.hopto.org>
Cc: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mqprio: Avoid panic if no options are provided

[ Upstream commit 7838f2ce36b6ab5c13ef20b1857e3bbd567f1759 ]

Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

llc: llc_cmsg_rcv was getting called after sk_eat_skb.

[ Upstream commit 9cef310fcdee12b49b8b4c96fd8f611c8873d284 ]

Received non stream protocol packets were calling llc_cmsg_rcv that used a
skb after that skb was released by sk_eat_skb. This caused received STP
packets to generate kernel panics.

Signed-off-by: Alexandru Juncu <ajuncu@ixiacom.com>
Signed-off-by: Kunjan Naik <knaik@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ppp: fix pptp double release_sock in pptp_bind()

[ Upstream commit a454daceb78844a09c08b6e2d8badcb76a5d73b9 ]

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

net: bpf_jit: fix an off-one bug in x86_64 cond jump target

[ Upstream commit a03ffcf873fe0f2565386ca8ef832144c42e67fa ]

x86 jump instruction size is 2 or 5 bytes (near/long jump), not 2 or 6
bytes.

In case a conditional jump is followed by a long jump, conditional jump
target is one byte past the start of target instruction.

Signed-off-by: Markus Kötter <nepenthesdev@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc: Fix handling of orig_i0 wrt. debugging when restarting syscalls.

[ A combination of upstream commits 1d299bc7732c34d85bd43ac1a8745f5a2fed2078 and
e88d2468718b0789b4c33da2f7e1cef2a1eee279 ]

Although we provide a proper way for a debugger to control whether
syscall restart occurs, we run into problems because orig_i0 is not
saved and restored properly.

Luckily we can solve this problem without having to make debuggers
aware of the issue. Across system calls, several registers are
considered volatile and can be safely clobbered.

Therefore we use the pt_regs save area of one of those registers, %g6,
as a place to save and restore orig_i0.

Debuggers transparently will do the right thing because they save and
restore this register already.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc64: Fix masking and shifting in VIS fpcmp emulation.

[ Upstream commit 2e8ecdc008a16b9a6c4b9628bb64d0d1c05f9f92 ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc32: Correct the return value of memcpy.

[ Upstream commit a52312b88c8103e965979a79a07f6b34af82ca4b ]

Properly return the original destination buffer pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc32: Remove uses of %g7 in memcpy implementation.

[ Upstream commit 21f74d361dfd6a7d0e47574e315f780d8172084a ]

This is setting things up so that we can correct the return
value, so that it properly returns the original destination
buffer pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc32: Remove non-kernel code from memcpy implementation.

[ Upstream commit 045b7de9ca0cf09f1adc3efa467f668b89238390 ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc: Kill custom io_remap_pfn_range().

[ Upstream commit 3e37fd3153ac95088a74f5e7c569f7567e9f993a ]

To handle the large physical addresses, just make a simple wrapper
around remap_pfn_range() like MIPS does.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc64: Patch sun4v code sequences properly on module load.

[ Upstream commit 0b64120cceb86e93cb1bda0dc055f13016646907 ]

Some of the sun4v code patching occurs in inline functions visible
to, and usable by, modules.

Therefore we have to patch them up during module load.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc32: Be less strict in matching %lo part of relocation.

[ Upstream commit b1f44e13a525d2ffb7d5afe2273b7169d6f2222e ]

The "(insn & 0x01800000) != 0x01800000" test matches 'restore'
but that is a legitimate place to see the %lo() part of a 32-bit
symbol relocation, particularly in tail calls.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().

[ Upstream commit 7cc8583372a21d98a23b703ad96cab03180b5030 ]

This silently was working for many years and stopped working on
Niagara-T3 machines.

We need to set the MSIQ to VALID before we can set it's state to IDLE.

On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
errors. The hypervisor documentation says, rather ambiguously, that
the MSIQ must be "initialized" before one can set the state.

I previously understood this to mean merely that a successful setconf()
operation has been performed on the MSIQ, which we have done at this
point. But it seems to also mean that it has been set VALID too.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: hugetlb: fix non-atomic enqueue of huge page

commit b0365c8d0cb6e79eb5f21418ae61ab511f31b575 upstream.

If a huge page is enqueued under the protection of hugetlb_lock, then the
operation is atomic and safe.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

drm/radeon/kms: bail on BTC parts if MC ucode is missing

commit 77e00f2ea94abee1ad13bdfde19cf7aa25992b0e upstream.

We already do this for cayman, need to also do it for
BTC parts. The default memory and voltage setup is not
adequate for advanced operation. Continuing will
result in an unusable display.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

watchdog: hpwdt: Changes to handle NX secure bit in 32bit path

commit e67d668e147c3b4fec638c9e0ace04319f5ceccd upstream.

This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.

This is needed for SLES11 SP2 and the latest upstream kernel as it appears
the NX Execute Disable has grown in its control.

Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

futex: Fix uninterruptible loop due to gate_area

commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream.

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

oprofile, arm/sh: Fix oprofile_arch_exit() linkage issue

commit 55205c916e179e09773d98d290334d319f45ac6b upstream.

This change fixes a linking problem, which happens if oprofile
is selected to be compiled as built-in:

  `oprofile_arch_exit' referenced in section `.init.text' of
  arch/arm/oprofile/built-in.o: defined in discarded section
  `.exit.text' of arch/arm/oprofile/built-in.o

The problem is appeared after commit 87121ca504, which
introduced oprofile_arch_exit() calls from __init function. Note
that the aforementioned commit has been backported to stable
branches, and the problem is known to be reproduced at least
with 3.0.13 and 3.1.5 kernels.

Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@nokia.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Link: http://lkml.kernel.org/r/20111222151540.GB16765@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>